The AOL data mess

August 7th, 2006

Not surprisingly this is the kind of topic that spreads like wildfire across blogland.
AOL search data snippet

AOL Research released (link to Google cache page) the search queries of hundreds of thousands of its users over a three month period. While user IDs are not included in the data set, all the search terms have been left untouched. Needless to say, lots of searches could include all sorts of private information that could identify a user.

The problems in the realm of privacy are obvious and have been discussed by many others so I won’t bother with that part. (See the blog posts linked above.) By not focusing on that aspect I do not mean to diminish its importance. I think it’s very grave. But many others are talking about it so I’ll focus on another aspect of this fiasco.

As someone who has research interests in this area and has been trying to get search companies to release some data for purely academic purposes, needless to say an incident like this is extremely unfortunate. Not that search companies have been particularly cooperative so far – based on this case not surprisingly -, but chances for future cooperation in this realm have just taken a nosedive.

To some extent I understand. No company wants to end up with this kind of a mess on their hands. And it would take way too much work on their part to remove all identifying information from a data set of this sort. I still wonder if there are possible work-arounds though, such as allowing access on the premises or some such solution. But again, that’s a lot of trouble, and why would they want to bother? Researchers like me would like to think we can bring something new to the table, but that may not be worth the risk.

Note, however, that dealing with sensitive data is nothing new in academic research. People are given access to very detailed Census data, for example, and confidentiality is preserved. From what I can tell the problem here did not stem from researchers, it was someone at AOL who was careless with the information. But the outcome will likely be less access to data for all sorts of researchers.

Another question of interest: Now that these data have been made public what are the chances for approval from a university’s institutional review board for work on this data set? (Alex raises related questions as well.) Would an approval be granted? These users did not consent to their data being used for such purposes. But the data have been made public and theoretically do not contain any identifying information. Even if they do, the researcher could promise that results would only be reported in the aggregate leaving out any potentially identifying information. Hmm…

For sure, this will be a great example in class when I teach about the privacy implications of online behavior.

Not surprisingly, people are already crunching the data set, here are some tidbits from it.

A propos the little snippet I grabbed from the data (see image above), see this paper of mine for an exploration of spelling mistakes made while using search engines and browsing the Web. About a third of that sample was AOL users.

The image above is from data in the xxx-01.txt file.

Scrollable ads

August 7th, 2006

GMail does something very smart with the Sponsored Links it displays in the Webclips area just above the message view area, it lets the user scroll back and forth among the ads.

Maybe I’m an odd one for actually looking at ads on occasion, but sometimes they do tell you about helpful or interesting information and services. So I like to click on them sometimes. However, more often than not, I just glance at them in the corner of my eye as I am about to move to another page. What then happens is that the ad changes. In GMail, I can just click on the back button in Webclips and get the ad (or whatever RSS feed I may have missed).

GMail Webclip

On most sites this is not possible (e.g. Yahoo! Mail). If you click the back button of your browser, chances are that some other ad is dynamically generated on the page you were just viewing by the time you return to it. It’s a bummer as some of those ads could be of interest to users a split second later.

Lowering the least bloggable unit

August 7th, 2006

I think I’ve been putting too high a threshold on the least bloggable unit* around here recently (although some may disagree). That is, I have all sorts of thoughts on IT and other matters that I could blog about, but I don’t bother, because I don’t have that much to say. There are also time constraints. More serious thoughts and posts require more time and needless to say time is limited around here.

So this is just to say that I may start posting more often, but in smaller chunks.

* Interestingly, it turns out that the phrase “least bloggable unit” has been used once in blog world so far: on Crooked Timber of all places in a comment by Sean Carroll.

Links for 2006-08-07

August 7th, 2006

Links for 2006-08-05

August 5th, 2006

Links for 2006-08-04

August 4th, 2006

Without pain on a plane

August 3rd, 2006

I am back from my trip to Argentina mentioned earlier and am happy to say that the long flight didn’t mess things up too much. I suspect the lack of time-zone change from Chicago to Buenos Aires helped quite a bit, but I would like to think my master preparedness was useful, too.

I did end up taking an hour-long nap after I got to Buenos Aires, but then was well-equipped to spend a good chunk of Saturday exploring the city. And what a fabulous city it is! It was my first time in Argentina, but after this visit I am convinced it was not the last. (The first batch of photos is available on Flickr now. More coming soon.)

As a side note on how some people try to make a long-distance relationship work, consider the story of the person sitting next to me on the flight there. He works in DC, but has a wife and young child in Argentina. Twice a month he gets on a plane Friday evening for the ten-hour flight to Buenos Aires to spend less than 48 hours with his family returning Sunday night so he can be back at work on Monday morning. Ouch.

Here is a list of ways to minimize fatigue generated by long flights, many drawn from responses to this post. I ran out to buy noise-canceling headphones after so many people recommended them. Great idea, I am convinced they made a huge difference!

  • noise canceling headphones (and/or earplugs)
  • water
  • eye mask
  • nasal spray (to counter dry air)
  • a bit of reading/game
  • easily accessible pen (so you can fill out immigration/customs paperwork whenever you want)
  • some type of sleeping pill (either over-the-counter or prescription)
  • at most a small item underneath the seat in front of you
  • an extra sweater/coat and the blanket they give you
  • resisting the need to eat everything you are served
  • small snacks (both sweet and not) so you can eat when you want
  • occasional stretching
  • in case of annoyances, a bit of meditation to block out the environment
  • resisting to watch several movies
  • aisle seat if you want freedom to move (but only if you don’t mind the chance of being bumped by the flight attendants and passersby), window seat if you want to use the side of the plane as a headrest (but only if you don’t mind the cold and having less access to movement)
  • adjusting headrest to avoid leaning/falling on neighbor
  • getting legs up (perhaps on small piece of luggage) for improved circulation
  • a good night’s sleep the night before

On the way back I got upgraded to business class so other than a bit of fatigue, the adjustment took even less out of me.

Links for 2006-07-29

July 29th, 2006

Links for 2006-07-28

July 28th, 2006

Long flight, little time-zone change

July 27th, 2006

I’m preparing for a short trip to Buenos Aires and am seeking advice on how to approach the trip for least amount of fatigue. CT folks seem to have a wealth of experience in the travel domain so I thought I’d ask if anyone had ideas for me. I am only going for a few days so when I get there at 9am I want to be ready to start exploring town instead of spending hours in bed. But is that realistic after a ten hour flight? I have a three hour layover in DC, which may add to my fatigue. I’m usually not so good at sleeping on planes (except in business class) so I don’t know if I can count on that much.

I have lots of experience with cross-continental travel and long flights so that’s not the issue. (The longest trip was probably when we moved to Honolulu from Budapest for a few months.) I have been taking such flights ever since I was nine, but it has always involved significant time-zone changes. Is it the long trip, the time-zone change or a combination of the two that causes one to be completely useless after a trip from the U.S. to Europe? I’m hoping most of it has to do with the time change so I can avoid it this time around.

For entertainment, I am bringing the manual of my new digital camera and a small English-Spanish dictionary and phrasebook, both of which I was happy to find in my favorite dictionary brand today at the local store. (I wouldn’t bother with a dictionary for a few days, but I figured it was worth getting one given my move to California in a month. I hadn’t planned to get a phrasebook, but I am a sucker for those little Langenscheidt books.)

Links for 2006-07-27

July 27th, 2006

Links for 2006-07-26

July 26th, 2006

Links for 2006-07-25

July 25th, 2006

Links for 2006-07-24

July 24th, 2006

But what if you meet a man?

July 23rd, 2006

Interesting anecdote in the comments to this post over at Science + Professor + Woman = Me. This is a conversation between the commenter and her chair, a man, about getting the signature for two graduate students to join her lab.

    Chair: I’m not sure that I can sign off on your being the advisor for these students.

    Me [Pam]: Excuse me? (Background: two new federally-funded three-yr grants, each with a doctoral stipend available for a student)

    Chair: Well, how do I know you are not going to meet a man and run off and be with him?

    (I kid you not, he said that).

    Me: You don’t. But how do I know that you aren’t going to meet a man and run off with him, and abandon the department?

    (He didn’t think it was funny – but he signed the forms.)

Same-sex waltz

July 22nd, 2006

This week, Chicago has been hosting Gay Games VII. It’s been fun to have all the various high quality sports competitions in town. Of course, as a spectator, there is not much difference when you watch the competitions at these events vs others since most sports tend to be divided by gender. However, couples sports (like figure skating or dancing) may look a bit different. But actually, only if you focus in on the gender aspect.

It should not be much surprise to anyone who’s been paying attention that I opted to go see the Dance Sports event. I only made it to the A-level competition of the men’s Latin dances and the women’s 10-dance, but this was just as well since this is the highest level under international rules. It was superb.

Anecdotally, my impression has been that most people in Chicagoland have either been excited about the Gay Games in town or haven’t paid much attention. But of course there is the occasional hostile approach. You really do have to wonder why people can’t just let others be as you’re standing there in the ballroom with all the energy and enthusiasm from both the crowd and the participants. Better yet, imagine if peope realized that they could even get something out of these events themselves, like enjoying the hard work of some very talented people.

The surprise of the event for me was to find out that the World Champion couple for men’s Latin hales from Hungary. In the Gay Games this week they placed third. I found out from them that Budapest will be hosting this year’s Same Sex Dance Competition . This made me wonder how the competition (and related associations and studios) got that particular name. Is use of the term “gay” exclusionary? Is it less politically charged to say “same sex”? Is the idea that not everyone who participates is gay? Anyone know the history of this? Apologies if I’m missing something obvious.

Links for 2006-07-22

July 22nd, 2006

Aussie, Aussie, Aussie!

July 21st, 2006

…, …, …!

I’m going to Australia in about two months. I’ve been interested in visiting ever since I read Jill Ker Conway‘s Road from Coorain, which was almost 15 years ago.

The reason I’m particularly excited about all this today is because I just received my tourist visa. Via email. Cool. Yes, talk about a good use of IT by government services. I had submitted my application just four days ago. (Anyone want to tear into this regarding security concerns?)

I got very anxious earlier this week when I realized I needed a visa to go to Australia. I feel like I’ve done my fair share of standing in lines for visas at 5am. Luckily, after a bit of browsing I realized that citizens of certain countries could apply for visitor visas online.

I HATE getting tourist visas. I don’t like the process involved in getting student/work visas either, but tourist visas bother me more. I don’t see why Australia needs to know so much about my various medical conditions just to allow me to visit for a week. In any case, being able to fill out the form in my living room without having to run around for x copies of y dimension passport photos made a big difference.

My most frustrating visa experience to date was at the Canadian embassy in NYC a few years ago. It was unbelievable how they treated people. They also sent people home, one after another – after the requisite five hours of standing in the freezing cold, of course – for paperwork that they never stated was required. I decided not to return to Canada until I could go without having to obtain a visa.

Links for 2006-07-21

July 21st, 2006

Links for 2006-07-20

July 20th, 2006