Update on an interesting data set...
The consumerist wrote about this (the AOL release of query strings and results)
... "Combine these ego searches with porn queries and you have a serious embarrassment"...
... They might come across the entry for User 17556639 ...
interesting.
Actually, the consumerist wrote a lot about this today. Scary now.
And like I thought - it is hard to put the cat back into the bag. Mirror sites are up now. That is a genie that is not going back into the bottle.
... "Combine these ego searches with porn queries and you have a serious embarrassment"...
... They might come across the entry for User 17556639 ...
interesting.
Actually, the consumerist wrote a lot about this today. Scary now.
And like I thought - it is hard to put the cat back into the bag. Mirror sites are up now. That is a genie that is not going back into the bottle.


25 Comments:
I wonder how many of the "worst" searches are from law enforcement...
I agree that the release of this data raises lots of interesting issues but is it really that surprisng that there are a load of perverts and sickos out there on the Internet ?
This guy isn't about to murder his wife. What's more interesting is his fleeting desire for 'Steak and Cheese' intermingled with his perverted fascination for necrophilic images.
I started trying to imagine if "steak and cheese" could be a code phrase for some kind of obscene and unhealthy practice -- I stopped myself after about two seconds of that.
Andy, How do you know he is not plannig on killing his wife (hmmm - do you use AOL)?
I think the goverment will end up with the right to "monitor" what is being searched for anyonymously. They will use that information to garner search warrents to see who user ID XXX is (or IP in the case of google, ask.com, etc...)
Is that a bad thing? The employees at google already have this information and more then likley sell it in parts to corporations now anyway.
>> How do you know he is not plannig on killing his wife (hmmm - do you use AOL)?
Maybe it was his wife that signed them up for AOL service -- that could get the charge reduced to "manslaughter" at least.
I'm also thinking that this data set could be the basis for some interesting exercises in SQL ... for example ...
"Compose a query to identify the number of porn addicts with an interest in yoga and fly fishing who have shopped for a new vehicle between the hours of 2am and 5am on at least three sundays in any six month period, and the number of times they've had difficulty finding the Nissan Auto website."
Look in the data. 15 lines down. A search from the same IP. 'How to salvage burnt steak from psychopathic, murderous husband ?'
must really be hating that AOL doesn't index him.
It hit news.com:
http://news.com.com/2100-1030_3-6102793.html?tag=nefd.top
I'd be willing to bet that it was a network admin seeing a spike in traffic (1000 users downloading the file) that caused an investigation into what was being downloaded.
Pd
it hit /. as of 4:41 EDT.
The files are still apparently available from:
http://thepiratebay.org/details.php?id=3510027
Pd
David,
I'm also thinking that this data set could be the basis for some interesting exercises in SQL ... for example ...
"Compose a query to identify the number of porn addicts with an interest in yoga and fly fishing who have shopped for a new vehicle between the hours of 2am and 5am on at least three sundays in any six month period, and the number of times they've had difficulty finding the Nissan Auto website."
I imagine this is exactly what Google Sets does behind the scenes.
Anon said
I think the goverment will end up with the right to "monitor" what is being searched for anyonymously. They will use that information to garner search warrents to see who user ID XXX is (or IP in the case of google, ask.com, etc...)
Is that a bad thing?
At the risk of sounding like an old hippy, I think that's a *terrible* thing, actually. It's so easy to say 'but if someone is a criminal, what's the harm?'.
I dread the authorities investigating me because of what I might be interested in (however perverse), thinking about or maybe have even been affected by as the victim?
Can I recommend Orwell's 1984?
Niall said...
must really be hating that AOL doesn't index him.
I *know*. I was sooo close to my 15 minutes of infamy!
Anon here again,
In reply to Doug about "sounding like a hippy" - these companies are gathering all sorts of information on people that use their services. They can use it as they please and sell to whom ever they please (most of the terms of agreement I have seen allows them to change that agreement whenever they want).
Why did AOL fight off the goverment? To keep their members information private or keep the ability to sell that information (to the goverment or highest bidder)?
anon,
I'm not so naive as to think that companies don't want to use the information however they like, or do I doubt they're legally entitled to do so *at the moment*.
That doesn't change the fact that I think it's a bad thing and using *possible* criminality to justify additional use is an extremely slippery slope.
Companies and governments would like to get away with all sorts of things that we don't let them get away with, because we campaign for legislation to limit them. Just saying 'well, they can already do so, so why not?' seems to be defeatist, pessimistic and a little cynical. (Then again, who can blame people for being a little cynical!)
I get your drift, though, so I won't be turning these comments into a long on-going argument ;-)
I'd be curious to know about the most recurrent search strings on asktom now (without of course having the data set published ;)
Alberto Dell'Era said...
I'd be curious to know about the most recurrent search strings on asktom now
"what is a bind variable" - Is that a good guess?
http://thepiratebay.org/details.php?id=3510027
doesn't work try:
http://thepiratebay.org/details.php?id=3510426
http://thepiratebay.org/details.php?id=3510326
cheers
Links to some more commentary on comp.risks.
word: xlbpfht sounds like Bill the Cat is now working on X11.
Does anyone think it is hypocritical to call this an outrage and then download the data?
where did you see "outrage"?
I personally expressed my thought that this could be considered a breach of privacy - after downloading it and seeing what was actually in there.
But I don't see any outrage and then download?
I wasn't referring to you - sorry if it came across that way, I meant in general. There is a conversation about this on slashdot.org at http://yro.slashdot.org/article.pl?sid=06/08/07/2022244
I myself find it to be a breach of privacy but I downloaded it to check it out as well. Am I a hypocrite?
I think you have to either
a) download it
b) read the extracts others have provided
to really become 'outraged' (they have 'outed' at least one person so far - contacted her and confirmed those were her searches).
Before you see the data, it almost sounds harmless. I thought it was probably a really bad idea in the first place (given the recent fuss google put up over handing this stuff over the to government...)
and it has been proven out to be so.
But, I don't think it is hypocritical to be "outraged" (eg: upset, not happy, displeased...) and download it.
The first user identified from AOL data
http://www.nytimes.com/2006/08/09/technology/09aol.html?_r=1&oref=slogin
Wonder how many more would roll out soon... :-)
well, it's not just a matter of releasing the data or not. Is it even kosher to *keep* it in the first place?
Part of our work is redirecting search engine traffic after capturing ad and search term data. We get quite a few GB of logs per day. The logs are processed for specific information, aggregated before it even reaches our databases. Then the aggregations are used for ROI, SEO, etcetc.
But for "legal reasons", we must keep the original logs for a number of years. Got heaps of TBs of those, all compressed and safely stashed away. Only had to pull out one log once from 6 months before, to resolve an issue with a client who claimed we hadn't got their traffic.
But like I did, so can others. And that worries me. One thing is to confirm data for a client, another totally different one is to data-mine the logs.
And that is exactly what is going to be done with the AOL logs...
here we are, two weeks after the fact and the axe is finally falling. This morning's local Northern Virginia news programs announce that three AOL employees are leaving the company over this fiasco.
I guess the worst part of the business end of our work is knowing that whatever you do, "The road to Hell is paved with good intentions."; "No good deed goes unpunished."; and blame will be apportioned
POST A COMMENT
<< Home