Monday, August 07, 2006

Update on an interesting data set...

The consumerist wrote about this (the AOL release of query strings and results)

... "Combine these ego searches with porn queries and you have a serious embarrassment"...

... They might come across the entry for User 17556639 ...

interesting.

Actually, the consumerist wrote a lot about this today. Scary now.

And like I thought - it is hard to put the cat back into the bag. Mirror sites are up now. That is a genie that is not going back into the bottle.
POST A COMMENT

25 Comments:

Anonymous Anonymous said....

I wonder how many of the "worst" searches are from law enforcement...

Mon Aug 07, 01:24:00 PM EDT  

Anonymous Andy C said....

I agree that the release of this data raises lots of interesting issues but is it really that surprisng that there are a load of perverts and sickos out there on the Internet ?

This guy isn't about to murder his wife. What's more interesting is his fleeting desire for 'Steak and Cheese' intermingled with his perverted fascination for necrophilic images.

Mon Aug 07, 02:24:00 PM EDT  

Blogger David Aldridge said....

I started trying to imagine if "steak and cheese" could be a code phrase for some kind of obscene and unhealthy practice -- I stopped myself after about two seconds of that.

Mon Aug 07, 02:55:00 PM EDT  

Anonymous Anonymous said....

Andy, How do you know he is not plannig on killing his wife (hmmm - do you use AOL)?

I think the goverment will end up with the right to "monitor" what is being searched for anyonymously. They will use that information to garner search warrents to see who user ID XXX is (or IP in the case of google, ask.com, etc...)

Is that a bad thing? The employees at google already have this information and more then likley sell it in parts to corporations now anyway.

Mon Aug 07, 03:01:00 PM EDT  

Blogger David Aldridge said....

>> How do you know he is not plannig on killing his wife (hmmm - do you use AOL)?

Maybe it was his wife that signed them up for AOL service -- that could get the charge reduced to "manslaughter" at least.

I'm also thinking that this data set could be the basis for some interesting exercises in SQL ... for example ...

"Compose a query to identify the number of porn addicts with an interest in yoga and fly fishing who have shopped for a new vehicle between the hours of 2am and 5am on at least three sundays in any six month period, and the number of times they've had difficulty finding the Nissan Auto website."

Mon Aug 07, 04:19:00 PM EDT  

Anonymous Andy C said....

Look in the data. 15 lines down. A search from the same IP. 'How to salvage burnt steak from psychopathic, murderous husband ?'

Mon Aug 07, 04:23:00 PM EDT  

Blogger Niall said....

must really be hating that AOL doesn't index him.

Mon Aug 07, 04:30:00 PM EDT  

Anonymous Anonymous said....

It hit news.com:
http://news.com.com/2100-1030_3-6102793.html?tag=nefd.top

I'd be willing to bet that it was a network admin seeing a spike in traffic (1000 users downloading the file) that caused an investigation into what was being downloaded.

Pd

Mon Aug 07, 04:45:00 PM EDT  

Anonymous Anonymous said....

it hit /. as of 4:41 EDT.

The files are still apparently available from:

http://thepiratebay.org/details.php?id=3510027

Pd

Mon Aug 07, 04:49:00 PM EDT  

Anonymous Cameron said....

David,
I'm also thinking that this data set could be the basis for some interesting exercises in SQL ... for example ...

"Compose a query to identify the number of porn addicts with an interest in yoga and fly fishing who have shopped for a new vehicle between the hours of 2am and 5am on at least three sundays in any six month period, and the number of times they've had difficulty finding the Nissan Auto website."


I imagine this is exactly what Google Sets does behind the scenes.

Mon Aug 07, 04:56:00 PM EDT  

Blogger Doug Burns said....

Anon said

I think the goverment will end up with the right to "monitor" what is being searched for anyonymously. They will use that information to garner search warrents to see who user ID XXX is (or IP in the case of google, ask.com, etc...)

Is that a bad thing?


At the risk of sounding like an old hippy, I think that's a *terrible* thing, actually. It's so easy to say 'but if someone is a criminal, what's the harm?'.

I dread the authorities investigating me because of what I might be interested in (however perverse), thinking about or maybe have even been affected by as the victim?

Can I recommend Orwell's 1984?

Mon Aug 07, 05:28:00 PM EDT  

Blogger Doug Burns said....

Niall said...

must really be hating that AOL doesn't index him.

I *know*. I was sooo close to my 15 minutes of infamy!

Mon Aug 07, 05:30:00 PM EDT  

Anonymous Anonymous said....

Anon here again,

In reply to Doug about "sounding like a hippy" - these companies are gathering all sorts of information on people that use their services. They can use it as they please and sell to whom ever they please (most of the terms of agreement I have seen allows them to change that agreement whenever they want).

Why did AOL fight off the goverment? To keep their members information private or keep the ability to sell that information (to the goverment or highest bidder)?

Mon Aug 07, 11:11:00 PM EDT  

Blogger Doug Burns said....

anon,

I'm not so naive as to think that companies don't want to use the information however they like, or do I doubt they're legally entitled to do so *at the moment*.

That doesn't change the fact that I think it's a bad thing and using *possible* criminality to justify additional use is an extremely slippery slope.

Companies and governments would like to get away with all sorts of things that we don't let them get away with, because we campaign for legislation to limit them. Just saying 'well, they can already do so, so why not?' seems to be defeatist, pessimistic and a little cynical. (Then again, who can blame people for being a little cynical!)

I get your drift, though, so I won't be turning these comments into a long on-going argument ;-)

Tue Aug 08, 02:35:00 AM EDT  

Blogger Alberto Dell'Era said....

I'd be curious to know about the most recurrent search strings on asktom now (without of course having the data set published ;)

Tue Aug 08, 09:59:00 AM EDT  

Anonymous Anonymous said....


Alberto Dell'Era said...
I'd be curious to know about the most recurrent search strings on asktom now


"what is a bind variable" - Is that a good guess?

Tue Aug 08, 02:25:00 PM EDT  

Blogger jsuarezcasana said....

http://thepiratebay.org/details.php?id=3510027

doesn't work try:

http://thepiratebay.org/details.php?id=3510426
http://thepiratebay.org/details.php?id=3510326

cheers

Tue Aug 08, 05:32:00 PM EDT  

Blogger Joel Garry said....

Links to some more commentary on comp.risks.

word: xlbpfht sounds like Bill the Cat is now working on X11.

Tue Aug 08, 08:11:00 PM EDT  

Anonymous Anonymous said....

Does anyone think it is hypocritical to call this an outrage and then download the data?

Tue Aug 08, 10:55:00 PM EDT  

Blogger Thomas Kyte said....

where did you see "outrage"?

I personally expressed my thought that this could be considered a breach of privacy - after downloading it and seeing what was actually in there.

But I don't see any outrage and then download?

Wed Aug 09, 12:28:00 AM EDT  

Anonymous Anonymous said....

I wasn't referring to you - sorry if it came across that way, I meant in general. There is a conversation about this on slashdot.org at http://yro.slashdot.org/article.pl?sid=06/08/07/2022244
I myself find it to be a breach of privacy but I downloaded it to check it out as well. Am I a hypocrite?

Wed Aug 09, 11:18:00 AM EDT  

Blogger Thomas Kyte said....

I think you have to either

a) download it
b) read the extracts others have provided

to really become 'outraged' (they have 'outed' at least one person so far - contacted her and confirmed those were her searches).

Before you see the data, it almost sounds harmless. I thought it was probably a really bad idea in the first place (given the recent fuss google put up over handing this stuff over the to government...)

and it has been proven out to be so.

But, I don't think it is hypocritical to be "outraged" (eg: upset, not happy, displeased...) and download it.

Wed Aug 09, 11:22:00 AM EDT  

Anonymous Anonymous said....

The first user identified from AOL data

http://www.nytimes.com/2006/08/09/technology/09aol.html?_r=1&oref=slogin

Wonder how many more would roll out soon... :-)

Wed Aug 09, 12:03:00 PM EDT  

Blogger Noons said....

well, it's not just a matter of releasing the data or not. Is it even kosher to *keep* it in the first place?

Part of our work is redirecting search engine traffic after capturing ad and search term data. We get quite a few GB of logs per day. The logs are processed for specific information, aggregated before it even reaches our databases. Then the aggregations are used for ROI, SEO, etcetc.

But for "legal reasons", we must keep the original logs for a number of years. Got heaps of TBs of those, all compressed and safely stashed away. Only had to pull out one log once from 6 months before, to resolve an issue with a client who claimed we hadn't got their traffic.

But like I did, so can others. And that worries me. One thing is to confirm data for a client, another totally different one is to data-mine the logs.

And that is exactly what is going to be done with the AOL logs...

Wed Aug 09, 12:43:00 PM EDT  

Anonymous Gus Spier said....

here we are, two weeks after the fact and the axe is finally falling. This morning's local Northern Virginia news programs announce that three AOL employees are leaving the company over this fiasco.

I guess the worst part of the business end of our work is knowing that whatever you do, "The road to Hell is paved with good intentions."; "No good deed goes unpunished."; and blame will be apportioned

Tue Aug 22, 09:33:00 AM EDT  

POST A COMMENT

<< Home