Radar O'Reilly, by Alistair Croll, @acroll, August 2, 2012
Data doesn't invade people's lives. Lack of control over how it's used does.
What's really driving so-called big data isn't the volume of information. It turns out big data doesn't have to be all that big. Rather, it's about a reconsideration of the fundamental economics of analyzing data.
... The advent of clouds, platforms like Hadoop, and the inexorable march of Moore's Law means that now, analyzing data is trivially inexpensive. And when things become so cheap that they're practically free, big changes happen - just look at the advent of steam power, or the copying of digital music, or the rise of home printing. Abundance replaces scarcity, and we invent new business models.
... With the new, data-is-abundant model, we collect first and ask questions later. The schema comes after the collection. Indeed, big data success stories like Splunk, Palantir, and others are prized because of their ability to make sense of content well after it's been collected - sometimes called a schema-less query. This means we collect information long before we decide what it's for.
And this is a dangerous thing. ...
We're great at using taste to predict things about people. OKcupid's 2010 blog post "The Real Stuff White People Like" showed just how easily we can use information to guess at race. It's a real eye-opener (and the guys who wrote it didn't include everything they learned - some of it was a bit too controversial). They simply looked at the words one group used which others didn't often use. The result was a list of "trigger" words for a particular race or gender.
Now run this backwards. If I know you like these things, or see you mention them in blog posts, on Facebook, or in tweets, then there's a good chance I know your gender and your race, and maybe even your religion and your sexual orientation. And that I can personalize my marketing efforts towards you.
That makes it a civil rights issue.
If I collect information on the music you listen to, you might assume I will use that data in order to suggest new songs, or share it with your friends. But instead, I could use it to guess at your racial background. And then I could use that data to deny you a loan.