The Noisy Channel, Daniel Tunkelang, January 4th, 2011
The increasing volume of data that we generate as a species is a story so overplayed as to have become trite. Indeed, a vast amount of this data is in the public domain, including data from the full text and common ngrams of books, genome research, the United States census, and much more. There is also open-source software not only to crawl the web, but also to search the data your crawl. So, if you're an aspiring data scientist and just want to get your hands on data, there's no excuse-go out and get it!
But perhaps you'd like to make a career out your jones for big data. Luckily for you, some of the hottest companies around are hiring data scientists!
Of course, those jobs aren't for everyone. To get an idea of the necessary qualifications, I suggest you read the answers on Quora for "How do I become a data scientist?" to get an idea of the requisite math and computer science skills. I'm also a fan of \ Hilary Mason's definition which was cited in Ryan Kim's "Wanted: Data Scientists to Turn Information Into Gold": a data scientist is someone who can obtain, scrub, explore, model and interpret data, blending hacking, statistics and machine learning. You can see Hilary's full explanation in a blog post she co-authored with Chris Wiggins, entitled \ "A Taxonomy of Data Science".