Daniel Tunkelang reports on three days at the O'Reilly Strata Conference, an assembly of two thousand people focused on data science and its applications.
Daniel Tunkelang, March 2, 2012
I spent the last three days at the
O'Reilly Strata Conference, an assembly of two thousand people focused on data science and its applications. While I'm wary of industry conferences from attending vendor-fests in my past life in the enterprise software world, Strata is an exceptionally good conference.
The speakers were a who's who of data science, including Lucene and Hadoop creator
Doug Cutting, search user interface pioneer
Marti Hearst, and Google chief economist
I spent Tuesday in the Deep Data session, billed as a no-holds-barred program for data scientists. My two favorite talks:
- Claudia Perlich, winner of three KDD cups, talked about using information to pick the right action and to influence people such that they behave in a way that is better for them, better for us, and possibly better for society in general.
- Monica Rogati, my colleague at LinkedIn and the epitome of a data scientist, delivered a fantastic talk about machine learning models and training data in the real world, extending
Peter Norvig's point about the
unreasonable effectiveness of data to observe that more data beats clever algorithms but better data beats more data.
But the most fun that day was the Oxford-style debate featuring
Drew Conway, Pete Skomoroch, Mike Driscoll, DJ Patil, Amy Heineike, Pete Warden, and Toby Segaran.
The question proposed was absurdly
if you had to hire your first data scientist and could only hire one, would you pick a domain expert or a machine learning expert?
After the moderator suppressed some initial attempts to hedge ("both", "it depends", etc.), the debaters ripped into the question by taking extreme positions and defending them with gusto. It was a lot of fun, with enthusiastic audience participation and the debaters exploiting their inside knowledge of their opponents' work histories. In the end, the machine learning side won by a small margin.
... the person who stole the show was Google's
Avinash Kaushik, who talked about making love with data to find orgasm-inducing actions to change the world and make more money. Unfortunately this was the one talk that was not recorded, but you can read the summary on
Avinash's Google+ page.
... featured O'Reilly's
Alex Howard moderating Intelius Chief Privacy Officer
Jim Adler and NYU PhD student
Solon Barocas on a panel provocatively titled "If Data Wants to Be Free, is Privacy a Prison?".
It was a great discussion, and I enjoyed the opportunity to offer my own provocative question through Twitter. Since the panelists were arguing that it was unethical to infer private facts from public data, I asked if they were trying to establish a new form of