Data Science - Part 2, Steve Miller, Information Management Blogs, May 3, 2011
I was a bit taken back the first day of O'Reilly's Strata Conference on data science in early February. The crowd was decidedly younger than those I generally encounter at business intelligence conferences. And while the topics of conversation - big data, data integration, statistics and visualization - were similar, the products of focus were very different.
Instead of Oracle and Netezza, data storage buzz was about MapReduce/Hadoop and Cassandra. Instead of Informatica and Kettle, data integration discussion was on languages python and ruby. Instead of OLAP and dashboards, analytics attention was on predictive models and machine learning. And instead of mature project organizations with well-specified roles, it seemed the data science teams were small, with a few jack-of-all-trade individuals handling most of the work.
I wasn't sure at first what to make of my observations. Trained as a statistician, I liked the practical focus of data science on both statistics and data integration. But having spent the last 25 years in decision support and business intelligence, I got the sense that DS was unappreciative of the history of intelligence in business.
It almost seemed that data science suffered from "not invented here" syndrome. So I determined right there and then to further investigate the differences between BI and DS. Fortunately, I was able to get a lot of help from the seminal article, "What is Data Science?," by prolific O'Reilly Unix author and industry expert Mike Loukides.