Data Science Tools – Are Proprietary Vendors Still Relevant?
We examine and quantify the dramatic impact of open source tools like R and Python on SAS, IBM, Microsoft, and other proprietary Data Science vendors. We also investigate how open source tools were faring against each other, which are growing, which are falling, and look R versus Python debate.
By Daniel Chalef, Domino Data Lab.
With the recent publication of Gartner’s Magic Quadrant for Advanced Analytics, we wanted to know how proprietary data science software vendors were faring against open source challengers. We discovered compelling evidence that open source tools have had a dramatic impact on SAS, IBM, Microsoft and others.
We also investigated how open source tools were faring against each other. Which tools have seen the most growth, and which are falling behind? And what of the R versus Python debate?
We used two sources of trend data in this study: Google Trends search data, and tag usage on the StackOverflow Q&A site.
All’s not well in Magic Quadrant land
Within larger companies, the use of SAS, IBM SPSS, and other products is pervasive. Despite this, interest in these products and product suites is waning. This is perhaps unsurprising: Gartner’s customer interviews revealed low satisfaction, implementation challenges, high prices, and concerns about lack of pricing transparency with these tools.
SAS saw a fall in search volume of 26%, and Microsoft Analysis Services a dramatic 46% from 2008 to 2015. Over the same period, IBM SPSS and Cognos lost 29% and 37%, respectively.
Enterprise analytics vendors such as Microsoft and IBM have responded to the changing market landscape with SaaS or cloud-based solutions. We’ve included these in our study: Microsoft with Cortana Analytics and Azure Machine Learning, and IBM with Watson Analytics.
None of these technologies, except for Watson Analytics, have seen enough search traffic for Google Trends reporting.
Open source: knocking it out of the park
Open source analytics tools have seen significant growth in interest over the last 5 years. Many tools see search volumes and growth rates far exceeding those of proprietary vendors. This tells a compelling story about the future of the data science tools market.
For a sense of scale, we have included Apache Hadoop, a popular distributed data processing technology, and SAS in the Google Trends traffic chart. As expected, Hadoop, R and Python saw the highest search volume.