Python leads the 11 top Data Science, Machine Learning platforms: Trends and Analysis
Tags: Anaconda, Apache Spark, Deep Learning, Excel, Keras, Poll, Python, R, RapidMiner, scikit-learn, Software, SQL, TensorFlow
Python continues to lead the top Data Science platforms, but R and RapidMiner hold their share; Almost 50% have used Deep Learning tools; SQL is steady; Consolidation continues.
Here is my initial analysis based on remaining participants, after "lone" voters were removed. More detailed association analysis and anonymized data will be published later.
Top Analytics, Data Science, Machine Learning Software
Fig 1: KDnuggets Analytics/Data Science 2019 Software Poll: top tools in 2019, and their share in the 2017, 2018 polls
Interestingly, we see the same group of top 11 tools (each with at least 20% share) in 2019 as in 2018.
Table 1: Top Analytics/Data Science/ML Software in 2019 KDnuggets Poll
Here 201N % share is % of voters who used this software in year 201N.
The average number of tools per respondent was 6.7, very consistent with 7.0 in 2018 and 6.75 in 2017 Poll.
Here are some observations on 3-year trends for top tools.
Python stayed at the top, with almost the same share (65.8% vs 65.6%) of respondents as in 2018.
RapidMiner kept its share at around 51%, which was a reflection of both a large user base and a successful campaign to motivate its users. I note that RapidMiner is not a current advertiser on KDnuggets.
R language share has declined 2 year in a row, but less this year than in the previous year. Several users commented that RStudio should be included, and we will include it in the next poll.
The shares for Deep Learning platforms Tensorflow and especially Keras have grown each year, reflecting the growing usage of Deep Learning in many applications.
SQL is steady, with a share above 30% for many years. So, if you are an aspiring Data Scientist, learn not only TensorFlow but also SQL - it will likely be useful for many more years.
TrendsIn 2019 we added a number of new entries, and eight of them received at least 25 votes:
- XGBoost, 12.7%
- Apache Kafka, 6.0%
- Google Bigquery, 5.2%
- LightGBM, 3.1%
- fastai library, 2.4%
- Apache Storm 1.9%
- CatBoost, 1.8%
The table below lists the tools were included in KDnuggets Poll in 2018 and have grown 20% or more in share and reached at least 25 voters in 2019.
Table 2: Major Analytics/Data Science/ML Software with the largest increase in usage
|Databricks Unified Analytics Platform||2.6%||1.2%||115%|
|Microsoft other ML/Data Science tools||1.8%||1.3%||35%|
Continuing Consolidation?There were 48 tools with 2% or higher share in 2018, and among them 14 (less than one third) have increased share in 2019, while 34 have decreased their share. This trend which also existed in 2018 suggests continuing consolidation of Data Science / Machine Learning platforms.
Tools that had at least 2% share in 2018 and declined 25% or more in their share in 2019 are in the next table.
Table 3: Major Analytics/Data Science Platform with the largest decline in usage
|IBM DSX/Watson Studio||1.9%||4.5%||-58.3%|
|IBM SPSS Modeler||2.4%||4.9%||-51.2%|
|Microsoft Machine Learning Server||1.2%||2.1%||-41.8%|
|IBM SPSS Statistics||5.3%||8.0%||-33.6%|
Some of the decline may be due to lack of vendor campaign to vote in KDnuggets Poll, and some may reflect decline in popularity of the platform as is probably the case for IBM.
Deep Learning ToolsThe share of users of Deep Learning tools jumped to 49.8% (!!) , from 33% of voters in 2018 and 32% in 2017.
Tensorflow remains the dominant platform, and Keras continue to grow as a very popular wrapper on top of Tensorflow. PyTorch has also significantly increased its share. Share of most of the other Deep Learning tools (except for MXnet) has declined.
Table 3: Major Deep Learning Platforms
|Other Deep Learning Tools||5.6%||4.9%||15.2%|
|Microsoft Cognitive Toolkit||1.6%||3.0%||-45.5%|
Big Data ToolsIn 2019, 37% used Big Data Tools vs 33% in 2018. Apache Spark continues to be ahead of Hadoop and we see the emergence of streaming Big Data platforms, like Apache Storm, Flink, or WSO2 Stream Processor. Table below shows the details, with na indicating this software was not included in 2018 poll.
|Hadoop: Open Source Tools||12.1%||11.0%||10.2%|
|SQL on Hadoop tools||8.4%||10.2%||-17.3%|
|Hadoop: Commercial Tools||4.5%||5.7%||-20.1%|
|WSO2 Stream Processor||0.5%||na||na|
Here are the main programming languages sorted by popularity.
|Other programming and data languages||5.7%||6.9%||-17.1%|
Next page shows regional participation and results for last 3 years.