Gold BlogGold BlogPython leads the 11 top Data Science, Machine Learning platforms: Trends and Analysis

Python continues to lead the top Data Science platforms, but R and RapidMiner hold their share; Almost 50% have used Deep Learning tools; SQL is steady; Consolidation continues.



The 20th annual KDnuggets Software Poll had over 1,800 participants. The average voter chose 6.1 different tools, so voters with just one choice stood out. We removed about 180 such "lone" votes (2/3 were from one vendor), because even if they represented legitimate users of that tool, their experience was not representative of what Data Scientists do in 2019.

Here is my initial analysis based on remaining participants, after "lone" voters were removed. More detailed association analysis and anonymized data will be published later.

Top Analytics, Data Science, Machine Learning Software



Top Analytics Data Science Machine Learning Software 2019, 3 yrs
Fig 1: KDnuggets Analytics/Data Science 2019 Software Poll: top tools in 2019, and their share in the 2017, 2018 polls

Interestingly, we see the same group of top 11 tools (each with at least 20% share) in 2019 as in 2018.

Table 1: Top Analytics/Data Science/ML Software in 2019 KDnuggets Poll
Software2019
% share
2018
% share
2017
% share
Python65.8%65.6%59.0%
RapidMiner51.2%52.7%31.9%
R Language46.6%48.5%56.6%
Excel34.8%39.1%31.5%
Anaconda33.9%33.4%24.3%
SQL Language32.8%39.6%39.2%
Tensorflow31.7%29.9%22.7%
Keras26.6%22.2%10.7%
scikit-learn25.5%24.4%21.9%
Tableau22.1%26.4%21.8%
Apache Spark21.0%21.5%25.5%



Here 201N % share is % of voters who used this software in year 201N.

The average number of tools per respondent was 6.7, very consistent with 7.0 in 2018 and 6.75 in 2017 Poll.

Here are some observations on 3-year trends for top tools.

Python stayed at the top, with almost the same share (65.8% vs 65.6%) of respondents as in 2018.

RapidMiner kept its share at around 51%, which was a reflection of both a large user base and a successful campaign to motivate its users. I note that RapidMiner is not a current advertiser on KDnuggets.

R language share has declined 2 year in a row, but less this year than in the previous year. Several users commented that RStudio should be included, and we will include it in the next poll.

The shares for Deep Learning platforms Tensorflow and especially Keras have grown each year, reflecting the growing usage of Deep Learning in many applications.

SQL is steady, with a share above 30% for many years. So, if you are an aspiring Data Scientist, learn not only TensorFlow but also SQL - it will likely be useful for many more years.

Trends

In 2019 we added a number of new entries, and eight of them received at least 25 votes:
  • XGBoost, 12.7%
  • Javascript, 6.8%
  • Apache Kafka, 6.0%
  • Google Bigquery, 5.2%
  • LightGBM, 3.1%
  • fastai library, 2.4%
  • Apache Storm 1.9%
  • CatBoost, 1.8%


The table below lists the tools were included in KDnuggets Poll in 2018 and have grown 20% or more in share and reached at least 25 voters in 2019.

Table 2: Major Analytics/Data Science/ML Software with the largest increase in usage
Software2019
% share
2018
% share
% change
BigML2.6%0.9%199%
Julia1.7%0.7%150%
Databricks Unified Analytics Platform2.6%1.2%115%
PyTorch11.3%6.4%76%
Microsoft other ML/Data Science tools1.8%1.3%35%


Continuing Consolidation?

There were 48 tools with 2% or higher share in 2018, and among them 14 (less than one third) have increased share in 2019, while 34 have decreased their share. This trend which also existed in 2018 suggests continuing consolidation of Data Science / Machine Learning platforms.

Tools that had at least 2% share in 2018 and declined 25% or more in their share in 2019 are in the next table.

Table 3: Major Analytics/Data Science Platform with the largest decline in usage
Platform2019
% share
2018
% share
% change
Dataiku2.0%6.3%-68.2%
TIBCO Spotfire1.2%3.1%-62.2%
IBM DSX/Watson Studio1.9%4.5%-58.3%
IBM SPSS Modeler2.4%4.9%-51.2%
Microsoft Machine Learning Server1.2%2.1%-41.8%
Weka6.7%11.4%-41.4%
MATLAB6.1%9.3%-34.5%
IBM SPSS Statistics5.3%8.0%-33.6%


Some of the decline may be due to lack of vendor campaign to vote in KDnuggets Poll, and some may reflect decline in popularity of the platform as is probably the case for IBM.

Deep Learning Tools

The share of users of Deep Learning tools jumped to 49.8% (!!) , from 33% of voters in 2018 and 32% in 2017.

Tensorflow remains the dominant platform, and Keras continue to grow as a very popular wrapper on top of Tensorflow. PyTorch has also significantly increased its share. Share of most of the other Deep Learning tools (except for MXnet) has declined.

Table 3: Major Deep Learning Platforms
Platform2019
% share
2018
% share
% change
Tensorflow31.7%29.9%5.8%
Keras26.6%22.2%19.7%
PyTorch11.3%6.4%75.5%
Other Deep Learning Tools5.6%4.9%15.2%
DeepLearning4J2.5%3.4%-25.6%
Apache MXnet1.7%1.5%13.1%
Microsoft Cognitive Toolkit1.6%3.0%-45.5%
Theano1.6%4.9%-67.4%
Torch0.9%1.0%-6.1%
TFLearn0.7%1.1%-34.7%
Caffe0.6%1.5%-58.3%


Big Data Tools

In 2019, 37% used Big Data Tools vs 33% in 2018. Apache Spark continues to be ahead of Hadoop and we see the emergence of streaming Big Data platforms, like Apache Storm, Flink, or WSO2 Stream Processor. Table below shows the details, with na indicating this software was not included in 2018 poll.

Platform2019
% share
2018
% share
% change
Apache Spark21.0%21.5%-2.3%
Hadoop: Open Source Tools12.1%11.0%10.2%
SQL on Hadoop tools8.4%10.2%-17.3%
Apache Kafka6.0%nana
Google Bigquery5.2%nana
Hadoop: Commercial Tools4.5%5.7%-20.1%
Apache Storm1.9%nana
Flink0.8%nana
WSO2 Stream Processor0.5%nana


Programming Languages

Python and R continue to dominate. The new entry this year was Javascript, which got a respectable 6.8% share. Julia share has increased, while most other languages have declined.

Here are the main programming languages sorted by popularity.

Platform2019
% share
2018
% share
% change
Python65.8%65.6%0.2%
R Language46.6%48.5%-4.0%
SQL Language32.8%39.6%-17.2%
Java12.4%15.1%-17.7%
Unix shell/awk7.9%9.2%-13.4%
C/C++7.1%6.8%3.7%
Javascript6.8%nana
Other programming and data languages5.7%6.9%-17.1%
Scala3.5%5.9%-41.0%
Julia1.7%0.7%150.4%
Perl1.3%1.0%25.2%
Lisp0.4%0.3%46.1%


Next page shows regional participation and results for last 3 years.