What Big Data, Data Science, Deep Learning software goes together?


We analyze the associations between top Data Science tools, Commercial vs Free/Open Source, rank tools on R vs Python bias, find tools more associated with Big Data, those more associated with Deep Learning, and uncover strong regional differences.



Poll: Data Science 2016: Top Tools Deep Learning affinity
Fig. 5: KDnuggets Data Science Software Poll
Deep Learning affinity for top tools


We note that Big Data tools have higher Deep Learning affinity (not surprising, since DL needs a lot of data). We also see languages: C/C++, Scala, Java, and Python with higher DL affinity.

The commercial tool with the highest DL affinity is MATLAB, thanks to its Deep Learning toolbox.

Next we combine Big Data and Deep Learning measures on the same chart.

Poll: Data Science 2016: Top Tools Deep Learning affinity
Fig. 6: KDnuggets Data Science Software Poll
Big Data vs Deep Learning affinity for top tools


We note that there some overall pattern - higher Big Data affinity corresponds to higher Deep Learning affinity. Scala is a big outlier which ranks near the top on both charts, so if you want to be a Deep Big Data Scientist, learn Scala.

Next we compute the R and Python bias by region. Here R% (region) is % of users in that region that use R, and R_bias(region) = log2 (R% (region) / R% (all)), and likewise for Python.

Poll: Data Science 2016: R and Python bias by region
Fig. 7: KDnuggets Data Science Software Poll
Regional bias for R & Python


We note that US/Canada are average, W. Europe has a Python bias, Asia has a strong R bias, E. Europe is much weaker on R, while Latin America is much stronger on R and weaker on Python.

Finally, here is the table with regional distribution. Here the R % is the percentage of users in that region which use R, and similarly for Python, Big Data, and Deep Learning.

RegionCountR %Python%Big Data %DL%
US/Canada116449.2%46.2%36.4%14.6%
W. Europe90347.7%48.0%40.3%18.4%
Asia27357.5%44.3%49.8%22.7%
E. Europe24040.4%45.0%39.6%18.3%
Latin America16953.8%37.3%32.5%21.9%
Africa/MidEast8343.4%34.9%44.6%26.5%
Australia/NZ6354.0%52.4%33.3%14.3%
All289549.0%45.8%39.1%17.6%


Download anonymized data from www.kdnuggets.com/aps/sw16-ord-reg-n-votes dot csv and let me know what you find!