What Big Data, Data Science, Deep Learning software goes together?
We analyze the associations between top Data Science tools, Commercial vs Free/Open Source, rank tools on R vs Python bias, find tools more associated with Big Data, those more associated with Deep Learning, and uncover strong regional differences.
Fig. 5: KDnuggets Data Science Software Poll
Deep Learning affinity for top tools
We note that Big Data tools have higher Deep Learning affinity (not surprising, since DL needs a lot of data). We also see languages: C/C++, Scala, Java, and Python with higher DL affinity.
The commercial tool with the highest DL affinity is MATLAB, thanks to its Deep Learning toolbox.
Next we combine Big Data and Deep Learning measures on the same chart.
Fig. 6: KDnuggets Data Science Software Poll
Big Data vs Deep Learning affinity for top tools
We note that there some overall pattern - higher Big Data affinity corresponds to higher Deep Learning affinity. Scala is a big outlier which ranks near the top on both charts, so if you want to be a Deep Big Data Scientist, learn Scala.
Next we compute the R and Python bias by region. Here R% (region) is % of users in that region that use R, and R_bias(region) = log2 (R% (region) / R% (all)), and likewise for Python.
Fig. 7: KDnuggets Data Science Software Poll
Regional bias for R & Python
We note that US/Canada are average, W. Europe has a Python bias, Asia has a strong R bias, E. Europe is much weaker on R, while Latin America is much stronger on R and weaker on Python.
Finally, here is the table with regional distribution. Here the R % is the percentage of users in that region which use R, and similarly for Python, Big Data, and Deep Learning.
|Region||Count||R %||Python%||Big Data %||DL%|
Download anonymized data from www.kdnuggets.com/aps/sw16-ord-reg-n-votes dot csv and let me know what you find!