KDnuggets Home » Polls » Languages for analytics/data mining (Aug 2014)

Languages for analytics / data mining / data science


 
  
What programming/statistics languages you used for an analytics / data mining / data science work in 2014?
Language used % voters in 2014 (719 total)
% voters in 2013 (713 total)
% voters in 2012 (579 total)
R (352 voters in 2014) 49.0%
60.9%
52.5%
SAS (262) 36.4%
20.8%
19.7%
Python (252) 35.0%
38.8%
36.1%
SQL (220) 30.6%
36.6%
32.1%
Java (89) 12.4%
16.5%
21.2%
Unix shell/awk/sed (63) 8.8%
11.1%
14.7%
Pig Latin/ Hive/ other Hadoop-based languages (61) 8.5%
8.0%
6.7%
SPSS (58) 8.1%
not asked
not asked
MATLAB (45) 6.3%
12.5%
13.1%
Scala (28) 3.9%
2.2%
2.4%
C/C++ (26) 3.6%
9.3%
14.3%
Julia (21) 2.9%
0.7%
0.3%
Other low-level languages (20) 2.8%
5.9%
11.4%
Perl (19) 2.6%
4.5%
9.0%
GNU Octave (17) 2.4%
5.6%
5.9%
Ruby (9) 1.3%
2.2%
3.8%
Lisp/Clojure (5) 0.7%
1.0%
4.3%
F# (0) 0%
1.7%
not asked in 2012

Notes

Comparing with similar KDnuggets Polls
in 2013: What programming/statistics languages you used for analytics / data mining in 2013, and
2012: What programming/statistics languages you used for analytics / data mining in the past 12 months?

we note several changes and trends.

1. A big increase in SAS user participation in 2014, perhaps partly driven by a change in KDnuggets readers composition, perhaps partly by increased visibility of this poll among SAS users.
SAS voters also had a high percentage of "lone" votes - in 2014, 58% of them said they used only SAS, compared to 26% in 2013. The number of "lone" votes in 2014 was 20.5% for R, 14% for Python, and 4.5% for SQL.

2. Consolidation among top 4 languages: R, SAS, Python, and SQL, and decline in usage of less popular languages for data mining: Java, Unix shell, MATLAB, C/C++, Perl, Octave, Ruby, Lisp, F.

3. Languages with the highest growth in 2014 were

  • Julia, 316% growth, from 0.7% share in 2013 to 2.9% in 2014
  • SAS, 76% growth, from 20.8% in 2013 to 36.4% in 2014
  • Scala, 74% growth, from 2.2% in 2013 to 3.9% in 2014
The languages with the largest decline is share of usage were
  • F#, 100% decline, from 1.7% share in 2013 to zero in 2014
  • C++/C, 60% decline, 9.3% in 2013 to 3.6% in 2014
  • GNU Octave, 57% decline, from 5.6% in 2013 to 2.4% in 2014
  • MATLAB, 50% decline, from 12.5% in 2013 to 6.3% in 2014
  • Ruby, 44% decline, from 2.2% in 2013 to 1.3% in 2014
  • Perl, 41% decline, from 4.5% in 2013 to 2.6% in 2014

Among other programming languages William Dwinnell mentioned Compiled BASIC (PowerBASIC).

Regional participation was

  • US/Canada, 51.6%,
  • Europe: 26.7%,
  • Asia: 13.3%,
  • Latin America: 3.7%,
  • Africa/Middle East: 3.5%
  • AU/NZ: 2.0%
This is similar to 2013, but with more participation from Asia and Africa/Middle East (led by Israel and Turkey), and less from Latin America (main decline from Brazil, perhaps still depressed from the World Cup loss).

KDnuggets Home » Polls » Languages for analytics/data mining (Aug 2014)