KDnuggets Home :: News » 2012 » Aug » Poll results: Top languages for analytics/data mining programming  (  12:n18 | Next > )
Latest News


Poll results: Top languages for analytics/data mining programming

          

R is now used by over 50% of data miners. R, Python, and SQL were the most popular programming languages. Python, Lisp/Clojure, and Unix tools showest the highest growth in 2012, while Java and MATLAB slightly declined in popularity.

Sometimes, the high-level visual GUI of your favorite data mining tool is not enough and you need to code an algorithm or more frequently some data wrangling / cleaning process.

Latest KDnuggets Poll asked "What programming/statistics languages you used for analytics / data mining in the past 12 months?"

On average, KDnuggets readers used 2.5 languages, with R, Python, and SQL being most popular ones, with highest growth in Lisp/Clojure(*), Python, and Unix tools. R is now used by over 50% of data miners. However, Hadoop-based languages were used by only about 7% of voters.

Comparing with 2011 KDnuggets Poll: What languages you used for data mining / data analysis?, the languages with the highest growth were

  1. Lisp/Clojure, 525% increase, to 4.4% in 2012 (for Lisp/Clojure) from 0.7% in 2011 (*) (for Lisp only, so results not fully comparable)
  2. Python, 49% increase, to 36.5%, from 24.6%
  3. Unix shell/awk/sed, 44% increase, to 14.5%, from 10.4%
  4. R, 16% increase, to 52.5%, from 45.1%.
The languages with the declining number of users were Java (down 12%) and MATLAB (down 10%).

Most popular language used along with R was Python (and vice versa).

Here are the results:

What programming/statistics languages you used for analytics / data mining in the past 12 months?[579 voters]

% users in 2012   % users in 2011
R (304 voters in 2012) 52.5%
45.1%
Python (209) 36.1%
24.6%
SQL (186) 32.1%
32.3%
Java (123) 21.2%
24.4%
SAS (114) 19.7%
21.2%
Unix shell/awk/sed (85) 14.7%
10.4%
C/C++ (83) 14.3%
12.8%
MATLAB (76) 13.1%
14.6%
Perl (52) 9.0%
7.9%
Pig, Hive, or other Hadoop-based languages (39) 6.7%
6.1%
GNU Octave (34) 5.9%
N/A for 2011
Lisp/Clojure (25) 4.4%
0.7% (Lisp only)
Ruby (22) 3.8%
N/A for 2011
Scala (14) 2.4%
N/A for 2011
Julia (2) 0.3%
N/A for 2011
Other (66) 11.6%
12.3%
None (4) 0.7%
1.2%

Regional participation was

  • US/Canada, 49.6%,
  • W. Europe: 24.2%,
  • Asia: 8.8%,
  • E. Europe: 5.4%,
  • Latin America: 5.1%,
  • AU/NZ: 3.5%,
  • Africa/Middle East: 3.3%

Comments

Gregory PS, Editor, Additional languages: After the poll, I also got suggestions for

  • Groovy - a very powerful object oriented language that extends Java while making it much easier to use for analytic programmers that are used to languages like SAS.
  • Also, PHP and SQL

ed, Excel
More data analysis is done on excel than all these tools combined--even if it is for quickie look at the data
Gregory PS, Editor:
Excel is great for a quick look at data, but Excel macros are not a good programming language when you need more complex data wrangling

dean h nelson, Languages for Text Analytics
Is there a reason that Microsoft languages VB and VBA have been left off the list? (Maybe I'm a newbie that has missed something -- i.e. that we may be dealing with unix-viable languages only)

Karl Rexer, SPSS should be on this list also
SPSS has a programming language as well as the GUI, so SPSS should be on this list also.
Gregory PS, Editor, SPSS Macro language
SPSS Clementine also has a macro language, and I have written very large programs in it for microarray data analysis, but in my opinion it is not really a separate language like SAS or R.

Comments
RGPSoftware
What about MDX? I would have thought that MDX would be on the list (used with OLAP cubes)... Maybe it falls into "Other"...


KDnuggets Home :: News » 2012 » Aug » Poll results: Top languages for analytics/data mining programming  (  12:n18 | Next > )

Copyright © 2012 KDnuggets.  | SUBSCRIBE to KDnuggets News email  | Tweet Twitter | facebook Facebook | RSS RSS | About KDnuggets