R is now used by over 50% of data miners. R, Python, and SQL were the most popular programming languages. Python, Lisp/Clojure, and Unix tools showest the highest growth in 2012, while Java and MATLAB slightly declined in popularity.
Sometimes, the high-level visual GUI of your favorite data mining tool is not enough and you need to code an algorithm or more frequently some data wrangling / cleaning process.
Latest KDnuggets Poll asked
"What programming/statistics languages you used for analytics / data mining in the past 12 months?"
On average, KDnuggets readers used 2.5 languages, with R, Python, and SQL being most popular ones, with highest growth in Lisp/Clojure(*), Python, and Unix tools. R is now used by over 50% of data miners. However, Hadoop-based languages were used by only about 7% of voters.
Comparing with
2011 KDnuggets Poll: What languages you used for data mining / data analysis?, the languages with the highest growth were
- Lisp/Clojure, 525% increase, to 4.4% in 2012 (for Lisp/Clojure) from 0.7% in 2011 (*) (for Lisp only, so results not fully comparable)
- Python, 49% increase, to 36.5%, from 24.6%
- Unix shell/awk/sed, 44% increase, to 14.5%, from 10.4%
- R, 16% increase, to 52.5%, from 45.1%.
The languages with the declining number of users were Java (down 12%) and MATLAB (down 10%).
Most popular language used along with R was Python (and vice versa).
Here are the results:
What programming/statistics languages you used for analytics / data mining in the past 12 months?[579 voters]
% users in 2012
% users in 2011
|
| R (304 voters in 2012) |
52.5%
45.1% |
| Python (209) |
36.1%
24.6% |
| SQL (186) |
32.1%
32.3% |
| Java (123) |
21.2%
24.4% |
| SAS (114) |
19.7%
21.2% |
| Unix shell/awk/sed (85) |
14.7%
10.4% |
| C/C++ (83) |
14.3%
12.8% |
| MATLAB (76) |
13.1%
14.6% |
| Perl (52) |
9.0%
7.9% |
| Pig, Hive, or other Hadoop-based languages (39) |
6.7%
6.1% |
| GNU Octave (34) |
5.9%
N/A for 2011 |
| Lisp/Clojure (25) |
4.4%
0.7% (Lisp only) |
| Ruby (22) |
3.8%
N/A for 2011 |
| Scala (14) |
2.4%
N/A for 2011 |
| Julia (2) |
0.3%
N/A for 2011 |
| Other (66) |
11.6%
12.3% |
| None (4) |
0.7%
1.2% |
Regional participation was
- US/Canada, 49.6%,
- W. Europe: 24.2%,
- Asia: 8.8%,
- E. Europe: 5.4%,
- Latin America: 5.1%,
- AU/NZ: 3.5%,
- Africa/Middle East: 3.3%
Comments
Gregory PS, Editor, Additional languages:
After the poll, I also got suggestions for
- Groovy - a very powerful object oriented language that extends Java while making it much easier to use for analytic programmers that are used to languages like SAS.
- Also, PHP and SQL
ed, Excel
More data analysis is done on excel than all these tools combined--even if it is for quickie look at the data
Gregory PS, Editor:
Excel is great for a quick look at data, but Excel macros are not a good programming language when you need more complex data wrangling
dean h nelson, Languages for Text Analytics
Is there a reason that Microsoft languages VB and VBA have been left off the list?
(Maybe I'm a newbie that has missed something -- i.e. that we may be dealing with unix-viable languages only)
Karl Rexer, SPSS should be on this list also
SPSS has a programming language as well as the GUI, so SPSS should be on this list also.
Gregory PS, Editor, SPSS Macro language
SPSS Clementine also has a macro language, and I have written very large programs in it for microarray data analysis, but in my opinion it is not really a separate language like SAS or R.
Comments
RGPSoftware
What about MDX? I would have thought that MDX would be on the list (used with OLAP cubes)... Maybe it falls into "Other"...
|