Sometimes, the high-level visual GUI of your favorite data mining tool is not enough and you need to code an algorithm or more frequently some data wrangling / cleaning process.
Latest KDnuggets Poll asked "What programming/statistics languages you used for analytics / data mining in the past 12 months?"
On average, KDnuggets readers used 2.5 languages, with R, Python, and SQL being most popular ones, with highest growth in Lisp/Clojure(*), Python, and Unix tools. R is now used by over 50% of data miners. However, Hadoop-based languages were used by only about 7% of voters.
Comparing with 2011 KDnuggets Poll: What languages you used for data mining / data analysis?, the languages with the highest growth were
- Lisp/Clojure, 525% increase, to 4.4% in 2012 (for Lisp/Clojure) from 0.7% in 2011 (*) (for Lisp only, so results not fully comparable)
- Python, 49% increase, to 36.5%, from 24.6%
- Unix shell/awk/sed, 44% increase, to 14.5%, from 10.4%
- R, 16% increase, to 52.5%, from 45.1%.
Most popular language used along with R was Python (and vice versa).
Here are the results:
What programming/statistics languages you used for analytics / data mining in the past 12 months?[579 voters]![]() ![]() | |
R (304 voters in 2012) |
![]() ![]() |
Python (209) |
![]() ![]() |
SQL (186) |
![]() ![]() |
Java (123) |
![]() ![]() |
SAS (114) |
![]() ![]() |
Unix shell/awk/sed (85) |
![]() ![]() |
C/C++ (83) |
![]() ![]() |
MATLAB (76) |
![]() ![]() |
Perl (52) |
![]() ![]() |
Pig, Hive, or other Hadoop-based languages (39) |
![]() ![]() |
GNU Octave (34) |
![]() N/A for 2011 |
Lisp/Clojure (25) |
![]() ![]() |
Ruby (22) |
![]() N/A for 2011 |
Scala (14) |
![]() N/A for 2011 |
Julia (2) |
![]() N/A for 2011 |
Other (66) |
![]() ![]() |
None (4) |
![]() ![]() |
Regional participation was
- US/Canada, 49.6%,
- W. Europe: 24.2%,
- Asia: 8.8%,
- E. Europe: 5.4%,
- Latin America: 5.1%,
- AU/NZ: 3.5%,
- Africa/Middle East: 3.3%
Comments
Gregory PS, Editor, Additional languages: After the poll, I also got suggestions for
- Groovy - a very powerful object oriented language that extends Java while making it much easier to use for analytic programmers that are used to languages like SAS.
- Also, PHP and SQL
ed, Excel
More data analysis is done on excel than all these tools combined--even if it is for quickie look at the data
Gregory PS, Editor:
Excel is great for a quick look at data, but Excel macros are not a good programming language when you need more complex data wrangling
dean h nelson, Languages for Text Analytics
Is there a reason that Microsoft languages VB and VBA have been left off the list?
(Maybe I'm a newbie that has missed something -- i.e. that we may be dealing with unix-viable languages only)
Karl Rexer, SPSS should be on this list also
SPSS has a programming language as well as the GUI, so SPSS should be on this list also.
Gregory PS, Editor, SPSS Macro language
SPSS Clementine also has a macro language, and I have written very large programs in it for microarray data analysis, but in my opinion it is not really a separate language like SAS or R.
Comments
RGPSoftware
What about MDX? I would have thought that MDX would be on the list (used with OLAP cubes)... Maybe it falls into "Other"...