KDnuggets Home » Polls » Tools / Languages for Data Cleaning (Sep 2008)

What tools/languages you typically use for data manipulation/cleaning


 
  
Poll
What tools/languages you typically use for data manipulation/cleaning [219 voters total]

SQL / database system (89) 40.6%
Excel (64) 29.2%
SAS (57) 26.0%
R (36) 16.4%
non-SAS data mining tool (35) 16.0%
Java (32) 14.6%
Perl (30) 13.7%
Python (24) 11.0%
Other (18) 8.2%
MATLAB (17) 7.8%
shell/awk/gawk (14) 6.4%
C++/C# (14) 6.4%
Other statistical languages (11) 5.0%
C (11) 5.0%
Other scripting languages (8) 3.7%
Other compiled languages (8) 3.7%
S-Plus (3) 1.4%


Comments

Karl Rexer, Other potential response options?
It seems to me that it would be good to list other data mining and statistics tools as potential response options. Most notably absent is SPSS and Clementine. The Rexer Analytics data miner survey showed them to be two of the most commonly used data mining tools. Other KDnuggets polls have also shown that many readers use these tools.

Additionally, I see that the most frequent response is currently "SQL/database system". Future polls on this topic may want to break this into 3 parts: 1) SQL, 2) Oracle, and 3) other database systems.


Estevam R. Hruschka Jr., Alchemy plataform
I used to build my own JAVA to this task. Once I've started using Alchemy software (http://alchemy.cs.washington.edu/), however, I'm really surprisingly happy with the potential of this tool. The idea of having the benefits of a graphical model in addition to the use of a first order logic system are really impressing.


Tim Manns, DO you mean actually used or what runs?
I find the question a little ambiguous because you could use a tool that auto-generates a code. For example, I use SPSS Clementine to do my data preparation and cleaning, but in the background this is automatically converted into SQL and run entirely in the data warehouse. I don't often write the raw SQL, but it is what performs the data manipulation. Should I simply respond with "SQL"


Editor: The reason SAS is mentioned and SPSS is not is that SAS is also a language that is frequently used for data processing and cleaning, while SPSS Clementine does not really have a separate language useful for data prep.

KDnuggets Home » Polls » Tools / Languages for Data Cleaning (Sep 2008)