KDnuggets Home » Polls » Operating System and Environment for Data Mining

Poll: Operating System and Environment for Data Mining


What operating systems / environments you frequently use for your data mining / analytics work? (Choose up to 4) [192 voters]

Windows XP (127) 66.1%
Linux (70) 36.5%
Windows Vista (27) 14.1%
MacOS (22) 11.5%
Windows Other (18) 9.4%
Unix other (15) 7.8%
Solaris (8) 4.2%
GNU Cygwin (7) 3.6%
Other (2) 1.0%

Tim Manns, OS or architecture?
The question is about operating system but I am thinking of architecture.

I use the SPSS Clementine tool on an average spec Windows laptop, connect to the Clementine Server engine on a 4 x dual core Windows Server, which connects to a monster 200 amp (cluster) Teradata warehouse system.

The data mining I do using a UI desktop interface is automatically compiled for processing in an internal command code and sent to the Windows server, and some or all of this may be further converted into optimised SQL and forwarded to the data warehouse.

This three tier architecture requires a Windows desktop, many OS for the server, and many OS for the database or data file system. Some users skip the middle server component.

In my case the data warehouse usually does all the processing as SQL. Occiasionally the server box does some sampling and model building. The desktop rarely even 'sees' the data (corporate security concerns are part of this).

I often work from home, connecting my laptop through the internet to our corporate network. My 'queries' are compiled and processed on the data warehouse.

- disclosure: I left SPSS two years ago...

Sam Steingold, CPU? RAM?
An interesting question is whether the CPU is 64 or 32 bit and how much RAM the machine has.
Also, I wonder how many respondents put in their desktop OS instead of the server on which the actual data crunching happens, in case of, e.g., using an RDBS.

Frank Xavier, Operating system only minor concern
For most state-of-the-art data mining solutions the underlying operating system is a minor issue. Java-based data mining software like the open source data mining suite RapidMiner run on any major operating system as long as Java is available:
Windows Vista, Windows XP, MacOS, Linux, Solaris, AIX, HP UX, other UNIX systems, etc.

Issues that have a higher relevance with regard to the underlying system (hard- and software) are e.g. the available amount of main memory and how it can be addressed (e.g. 32bit vs. 64bit hardware, operating system, and data mining software) and how much these data mining solutions support parallel data mining. Here the RapidMiner Enterprise Edition offers 64bit and multi-core support. The same probably also applies to the server versions SAS and SPSS, but are often a problem with desktop-based data mining tools like SPSS Clementine.

KDnuggets Home » Polls » Operating System and Environment for Data Mining