Mikut Data Mining Tools Big List – Update
An update of the Excel table describing 325 recent and historical data mining tools is now online (Excel format), 31 of them were added since the last update in November 2012. These new updated tools include new published tools and some well-established tools with a statistical background.
Here is the full updated table of tools, (XLS format) which contains additional material to the paper
R. Mikut, M. Reischl: "Data Mining Tools". Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. DOI: 10.1002/widm.24., September/October 2011, Vol. 1
Please help the authors to improve this Excel table:
Contact: ralf.mikut@kit.edu
Here are parts of the table with the active tools:
License code: CO - commercial, OS - open source.
Data Mining Systems:
| Tool | Company | License | Remarks |
|---|---|---|---|
| 11 Ants | 11Ants Analytics | CO | family of data mining tools with a focus on business applications |
| ADAPA | Zementis Inc. | CO | develops the ADAPA decision engine which is a framework to deploy, integrate, and execute predictive models in PMML, add-ins for Excel, IBM cloud solution (Software as a Service - SaaS) |
| Coheris SPAD Data Mining | Coheris | CO | company provides also solutions for text mining, former company SPAD |
| D2K - Data to Knowledge | U. of Illinois | CO/OS | additional tools for EA and text mining, tool I2K for images under development, free academic version, see Alcala09, no developments since 2004 |
| Data Applied | Data Applied | CO | web service for Data Analysis, SAAS |
| DataDetective | Sentient | CO | with tools for fuzzy matching, applications on CRM, crime analysis, fraud detection |
| GhostMiner | FQS Poland / Fujitsu | CO | multi model support |
| IBM SPSS Modeler | IBM | CO | former Clementine, now in cooperation with IBM, Predictive Analytics Software (PASW), SPSS is an IBM company since 2009 |
| InfiniteInsight | KXEN | CO | (Knowledge eXtraction ENgines) providing predictive software tools (based on Vapnik Learning Theory) to application providers and system integrators |
| JMP | SAS Institute | CO | free trial, additional special tools for genomics |
| KnowledgeStudio | ANGOSS Software | CO | PMML support and code generation |
| Model Builder | FICO | CO | company's former name Fair Isaac Corporation |
| Oracle Data Mining (ODM) | Oracle | CO | provides GUI, PL/SQL-interface, and Java-interface to Attribute Importance, Bayes Classification, Association Rules, Clustering, SVM |
| Partek Discovery Suite | Partek Incorporated | CO | additional special solutions for genomics, free demos |
| PolyAnalyst | Megaputer | CO | from Goebel99, support for text mining |
| Predixion Enterprise Insight | Predixion Software | CO | data mining suite with a focus to standard worksflows, big data support, cloud options, OEM options possible |
| RapidAnalytics | Rapid-I GmbH | CO/OS | server built on top of RapidMiner, focussed on client-server solutions, user and user rights management, web interfaces, web services, process scheduler, reports, dashboards; collaborative access for teams and companies with many users |
| RapidMiner | Rapid-I GmbH | OS | formerly YALE, more than 1000 algorithms and operators for data mining, text mining, web mining, time series analysis and forecasting, audio mining, image mining, predictive analytics, ETL, reporting, integrates Weka and R and Hadoop (Radoop), repository under sourceforge.net/projects/rapidminer/ |
| Revolution R Enterprise | Revolution Analytics | OS/CO | based on open source software R with many additional tools for big data (e.g. Hadoop support) and database coupling, some commercial parts also free for academic use |
| Salford Predictive Modeling Suite (SPM) | Salford Systems | CO | includes former separate tools CART, MARS, TreeNet, Random Forests |
| SAS Enterprise Miner | SAS Institute | CO | one of the world's leading tools, enterprise oriented |
| SQL Server Analysis Service | Microsoft | CO | special coupling to SAP software |
| Stata | StataCorp LP | CO | actually coming from statistics, many methods included |
| STATISTICA | StatSoft | CO | additional tools for text mining |
| Think Enterprise Data Miner (EDM) | thinkAnalytics | CO | massively scalable, embeddable, Java-based real-time data-mining platform, former name K.wiz |
| TIBCO Spotfire Miner | TIBCO | CO | coupling to S-Plus, R |
| scikit learn | various | OS | Python-based collection of data mining tools |
| WEKA | U. of Waikato | OS | most well-known software, integrated in many other tools, different extensions, e.g. for human genetics WEKA-CG |
Libraries for Data Mining
| Name | Company | License | Remarks |
|---|---|---|---|
| Fast Artificial Neural Network Library (FANN) | various | OS | multilayer artificial neural networks in C |
| JAVA Data Mining Package | various | OS | JAVA based, alpha version, no update since 2009 |
| Julia | various | OS | open source language for technical computing, yet under development (started in 2012), includes some data mining libraries (as e.g. decision trees, clustering, LIBSVM), aims at fast analysis for big data, parallel processing etc. |
| LibSVM | National Taiwan University | OS | for support vector classification and regression, C++, JAVA-based |
| MLC++ | Silicon Graphics, U. of Stanford | OS | C++ library for supervised learning, included in SGI's MineSet |
| NAG Data Mining Components | Numerical Algorithms Group Ltd (NAG) | CO | components in C++ |
| Neurofusion | Alyuda Research | CO | is a general-purpose ANN C++ library that can be used to create, train and apply constructive neural networks for solving both regression and classification problems |
| OpenNN | various | OS | open ANN library, multilayer perceptron neural network in the C++, former name Flood |
| OpenPR | various | OS | library for image processing, pattern reognition, computer vision and natural language processing, based on C++, Scilab support |
| Orange | U. Ljubljana | OS | Python scripts, extensions for text mining and bioinformatics, see Chen07, Alcala09 |
| ROOT | Cern | OS | C++ support, LPGL license, general parallel processing framework |
| SMILE | U. of Pittsburgh | OS | specialized to Bayesian Networks, developed since 1998 |
| Waffles | various | OS | C++ library, additional command line functionality, some exotic methods |
| XELOPES Library | Prudsys | CO/OS | in Java, C++, different license models, PMML support |
| WEKA | U. of Waikato | OS | most well-known software, integrated in many other tools, different extensions, e.g. for human genetics WEKA-CG |
Get the full table at
sourceforge.net/projects/gait-cad/files/wiley_irdmkd_data_mining_tools/tools.xls/download
The color code for Excel tools table is:
- green: active and relevant tools
- yellow: less active and/or less relevant tools
- red: historical tools or not yet available tools