KDnuggets Home » Software » Libraries

Libraries and Development Kits for Data Mining


commercial: | free and open-source

  • AC2 (from Isoft), a set of libraries for building data mining solutions on the server side.
  • Analytics1305 Machine Learning Library, with over 15 feature-rich core algorithms, based on our C++ framework.
  • GEPSR, a COM component for integrating Gene Expression Programming into custom applications.
  • ILNumerics, a numerical library for .NET that turns C# into a 1st class mathematical language, with a Matlab-like high-level syntax, high performance, and 2D/3D visualization features. Free Community Edition (GPL).
  • IMSL Numerical Libraries, embeddable mathematical and statistical algorithms written in C, C#/.NET, Java and Fortran that are used in a broad range of data mining applications.
  • Juttle, an open source platform to build analytics and visualizations, with a powerful dataflow programming language, adapters to read from various data backends, and an integrated visualizations library.
  • KnowledgeSTUDIO SDK, offering enterprise application developers access to an extensive API and library of data mining components
  • K.wiz, open Java data mining and knowledge discovery platform providing a full API and extensive range of data mining components. K.wiz Application Enterprise enables building packaged analytical applications using HTML, XML and Javascript.
  • The LPA Data Mining Toolkit provides an embeddable collection of routines which support the discovery of association rules within RDMS.
  • Microsoft OLE DB for Data Mining, SQL Server 2005 Books Online.
  • MLF: machine learning framework for Mathematica, the multi-method system for creating understandable computational models from data
  • NAG Data Mining Components, statistics and machine learning components for data cleaning, transformation and model building -- for creating applications with data mining functionality.
  • Neusciences aXi.DecisionTree and aXi.Kohonen, ActiveX Controls for building a decision tree and Kohonen Clustering. Includes a Delphi interface.
  • PolyAnalyst COM, an SDK offering a large selection of machine learning algorithms as separate COM components for simple integration in external applications.
  • TextAnalyst COM, an SDK for building intelligent applications with semantic analysis, summarization, clustering, categorization, and retrieval of texts and fragments.
  • Waffles, an open source library of machine learning and data mining tools in C++, includes several classifiers, clustering algorithms, dimensionality reduction algorithms, and much more.
  • WizWhy-OCX, ActiveX control includes all the functions of WizWhy rule-finding product
  • XAffinity(tm), ActiveX toolkit (DLL) for association and sequential analysis in SQL databases.
  • XELOPES, an open platform-independent and data-source-independent library for Embedded Data Mining.

free and open-source:

  • Apache Mahout, a suite of machine learning libraries designed to be scalable and robust
  • Data Mining Template Library (DMTL), an open-source collection of generic algorithms and data structures for mining complex patterns, including Itemsets, Sequences, Trees and graphs.
  • dlib C++ library, provides readily usable SVM classification, Kernel based regression, Neural Networks, and Bayesian Network Inference algorithms.
  • Java Data Mining Package (JDMP), an open source Java library for data analysis and machine learning.
  • mloss, machine learning open source software, includes libraries and components.
  • Mulan, an open-source Java library for learning from multi-label data, includes algorithms for classification, ranking, feature selection, and evaluation.
  • Orange, C++ components for data mining,includes preprocessing, modeling and data exploration techniques.
  • Sav Z, Java(TM) API language for developing high-performance mobile object-relational database applications (an improved JDBC). Free download!
  • Scikit learn, machine learning in Python.
  • VFML (Very Fast Machine Learning) library for mining very large databases and data streams. Written in C, it includes highly scalable implementations of several widely used machine learning algorithms and tools for data preparation, testing, and rapid development of stream mining systems.
  • Weka, collection of machine learning algorithms for solving real-world data mining problems (in Java).
  • YCML, includes Machine Learning and optimization algorithms, based on YCMatrix a matrix library, and be used in iOS and OS X applications. On Github (GPLv3).