3 Generations of Machine Learning and Data Mining Tools
Three different paradigms available for implementing Machine Learning (ML) algorithms both from the literature and from the open source community.
Impetus Blog, by Dr. Vijay Srinivas Agneeswaran, Feb 05, 2013
Three Generations of Tools for Realizing Machine Learning Algorithms
... I give my view of the three generations of Machine Learning tools available to us today:
The first generation ML tools can facilitate deep analytics as they have a wide set of ML algorithms. However, not all of them can work on large data sets - tera-petabytes of data, due to scalability limitations (they are limited by the non-distributed nature of the tool). In other words, they are vertically scalable (you can increase the processing power of the node in which the tool runs), but not horizontally (not all of them can run on a cluster). No doubt they are addressing those limits by building Hadoop connectors. I am quite sure the traditionalists are up in arms against me on this...
The second generation tools ... provide the ability to scale to large data sets by implementing the algorithms over Hadoop, the open source Map-Reduce implementation. These tools are maturing fast and are open source. ... Mahout has a set of algorithms for clustering and classification, as well as a very good recommendation algorithm. ... Mahout implements only a smaller subset of ML algorithms over Hadoop - only 25 algorithms are production quality, with only 8-9 usable over Hadoop.