2016 Silver BlogTop 10 Machine Learning Projects on Github

The top 10 machine learning projects on Github include a number of libraries, frameworks, and education resources. Have a look at the tools others are using, and the resources they are learning from.

5. Pattern

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.


★ 3799, 598

Pattern is a Python-based web mining toolkit coming out of the Computational Linguistics & Psycholinguistics (CLiPS) research center at the University of Antwerp. In this context, it has tools for the tasks of scraping, machine learning, natural language processing, network analysis, and visualization. Pattern can also easily mine data from several well-known web services. The project claims to be well-documented, and to include numerous examples and unit tests.

CLiPS Pattern

6. NuPIC (Numenta Platform for Intelligent Computing)

A brain-inspired machine intelligence platform, and biologically accurate neural network based on cortical learning algorithms.


★ 3647, 987

NuPIC implements the Hierarchical Temporal Memory (HTM) machine learning algorithms. HTM is an attempt to model the computation of the neocortex, and focuses on storing and recalling spatial and temporal patterns. NuPIC is ideally suited to pattern-related anomaly detection.

7. Vowpal Wabbit

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.


★ 2949, 827

Vowpal Wabbit aims for speedy modelling of massive datasets, and supports parallel learning. The project was started at Yahoo! and is currently developed at Microsoft Research. Vowpal Wabbit harnesses out-of-core learning, and has been used to learn a tera-feature dataset in an hour across 1000 compute nodes.

8. aerosolve

A machine learning package built for humans.


★ 2538, 245

aerosolve attempts to be different from other libraries, focusing on human-friendly debugging facilities, Scala code for training, an image content analysis engine for easy image ranking, and a feature transformation language giving users flexibility and control over features. aerosolve implements thrift based feature representation, wherein features are logically-grouped for the purposes of applying transformations to, or facilitating interactions between, entire features groups at once.

aerosolve Reviews

9. GoLearn

Machine Learning for Go.


★ 2334, 215

GoLearn is an actively developed machine learning library for Go. Its goals are to provide a fully-featured, simple-to-use, customizable package for Go developers. GoLearn implements the familiar (to many) fit/predict interface of Scikit-learn, making it easy to swap out estimators, and implements "helper functions" like cross validation and train/test splitting.

10. Code for Machine Learning for Hackers

Code accompanying the book "Machine Learning for Hackers."


★ 2003, 1446

This repo contains the code from the O'Reilly book Machine Learning for Hackers. All repo code is in R, relies on numerous R packages, and topics covered include the all-too common tasks of classification, ranking, and regression, as well as statistical procedures such as principal component analysis and multidimensional scaling.

* Determined by the top returned results to the query "machine learning" on Github search, sorted by most stars, as of December 10, 2015, 1:00PM EST.