2016 Silver BlogTop 10 Machine Learning Projects on Github

The top 10 machine learning projects on Github include a number of libraries, frameworks, and education resources. Have a look at the tools others are using, and the resources they are learning from.

Open source software is an important piece of the data science puzzle.

According to the most recent KDnuggets data science software poll results, 73% of data scientists used free software in the previous 12 months. While there are many sources of such tools on the internet, Github has become a de facto clearinghouse for all types of open source software, including tools used in the data science community. The importance, and central position, of machine learning to the field of data science does not need to be pointed out.

The following is an overview of the top 10 machine learning projects on Github.*

Github Machine Learning Repo Stars vs Forks

1. Scikit-learn

Machine learning in Python.


★ 8641, 5125

The top project is, unsurprisingly, the go-to machine learning library for Pythonistas the world over, from industry to academia. Scikit-learn leverages the Python scientific computing stack, built on NumPy, SciPy, and matplotlib. As general purpose a toolkit as there could be, Scikit-learn contains classification, regression, and clustering algorithms, as well as data-preparation and model-evaluation tools.


2. Awesome Machine Learning

A curated list of awesome Machine Learning frameworks, libraries and software.


★ 8404 , 1885

This is a curated list of machine learning libraries, frameworks, and software. The list is categorized by language, and further by machine learning category (general purpose, computer vision, natural language processing, etc.). It also includes data visualization tools, which opens it up as more of a generalized data science list in some sense... which is a good thing.

3. PredictionIO

PredictionIO, a machine learning server for developers and ML engineers. Built on Apache Spark, HBase and Spray.


★ 8145, 1002

PredictionIO is a general purpose framework. It includes several template engines for well-known tasks, such as classification and recommendation, which can be customized, connects to existing applications with REST APIs or SDKs, and includes supports for Spark MLib. Since it is built on top of Spark and utilizes its ecosystem, it should come as no surprise that PredictionIO is developed mainly in Scala.


4. Dive Into Machine Learning

Dive into Machine Learning with Python Jupyter notebook and scikit-learn.


★ 4326, 342

This is a collection of IPython notebook tutorials for scikit-learn, as well as a number of links to related Python-specific and general machine learning topics, and more general data science information. The author isn't greedy either; they are quick to point out many other tutorials covering similar ground, in case this one doesn't tickle your fancy. The repo has no no software, but if you're new to Python machine learning, it may be worth checking out.