KDnuggets Home » News » 2016 » Aug » Software » Top Machine Learning Projects for Julia ( 16:n31 )

Top Machine Learning Projects for Julia

Julia is gaining traction as a legitimate alternative programming language for analytics tasks. Learn more about these 5 machine learning related projects.

If you don't know, Julia is "a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments."


Julia is fast, and enjoys support from and integration with the Jupyter notebook environment. Julia can call C directly without a wrapper, integrates top tier open source C and Fortran code into its Base library, and can easily call Python as well. Julia is built for parallel and cloud computing, and has particular interest from the analytics and scientific computing communities. According to KDnuggets' most recent analytics software poll, Julia placed 8th on the list of most used programming languages. This isn't exactly destroying the competition, yet JavaScript did not even make the list and a recent offering of top machine learning projects in that language proved very popular.

I originally come from a general computer science background, moving toward machine learning specialization from there, and so my weapon of choice has always been Python. When I pushed myself to take up another language, Julia became my focus, and so here we are.

The following is a collection of machine learning projects for Julia. The aren't all machine learning libraries; one of the projects is a collection of supporting functionality for implementing machine learning algorithms. The selection of these projects is, unfortunately, not objective; I have continually found that attempting to quantify and rank disparate projects really leads nowhere interesting and diminishes the overall use of such a list. Therefore, the items selected for inclusion are those which I, myself, have decided to use for my adventures in learning Julia.

As always, items are numbered for convenience and amusement. The top 5 machine learning projects, as subjectively selected by me, are as follows (feel free to tweet me with your dissatisfaction of my choices, if you feel it necessary):

1. MLBase.jl

This seems like a good place to start. MLBase is a self-described "Swiss knife for machine learning." MLBase does not implement any machine learning algorithms; it is a collection of supporting tools, such as those for preprocessing, score-based classification, performance evaluation metrics, model tuning, and more.

MLBase's documentation is good, and comes with a number of code examples for each of its tools.

2. ScikitLearn.jl

For those of us coming from Python, this is a potential lifesaver.

Scikit kernels

From its repo:

ScikitLearn.jl implements the popular scikit-learn interface and algorithms in Julia. It supports both models from the Julia ecosystem and those of the scikit-learn library (via PyCall.jl).

ScikitLearn.jl is quick to point out that it is not an official port of scikit-learn; however, the fact that it implements its reassuring interface and combines both Python and Julia models makes it an attractive library.

The project has a great quick start guide as well as a number of fantastic examples as Jupyter notebooks.

3. MachineLearning.jl

And now, machine learning algorithms in Julia itself. MachineLearning.jl has not had a commit in a year; however, given that it aims to be a general purpose machine learning library for Julia, with a number of algorithms and support tools, it's a good stopover for those exploring machine learning in the language.

Project goals, from its repo:

Initially, the package will be targeted towards the machine learning practitioner, working with a dataset that fits in memory on a single machine. Longer term, I hope this will both target much larger datasets and be valuable for state of the art machine learning research as well.

The library currently includes the following algorithms: decision tree classifier, random forest classifier, basic neural network, and Bayesian additive regression trees. It also includes functionality for splitting datasets into training and testing sets and performing cross-validation.

It may not be clear what the future holds for MachineLearning.jl, but the project provides some basic functionality for experimentation as well as an environment for picking up machine learning in Julia.

4. Mocha.jl


Mocha ticks an awful lot of the boxes you would expect of a modern deep learning library. From its repo:

Mocha is a Deep Learning framework for Julia, inspired by the C++ framework Caffe. Efficient implementations of general stochastic gradient solvers and common layers in Mocha could be used to train deep / shallow (convolutional) neural networks, with (optional) unsupervised pre-training via (stacked) auto-encoders.

Mocha's documentation has a collection of tutorials and a thorough user guide; Mocha is definitely not an under-documented deep learning project.

5. TextAnalysis.jl

TextAnalysis.jl is an actively-developed Julia library for text analysis. It provides functionality for the preprocessing of documents, corpus creation, document term matrices, TF-IDF, Latent Semantic Analysis, Latent Dirichlet Allocation, and more. It seems to be the place to start if you are interested in text analytics using Julia.

If you are interested in learning Julia, this is a good place to start.

If you want to find additional algorithm-specific machine learning projects for Julia, look here.