Choosing an Open Source Machine Learning Library: TensorFlow, Theano, Torch, scikit-learn, Caffe

Open Source is the heart of innovation and rapid evolution of technologies, these days. Here we discuss how to choose open source machine learning tools for different use cases.

Torch: Facebook-Backed Framework Powered by Lua Scripting Language

Torch is often called the easiest deep learning tool for beginners. It has a simple scripting language, Lua, and a helpful community sharing an impressive array of tutorials and packages for almost any deep learning purpose. Despite using a less common language than Python, it’s widely adopted – Facebook, Google, and Twitter are known for using it in their AI projects.

Datasets and models

You can find a list of popular datasets to be loaded for use in Torch on its GitHub cheatsheet page. Moreover, Facebook released an official code for Deep Residual Networks (ResNets) implementation with pre-trained models with instructions for fine-tuning your own datasets.

Audience and learning curve

Regardless of the differences and similarities, the choice will always come down to the language. The market population of experienced Lua engineers will always be smaller than that of Python. However, Lua is significantly easier to read, which is reflected in the simple syntax of Torch. The active Torch contributors swear by Lua, so it’s a framework of choice for both novices and those wishing to expand their toolset.

Use cases

Facebook used Torch to create DeepText, a tool categorizing minute-by-minute text posts shared on the site  and providing a more personalized content targeting. Twitter has been able to recommend posts based on algorithmic timeline (instead of reverse chronological order) with the help of Torch.

scikit-learn: Accessible and Robust Framework from the Python Ecosystem

In November 2016, scikit-learn became a number one open source machine learning project for Python, according to KDNuggets.

scikit-learn is a high level framework designed for supervised and unsupervised machine learning algorithms. Being one of the components of the Python scientific ecosystem, it’s built on top of NumPy and SciPy libraries, each responsible for lower-level data science tasks. While NumPy sits on Python and deals with numerical computing, the SciPy library covers more specific numerical routines such as optimization and interpolation. Subsequently, scikit-learn was built precisely for machine learning. The relationship between the three along with other tools in the Python ecosystem reflects different levels in the data science field: The higher you go, the more specific the problems you can solve.

The Python NumPy-based ecosystem includes tools for array-oriented computing

Datasets and models

The library already includes a few standard datasets for classification and regression despite their being too small to represent real-life situations. However, the diabetes dataset for measuring disease progression or the iris plants dataset for pattern recognition are good for illustrating how machine learning algorithms in scikit behave. Moreover, the library provides information about loading datasets from external sources, includes sample generators for tasks like multiclass classification and decomposition, and offers recommendations about popular datasets usage.

Audience and learning curve

Despite being a robust library, scikit-learn focuses on ease of use and documentation. Considering its simplicity and numerous well-described examples, it’s an accessible tool for non-experts and neophyte engineers, enabling quick application of machine learning algorithms to data. According to testimonials by software shops AWeber and Yhat, scikit is well-suited for production characterized by limited time and human resources.

Use cases

scikit-learn has been adopted by a plethora of successful brands like Spotify, Evernote, e-commerce giant Birchbox, and, for product recommendations and customer service. However, you don’t have to be an expert to explore data science with the library. Thus, a technology and engineering school Télécom ParisTech uses the library for its machine learning courses to allow students to quickly solve interesting problems.

Caffe/Caffe2: Easy to Learn Tool with Abundance of Pre-Trained Models

While Theano and Torch are designed for research, Caffe isn’t fit for text, sound, or time series data. It’s a special-purpose machine learning library for image classification. The support from Facebook and the recently open sourced Caffe2 have made the library a popular tool with 248 GitHub contributors.

Despite being criticized for slow development, Caffe’s successor Caffe2 has been eliminating the existing problems of the original technology by adding flexibility, weightlessness, and support for mobile deployment.

Datasets and models

Caffe encourages users to get familiar with datasets provided by both the industry and other users. The team fosters collaboration and links to the most popular datasets that have already been trained with Caffe. One of the biggest benefits of the framework is Model Zoo – a vast reservoir of pre-trained models created by developers and researchers, which allow you to use, or combine a model, or just learn to train a model of your own.

Audience and learning curve

The Caffe team claims that you can skip the learning part and start exploring deep learning using the existing models straightaway. The library is targeted at developers who want to experience deep learning first hand and offers resources that promise to be expanded as the community develops.

Use cases

By using the state-of-the-art Convolutional Neural Networks (CNNs) – deep neural networks successfully applied for visual imagery analysis and even powering vision in self-driving cars – Caffe allowed Facebook to develop its real-time video filtering tool for applying famous artistic styles on videos. Pinterest also used Caffe to expand a visual search function and allow users to discover specific objects on a picture.

When Demand Matches Proposition

The number of machine learning tools appearing on the market and the number of projects applied by businesses of all sizes and fields create a continuous, self-supporting cycle. The more ML efforts you initiate, the more tools and services are created, and therefore – the cheaper and more accessible they become.

Even in our age at the dawn of machine learning, we have such a wide range of opportunities that it’s hard to make a choice. The most important takeaway is that each of these projects has been created for a series of scenarios and your task is to find the ones that fit your approach the best.

Original. Reposted with permission.