PredictionIO (Open Source Version) vs Microsoft Azure Machine Learning

Azure Machine Learning and PredictionIO are tools that both have similar visions and similar features, but when digging deeper you’ll notice key differences and key advantages to each.

By Louis Dorard.

Azure Machine Learning and PredictionIO are tools that both make it easier for developers and data scientists to productionalize predictive models learned from data. The former is a platform for creating predictive APIs hosted on the Microsoft Azure cloud, whereas the latter is an open source machine learning server that you run on your own infrastructure, also to expose predictive models as APIs. Last month, Microsoft announced a new version of Azure ML at the Strata conference with new features such as support for Hadoop and Spark — which are already used in PredictionIO. It’s clear that both companies have similar visions and similar features, but when digging deeper you’ll notice key differences and key advantages to each...

Similar visions

One major hurdle that companies encounter in their machine learning projects is taking data scientists’ work to production in order to deliver predictions to end users (who’ll ultimately benefit from them). There’s clearly a divide between data scientists and system folks, which makes things complicated. Azure ML and PredictionIO both provide solutions to the “last mile problem” of deploying predictive models into production by simplifying the process.

Most of today’s applied machine learning tasks can be dealt with: classification, regression, recommendation, clustering and anomaly detection. With both tools you can easily interchange machine learning algorithms until settling on what works best on your problem. On Azure, Microsoft’s algorithms will be used by default (supposedly the same as those used in Xbox, Bing, Cortana...), whereas PredictionIO comes with Spark’s MLlib library, deep learning library and other JVM-based algorithm libraries. You can still use other libraries or your own custom algorithms. Microsoft recently added support for Python, which now makes it possible to copy existing code based on scikit-learn and Pandas for instance, and to have it run on Azure. Because PredictionIO is open source, you are theoretically not restricted to any particular language, but you might want to take advantage of Spark and Scala for distributed processing.

Another aspect on which both organizations have been working on is the ability to create and reuse predefined templates and workflows to help their users launch predictive APIs more quickly than with traditional development methods. Microsoft uses their Azure Marketplace for letting people share templates, and PredictionIO has built a templates gallery. If you take the example of e-commerce websites, adding product recommendations will be more or less the same problem from one site to the other, so you will rarely need custom developments; you would just reuse an existing template and tie it to your own data sources.

Azure’s strengths

Although PredictionIO’s install is super easy (just a one-line command, or you can fire up an already provisioned Amazon instance or a snap in 5 seconds), with Azure there’s nothing to install at all. Everything is done from the browser, including the authoring of models and experiments. This happens within a canvas (quite similar to that of RapidMiner) where each data processing task is represented as a block and where you can compose blocks together by drag & drop and connecting those that can connect. The canvas exposes details of Machine Learning algorithms but it makes their use accessible to non programmers (or bad programmers). You can work on the canvas with others thanks to the team sharing feature that allows several people to collaborate on the same workspace.

Azure Ml Canvas
Azure ML’s interface with its canvas (in the middle)

One advantage of working with a cloud platform such as Azure is its auto-scaling feature: models are deployed in a way that’s elastic and you don’t have to worry about scaling out your APIs. If you’re adding a recommender system to your ecommerce site and it gets popular, you won’t be in trouble when traffic increases. The learning and prediction phases are done on Microsoft’s infrastructure which you don’t have to manage. You could replicate this auto-scaling with PredictionIO in the cloud, for example with their predefined Amazon CloudFormation stack, but it’s a little more involved.