PredictionIO (Open Source Version) vs Microsoft Azure Machine Learning

Azure Machine Learning and PredictionIO are tools that both have similar visions and similar features, but when digging deeper you’ll notice key differences and key advantages to each.



PredictionIO’s strengths

The ability to operationalize models and turn them into APIs with a click in Azure ML comes at a price: first, your data has to live in Azure, which is a no-go for certain organizations with sensitive data; second, you have to pay for the infrastructure and for the service (whereas you’d only pay for your infrastructure when using PredictionIO). There’s a free tier but for deploying in production you need the standard tier at $10/month per seat, to which you would add the “studio” usage at $1/hour (for experimenting) and API usage (based on the number of hours active and of transactions, see pricing details).

Several advantages of PredictionIO come from the fact that it’s open source and based on Spark:

  • You own everything you do with it and you can host anywhere you like: locally (even without an internet connection), on your own infrastructure, on a private cloud, or on a public one.
  • You can write and use your own custom data processing tasks in Scala and their execution will be distributed if you have a cluster of compute nodes (whereas custom scripts on Azure would always run on a single node).
  • There are no restrictions on the size of training data that can be ingested (10 GB max on Azure) and on the number of concurrent requests to your model’s API (20 max on Azure).

Besides open source, PredictionIO is particularly interesting for its DASE framework that structures data processing flows. DASE refers to 4 stages in the creation of a predictive model — also called “engine” in PredictionIO terminology: Data preparation, Algorithm, Serving, Evaluation.

  • Data preparation is where you would extract features from data sources
  • Algorithm is where you would specify a type of machine learning model along with its parameters
  • Serving is where you would add any business logic to post-process model outputs and finally deliver responses to prediction API queries
  • Evaluation is where you would define metrics and methods to assess the performance of models and thus compare them.

PredictionIO Engine

PredictionIO’s engine components and commands to use them (build, train, deploy)