Machine Learning Wars: Amazon vs Google vs BigML vs PredicSis
Comparing 4 Machine Learning APIs: Amazon Machine Learning, BigML, Google Prediction API and PredicSis on a real data from Kaggle, we find the most accurate, the fastest, the best tradeoff, and a surprise last place.
By Louis Dorard
UPDATE - NEW BIGML RESULTS: As pointed out by Francisco Martin, if you just change the objective field (SeriousDlqin2yrs) to be numeric instead of categorical, BigML's accuracy for a single model goes to 0.853 (whereas it was initially reported as 0.790 - the accuracy in the table above and the Kaggle rank below have been updated to reflect that).
Amazon ML (Machine Learning) made a lot of noise when it came out last month. Shortly afterwards, someone posted a link to Google Prediction API on HackerNews and it quickly became one of the most popular posts. Google’s product is quite similar to Amazon’s but it’s actually much older since it was introduced in 2011. Anyway, this gave me the idea of comparing the performance of Amazon’s new ML API with that of Google. For that, I used the Kaggle “give me some credit” challenge. But I didn’t stop there: I also included startups who provide competing APIs in this comparison — namely, PredicSis and BigML. In this wave of new ML services, the giant tech companies are getting all the headlines, but bigger companies do not necessarily have better products.
Here is a tweet-size summary:
Methodology
The ML problem in the Kaggle credit challenge is a binary classification one: you’re given a dataset of input-output pairs where each input corresponds to an individual who has applied for a credit and the output says whether he later defaulted or not. The idea is to use ML to predict whether a new individual applying for a credit will default.
ML has two phases: train and predict. The “train” phase consists in using a set of input-output examples to create a model that maps inputs to outputs. The “predict” phase consists in using the model on new inputs to get predictions of the associated outputs. Amazon ML, Google Prediction API, PredicSis and BigML all have similar API methods for each phase:
UPDATE - NEW BIGML RESULTS: As pointed out by Francisco Martin, if you just change the objective field (SeriousDlqin2yrs) to be numeric instead of categorical, BigML's accuracy for a single model goes to 0.853 (whereas it was initially reported as 0.790 - the accuracy in the table above and the Kaggle rank below have been updated to reflect that).
Amazon ML (Machine Learning) made a lot of noise when it came out last month. Shortly afterwards, someone posted a link to Google Prediction API on HackerNews and it quickly became one of the most popular posts. Google’s product is quite similar to Amazon’s but it’s actually much older since it was introduced in 2011. Anyway, this gave me the idea of comparing the performance of Amazon’s new ML API with that of Google. For that, I used the Kaggle “give me some credit” challenge. But I didn’t stop there: I also included startups who provide competing APIs in this comparison — namely, PredicSis and BigML. In this wave of new ML services, the giant tech companies are getting all the headlines, but bigger companies do not necessarily have better products.
Here is a tweet-size summary:
Amazon Machine Learning most accurate
BigML fastest
PredicSis best trade-off
Google (Prediction API) last
Methodology
ML has two phases: train and predict. The “train” phase consists in using a set of input-output examples to create a model that maps inputs to outputs. The “predict” phase consists in using the model on new inputs to get predictions of the associated outputs. Amazon ML, Google Prediction API, PredicSis and BigML all have similar API methods for each phase:
- One method that takes in a dataset (in csv format for instance), and that returns the id of a model trained on this dataset
- One method that takes a model id and an input, and that returns a prediction.