Cloud Machine Learning Wars: Amazon vs IBM Watson vs Microsoft Azure

Amazon recently announced Amazon Machine Learning, a cloud machine learning solution for Amazon Web Services. Able to pull data effortlessly from RDS, S3 and Redshift, the product could pose a significant threat to Microsoft Azure ML and IBM Watson Analytics.

Model Selection
Amazon machine learning simply requires that the user select a target variable. Then the user chooses whether to learn with default settings or to select a custom model. Even under the Advanced Settings for model creation, the choices available to the user are limited to model size, l1 vs l2 regularization (or neither) and magnitude of the regularization parameter (they call this regularization amount. As with data preprocessing, Amazon Machine Learning assumes that target customers seek an automatic solution. Presumably, under the hood Amazon is running either logistic regression or an SVM for classification and linear regression or low order polynomial regression to predict numerical quantities.

Upon selecting a model, the service asks whether the user would like to holdout data for validation from the training set or to provide holdout data from a different source. Once these selections are made, Amazon ML trains the model on the given dataset. Using the sample dataset of dummy bank customers (5MB in size), training takes roughly 10 minutes. When evaluating the evaluation metric for a binary classification task, Amazon ML reports the area under the ROC curve (AUC).


Amazon's pricing model is straightforward. In addition to standard computer charges for data storage, etc, Amazon charges for making predictions. Each real-time prediction costs $.0001 and batch predictions cost $.10 per 1,000 predictions (the same price per prediction). Additionally, Amazon charges $0.42 per hour for model building. This is somewhat opaque as the user may have no way to reason about how long a model should take to build, knowing neither what precise model is chosen, the number of model parameters, nor how many passes will be made through the data (assuming running in default mode). A nice additional feature for the service might be cost estimation prior to running the algorithm, especially for models run with the default settings, which are hardest to reason about.


Amazon's cloud machine learning service is narrower in scope than either IBM's Watson Analytics or Microsoft's Azure ML offerings in the space. However, it is far the smoothest of the services. The service has a clear use case, data acquisition is effortless, and it's clear who might use the product. In contrast, Azure ML assumes that its customers know nearly enough to build a model themselves but want a GUI. Watson Analytics, when we tested it, couldn't handle enterprise scale data. Watson appeared focused more on data visualization and exploration, than specific prediction problems. As Amazon's service does not feature deep learning or machine perception functionality, and can only be trained on supplied datasets (as opposed to more universal datasets like Imagenet, or large text corpora), it's unlikely to compete directly with MetaMind.

Zachary Chase Lipton Zachary Chase Lipton is a PhD student in the Computer Science Engineering department at the University of California, San Diego. Funded by the Division of Biomedical Informatics, he is interested in both theoretical foundations and applications of machine learning. In addition to his work at UCSD, he has interned at Microsoft Research Labs. He will be working for Amazon this summer as a Machine Learning Scientist.