HPE Haven OnDemand and Microsoft Azure Machine Learning: Power Tools for Developers and Data Scientists
While both HPE and Microsoft machine learning platforms offer numerous possibilities for developers and data scientists, HPE Haven OnDemand is a diverse collection of APIs for interacting with data designed with flexibility in mind, allowing developers to quickly perform data tasks in the cloud.
Data is everywhere. Data is big, complex, and growing exponentially in volume. Perhaps most importantly, data is not a fad, and the challenges associated with it are not going anywhere. With organizations inundated with data these days, turning it from liability to asset can be a challenge, with the greatest potential asset being insight. With such a wide availability of data-related tools today, it can be difficult to know where to begin looking for help.
This is where HPE Haven OnDemand comes in. HPE Haven OnDemand is a cloud services platform which simplifies how you can interact with data, allowing it to be transformed into an asset anytime, anywhere. HPE Haven OnDemand provides a collection of machine learning application programming interfaces (APIs) for interacting with structured and unstructured data in a variety of ways.
A sample of HPE Haven OnDemand’s 60+ APIs includes:
- Connectors – website, DropBox, SharePoint, enterprise onsite
- Format conversion – Optical Character Recognition (OCR), text extraction
- Text analysis – language identification, sentiment analysis, document categorization
- Index and search – find related concepts, find similar, add to text index, parametric search
- Prediction – prediction and recommendation
HPE Haven OnDemand also includes APIs for anomaly detection, trend analysis, and a variety of other analytics APIs. A full overview of the APIs can be found here.
HPE Haven OnDemand is currently making the headlines with their Machine Learning as a Service, which is hosted on Microsoft’s Azure Cloud. Both companies have a machine learning platform; as such, this article helps to contrast the two offerings and toolsets. In this article, we will take an introductory look at HPE Haven OnDemand, with a focus on one of the most common and useful contemporary data-related tasks: prediction. A prediction task will be undertaken, and the process discussed. In order to put HPE Haven OnDemand’s services in perspective, a similar process will be undertaken using Azure Machine Learning, and the differences between the 2 platforms will be highlighted.
HPE Haven OnDemand
First off, signing up for HPE Haven OnDemand is straightforward, with common sign-on options including Google, Facebook, and Twitter authentication, as well as HP Passport. I was signed up with my Google ID in a matter of seconds. The only other step necessary prior to employing the various APIs is to generate an API key, after which I was on my way. More information on getting started can be found here.
In order to make predictions, some data is obviously required. For this walkthrough, the classic Adult dataset, constructed from 1994 US census data, will be used to predict whether or not a given individual has an income in excess of $50K per year, based on 14 attributes including age, occupation, and marital status. More on the Adult dataset can be found here.
Of course, the data needs to be cleaned and prepared. After removing instances with missing values, the dataset was split into training and testing sets. Note that, while missing values were removed for simplicity, HPE Haven OnDemand is perfectly able to deal with such data incompleteness.
Train Prediction API
In order to perform prediction, whether with HPE Haven OnDemand or any other tool, a basic 2-step process must be followed: first, a predictive model must be trained, after which the model is used to make predictions. Using the HPE Haven OnDemand Train Prediction API, a model can be trained using a simple web interface, HTTPS, or a cURL call. This flexibility allows for APIs to be called by just about any programming language. Also note that a selection of supported client libraries exist in order to simplify API calls and integrate functionality into a variety of apps. This particular walkthrough will employ the web interface.
In order to use the API, training data must be provided, and HPE Haven OnDemand accommodates JSON or CSV files, URLs, and object store references. The prediction_field and service_name parameters must also be set, which represent the dataset attribute to be predicted and a unique name for the constructed prediction model, respectively.
Supported data types currently include RICH_TEXT, DOUBLE, and INTEGER. To facilitate data validation, I edited the Adult dataset CSV training and testing files by inserting first a header row defining the column names, immediately under which I added a row describing the datatype of each attribute (RICH_TEXT, DOUBLE, INTEGER). Prediction could now be performed via the ‘Try It’ tab of the API website.
After selecting the file for upload and providing the few required parameters, the service was started in asynchronous mode (HPE Haven OnDemand warns that prediction can be a lengthy process), and a few seconds later returned this output:
Success! HPE Haven OnDemand trained a model just as simply as it had promised. It also provided a cURL call below the output which could be used to perform the same task via direct call in the future, or integrated into our own application. Sample HTTPS calls are also provided in the API overview.
It should be noted that HPE Haven OnDemand uses a number of prediction algorithms and runs each a number of times with different parameters in order to select and return the best-performing statistical model. This guarantees the best result without the necessity of building, testing, and continually modifying the parameters of numerous prediction algorithms.