API for Prediction and Machine Learning: poll results and analysis
APIs are set procedures which provide easy to use, automated, robust solution to the recurring programming challenges. Here, we analyzed major players in the big data domain are providing machine learning APIs.
By Jasmien Lismont, Tine Van Calster, Maria Oskarsdottir, Jan Vanthienen, Bart Baesens, Wilfried Lemahieu (KU Leuven).
During April 2015 a poll was issued on KDnuggets researching the application of machine learning APIs (ML APIs) for analytics. The poll received 53 answers which are presented in this report.
Usage of ML APIs
Figure 1 indicates which APIs are used by more than 10% of in total 46 respondents. Note that respondents could select more than one answer. Strikingly, a high use of Indico (41.3%) can be observed, which is a text and image analytics tool. This might indicate a lack of or high complexity level of text and image mining techniques in stand-alone tools. A similar tool is Alchemy API (10.9%). Next in line is Microsoft’s Azure (19.6%) which allows both for quick, user-friendly analytics and more elaborate analytics using R or Python. Other tools focus on scalable advanced analytics by connecting to the cloud or parallel processing, such as GraphLab (17.4%), H2O (17.4%), PredictionIO (15.2%) and Google’s Prediction API (10.9%). Noticeably, also BigML scores relatively high (13%). BigML focuses on rapid, easy to use analytics and is limited to a couple of problem settings and techniques.
Figure 1: ML APIs used by more than 10% of the respondents.
Furthermore, we analyzed the top three preferred APIs of in total 42 respondents as can be observed from Figure 2. Again, Indico is well ahead (47.6%) but also Graphlab (16.7%), H20 (16.7%), Azure (16.7%) and BigML (14.3%) score well. This might indicate that analysts are trying out different tools although they not always prefer those tools in the end. Moreover, it might also indicate, on the one hand, a strong need and preference for text and video mining tools, and, on the other hand, a need for easy-to-use, scalable tools.
Figure 2: Top 3 ML APIs (preferred by more than 10% of respondents).
User motivation for ML APIs
Before we dive into the reasons why respondents choose certain APIs, we look at their use of stand-alone tools such as SAS, RapidMiner, R, Excel, Weka, etc, and the perceived security of APIs.
Usage of stand-alone tools next to ML APIs
We observe a high application of a combination of stand-alone tools and ML APIs (49%), see Figure 3. Nevertheless, 20.4% of the 49 respondents use only one API and 16.3% combine multiple APIs. Another 14.3% only apply stand-alone tools.
Figure 3: Usage of stand-alone tools.
Perceived security of ML APIs
Figure 4 illustrates if respondents feel that an API is secure enough with regards to the privacy of their data. The majority (53.1%) of 49 respondents believes the tools are secure enough but almost one third of respondents (28.6%) believes that an API needs the option of a private cloud before they trust its security. 18.4% of respondents do not believe APIs are secure enough at this moment. This might suggest that API providers need to investigate how they can ensure the security of their APIs and/or convince their users of their API security.
Figure 4: How secure are ML APIs according to our respondents?
Motivation for ML APIs usage
Finally, we take a look at the motivation for applying certain APIs as can be observed from Figure 5. Almost 60% (57.8%) of 45 respondents use ML APIs because they are easy to use. This might explain the preference for tools such as BigML for instance. Furthermore, almost 50% want access to known and proven algorithms and are motivated by fast development. The top five is completed with scalability (33.3%) and easy evaluation (31.1%) as motivation. Furthermore, budgeting reasons might come into play since APIs often demand lower initial investments and might require less management. Accessibility and good integration with other tools also explain the use of APIs. However, less important are end-to-end support for business problems – which might signal that APIs are often combined with other tools -; pooled resources; data storage solutions; visualization; and sharing.
Figure 5: Motivation for ML APIs