From Big Data Platforms to Platform-less Machine Learning
The rise in serverless architectures along with marketplaces from cloud providers creates a significant momentum to democratize big data analytics. Machine learning or AI services are much more valuable, tangible and easier to understand for businesses than clumsy big data platforms.
By Stepan Pushkarev – CTO of Hydrosphere.io
Platform-centric and vendor-driven architectures should be deprecated if the companies using these, consider superior technology to be their core differentiator from competitors. Amazon would never have been the giant it is today, if its ecommerce business had been built on Magento or its warehouse operations had been powered by the SAP ERP.
Technology lessons learned from the past should clearly define the roadmaps of Big Data analytics projects, otherwise yet another Data Science, IoT, Big Data, AI, DevOps or Streaming platform will lock you in and slow down your innovation and movement forward in the long run. There are numerous legacy mainframe, marketing, content management or billing platforms that slow down enterprises all over the world, as opposed to improving their processes.
Most companies have to rely on the core in-house architecture based on an ecosystem of interchangeable analytical, machine learning and AI services. It democratizes the technology stack and unlocks the advantages of plugging ready-to-go server less applications.
What is a “Service”?
For the sake of simplicity, consider analytics, machine learning or AI services as smart SaaS applications that could either be deployed in your cloud or can be hosted by a 3rd party provider.
The players in this industry are more or less familiar with the server less architectures based on AWS offerings. Following this model and packaging the features of recommendation, image recognition, fraud detection, pricing optimization, risk modeling services as a seamless addition for AWS, Azure, GCE or Azure ecosystems, makes it much more attractive to prospective client companies.
The beauty of this approach lies in the ability for users to cherry pick in-house enterprise services, commercial applications and open source projects, which can then be deployed as a services.
Buy analytics services.
Choose and pick self-sustained offerings for text processing, image recognition, predictive searches and many others per your requirements or desires. The additional features will bring an immediate value and could be replaced later with newer alternatives.
Third party services might be deployed as SaaS and on-premises, right in your cloud if they are packaged appropriately.
Build your own services.
Prepare and train your in-house team to deliver personalization, recommendation or log analytics services as internal smart SaaS applications with a clear API, bounded context and SLA requirements, instead of incomplete machine learning models trained in a closed off environment.
Even if the service is hosted on your servers, it has to be delivered as a serverless application from the operations experience perspective.
Hydrosphere.io builds open-source tools that simplify deploying, serving and monitoring the quality of machine learning pipelines trained with Apache Spark. Predictive services could then be easily packaged and deployed on your infrastructure and data lake built on top of AWS, Azure, GCE as well as Hortonworks, Cloudera, EMR and MapR distributions.
Prediction.io integrates with Apache Spark, HBase, Spray and Elasticsearch into the end-to-end machine learning server.
Besides the open source stack, there are also plenty of proprietary MLaaS options like Azure Machine Learning, Amazon Machine Learning, Google Prediction, BigML and DataRobot. These cover most of the common ML use cases, but as with any black box drag-n-drop systems, they have obvious challenges such as integration with the data feeds and devops toolset.
The purpose of this paper is not to compare cloud machine learning tools. The main idea is to propose an ecosystem that democratizes all of these platforms and frameworks. With the right architecture, image recognition services could be built on the Google Cloud Vision API, and then be easily replaced with IBM Watson, while churn prediction services could be managed by Apache Spark and Hydrosphere.io at the same time.
Sell your analytics services.
If you are a data scientist or developer, you have an excellent opportunity to build your own service based on machine learning that covers a use case from a particular domain and sell it through the same “as a service” ecosystem. It will create a recurring revenue stream for you, since you’ll be right between the cloud provider and enterprise customer.
The significant rise in serverless and microservices architectures working in seamless conjunction with all sorts of marketplaces from cloud providers creates a significant momentum for the community to democratize big data analytics and make it accessible to enterprise customers. Machine learning or AI services are much more valuable, tangible and easier to understand for businesses than the half-baked monoliths of clumsy big data platforms.
Bio: Stepan Pushkarev is CTO of Hydrosphere.io. Stepan co-founded and led engineering teams for eCommerce, IoT and Ad-tech companies. Being responsible for the full products stack from math models, infrastructure & operations, to the web & mobile applications as well as for hiring and establishing engineering culture and delivery process he combines strong technology, management end entrepreneur backgrounds.
- Amazon Machine Learning: Nice and Easy or Overly Simple?
- The Imminent Future of Predictive Modeling
- Hadoop as a Service: 18 Cloud Options