Accelerating AI with MLOps

Companies are racing to use AI, but despite its vast potential, most AI projects fail. Examining and resolving operational issues upfront can help AI initiatives reach their full potential.



By Yochay Ettun, CEO and Co-founder of cnvrg.io.

Despite the vast potential of artificial intelligence (AI), it hasn’t caught hold in most industries. The majority of AI projects hit a wall. Accenture estimates that 80% to 85% of companies’ AI projects are in the proof-of-concept stage, and nearly 80% of enterprises fail to scale AI deployments across the organization.  

 

Obstacles to operationalizing AI

 

Once AI is built, many organizations struggle to get their models to production. It can take months, even years, to deliver value from your AI. According to Gartner's research, only 53% of projects make it from artificial intelligence (AI) prototypes to production.

There are still very few established standards for developing AI/machine learning (ML) models. Data scientists are skilled mathematicians, but in order to successfully bring models to production, they also need experience in software development methodologies and DevOps.  Models are often created from scratch without consistent software development processes, testing procedures or KPIs for measuring their performance.

Skilled data scientists are a scarce resource.   Companies would prefer they spend their time on what they do best--delivering high-impact algorithms. But more often than not, due to unforeseen operational issues, they are consumed by other tasks, including tracking and monitoring, configurations, compute resource management, serving infrastructure, feature extraction, and model deployment.

In addition, today's infrastructure landscape is a jungle. There are endless combinations of compute options that a data scientist can utilize for different AI workloads, including CPUs, GPUs, AI accelerators, cloud, on-prem, hybrid cloud, co-location, and others. As a result, there is a lot of complexity and unforeseen challenges executing jobs to achieve high performance at a reasonable price.

 

Accelerating AI deployments with MLOps

 

This is where MLOps (machine learning operations) can make a difference to help data scientists tackle operational issues.  MLOps is a set of practices for collaboration and communication between data scientists and operations professionals to automate the deployment of machine learning and deep learning models in large-scale production environments. It makes it easier to align models with business needs, implement standard and consistent development processes, scale-up and manage diverse data pipelines.

Most companies are using MLOps for automating ML pipelines, monitoring, lifecycle management, and governance.  Using MLOps, models can be monitored to sense a degradation in performance or to detect a drift in results indicating that the models need to be retrained with more complete or fresher data. It’s also used to continuously experiment with new methods to take advantage of new technology, such as the latest language processing and text and image analysis.

The demand for MLOps solutions is huge.  A community-led effort to collect all the different tools in the space, including open-source, resulted in a list of over 330 different MLOps tools. A report from Cognilytica predicts exponential growth in MLOps tools to reach USD 125 billion by 2025. That would represent a 33 percent annual growth rate.

There is a  great deal of evidence that those companies that implement MLOps have a higher likelihood of success. Here are two examples.

ST Unitas, a leading Global EdTech company and owner of The Princeton Review – the top US-based education service, needed to scale up deployments of its e-learning systems, including CONECTS, a learning platform that processes images of math problems and presents personalized tests to increase scores for three million students. After implementing an MLOps platform, ST Unitas scaled up AI activity by 10X in just a few months, without hiring additional engineers, improving both performance and throughput.

Playtika, a leading Game-Entertainment company with over 10 million daily active users (DAU), uses massive amounts of data to reshape the gaming landscape by tailoring UX based on in-game actions. Playtika’s large-scale ML processes gathered millions of users and thousands of features every minute using batch and web services, but neither solution could process the data quickly enough to produce predictions in real-time. By implementing an MLOps platform, they increased performance by 40% and gained up to a 50% increase in successful throughput.   Now Playtika is able to process 9TB of data daily to provide its scientists the data they need to create a continuously changing and adaptive game environment to continue to earn top-grossing games.

With the new online economy, companies are racing to use AI insights to become truly data-driven. But even if machine learning models are built based on solid algorithms and are trained successfully based on a full set of clean data, if the model can’t run on the existing IT infrastructure, AI can never meet its true potential.  Examining and finding solutions to solve operational issues upfront can help AI projects reach their successful completion.

 

Bio: Yochay Ettun is an experienced tech leader, included in the 2020 Forbes 30 under 30 list for his achievements in AI advancement and for the establishment of cnvrg.io. Yochay has been writing code since the young age of 7. He served in the Intelligence unit of the Israeli Defense Forces (IDF) for 4 years, and received a BSc in Computer Science at the Hebrew University of Jerusalem (HUJI), where he founded the HUJI Innovation Lab. Yochay led as the former CTO of Webbing labs and was a consultant for AI and machine learning companies. After 3 years of consulting, Yochay, along with Co-founder Leah Kolben, decided to create a tool to help data scientists and companies scale their AI and machine learning with cnvrg.io. The company continues to help data science teams from Fortune 500 companies manage, build and automate machine learning from research to production.

Related: