Moving from Data Science to Machine Learning Engineering

The world of machine learning — and software — is changing. Read this article to find out how, and what you can do to stay ahead of it.

By Caleb Kaiser, Cortex Labs

For the last 20 years, machine learning has been about one question: Can we train a model to do something?

Something, of course, can be any task. Predict the next word in a sentence, recognize faces in a photo, generate a certain sound. The goal was to see if machine learning worked, if we could make accurate predictions.

Thanks to decades of work by data scientists, we now have a lot of models that can do a lot of somethings:

  • OpenAI’s GPT-2 (and now GPT-3) can generate passably human text.
  • Object detection models like YOLOv5 (debates over the official version aside) can parse objects from 140 frames of video per second.
  • Text-to-speech models like Tacotron 2 can generate human-sounding speech.

The work being done by data scientists and ML researchers is incredible, and as a result, a second question has naturally arisen:

What can we build with these models, and how can we do it?

This is notably not a data science question. This is an engineering question. To answer it, a new discipline has emerged—machine learning engineering.


Machine learning engineering is how machine learning gets applied to real-world problems

The difference between data science and machine learning engineering can feel a little intangible at first, and so it’s helpful to look at a few examples.


1. From image classification, to ML-generated catalogues

Image classification and keyword extraction are classic problems of computer vision and natural language processing, respectively. uses an ensemble of models trained for both tasks to create an API that extracts structured information from product images:


Source: TechCrunch


The models themselves are impressive feats of data science. The Glisten API, however, is a feat of machine learning engineering.


2. From object detection, to poacher prevention

Wildlife Protection Solutions is a small nonprofit that uses technology to protect endangered species. Recently, they upgraded their video monitoring system to incorporate an object detection model trained to recognize poachers. The model has already doubled its detection rate:


Source: Silverpond


Object detection models like YOLOv4 are successes of data science, and Highlighter—the platform WPS used to train their model—is an impressive data science tool. WPS’s poacher detection system, however, is a feat of machine learning engineering.


3. From machine translation to a COVID19 moonshot

Machine translation refers to the use of machine learning to “translate” data from one form to another—sometimes between human languages, and sometimes between entirely different formats.

PostEra is a medicinal chemistry platform that uses machine translation to “translate” a compound into an engineering blueprint. Currently, chemists are using the platform in an open source effort to find a treatment for COVID19:


Source: PostEra


Developing a model that can translate a molecule into a series of “routes” (transformations to go from one molecule to another) is a feat of data science. Building the PostEra platform is a feat of machine learning engineering.


4. From text generation, to ML dungeon masters

OpenAI’s GPT-2 was, at the time of its release, the most powerful text generating model in history. At an insane 1.5 billion parameters, it represented a big step forward in transformer models.

AI Dungeon is a classic dungeon crawler with a twist: its dungeon master is actually GPT-2 fine tuned on text from choose your own adventure stories:


Training GPT-2 is a historic feat of data science. Building a dungeon crawler out of it is a feat of machine learning engineering.

All of these platforms stand on the shoulders of data science. They wouldn’t work if they couldn’t train a model for their tasks. But, in order to apply these models to real world problems, they need to be engineered into applications.

Put another way, machine learning engineering is how the innovations of data science manifest outside of ML research.

The central challenge machine learning engineering presents, however, is that it introduces an entirely new category of engineering problems—ones we don’t have easy answers for just yet.


What goes into machine learning engineering

At a high-level, we can say that machine learning engineering refers to all the tasks required to take a trained model and build production applications:

Image for post

To make this more tangible, we can use a simple example.

Let’s go back to AI Dungeon, the ML-powered dungeon crawler. The game’s architecture is simple. Players input some text, the game makes a call to the model, the model generates a response, and the game displays it. The obvious way to build this is to deploy the model as a microservice.

In theory, this should be similar to deploying any other web service. Wrap the model in an API with something like FastAPI, containerize it with Docker, deploy to a Kubernetes cluster, and expose it with a load balancer.


In practice, GPT-2 complicates things:

  • GPT-2 is huge. The fully trained model is over 5 GB. In order to serve it, you need a cluster provisioned with large instance types.
  • GPT-2 is resource-intensive. A single prediction can lock up a GPU for extended periods of time. Low latency is difficult to achieve, and a single instance cannot handle many requests at once.
  • GPT-2 is expensive. As a result of the above facts, deploying GPT-2 to production means that—assuming you have a decent amount of traffic—you will be running many large GPU instances, which gets expensive.

When you consider that the game had over 1 million players very quickly after releasing, these problems become more severe.

Writing a performant API, provisioning a cluster with GPU instances, using spot instances to optimize costs, configuring autoscaling for inference workloads, implementing rolling updates so that the API doesn’t crash every time they update the model—it’s a lot of engineering work, and this a simple ML application.

There are a number of common features—retraining, monitoring, multi-model endpoints, batch prediction, etc.—needed for many ML applications, each of which would raise the level of complexity significantly.

Solving these problems is what a machine learning engineer (in conjunction with an ML platform team, depending on the org) does, and their job is made significantly harder by the fact that most tooling for working with machine learning was designed for data science, not engineering.

Fortunately, this is changing.


We’re building a platform for machine learning engineering—not data science

A couple years ago, a few of us transitioned from software engineering to MLE. After spending weeks hacking data science workflows and writing glue code to make ML applications work, we started thinking about how we could apply software engineering principles to machine learning engineering.

For example, look at AI Dungeon. If they were building a normal API—one that didn’t involve GPT-2—they would use something like Lambda to spin up their API in 15 minutes. Because of the ML-specific challenges of serving GPT-2, however, orchestration tools from software engineering won’t work.

But, why shouldn’t the principles still apply?

So, we started working on tools for machine learning engineering, tools that applied those principles. Cortex, our open source API platform, makes it as easy as possible for machine learning engineers to deploy models as APIs, using an interface that will be familiar to any software engineer:


Source: Cortex repo


The API platform is actually what AI Dungeon—as well as every other ML startup listed above—used to deploy their models. The design philosophy behind it, and all of our work at Cortex, is very simple:

We treat the challenges of machine learning engineering as engineering—not data science—problems.

For the API platform, that means that instead of notebooks—which are difficult to version, rely on hidden state, and allow for arbitrary execution order—we use YAML and Python files. Instead of a GUI with a “Deploy” button, we built a CLI, through which you can actually manage deployments.

You can apply this philosophy to many of the challenges of using machine learning in production.

Reproducibility, for example, isn’t only a challenge in machine learning. It’s a problem in software engineering too—but we use version control to solve it. And while traditional version control software like Git doesn’t work for machine learning, you can still apply the principles. DVC (Data Version Control), which applies Git-like version control to training data, code, and their resulting models, does just this:


Source: DVC


And what about all those files of boilerplate and glue code needed to initialize a model and generate predictions? In software engineering, we’d design a framework for this.

Finally, we’re seeing this happen in machine learning engineering too. Hugging Face’s Transformers library, for example, provides an easy interface for most popular transformer models:


Source: Hugging Face


With those six lines of Python, you can download, initialize, and serve predictions from GPT-2, one of the most powerful text generating models. That’s six lines of Python to do something not even mature, well-funded teams could do three years ago.

What makes us so excited about this ecosystem—beyond the fact that we’re a part of it—is that it represents the bridge between decades of research into machine learning and the problems people face every day. Every time one of these projects removes a barrier to machine learning engineering, it becomes that much easier for a new team to solve a problem with machine learning.

In the future, machine learning is going to become a part of every engineer’s stack. There will hardly be a problem ML doesn’t touch. The pace at which this occurs is entirely dependent on how quickly we can develop platforms like Cortex, and accelerate the proliferation of machine learning engineering.

If that is exciting to you too, we’re always happy to welcome new contributors.

Bio: Caleb Kaiser (@KaiserFrose) is on the founding team of Cortex Labs, where he helps maintain Cortex.

Original. Reposted with permission.