Is There a Way to Bridge the MLOps Tools Gap?

Converting Jupyter notebooks to a well-designed software system is a mandatory step in every ML project. But there is a notable lack of tooling to assist developers with such translation, beyond the basic nbconvert utility.



Is There a Way to Bridge the MLOps Tools Gap?
Photo by Pavel Danilyuk

 

Interactive notebooks, such as Jupyter, are essential for artificial intelligence/machine learning (AI/ML) development, but are ill-suited for production environments. Therefore converting notebooks to a well-designed software system is a mandatory step in every ML project. But there is a notable lack of tooling to assist developers with such translation, beyond the basic nbconvert utility. 

 

Notebooks are the go-to Integrated Development Environment (IDE) of Data Science

 

Even before Jupyter was developed, mathematicians, researchers and analysts used interactive “notebook style” development environments (for example Mathematica). For data exploration and statistical analysis, immediate feedback and visualization is essential to know if a given workflow will lead to a working model. 

There are three problems commonly faced by data scientists when trying to move from model to a production-ready ML prototype:

  1. Running the ML code on regular scheduled intervals (cron service) or a resilient and scalable web service; 
  2. Applying the code to larger datasets that does not fit on a single computer node (distributed computing) seamlessly within the notebook; and 
  3. Finding IDE convenience features suitable to develop large code bases. Jupyter itself lacks many “creature comforts” of modern development friendly IDEs.

A large wave of tools recently appeared to address these concerns - from robust Jupyter alternative notebooks like Deepnote and Hex; cron schedulers like PaperMill; and full-service cloud setups like Databricks. However, all of these tools have one thing in common: they are built around notebooks and assist in using notebooks in production settings. When using any of these tools, you’ll run into two issues: 1.) the code structure in the notebook is the same (usually a poorly maintainable “spaghetti code”) and 2.) the environment the notebook is scheduled to run in is the notebook kernel (programming language interpreter optimized for developer interactivity instead of runtime efficiency). 

 

But is Productionizing Notebooks Even a Good Idea?

 

While interactive code interpreters are immensely useful for exploratory data analysis (EDA) and reporting, they are not suitable for quality production code for several reasons:

  1. There is no test harness;
  2. Notebooks discourage modularity and encourage spaghetti scripting;

 

Note: While you could technically import everything_else in a cell and develop in everything_else.py,  this is a) rarely done in practice and b) makes it harder to iterate back to the original notebook with plots and tables suitable for EDA. 

 

  1. Fault tolerance: If one part (a function, e.g.) of the notebook fails or a computer reboots,  data scientists need the ability to pick up work from the last stopping point vs. starting from the beginning. Tools like Airflow/Luigi exist for this reason;
  2. Handling more incoming requests (horizontal scaling is different from scalability provided by Spark); and 

 

Note: One could build a scalable web service around a notebook (ex: https://www.qwak.ai), but in reality, it's something that data scientists usually ask engineers to help execute. 

 

  1. Code reviews and versioning for notebooks is problematic. IDE support is getting better but still not as good as "normal" code.

But perhaps the largest challenge to overcome is the growing gap between data science and the rest of the enterprise stack. For example, a typical marketing tech stack would be maintained by its own engineering team, making data science the weakest link.   

 

We need to reduce workflow bottlenecks and help data science realize data’s full potential 

 

We need clear separation between the data science development environment and the production stack to adequately address respective areas of concern. Existing tools that try to do both have proven to only compromise both environments. Asking data scientists to be more mindful of engineering concerns during development substantially reduces workflow productivity. And it’s not realistic to fundamentally change the way data science operates. So the question is: is it possible to bridge the gap with a single tool? 

LineaPy is also among the new tools to crop up lately but was built with this tooling gap specifically in mind. It gives data scientists a way to work exactly as they do now, while also reaping the benefits of the good engineering ‘stuff’ that comes with the production stack. LineaPy doesn’t try to change the data science or the production stack environments, but rather attempts to act as a much needed bridge (It can be used in interactive computing environments like Jupyter Notebook/Lab or IPython). 

The existence of a tool like LineaPy shows that it’s possible to respect the separation of concerns while still closing the gap between development and production. In general, we should avoid jury rigging existing tools that serve clearly defined purposes to do something else. When we do this, we create more frustrations than the problems we’re solving.
 
 
Mike Arov is a principal machine learning engineer at PostClick, a leading solution for digital advertising conversions.