Tips for a cost-effective machine learning project

Spoiler: you don’t need a VM running 24/7 to handle 16 requests a day.

By François Paupier, Data Engineer


Street art by Mike Mozart


You just released a machine learning project. It can be a new product at your start-up, a proof of concept for a client demo, or a personal project that enriches your portfolio. You are not looking for a production-grade site; you want to get the job done. For cheap. So that a few users can test your product.

How to cost-effectively make your project available?

This post is a follow-up and an update over this previous post, where I introduced, a text generative web-app using ML to generate rap music lyrics.

This project has been serving punchlines for a year by now. I share here the updated architecture that led us to reduce our cloud provider bill from 50$/month to less than a 1$/month.

I use this project as an example, but the proposals discussed here apply to every similar project with a flexible latency requirement.


What to expect?

First, I describe the architecture of the service and what we want to deliver. Then, I define possible ways to achieve our objective. Finally, I zoom in on how we drastically reduced our compute cost using serverless functions.


Service anatomy



First the user fetches the static assets, then locally executes the JS that calls the prediction server to generate lyrics.


  1. First, the user fetches the static files.
  2. Then, the user locally calls the prediction server.
  3. Finally, the server returns the prediction.

The logical separation of concerns remains the same between the initial solution and the new one introduced below. We only update the underlying technology.


Initial solution design


Dollar bills burning

What paying 600$ a year for a porftolio project feels like — Photo by Jp Valery on Unsplash


When we developed raplyrics, we wanted it to be dead simple. It started as a project built and tested on our machines. Then, we pushed it to the cloud.

Several ways to serve a machine learning model exist today. Back in the day, we wanted to get our hands dirty, so we implemented our serving strategy.

Advice: Don’t develop your own machine learning model serving framework — mature solutions such as tensorflow serving exist. TF Serving has exhaustive documentation, and it will be way more efficient than doing it yourself.

That being said, let’s get back to our project. We separated the front from the back;

  1. The client tier was an Apache Http server
  2. The server tier was a Python flask app running the lyric generation model.

We bought a domain model, deployed our code to an EC2, and we were ready to serve users.


The problem

It’s all fun and games until the free trial expires. After the initial 12 months, the monthly bill rocketed up at ~45/50$ a month for this single project while serving 32 users for September 2019.

The truth is, we have a virtual machine with 2GB Ram, running 24/7 to serve dozens of users.


Updated solution design

After the free trial, it became clear that something was wrong in the way we approached this project.

The typical user of our service knows this website is a personal project; it sets the level of expectation.

The typical user generates a dozen lyrics and then go away.

We know what we want to achieve, serve a two-tier architecture with a front-end handling user input that calls a service to generate lyrics. The front and back are loosely coupled. (Only a reference to the backend endpoint in the front).

What are the possibilities, the possible Hows?


Listing the options


  • A — Deploy the same project to another Cloud provider who offers free credits. Repeat.

That’s possible. For example, If you come from AWS, the 300$ credits of GCP can keep you running for a while. Maybe you only need this portfolio project or proof of concept for a client for a limited time.
We want to keep our project for some time; option A is not a great fit.

  • B — Use a static web site for the client tier, serve requests through API call to serverless computing.

What Is Serverless Computing?

Serverless computing is a method of providing backend services on an as-used basis. Servers are still used, but a company that gets backend services from a serverless vendor is charged based on usage, not a fixed amount of bandwidth or number of servers. — From CloudFare

We chose option B, using a static web site and exposing our API on a serverless compute service. The drawback with option B is the added latency at cold starts. A Cold Start is the first run in a while of a serverless computing service; it usually takes longer than Warm Start.


Static website and serverless compute in action

Now that we defined how we want to do it, we can focus on the choice of technology.

Hosting the static page

Multiple static hosting solutions exist. We chose Netlify. It’s easy to get the job done in the least amount of time. Basic hosting, using a custom domain name, and SSL certificate are free on Netlify.

Serving the API with serverless computing

Each cloud provider offers a serverless computing service; we chose Google Cloud and its cloud functions.

Google cloud has a tutorial on how to serve machine learning models through cloud functions. With this tutorial as a baseline, we were able to serve our model with a little refactoring.

Each cloud provider tends to have a slightly different way of handling how they serve cloud functions. Google Cloud also offers Cloud Run, a serverless compute service based on Dockerfile. Using Dockerfiles makes it easier to move the project from one cloud provider to the other.

On the cold start latency

For cold starts, we have to load the model weights (150Mb) from a bucket. Then, the python app loads the weights. In those cases, the response time can reach up to 40s. For warm starts, the response time is usually below 2s. For a portfolio project, we are ok with this cost/latency tradeoff.

We added some UI elements to our front end to make it explicit the first prediction may take some time.



You don’t need a full production scale set up to serve your small project. Aim for the most cost-effective solution.

  • Latency requirements for portfolio projects are not the same as the ones of production services.
  • Static website and using API based on serverless computing is a cost-effective solution to serve your project.

Some tricks are required to handle states, loading resources from network efficiently, but the economy on the bill is worth the shot.

Thanks to Cyril for his thoughtful feedback on the article.



Raplyrics source code is available on GithHub.

Hosting a static website

Serverless compute services

  • AWS — Run code without thinking about servers on AWS, aws lambda
  • Google Cloud — Event-driven serverless compute platform, gcloud functions
  • Azure — Event-driven serverless compute, Azure functions
  • Alibaba — A fully hosted and serverless running environment, function compute

Bio: François Paupier is a versatile data engineer, located in Boston, MA.

Original. Reposted with permission.