12-Hour Machine Learning Challenge: Build & deploy an app with Streamlit and DevOps tools
This article will present the knowledge, process, tools, and frameworks required for completing a 12-hour ML challenge. I hope you can find it useful for your personal or professional projects.
By Ian Xiao, Engagement Lead at Dessa
TL;DR —In this article, I want to share my learnings, process, tools, and frameworks for completing a 12-hour ML challenge. I hope you can find it useful for your personal or professional projects.
Here is a table of content to help you navigate:
- Part 1: Find a Good Problem
- Part 2: Define the Constraints
- Part 3: Think, Simplify, & Prioritize
- Part 4: Sprint Planning
- Part 5: the App
- Part 6: Lessons Learned
- Bonus: Process & Tools for Lazy Programmers
Disclaimer: this is not sponsored by Streamlit, any of the tools I mention, nor any of the firms I work for.
Step 1: Find a Good Problem (The Christmas Problem)
Well, Christmas. It used to be the time of the year when I hung out with my wife and puppy on the couch and binge-watched movies and shows.
Then, this Christmas. Something changed. For some reason, most of the stuff I find on Netflix or YouTube seemed to be quite boring. Maybe I’ve reached a tipping point of zero utility gain from watching a similar content pushed by the recommendation algorithms. The algorithms that know me so well (maybe too well).
I realized a problem: I am trapped by the recommendation algorithms that know me so well — I am imprisoned digitally (this post takes a more design lens).
I can’t seem to find stuff outside of the content bubble. Everything the algorithms think I am interested in has gradually become boring; it’s ironic. I want to get out!
The point is this: find a problem that’s annoying enough. It doesn’t have to be curing cancer or eliminating hunger (if you can, bravo!), just something meaningful enough so you are willing to commit and get started.
Step 2: Define Constraints (the 12 Hour Challenge)
- ~12 hours of total working time; they don’t need to be consecutive hours
- must ship a usable and stable app for users other than myself
- must have an ML component, but no unnecessary complexity
- must share the work & learnings with others (a.k.a write this post)
- (the experience must be fun)
Why having a deadline? According to Matt:
… having a deadline focuses individuals on prioritizing what they need to focus on in order to get their project to a workable state. Individuals must factor in the time it takes to design a project, to come up with a solution, deal with any unforeseen technicalities and everything in between to make it to the deadline.
(So, why only 12 hours, instead of 48 hours? Well, I am not as intense as Matt. If you decide to do this, pick a time frame that works the best for you and sticks with it. The point is to execute and ship.)
Here is my rough time budget for all the work that’s involved:
- 2 hours: have a rough design of the app (e.g. research, UX, architecture).
- 8 hours: re-design, build, and test the app iteratively.
- 2 hours: write, edit, and publish this article (and this).
Step 3: Think, Simplify, Prioritize, and Repeats
Before coding, I need to address a few important questions to 1) crystalize what exactly I need to build and 2) prioritize what to build in the 12 hours. Although not exhaustive, here are some guiding questions:
Putting a Product hat on, who are the users? What do the users want and need? How do their needs differ by segment? Which user group should I target first? What are the features to address the needs? …
Putting a Data Science hat on, what data do I need and is available? Do I need BI analytics or predictive model? What business and model metric should I use? How do I measure performance? …
Putting a Designer hat on, what emotions does the app need to trigger? What colour scheme should I use? How does the user journey look like? Given the features, what is the best user interaction? …
Putting an Engineer hat on, how many users does the app need to support at a time? What does the development to deployment process look like? What technology stack to use that can balance prototyping speed and scalability? …
Putting a Business hat on, how do I monetize the app? How to grow and sustain the audience for the app? How to minimize the cost of running technology and operation? …
As you can imagine, this exercise can get overwhelming quickly. Be sure to pull back from the urge of trying to solve everything.
Ultimately, here are the top three “user wants/needs” I can address and the corresponding features in 8 hours of development:
- I want surprises: the app should be able to suggest movies I haven’t seen before or different than my normal viewing history. > “Today’s Pick & Filtering”
- I need to choose: the app should be able to show a trailer and provide some information about the movie quality. > “Trailer and Rating”
- I want to control: the app should offer a simple way to allow users to control how different the suggestions look like. > “Filter Panel & Smart Exploration”
Here are a few things I’d love to build, but de-prioritized:
- user authentication / messaging
- back-up on the Cloud
- multi-model recommendation
- customer service bot
With the features in mind, here is the rough architecture design of the solution, key components, and their interactions.
Note: this is the output of an iterative process. Yout initiate thinking might look very different. See Lessons Learn for tips on how to decide what to build vs. not.
Step 4: Sprint Planning & Execute
I decided to build this in four 2-hour sprints. Here were the rough outcomes of each sprint:
Sprint 1: an automated development-to-deployment pipeline; a simple click-able “Today’s Pick” and filtering features served on Heroku.
Sprint 2: Build out the ETL; a set of automatic test cases for the ETL; improved front-end with YouTube Trailers & Personalized section with dummy data. Run time optimization.
Sprint 3: Build out API for Smart Exploration. Integrate with front-end with a dummy model. Research on modelling options. More run time optimization.
Sprint 4: Refactor and optimize a KNN-based Collaborative Filtering model. Add modelling test cases. Code clean-up and more optimization.
Step 5: Ta-dah.
YAME was born. Now you can use YAME here to find something interesting for your weekdays, weekends, date nights, and family gatherings. The app aims to provide the convenience of a search engine while offering control without overloading the users.
Convenience: The landing page has five movies the system recommends. It updates daily. The algorithm picks movies across years and genres; it tries to be unbiased.
Some control: if you don’t like what you see or just wonder what’s out there, you can choose the year and genre using the panel on the left.
More control without sacrificing convenience: If you really want something else, you can explore based on how “adventurous” you feel today with a simple interface. This UI allows users to have an option to choose. Users can decide what they might want to see without being cognitively overloaded.
Step 6: Lessons Learned
1/ Be safe, be fast, be lazy. Automate tests before anything else. If you find yourself manually testing something regularly, invest a bit of time and automate it. Having automatic testing with PyTest and CircleCI saved so much headache. For an ML app, you should have two sets of tests. One for software testing (e.g. unit and integration tests), the other for model testing (e.g. minimum performance and edge cases). Having dynamic test cases (inputs that are driven by random numbers) also helps to catch bugs in edge cases that are hard to anticipate.
2/ Avoid the Kaggle Trap. Since I only budgeted ~4 hours to work on the ML component, the key is to build a just-good-enough model to validate the functionality and usefulness of the ML feature. It’s very easy to fall into the trap of “Kaggle Mode” (e.g. spending lots of time building complex models for small performance gain). I use a Model-UX analysis to help set the boundary. This analysis is not meant to be a scientific exercise, but a tool to keep you away from Kaggling.
Note: the threshold of the minimum model performance varies on the use case. For example, an app that shows synthetic faces using GAN or fraud detection will likely need a very good model performance to convince users of its usefulness.
So, my strategy is to start with the simplest model: a “model” that’s driven by a random number generator. Although it sounds naive from a modelling standpoint, it adds the greatest value to the UX with the least amount of development time (~5 mins). Users can play with a Personalization feature, which didn’t exist. It doesn’t really matter if it’s providing the “best” recommendation, the key is to validate the feature. Then I evolve the model to a rule-based and KNN-based Collaborative Filtering algorithm.
3/ Building is fun, prioritizing isn’t. Here are some tips to make it easier:
- Start with the most annoying and profitable problem (don’t care too much about profit in this exercise).
- Think of an ideal solution & budget how much time you need to build it; keep in mind that you will likely under-estimate, but it’s okay.
- Cut the time to 1/3, re-think the solution and see if you are comfortable implementing without a significant amount of research (some research is still good for learning)
- Repeat until the scope fits into a 2- to 4-hour timeframe
If you like and want to support YAME, please check out my Patreon page. The support will go towards covering the cost of running and improving YAME (e.g. server, website, etc.).
Until Next Time,
Bonus: Process & Tools for Lazy Programmers
For anyone who’s interested (and got this far), I wanted a workflow that’s as automated as possible, so I can spend my time designing and coding, instead of doing manual testing or move codes around. Everyone has their own preferences. The key for me is being able to iterate fast and be ready to scale.
From a tech stack standpoint, here are the tools I chose (also a few alternatives):
- Python as the programming language for general workflow, ETL, and modelling. (alternative: SQL for ETL, R for Modelling, and Java for workflow)
- Streamlit as the front-end tool. It’s python based. Out of the box, it comes with most of the widgets I need for the User Experience; and it’s web- and mobile-friendly. It encouraged me to focus on user experiences much as modelling. Jupyter is great, but I feel like it tends to keep people in the Kaggle Trap. (alternative: Flask, Django, or React for the front end; Jupyter Notebook for Analysis and Model Experimentation)
- Postgres as the back-end database tool. (alternative: GCP, AWS, Azure; note that SQLite doesn’t work with Heroku below if you want to follow the same setup)
From a DevOps standpoint, here are the tools:
- PyCharm as the IDE (alternative: Sublime, Atom)
- Github for code versioning (alternative: DVC)
- PyTest for managing test cases and run automatic testing
- Circle CI for Continous Integration and Deployment (alternative: Jenkins)
- Heroku for web-hosting (alternative: Cloud solutions such as GCP, AWS, Azure, or Paperspace)
If you are as lazy as I am as a programmer, I highly recommend you to invest the time upfront to set up this DevOps workflow. It saves lots of time from manual testing and deployment. More importantly, it really safeguards your codebase from stupid bugs.
Note: The reason why I didn’t choose the alternatives is to avoid over-engineering and not being familiar enough to have the efficiency gain.
If you like this article, you may also like these …
Data Science is Boring
How I cope with the boring days of deploying Machine Learning
We Created a Lazy AI
How to Design and Implement Reinforcement Learning for the Real World
A Doomed Marriage of ML and Agile
How not to apply Agile on an ML project
How to develop and manage a happy data science team
The Last Defense against Another AI Winter
The numbers, five tactical solutions, and a quick survey
The Last-Mile Problem of AI
One Thing Many Data Scientists Don’t Think Enough About
Bio: Ian Xiao is Engagement Lead at Dessa, deploying machine learning at enterprises. He leads business and technical teams to deploy Machine Learning solutions and improve Marketing & Sales for the F100 enterprises.
Original. Reposted with permission.
- We Created a Lazy AI
- How to Write Web Apps Using Simple Python for Data Scientists
- Building an image search service from scratch
|Top Stories Past 30 Days|