The Forgotten Algorithm

This article explores Monte Carlo Simulation with Streamlit.

By Ian Xiao, Engagement Lead at Dessa

TL;DR —When we talk about Machine Learning, we often think of supervised and unsupervised learning. In this article, I want to discuss an often forgotten but equally powerful algorithm: Monte Carlo Simulation. I will share a general design framework and a few techniques with an interactive tool. Finally, you can also find a list of good simulation tools at the end of the article.

Disclaimer: this is not sponsored by Streamlit, any of the tools I mention, nor any of the firms I work for. I use data science and machine learning interchangeably.

Like What You Read? Follow me on Medium, LinkedIn, Twitter.

The Undeserving One

Inthe recent Machine Learning (ML) uprising, supervised and unsupervised learning algorithms, such as classification with Deep Learning and clustering with KNN, got most of the spotlight. When these algorithms receive flattering praises from the enthusiastic community, something equally powerful and elegant sits in the dark corner calmly and quietly. Its name is Monte Carlo — the forgotten and undeserving hero of atomic physics, modern finance, and gambling (or a villain depends on your opinions of these matters).

Note: I will refer to supervise and unsupervised learning methods as “ML algorithm” and Monte Carlo methods as “Simulation” for brevity.

A Short History

Stanislaw Ulam, Enrico Fermi, and John von Neumann — the geniuses at Los Alamos — invented, improved, and popularized the Monte Carlo method in the 1930s for a not-so-noble cause (hint: it’s not for the bomb). Watch the video to find out more.

A Short History of Monte Carlo Simulation (YouTube)

What is Monte Carlo Simulation?

If I were to summarize what Monte Carlo simulation is in one sentence, here it is: Fake it a billion times until we kind of know what the reality is.

via GIPHY

On a technical (and more serious) level, the goal of the Monte Carlo method is to approximate the expectations of outcomes given various inputs, uncertainty, and system dynamics. This video walks through some high-level mathematics for those who are interested.

Monte Carlo Approximation, YouTube

Why use Simulation?

If I were to highlight one (oversimplified) advantage of Simulation over ML algorithms, it would be this: Exploration. We use Simulation to understand the inner working of any systems at any scale (e.g. the world, a community, a company, a team, a person, a fleet, a car, a wheel, an atom, etc.)

By re-creating a system virtually with simulations, we can calculate and analyze hypothetical results without actually changing the world or waiting for real events to happen. In other words, Simulations allow us to ask bold questions and develop tactics to manage various future outcomes without much risk and investment.

When to use Simulation, instead of ML?

According to Benjamin Schumann, a well-known simulation expert, Simulation is process-driven while ML is data-centric. To produce good Simulation, we need to understand the process and underlying principles of a system. In contrast, we can create reasonably good predictions using ML by only using data from a data warehouse and some out-of-box algorithms.

In other words, creating good simulation is often more expensive financially and cognitively. Why would we ever use Simulation?

Well, consider three simple questions:

Do you have data in a data warehouse to represent the business problem?
Do you have enough of these data — quantity- and quality-wise — to build a good ML model?
Is prediction more important than exploration (e.g. ask what-if questions and develop tactics to support business decisions)?

If you answer “No” to any of these, then you should consider using Simulation instead of ML algorithms.

How to Design a Monte Carlo Simulation?

To create a Monte Carlo Simulation, at the minimum, it follows a 3-step process:

Simulation Process, Author’s Analysis

As you can see, creating a Monte Carlo simulation still requires data, and more importantly, some understanding of the system dynamics (e.g. the relationship between sales volume and price). To obtain such knowledge, it typically requires talking to experts, studying process flows, and observing real business operations.

Yet Another Simulator

To see how the basic concepts come to live, you can go to Yet Another Simulator — it’s an interactive tool I developed using Streamlit.

On the Welcome Page, you can play with various input setup and observe how the outcome changes depending on the function you apply.

Welcome Page of Yet Another Simulator, Author’s Work

In addition to the basic example, the tool includes 4 case studies that discuss various design techniques, such as Influence Diagram, Sensitivity Analysis, Optimization, and Combining ML with Simulation.

For example, in the CMO example, I discuss how to use the Influence Diagram to help design a simulation to solve an advertisement budget allocation problem.

Influence Diagram, Author’s Work

Finally, you will step into the shoes of a Data Scientist who advises the Chief Marketing Officer (CMO). Your goal is to help the CMO to decide how much to spend on advertising, explore various scenarios, and come up with tactics to maximize return under different uncertainties.

Ad Budget Allocation, Author’s Work

I hope the examples illustrate how Monte Carlo Simulation works, its strength in allowing us to explore compared to ML algorithms, and how you can design useful simulations with different design techniques.

Some of the case studies are still under active development. Sign-up here to get notified when they are ready.

To Sum Up

I hope this article offers another look at the Monte Carlo method; we often forget such a useful tool in today’s ML discussion. Simulation has many strengths that traditional ML algorithms can’t provide — for example, the ability to explore big questions under tremendous uncertainty.

In an upcoming article, I will discuss how to combine ML and Simulation in a real business setting to get the best of both worlds and how to articulate the implications of the different simulated scenarios.

Stay tuned by following me on Medium, LinkedIn, or Twitter.

Until next time,

Ian

via GIPHY

If you like this article, you may also like these:

12-Hour ML Challenge
How to build & deploy an ML app with Streamlit and DevOps tools

A Doomed Marriage of ML and Agile
How not to apply Agile on an ML project

Data Science is Boring
How I cope with the boring days of deploying Machine Learning

The Last Defense against Another AI Winter
The numbers, five tactical solutions, and a quick survey

The Last-Mile Problem of AI
One Thing Many Data Scientists Don’t Think Enough About

We Created a Lazy AI
How to Design and Implement Reinforcement Learning for the Real World

Popular Tools

When I discuss Simulation, many people asked for suggestions on tools. Here is a list of tools I know, choose the ones that fit your purpose. Enjoy.

AnyLogic (This is probably the go-to tool for simulation professionals; Freemium)
Simio (Freemium)
Yasai (Excel Add-In, Free)
Oracle Crystal Ball (Freemium)
SimPy (Python Package, Free)
Hash (start-up in stealth mode as of the time of writing. Pretty solid founding team. Probably Freemium)

Reference

History of Decision Tree — http://pages.stat.wisc.edu/~loh/treeprogs/guide/LohISI14.pdf

History of Clustering — https://link.springer.com/chapter/10.1007/978-3-540-73560-1_15

Time to Marry Simulation and ML — https://www.benjamin-schumann.com/blog/2018/5/7/time-to-marry-simulation-models-and-machine-learning

Taxonomy of Simulation — https://gamingthepast.net/theory-practice/simulation-design-guide/

What is Monte Carlo and How it works — https://www.palisade.com/risk/monte_carlo_simulation.asp

Bio: Ian Xiao is Engagement Lead at Dessa, deploying machine learning at enterprises. He leads business and technical teams to deploy Machine Learning solutions and improve Marketing & Sales for the F100 enterprises.

Original. Reposted with permission.

Related: