No order left behind; no shopper left idle.
This post is about how we use Monte Carlo simulations to balance supply and demand in a rapidly growing, high-variance marketplace.
Simulation Based Optimization
Monte Carlo simulation methods offer several advantages for problems like this:
- no dependencies on previous week’s outcomes
- model complex systems as interactions between random variables
- perform stochastic optimization to deal with variances better
The idea is to simulate a lot of different variations of demand and supply, and solve for a set of staffing levels that minimizes idleness and lost deliveries costs across all of the simulation runs.
Each simulation needs to represent a future date for the market we want to staff for. This means that we need orders, stores, locations of shoppers, predictive models to figure out how long it takes to fulfill the orders, and an estimate on how many shoppers are likely to cancel.
We first have to start with how many orders we expect on a future date. We have a demand forecasting algorithm that produces a point estimate for demand for each market for each day. But the actual demand doesn’t always match the forecast and is prone to variance. To account for the variability, we build a distribution of our demand forecast based on its error distribution to determine the number of orders we need to fulfill in each simulation.
Example of a demand forecast distribution. Mean = Demand Forecast.
Let’s say we sample a number from this distribution and it is 3100 deliveries. We then sample those many number of orders from all the orders that have been placed historically in that market.
Spatial distribution of demand in a part of San Francisco (noise added on top of actual data)
If we do this over and over again, we address both the variance in the number of orders, and also the variance in space and time density.
Not every shopper has the same style of working. Some are slow, some are super fast. We model their speeds as a gamma distribution and sample from that. They are also not static in space. They begin their shifts somewhere between their home locations and store locations, and can be anywhere between their last delivery location and a nearest store location during their shifts. By using Markov models, we model distributions of their locations.
Staffing for each simulation
Once we bootstrap a simulation with orders, shoppers, and stores, we figure out the ideal staffing required for that simulation.
An example of a Simulation bootstrapped with orders, shoppers and stores.
We do so by solving our fulfillment problem, which is basically a Vehicle Routing Problem (VRP) with time windows. We use the same algorithm that we actually use to fulfill our orders in a real-time basis, so as to be consistent and make planning match reality as closely as possible.
Solve a VRP for each simulation
Solving the above problem gives us an optimal set of routes, each of which has a start and end time, based on when they are due and how long it takes to fulfill them. From these optimal routes, we work backwards and figure out how many shoppers we need. This approach also allows us to automatically staff more shoppers in stores that are part of more optimal routes.
Example of working backwards from trips to generate staffing levels
Example of number of shoppers required by hour of day
To account for all the variability, we repeat the above process hundreds of times by repeatedly sampling and solving for staffing in each simulation.
Aggregate results from all simulations
Solving For Final Staffing Levels
Ultimately we have to solve for one final set of staffing levels that minimizes all our costs. Let’s say x_h (h: 8 AM to 10PM, x_h are integer variables that range from [0, ∞)) represents the number of shoppers required at hour h. The objective is to solve for all x_h.
x_h (h: 8AM to 10 PM) are integer variables that determine number of shoppers required
In this particular simulation, if x_h take values represented by the red line, in the regions marked with 1️, we overstaff and hence incur an idleness cost. In the regions marked with 2, we understaff and hence incur a lost deliveries cost.
By minimizing these costs across all simulation runs, we solve for all x_h:
- N: number of simulation runs
- h: hour of day
- ld_cost: lost deliveries cost
- x_h: non-negative integer variables
- l_h,i: shoppers required at hour h in simulation i
Final set of staffing levels
The red line represents the values of x_h after solving the above optimization problem.
Putting everything we discussed so far together, the architecture of our staffing engine looks like this:
We have a core simulation engine that runs several scenarios. We then have an optimizer that takes these simulation runs and constructs a final set of staffing levels.
To test the performance of this approach, we used the staffing levels generated through this approach in production and tracked metrics for our two main objectives: Lost Deliveries and Idleness, and compared them with our previous approaches.
Comparing density plots of Lost Deliveries and Idleness
In our markets that we have historically staffed high to encourage growth, we were able to significantly reduce idleness while maintaining almost the same amount of lost deliveries. This means that we are able to keep growing fast while keeping our shoppers busy and their earning potential high.
Lost deliveries remained steady but significantly lower idleness in our new markets
In markets where we have historically kept staffing tight to maintain a low level of idleness, we were able to lower our lost deliveries while keeping our idleness the same. This makes us even more available for our customers and allows us to grow faster.
Idleness remained steady but lower lost deliveries in our more mature markets
There’s more coming
In a follow up post, we will discuss how we expanded on this approach to staff for hyper-growth and some of the lessons we learnt from doing this project.
If you are interested in working on one of the many challenging problems we have at Instacart, check out our careers page.
Original. Reposted with permission.