Topics: AI | Data Science | Data Visualization | Deep Learning | Machine Learning | NLP | Python | R | Statistics

KDnuggets Home » News » 2017 » Oct » Tutorials, Overviews » No order left behind; no shopper left idle. ( 17:n41 )

No order left behind; no shopper left idle.

This post is about how we use Monte Carlo simulations to balance supply and demand in a rapidly growing, high-variance marketplace.

Simulation Based Optimization

Monte Carlo simulation methods offer several advantages for problems like this:

  • no dependencies on previous week’s outcomes
  • model complex systems as interactions between random variables
  • perform stochastic optimization to deal with variances better

The idea is to simulate a lot of different variations of demand and supply, and solve for a set of staffing levels that minimizes idleness and lost deliveries costs across all of the simulation runs.

Each simulation needs to represent a future date for the market we want to staff for. This means that we need orders, stores, locations of shoppers, predictive models to figure out how long it takes to fulfill the orders, and an estimate on how many shoppers are likely to cancel.

Simulating Orders

We first have to start with how many orders we expect on a future date. We have a demand forecasting algorithm that produces a point estimate for demand for each market for each day. But the actual demand doesn’t always match the forecast and is prone to variance. To account for the variability, we build a distribution of our demand forecast based on its error distribution to determine the number of orders we need to fulfill in each simulation.

Example of a demand forecast distribution. Mean = Demand Forecast.

Let’s say we sample a number from this distribution and it is 3100 deliveries. We then sample those many number of orders from all the orders that have been placed historically in that market.

Spatial distribution of demand in a part of San Francisco (noise added on top of actual data)

If we do this over and over again, we address both the variance in the number of orders, and also the variance in space and time density.

Simulating Shoppers

Not every shopper has the same style of working. Some are slow, some are super fast. We model their speeds as a gamma distribution and sample from that. They are also not static in space. They begin their shifts somewhere between their home locations and store locations, and can be anywhere between their last delivery location and a nearest store location during their shifts. By using Markov models, we model distributions of their locations.

Staffing for each simulation

Once we bootstrap a simulation with orders, shoppers, and stores, we figure out the ideal staffing required for that simulation.

An example of a Simulation bootstrapped with orders, shoppers and stores.

We do so by solving our fulfillment problem, which is basically a Vehicle Routing Problem (VRP) with time windows. We use the same algorithm that we actually use to fulfill our orders in a real-time basis, so as to be consistent and make planning match reality as closely as possible.

Solve a VRP for each simulation

Solving the above problem gives us an optimal set of routes, each of which has a start and end time, based on when they are due and how long it takes to fulfill them. From these optimal routes, we work backwards and figure out how many shoppers we need. This approach also allows us to automatically staff more shoppers in stores that are part of more optimal routes.

Example of working backwards from trips to generate staffing levels

Example of number of shoppers required by hour of day

Multiple Simulations

To account for all the variability, we repeat the above process hundreds of times by repeatedly sampling and solving for staffing in each simulation.

Aggregate results from all simulations

Solving For Final Staffing Levels

Ultimately we have to solve for one final set of staffing levels that minimizes all our costs. Let’s say x_h (h: 8 AM to 10PM, x_h are integer variables that range from [0, ∞)represents the number of shoppers required at hour h. The objective is to solve for all x_h.

x_h (h: 8AM to 10 PM) are integer variables that determine number of shoppers required

In this particular simulation, if x_h take values represented by the red line, in the regions marked with 1️, we overstaff and hence incur an idleness cost. In the regions marked with 2, we understaff and hence incur a lost deliveries cost.
By minimizing these costs across all simulation runs, we solve for all x_h:

  • N: number of simulation runs
  • h: hour of day
  • ld_cost: lost deliveries cost
  • x_h: non-negative integer variables
  • l_h,i: shoppers required at hour h in simulation i

Final set of staffing levels

The red line represents the values of x_h after solving the above optimization problem.


Putting everything we discussed so far together, the architecture of our staffing engine looks like this:

We have a core simulation engine that runs several scenarios. We then have an optimizer that takes these simulation runs and constructs a final set of staffing levels.



To test the performance of this approach, we used the staffing levels generated through this approach in production and tracked metrics for our two main objectives: Lost Deliveries and Idleness, and compared them with our previous approaches.

Comparing density plots of Lost Deliveries and Idleness

In our markets that we have historically staffed high to encourage growth, we were able to significantly reduce idleness while maintaining almost the same amount of lost deliveries. This means that we are able to keep growing fast while keeping our shoppers busy and their earning potential high.

Lost deliveries remained steady but significantly lower idleness in our new markets

In markets where we have historically kept staffing tight to maintain a low level of idleness, we were able to lower our lost deliveries while keeping our idleness the same. This makes us even more available for our customers and allows us to grow faster.

Idleness remained steady but lower lost deliveries in our more mature markets


There’s more coming

In a follow up post, we will discuss how we expanded on this approach to staff for hyper-growth and some of the lessons we learnt from doing this project.

If you are interested in working on one of the many challenging problems we have at Instacart, check out our careers page.

Bio: Jagannath Putrevu (@jputrevu) is a Data Scientist at Instacart.

Original. Reposted with permission.


Sign Up

By subscribing you accept KDnuggets Privacy Policy