Linear Programming and Discrete Optimization with Python using PuLP

Knowledge of such optimization techniques is extremely useful for data scientists and machine learning (ML) practitioners as discrete and continuous optimization lie at the heart of modern ML and AI systems as well as data-driven business analytics processes.

Solving the problem and printing the solution

PuLP has quite a few choices of solver algorithms (e.g. COIN_MP, Gurobi, CPLEX, etc.). For this problem, we do not specify any choice and let the program default to its own choice depending on the problem structure.



We can print the status of the solution. Note, although the status is optimal in this case, it does not need to be so. In case the problem is ill-formulated or there is not sufficient information, the solution may be infeasible or unbounded.

# The status of the solution is printed to the screen
print("Status:", LpStatus[prob.status])

>> Status: Optimal


The full solution contains all the variables including the ones with zero weights. But to us, only those variables are interesting which have non-zero coefficients i.e. which should be included in the optimal diet plan. So, we can scan through the problem variables and print out only if the variable quantity is positive.

for v in prob.variables():
    if v.varValue>0:
        print(, "=", v.varValue)

>> Food_Frozen_Broccoli = 6.9242113
   Food_Scrambled_Eggs = 6.060891
   Food__Baked_Potatoes = 1.0806324


So, the optimal solution is to eat 6.923 servings of frozen broccoli, 6.06 servings of scrambled eggs and 1.08 servings of a baked potato!


You are welcome to download the whole notebook, the data file, and experiment with various constraints to change your diet plan. The code is here in my Github repository.

Finally, we can print the objective function i.e. cost of the diet in this case,

obj = value(prob.objective)
print("The total cost of this balanced diet is: ${}".format(round(obj,2)))


>> The total cost of this balanced diet is: $5.52



What if we want a solution with whole numbers?

As we can see that the optimal result came back with a set of fractional numbers of servings for the food items. This may not be practical and we may want the solution to be forced to have only integer quantities as servings.

This brings to us the technique of integer programming. The algorithm used for the previous optimization is simple linear programming where the variables were allowed to assume any real number value. Integer programming forces some or all of the variables to assume only integer values.

In fact, integer programming is a harder computational problem than linear programming. Integer variables make an optimization problem non-convex, and therefore far more difficult to solve. Memory and solution time may rise exponentially as you add more integer variables.

Fortunately, PuLP can solve an optimization problem with this kind of restrictions too.

The code is almost identical as before, so it is not repeated here. The only difference is that the variables are defined as belonging to Integer category as opposed to Continuous.

food_integer = LpVariable.dicts("Food",food_items,0,cat='Integer')


For this problem, it changes the optimal solution slightly, adding iceberg lettuce to the diet and increasing the cost by $0.06. You will also notice a perceptible increase in the computation time for the solution process.

Therefore, the optimal balanced diet with whole servings consists of
Food_Frozen_Broccoli = 7.0
Food_Raw_Lettuce_Iceberg = 1.0
Food_Scrambled_Eggs = 6.0
Food__Baked_Potatoes = 1.0


A cool application of integer programming is solving a driver-scheduling problem which can be an NP-hard problem. See this article (also note in the article, how they compute the costs of various actions and use them in the optimization problem),

An Intro to Integer Programming for Engineers: Simplified Bus Scheduling
This article is part of Remix’s series on the software engineering problems we face. In this installment, Remix…


How to incorporate binary decisions in a linear programming problem?

Often, we want to include some kind of ‘If-then-else” kind of decision logic in the optimization problem.

What if we don’t want both broccoli and iceberg lettuce to be included in the diet (but only one of them is fine)?

How do we represent such decision logic in this framework?


Turns out, for this kind of logic, you need to introduce another type of variables called indicator variables. They are binary in nature and can indicate the presence or absence of a variable in the optimal solution.

Involving indicator function as a constraint in a LP problem
Thanks for contributing an answer to Mathematics Stack Exchange! Please be sure to answer the question. Provide details…

But for this particular problem, there is an apparent problem with using indicator variables. Ideally, you want the cost/nutritional value of a food item to be included in the constraint equation if the indicator variable is 1 and ignore it if is zero. Mathematically, it is intuitive to write this as a product of the original term (involving the food item) and the indicator variable. But the moment you do that, you are multiplying two variables and making the problem nonlinear! It falls under the domain of quadratic programming (QP) in that case (quadratic because the terms are now the product of two linear terms).

The popular machine learning technique Support Vector Machine essentially solves a quadratic programming problem.

What is an intuitive explanation of quadratic programming, and how is it defined in SVM?
Answer: Quadratic programming involves minimizing a form that is quadratic in the components of the unknown vector…

However, this general concept of using an indicator variable for expressing binary logic in a linear programming problem is also extremely useful. We have given a link to a problem of solving Sudoku puzzle by LP in the next section where this trick is used.

It turns out that there is a clever trick to incorporate such binary logic in this LP without making it a QP problem.

We can denote the binary variables as food_chosen and instantiate them as Integer with lower and upper bounds of 0 and 1.

food_chosen = LpVariable.dicts("Chosen",food_items,0,1,cat='Integer')


Then we write a special code to link the usual food_vars and the binary food_chosen and add this constraint to the problem.

for f in food_items:
    prob += food_vars[f]>= food_chosen[f]*0.1
    prob += food_vars[f]<= food_chosen[f]*1e5


If you stare at the code long enough, you will realize this effectively means that we are giving food_vars importance only if the corresponding food_chosenindicator variable is 1. But this way we avoid the direct multiplication and keep the problem structure linear.

To incorporate the either/or condition of broccoli and iceberg lettuce, we just put a simple code,

prob += food_chosen['Frozen Broccoli']+food_chosen['Raw Iceberg Lettuce']<=1


This ensures the sum of these two binary variables is at most 1, which means only one of them can be included in the optimal solution but not both.


More applications of linear/integer programming

In this article, we showed the basic flow of setting up and solving a simple linear programming problem with Python. However, if you look around, you will find countless examples of engineering and business problems which can be transformed into some form of LP and then solved using efficient solvers. Following are some of the canonical examples to get you started thinking,

Many machine learning algorithms also use the general class of optimization of which linear programming is a subset — convex optimization. See the following article for more information about it,

What lies beneath? Optimization at the heart of Machine Learning
We show the core optimization frameworks behind the most popular machine learning/statistical modeling


Summary and conclusion

In this article, we illustrated solving a simple diet optimization problem with linear and integer programming techniques using Python package PuLP. It is noteworthy that even the widely-used SciPy has a linear optimization method built-in. Readers are encouraged to try various other Python libraries and choose a good method for themselves.

If you have any questions or ideas to share, please contact the author at tirthajyoti[AT] Also, you can check the author’s GitHub repositories for other fun code snippets in Python, R, or MATLAB and machine learning resources. If you are, like me, passionate about machine learning/data science, please feel free to add me on LinkedIn or follow me on Twitter.

Tirthajyoti Sarkar - Sr. Principal Engineer - Semiconductor, AI, Machine Learning - ON…
Georgia Institute of Technology Master of Science - MS, Analytics This MS program imparts theoretical and practical…

Bio: Tirthajyoti Sarkar is the Senior Principal Engineer at ON Semiconductor working on Deep Learning/Machine Learning based design automation projects.

Original. Reposted with permission.