Genetic Programming in Python: The Knapsack Problem

This article explores the knapsack problem. We will discuss why it is difficult to solve traditionally and how genetic programming can help find a "good enough" solution. We will then look at a Python implementation of this solution to test out for ourselves.



The Knapsack Problem Using Genetic Programming: A Python Implementation
Image created with DALL•E

 

Editor's note: As pointed out by reader Monem in the comments below, this is an example of using a genetic algorithm to solve the knapsack problem, not genetic programming. I got involved in the production of the article and made a very basic error that carried all the way through. For transparency, I'm leaving as it, with this correction note. Apologies for the confusion.

 

In this article, we will look at the knapsack problem, a classic in computer science. We will explain why it is difficult to solve using traditional computational methods, and how genetic programming can help find a "good enough" solution. Afterwards, we will look at a Python implementation of just such a solution to test out for ourselves.

 

The Knapsack Problem

 
The knapsack problem can be used to illustrate the difficulty of solving complex computational problems. In its simplest form, one is given a knapsack of a certain capacity, a set of items with their sizes and values, and asked to maximize the value of the items placed in the knapsack without exceeding the capacity. The knapsack problem can be formulated in many ways, but it is generally considered to be a difficult problem to solve when employing traditional algorithms.

The difficulty of the knapsack problem lies in the fact that it is an NP-complete problem, which means that there is no known solution that can guarantee a globally optimal solution. Therefore, the problem is intractable and cannot be quickly solved using traditional methods. The best known algorithms for solving the knapsack problem involve using brute force search or heuristics, which can have lengthy run times, and which may not guarantee an optimal solution.

 

Genetic Programming

 
Genetic programming, however, can provide an alternative method for finding a solution to the knapsack problem. Genetic programming is a technique that uses evolutionary algorithms to search for solutions to complex problems. By using genetic programming, it is possible to quickly find a solution that is “good enough” for the given problem. It can also be used to optimize and improve upon solutions.

In genetic programming, a set of possible solutions (or initial generation) are randomly generated, and then evaluated based on a set of criteria. Those solutions that best fit the criteria are then selected, and genetic mutations are applied to create new solution variants (or subsequent generations). This new generation of variants is then evaluated and the process is repeated until a satisfactory solution is found. The process is repeated until an optimal, or best “good enough”, solution is found.

The advantage of using genetic programming to solve the knapsack problem is that a good enough solution can be found quickly without having to exhaustively search through all possible solutions. This makes it a much more efficient approach than traditional algorithms, and allows for a much faster solution to be found.

 

Explaining the Code

 
Now that we have an idea of what the knapsack problem is, what genetic programming is, and why we would use the latter to try and solve the former, let's have a look at a Python implementation of what we have described above.

We will go through the important functions one by one, and then look at the holistic program afterwards.

The program is implemented in Python, using no third party libraries.

 

Generate random population

 
This function generates a random population of a given size. It uses a for loop to iterate through the given size, and for each iteration it creates a chromosome. This chromosome is a list of 0s and 1s, which is generated using the random.choice() function. The chromosome is then appended to the population list. Finally, the function prints out a message and returns the population list. This function is useful for creating a population of individuals for genetic algorithms.

def generate_population(size):
	population = []
	for _ in range(size):
		genes = [0, 1]
		chromosome = []
		for _ in range(len(items)):
			chromosome.append(random.choice(genes))
		population.append(chromosome)
	print("Generated a random population of size", size)
	return population

 

Calculate chromosome fitness

 
This function is used to calculate the fitness of a chromosome. It takes a chromosome as an argument and iterates through it. If the value of the chromosome at a given index is 1, it adds the corresponding item's weight and value to the total weight and total value respectively. If the total weight exceeds the maximum weight, the fitness is set to 0. Otherwise, the fitness is set to the total value. This function is used in genetic algorithms to determine the fitness of a given chromosome.

def calculate_fitness(chromosome):
	total_weight = 0
	total_value = 0
	for i in range(len(chromosome)):
		if chromosome[i] == 1:
			total_weight += items[i][0]
			total_value += items[i][1]
	if total_weight > max_weight:
		return 0
	else:
		return total_value

 

Select chromosomes

 
This function is used to select two chromosomes from a population for crossover. It first calculates the fitness values of each chromosome in the population using the calculate_fitness() function. It then normalizes the fitness values by dividing each value by the sum of all fitness values. Finally, it uses the random.choices() function to randomly select two chromosomes from the population based on the normalized fitness values. The two selected chromosomes are then returned as the parent chromosomes for crossover.

def select_chromosomes(population):
	fitness_values = []
	for chromosome in population:
		fitness_values.append(calculate_fitness(chromosome))
	
	fitness_values = [float(i)/sum(fitness_values) for i in fitness_values]
	
	parent1 = random.choices(population, weights=fitness_values, k=1)[0]
	parent2 = random.choices(population, weights=fitness_values, k=1)[0]
	
	print("Selected two chromosomes for crossover")
	return parent1, parent2

 

Perform crossover

 
This function performs crossover between two chromosomes. It takes two parent chromosomes as input and randomly selects a crossover point. It then creates two child chromosomes by combining the two parent chromosomes at the crossover point. The first child chromosome is created by taking the first part of the first parent chromosome and the second part of the second parent chromosome. The second child chromosome is created by taking the first part of the second parent chromosome and the second part of the first parent chromosome. Finally, the two child chromosomes are returned as the output.

def crossover(parent1, parent2):
	crossover_point = random.randint(0, len(items)-1)
	child1 = parent1[0:crossover_point] + parent2[crossover_point:]
	child2 = parent2[0:crossover_point] + parent1[crossover_point:]
	
	print("Performed crossover between two chromosomes")
	return child1, child2

 

Perform mutation

 
This function performs a mutation on a chromosome. It takes in a chromosome as an argument and uses the random module to generate a random number between 0 and the length of the chromosome. If the value at the mutation point is 0, it is changed to 1, and if it is 1, it is changed to 0. The function then prints a message and returns the mutated chromosome. This function can be used to simulate genetic mutations in a population of organisms.

def mutate(chromosome):
	mutation_point = random.randint(0, len(items)-1)
	if chromosome[mutation_point] == 0:
		chromosome[mutation_point] = 1
	else:
		chromosome[mutation_point] = 0
	print("Performed mutation on a chromosome")
	return chromosome

 

Get best chromosome

 
This function takes in a population of chromosomes and returns the best chromosome from the population. It does this by first creating a list of fitness values for each chromosome in the population. It then finds the maximum fitness value and its corresponding index in the list. Finally, it returns the chromosome at the index of the maximum fitness value. This function is useful for finding the best chromosome from a population of chromosomes in order to use it for further operations.

def get_best(population):
	fitness_values = []
	for chromosome in population:
		fitness_values.append(calculate_fitness(chromosome))

	max_value = max(fitness_values)
	max_index = fitness_values.index(max_value)
	return population[max_index]

 

The control loop

 
This code is performing an evolutionary algorithm to evolve a population of chromosomes. It starts by looping through the specified number of generations. For each generation, two chromosomes are selected from the population, and then crossover is performed on them to generate two new chromosomes. Then, the two new chromosomes are subjected to mutation with a given probability. The new chromosomes are individually subject to a random genetic mutation if the randomly generated probability is above the predetermined threshold. Finally, the old population is replaced with the new population, which consists of the two new chromosomes and the remaining chromosomes from the old population.

for _ in range(generations):
	# select two chromosomes for crossover
	parent1, parent2 = select_chromosomes(population)

	# perform crossover to generate two new chromosomes
	child1, child2 = crossover(parent1, parent2)

	# perform mutation on the two new chromosomes
	if random.uniform(0, 1) < mutation_probability:
		child1 = mutate(child1)
	if random.uniform(0, 1) < mutation_probability:
		child2 = mutate(child2)

	# replace the old population with the new population
	population = [child1, child2] + population[2:]

 

The Full Python Implementation

 
If we take the above functions and control loop, add a list of item along with a few parameters and some output to the console, we get the following full Python implementation.

Note that all of the parameters are hardcoded for simplicity; however, with little trouble one could instead accept command line arguments or solicit user input for any of them, including the number, value, and weight of available items.

import random

# function to generate a random population
def generate_population(size):
	population = []
	for _ in range(size):
		genes = [0, 1]
		chromosome = []
		for _ in range(len(items)):
			chromosome.append(random.choice(genes))
		population.append(chromosome)
	print("Generated a random population of size", size)
	return population

# function to calculate the fitness of a chromosome
def calculate_fitness(chromosome):
	total_weight = 0
	total_value = 0
	for i in range(len(chromosome)):
		if chromosome[i] == 1:
			total_weight += items[i][0]
			total_value += items[i][1]
	if total_weight > max_weight:
		return 0
	else:
		return total_value

# function to select two chromosomes for crossover
def select_chromosomes(population):
	fitness_values = []
	for chromosome in population:
		fitness_values.append(calculate_fitness(chromosome))
	
	fitness_values = [float(i)/sum(fitness_values) for i in fitness_values]
	
	parent1 = random.choices(population, weights=fitness_values, k=1)[0]
	parent2 = random.choices(population, weights=fitness_values, k=1)[0]
	
	print("Selected two chromosomes for crossover")
	return parent1, parent2

# function to perform crossover between two chromosomes
def crossover(parent1, parent2):
	crossover_point = random.randint(0, len(items)-1)
	child1 = parent1[0:crossover_point] + parent2[crossover_point:]
	child2 = parent2[0:crossover_point] + parent1[crossover_point:]
	
	print("Performed crossover between two chromosomes")
	return child1, child2

# function to perform mutation on a chromosome
def mutate(chromosome):
	mutation_point = random.randint(0, len(items)-1)
	if chromosome[mutation_point] == 0:
		chromosome[mutation_point] = 1
	else:
		chromosome[mutation_point] = 0
	print("Performed mutation on a chromosome")
	return chromosome

# function to get the best chromosome from the population
def get_best(population):
	fitness_values = []
	for chromosome in population:
		fitness_values.append(calculate_fitness(chromosome))

	max_value = max(fitness_values)
	max_index = fitness_values.index(max_value)
	return population[max_index]


# items that can be put in the knapsack
items = [
		[1, 2],
		[2, 4],
		[3, 4],
		[4, 5],
		[5, 7],
		[6, 9]
	]

# print available items
print("Available items:\n", items)

# parameters for genetic algorithm
max_weight = 10
population_size = 10
mutation_probability = 0.2
generations = 10

print("\nGenetic algorithm parameters:")
print("Max weight:", max_weight)
print("Population:", population_size)
print("Mutation probability:", mutation_probability)
print("Generations:", generations, "\n")
print("Performing genetic evolution:")

# generate a random population
population = generate_population(population_size)

# evolve the population for specified number of generations
for _ in range(generations):
	# select two chromosomes for crossover
	parent1, parent2 = select_chromosomes(population)

	# perform crossover to generate two new chromosomes
	child1, child2 = crossover(parent1, parent2)

	# perform mutation on the two new chromosomes
	if random.uniform(0, 1) < mutation_probability:
		child1 = mutate(child1)
	if random.uniform(0, 1) < mutation_probability:
		child2 = mutate(child2)

	# replace the old population with the new population
	population = [child1, child2] + population[2:]

# get the best chromosome from the population
best = get_best(population)

# get the weight and value of the best solution
total_weight = 0
total_value = 0
for i in range(len(best)):
	if best[i] == 1:
		total_weight += items[i][0]
		total_value += items[i][1]

# print the best solution
print("\nThe best solution:")
print("Weight:", total_weight)
print("Value:", total_value)

 

Save the above code to the file knapsack_ga.py, and then run it by typing python knapsack_ga.py.

A sample output is as follows:

Available items:
[[1, 2], [2, 4], [3, 4], [4, 5], [5, 7], [6, 9]]

Genetic algorithm parameters:
Max weight: 10
Population: 10
Mutation probability: 0.2
Generations: 10 

Performing genetic evolution:
Generated a random population of size 10
Selected two chromosomes for crossover
Performed crossover between two chromosomes
Performed mutation on a chromosome
Selected two chromosomes for crossover
Performed crossover between two chromosomes
Performed mutation on a chromosome
Selected two chromosomes for crossover
Performed crossover between two chromosomes
Selected two chromosomes for crossover
Performed crossover between two chromosomes
Selected two chromosomes for crossover
Performed crossover between two chromosomes
Selected two chromosomes for crossover
Performed crossover between two chromosomes
Selected two chromosomes for crossover
Performed crossover between two chromosomes
Selected two chromosomes for crossover
Performed crossover between two chromosomes
Performed mutation on a chromosome
Selected two chromosomes for crossover
Performed crossover between two chromosomes
Selected two chromosomes for crossover
Performed crossover between two chromosomes
Performed mutation on a chromosome

The best solution:
Weight: 10
Value: 14

 

And there you go. You now know how to use genetic programming to solve the knapsack problem. With a little ingenuity, the above script could be modified to solve all sorts of computationally complex problems for their best "good enough" solutions.

Thanks for reading!

 
Portions of this article were plotted and/or written with the assistance of GPT-3.

 
 
Matthew Mayo (@mattmayo13) is a Data Scientist and the Editor-in-Chief of KDnuggets, the seminal online Data Science and Machine Learning resource. His interests lie in natural language processing, algorithm design and optimization, unsupervised learning, neural networks, and automated approaches to machine learning. Matthew holds a Master's degree in computer science and a graduate diploma in data mining. He can be reached at editor1 at kdnuggets[dot]com.