Packing your knapsack: Data structures meet the real world

Congrats, you have been selected to populate the first Mars colony! Grocery stores on Mars are still a bit short of goods . . . and hard to find. So, you will have to eventually grow your own food. In the meantime, for the first few months, you will have goods shipped to sustain you.

1. Abstracting the problem away

The problem is, your crates can’t weight more than 1000 kilograms, and that’s a hard limit.

To make things harder, you can choose only from a limited set of things, already packed in boxes:

  • Potatoes, 800 kgs
  • Rice, 300 kgs
  • Wheat flour, 400 kgs
  • Peanut butter, 20 kgs
  • Tomato cans, 300 kgs
  • Beans, 300 kgs
  • Strawberry jam, 50 kgs

You’ll get water for free, so no worries about that, but you can either take a whole crate or not take it. You’d certainly like to have some choice, and not a ton of potatoes (aka The Martian experience).

But, at the same time, the expedition’s main concern is keeping you well sustained and energetic throughout your stay, so the main discriminant to choose what goes with you will be the nutritional values. Let’s say the total calories will be a good indica­tor. Table 1.1 sheds a different light on the list of available goods.

Since the actual content is irrelevant for the decision (despite your understandable protests, mission control is very firm on that point), the only things that matter are the weight and total calories provided for each of the boxes.

Hence, our problem can be abstracted as “choose any number of items from a set, without the chance to take a fraction of any item, such that their total weight won’t be over 1000 kgs, and in such a way that the total amount of calories is maximized.”

2. Looking for solutions

Now that we have stated our problem, we can start looking for solutions. You might be tempted to start packing your crate with boxes starting from the one with the highest calories value. That would be the potatoes box weighing 800 kgs.

But if you do so, neither rice nor wheat flour will fit in the crate, and their com­bined calorie count exceeds, by far, any other combination you can create within the remaining 200 kgs left. The best value you get with this strategy is 1,749,400 calories, picking potatoes, strawberry jam, and peanut butter.

So, what would have looked like the most natural approach, a greedy algorithm that at each step chooses the best immediate option, does not work. This problem needs to be carefully thought through.

Time for brainstorming. You gather your logistics team and together look for a solution.

Soon enough, someone suggests that, rather than the total calories, you should look at the calories per kg. So you update table 1.1 with a new column, and sort it accordingly. The result is shown in table 1.2.

Then you try to go top to bottom, picking up the food with the highest ratio of calo­ries per unit of weight, ending up with peanut butter, rice, wheat flour, and strawberry jam, for a total of 2,813,800 calories.

That’s much better than our first result. It’s clear that this was a step in the right direction. But looking closely, it’s apparent that adding peanut butter prevented us from taking beans, which would have allowed us to increase the total value of the crate even more. The good news is that at least you won’t be forced to follow The Martian’s diet anymore; no potatoes go on Mars this time.

After a few more hours of brainstorming, you are all pretty much ready to give up: the only way you can solve this is to check, for each item, if you can get a better solu­tion by including it or leaving it out. And the only way you know of doing so is enu­merating all possible solutions, filtering out the ones that are above the weight threshold, and picking the best of the remaining ones. This is what’s called a brute force algorithm, and we know from math that it’s a very expensive one.

Since for each item you can either pack it or leave it, the possible solutions are . You guys are not too keen on going through a hundred solutions, but after a few hours, despite understanding why it’s called brute force, you are totally exhausted but almost done with it.

Then the news breaks: someone called from mission control, and following com­plaints from some settlers-to-be, 25 new food items have been added to the list, includ­ing sugar, oranges, soy, and marmite (don’t ask . . . ).

After checking your numbers everyone is in dismay: now you’d have approximately 4 billion different combinations to check.

3. Algorithms to the rescue

At that point, it’s clear that you need a computer program to crunch the numbers and make the best decision.

You write one yourself in the next few hours, but even that takes a long time to run, another couple of hours. Now, as much as you’d like to apply the same diet to all the colonists, it turns out some of them have allergies: a quarter of them can’t eat gluten, and apparently many swear they are allergic to marmite. So you’ll have to run the algorithm again a few times, each time taking into account individual allergies. To make thing worse, mission control is considering adding 30 more items to the list to make up for what people with allergies will miss. If that happens, we will end up with 62 total possible items, and the program will have to go over several billions of billions of possible combinations. You try that, and after a day the program is still running, nowhere close to termination.

The whole team is ready to give up and go back to the potatoes diet, when some­one remembers there is a person on the launch team who had an algorithms book on his desk.

You call them in, and they immediately see the problem for what it is: the 0-1 knap­sack problem. The bad news is that it is an NP-complete7 problem, and this means it is hard to solve, as there is no “quick” (as in polynomial with respect to the number of items) known algorithm that computes the optimal solution.

There is, however, also good news: there exist a pseudo-polynomial solution, a solu­tion using dynamic programming that takes time proportional to the max capacity of the knapsack. Luckily, the capacity of the crate is limited: so the solution takes takes a num­ber of steps equal to the number of possible values for the capacity filled multiplied by the number of items, so assuming the smallest unit is 1kg, it only takes 1000 x 62 steps. Wow, that’s much better than 262! And, in fact, once you rewrite the algorithm, it finds the best solution in a matter of seconds.

At this time, you are willing to take the algorithm as a black box and plug it in with­out too many questions. Still, this decision is crucial to your survival . . . it seems like a situation where you could use some deep knowledge of how an algorithm works.

For our initial example, it turns out that the best possible combination we can find is rice, wheat flour, and beans, for a total of 3,256,000 calories. That would be quite a good result compared to our first attempt, right?

You probably had already guessed the best combination, but if it seems too easy in the initial example, where we only have seven items, try the same with hundreds of dif­ferent products, in a setup closer to the real scenario, and see how many years it takes you to find the best solution by hand!

We could be happy with this solution; it’s the best we can find given the constraint.

4. Thinking (literally) outside of the box

But in our narration, this is the point when the real expert in algorithms comes in. For instance, imagine that a distinguished academic is visiting our facilities while prepar­ing the mission, and has been invited to help with computing the best route to save fuel. During lunch break someone proudly tells her about how you brilliantly solved the problem with packing goods. And then she asks an apparently naive question: Why can’t you change the size of the boxes?

The answer will likely be either that “this is the way it has always been,” or that “items come already packed from vendors, and it would cost time and money to change this.”

And that’s when the expert will explain that if we remove the constraint on the box size, the 0-1 knapsack problem, which is an NP-complete problem, becomes the unbounded knapsack problem, for which we have a linear-time[5] greedy solution that is usually better than the best solution for the 0-1 version.

Translated into human-understandable language, we can turn this problem into one that’s easier to solve and that will allow us to pack the crates with the largest possi­ble total of calories. The problem statement now becomes this:

Given a set of items, choose any subset of items or fraction of items from it, such that their total weight won’t be over 1000 kgs, and in such a way that the total amount of calories is maximized.

And yes, it is worth the time to go through the overhead of repacking everything, because we get a nice improvement.

Specifically, if we can take any fraction of the original weights for each product, we can simply pack products starting from the one with the highest ratio of calories per kg (peanut butter, in this example), and when we get to one box that would not fit in the remaining available space, we take a fraction of it to fill the gap, and repack it. So, in the end we wouldn’t even have to repack all the goods, just one.

The best solution would then be peanut butter, rice, wheat flour, strawberry jam, and 230 kilograms of beans, for a total of 3,342,800 calories.

5. Happy ending

So, in our story, the future Mars settlers will have greater chances of survival and won’t go into depression because of a diet comprising only potatoes with a peanut butter and strawberry dip.

From a computational point of view, we moved from incorrect algorithms (the greedy solutions taking the highest total, or highest ratio first), to a correct but unfea­sible algorithm (the brute force solution enumerating all the possible combinations) to, finally, a smart solution which would organize the computation in a more efficient fashion.

The next step, as important or even more so, brought us to thinking outside of the box to simplify our problem, by removing some constraints, and thus finding an easier algorithm and a better solution. This is actually another golden rule: always study your requirements thoroughly, question them, and when possible, try to remove them if that brings you a solution that is at least as valuable, or a slightly less valuable one that can be reached with a much lower cost. Of course, in this process, other concerns (such as laws and security, at all levels) must be taken into consideration, so some con­straints can’t be removed.

In the process of describing algorithms, as we explained in the previous section, we would next describe our solution in detail and provide implementation guidelines.

We will skip these steps for the 0-1 knapsack dynamic programming algorithm, both because it’s an algorithm, not a data structure, and it is thoroughly described in litera­ture. Not to mention that in this chapter its purpose was just illustrating the following:

  • How important it is to avoid bad choices for the algorithms and data structures we use
  • The process we will follow in the next chapters when it comes to introducing a problem and its solutions

Source: Rocca Marcello La (2021), Advanced Algorithms and Data Structures, Manning Publications (2021)

Leave a Reply

Your email address will not be published. Required fields are marked *