Summary


Linear programming is a technique to optimize linear objective functions subject to linear equality and linear inequality constraints. An instance of this is called a linear program (LP).

๐’ซ:{max or min}๐‘1๐‘ฅ1+โ€ฆ+๐‘๐‘›๐‘ฅ๐‘› ๐ด1,1๐‘ฅ1+โ€ฆ+๐ด1,๐‘›๐‘ฅ๐‘›{โ‰คorโ‰ฅor=}๐‘1 โ‹ฎโ‹ฑโ‹ฎโ‹ฎโ‹ฎ ๐ด๐‘š,1๐‘ฅ1+โ€ฆ+๐ด๐‘š,๐‘›๐‘ฅ๐‘›{โ‰คorโ‰ฅor=}๐‘๐‘š ๐‘ฅ1,โ€ฆ,๐‘ฅ๐‘›โ‰ฅ0

๐’ซ is the program with ๐‘› decision variables ๐‘ฅ๐‘– and a linear objective function with coefficients ๐‘๐‘–. There are ๐‘š linear constraints with coefficients ๐ด๐‘—,๐‘–, each using โ‰ค, โ‰ฅ, or =. The last constraint is non-negativity: ๐‘ฅ1,โ€ฆ,๐‘ฅ๐‘›โ‰ฅ0.

Example

Youโ€™re at your favorite arcade which houses two claw cranes ๐พ1 and ๐พ2. If you spend $๐‘ฅ1 on ๐พ1, you get stuff worth $๐‘ฅ1/3. If you spend $๐‘ฅ2 on ๐พ2, you get stuff worth $๐‘ฅ2/2. You must spend at least as much on ๐พ1 as you do on ๐พ2. You have at most $60 to spend on ๐พ1 and ๐พ2 combined. Money spent on neither ๐พ1 nor ๐พ2 is lost. How do you get the most out of ๐พ1 and ๐พ2?

๐’ซ:max๐‘ฅ1/3+๐‘ฅ2/2 ๐‘ฅ1โˆ’๐‘ฅ2โ‰ฅ0 ๐‘ฅ1+๐‘ฅ2โ‰ค60 ๐‘ฅ1,๐‘ฅ2โ‰ฅ0

You want to put as much as you can in ๐พ2. However, since you need to put at least that much on ๐พ1, you put half into each.

Vectors and Matrices

For โƒ—๐‘ฃ1, โƒ—๐‘ฃ2 of the same length ๐‘›, โƒ—๐‘ฃ1{<orโ‰คor=orโ‰ฅor>}โƒ—๐‘ฃ2 if and only if all components satisfy that relation.

Notation: โƒ—0 is the zero vector, ๐ด denotes a matrix, and ๐ด๐‘‡ is its transpose.

Maximization Standard Form

๐’ซ:max๐‘1๐‘ฅ1+โ€ฆ+๐‘๐‘›๐‘ฅ๐‘› ๐ด1,1๐‘ฅ1+โ€ฆ+๐ด1,๐‘›๐‘ฅ๐‘›โ‰ค๐‘1 โ‹ฎโ‹ฑโ‹ฎโ‹ฎโ‹ฎ ๐ด๐‘š,1๐‘ฅ1+โ€ฆ+๐ด๐‘š,๐‘›๐‘ฅ๐‘›โ‰ค๐‘๐‘š ๐‘ฅ1,โ€ฆ,๐‘ฅ๐‘›โ‰ฅ0

Minimization Standard Form

๐’ซ:min๐‘1๐‘ฅ1+โ€ฆ+๐‘๐‘›๐‘ฅ๐‘› ๐ด1,1๐‘ฅ1+โ€ฆ+๐ด1,๐‘›๐‘ฅ๐‘›โ‰ฅ๐‘1 โ‹ฎโ‹ฑโ‹ฎโ‹ฎโ‹ฎ ๐ด๐‘š,1๐‘ฅ1+โ€ฆ+๐ด๐‘š,๐‘›๐‘ฅ๐‘›โ‰ฅ๐‘๐‘š ๐‘ฅ1,โ€ฆ,๐‘ฅ๐‘›โ‰ฅ0

Let โƒ—๐‘ฅ=(๐‘ฅ1โ‹ฎ๐‘ฅ๐‘›), โƒ—๐‘=(๐‘1โ‹ฎ๐‘๐‘›), ๐ด=(๐ด1,1โ‹ฎ๐ด๐‘š,1โ€ฆโ‹ฑโ€ฆ๐ด1,๐‘›โ‹ฎ๐ด๐‘š,๐‘›), โƒ—๐‘=(๐‘1โ‹ฎ๐‘๐‘š). We can express the maximization standard form in 3 lines: ๐’ซ:maxโƒ—๐‘๐‘‡โƒ—๐‘ฅ, ๐ดโƒ—๐‘ฅโ‰คโƒ—๐‘, โƒ—๐‘ฅโ‰ฅโƒ—0.

Feasibility

A feasible solution is an ฬƒ๐‘ฅ satisfying ๐ดฬƒ๐‘ฅโ‰คโƒ—๐‘ and ฬƒ๐‘ฅโ‰ฅโƒ—0. The feasible region FEASIBLE(๐’ซ) is the set of all feasible solutions. ๐’ซ is feasible if FEASIBLE(๐’ซ)โ‰ โˆ…, and infeasible otherwise.

Constraints can also be degenerate, meaning pulling them out doesnโ€™t alter the feasible region.

Boundedness

FEASIBLE(๐’ซ) is bounded if it fits in a finite ball, and unbounded otherwise. If ๐’ซ is feasible, the largest value the objective function takes over the feasible region is the optimal value. A feasible solution achieving the optimal value is an optimal solution โƒ—๐‘ฅโˆ—.

If ๐’ซ is feasible and its optimal value is finite, ๐’ซ is bounded, else itโ€™s unbounded. If ๐’ซ is bounded, it has an optimal solution and a unique optimal value. Multiple optimal solutions can give the same unique optimal value (degeneracy).

Geometric Visualization

n = 1

One decision variable ๐‘ฅ1.

Key takeaways:

  1. Feasible region is a convex set
  2. There is an optimal solution (if one exists) at a corner point As a corollary, there also canโ€™t be two feasible regions.

n = 2

Notice that FEASIBLE(๐’ซ) once again is a convex set and has an optimal solution at a corner point.

Solving Linear Programs

One of the first algorithms was the Simplex method which, though exponential time, is quite efficient in practice. It walks from corner to corner. The next is the Ellipsoid method, which, though polynomial, is often slow in practice. The class of methods used today is Interior-point methods, which are both polynomial time and efficient.

As a result, linear programs can be solved in polynomial time in the worst case.

Duality

Using the ๐‘›=2 example, we have the optimal solution: Note that ๐‘ฅ1+2๐‘ฅ2โ‰ค2๐‘ฅ1+2๐‘ฅ2=2(๐‘ฅ1+๐‘ฅ2)โ‰ค2โ‹…15=30 since ๐‘ฅ1โ‰ฅ0. This is close to the optimal value of 26. Letโ€™s try again. ๐‘ฅ1+2๐‘ฅ2โ‰ค(76)๐‘ฅ1+2๐‘ฅ2=(16)(โˆ’3๐‘ฅ1+2๐‘ฅ2)+(53)(๐‘ฅ1+๐‘ฅ2)โ‰ค(16)โ‹…10+(53)โ‹…15=803. One last time:

๐‘ฅ1+2๐‘ฅ2=(15)(โˆ’3๐‘ฅ1+2๐‘ฅ2)+(85)(๐‘ฅ1+๐‘ฅ2)โ‰ค26

Hence, the optimal value is 26. Turns out we can always do this.

The general method gives us the dual program ๐’Ÿ:

๐’Ÿ:min10๐‘ฆ1+15๐‘ฆ2 โˆ’3๐‘ฆ1+๐‘ฆ2โ‰ฅ1 2๐‘ฆ1+๐‘ฆ2โ‰ฅ2 ๐‘ฆ1,๐‘ฆ2โ‰ฅ0

๐’ซ's optimal value โ‰ค ๐’Ÿ's optimal value

To see why:

๐‘ฅ1+2๐‘ฅ2โ‰ค(โˆ’3๐‘ฆ1+๐‘ฆ2)๐‘ฅ1+(2๐‘ฆ1+๐‘ฆ2)๐‘ฅ2

since โˆ’3๐‘ฆ1+๐‘ฆ2โ‰ฅ1, 2๐‘ฆ1+๐‘ฆ2โ‰ฅ2, and ๐‘ฅ1,๐‘ฅ2โ‰ฅ0. Rearranging:

=(โˆ’3๐‘ฅ1+2๐‘ฅ2)๐‘ฆ1+(๐‘ฅ1+๐‘ฅ2)๐‘ฆ2โ‰ค10๐‘ฆ1+15๐‘ฆ2

since โˆ’3๐‘ฅ1+2๐‘ฅ2โ‰ค10, ๐‘ฅ1+๐‘ฅ2โ‰ค15, and ๐‘ฆ1,๐‘ฆ2โ‰ฅ0. Therefore, the optimal value of ๐’ซโ‰ค10๐‘ฆ1+15๐‘ฆ2.

Given a primal in maximization standard form:

๐’ซ:maxโƒ—๐‘๐‘‡โƒ—๐‘ฅs.t.๐ดโƒ—๐‘ฅโ‰คโƒ—๐‘,โƒ—๐‘ฅโ‰ฅโƒ—0

the dual is the minimization problem:

๐’Ÿ:minโƒ—๐‘๐‘‡โƒ—๐‘ฆs.t.๐ด๐‘‡โƒ—๐‘ฆโ‰ฅโƒ—๐‘,โƒ—๐‘ฆโ‰ฅโƒ—0

There are ๐‘š primal constraints / ๐‘› primal decision variables โ‡’ ๐‘š dual decision variables / ๐‘› dual constraints.

Note

The dual of the dual is the primal.

When a linear program is not in standard form, taking duals is a bit trickier. The general correspondence is:

Primal (max)Dual (min)
โ‰ค constraint๐‘ฆ๐‘–โ‰ฅ0
โ‰ฅ constraint๐‘ฆ๐‘–โ‰ค0
= constraint๐‘ฆ๐‘– unrestricted
๐‘ฅ๐‘—โ‰ฅ0๐‘—-th dual constraint โ‰ฅ
๐‘ฅ๐‘— unrestricted๐‘—-th dual constraint =

The dual objective is minโƒ—๐‘๐‘‡โƒ—๐‘ฆ and the dual constraints come from ๐ด๐‘‡โƒ—๐‘ฆ vs โƒ—๐‘, with the direction (โ‰ฅ or =) determined by the table above.

Example

๐’ซ:max๐‘ฅ1โˆ’๐‘ฅ2+2๐‘ฅ3 โˆ’๐‘ฅ1+3๐‘ฅ2โ‰คโˆ’3 ๐‘ฅ1+3๐‘ฅ3โ‰ฅ2 ๐‘ฅ2+๐‘ฅ3=7 ๐‘ฅ1,๐‘ฅ2โ‰ฅ0

Note that ๐‘ฅ3 is unrestricted (no non-negativity constraint). Applying the rules:

  • Constraint 1 (โ‰ค) โ†’ ๐‘ฆ1โ‰ฅ0
  • Constraint 2 (โ‰ฅ) โ†’ ๐‘ฆ2โ‰ค0
  • Constraint 3 (=) โ†’ ๐‘ฆ3 unrestricted
  • ๐‘ฅ1โ‰ฅ0 โ†’ dual constraint 1 uses โ‰ฅ
  • ๐‘ฅ2โ‰ฅ0 โ†’ dual constraint 2 uses โ‰ฅ
  • ๐‘ฅ3 unrestricted โ†’ dual constraint 3 uses =
๐’Ÿ:minโˆ’3๐‘ฆ1+2๐‘ฆ2+7๐‘ฆ3 โˆ’๐‘ฆ1+๐‘ฆ2โ‰ฅ1 3๐‘ฆ1+๐‘ฆ3โ‰ฅโˆ’1 3๐‘ฆ2+๐‘ฆ3=2 ๐‘ฆ1โ‰ฅ0,๐‘ฆ2โ‰ค0

Duality Theorems

Weak Duality

If โƒ—๐‘ฅ is feasible for ๐’ซ and โƒ—๐‘ฆ is feasible for ๐’Ÿ, then โƒ—๐‘๐‘‡โƒ—๐‘ฅโ‰คโƒ—๐‘๐‘‡โƒ—๐‘ฆ.

What comes from this is that if ๐’ซ is unbounded, then ๐’Ÿ is infeasible. Additionally, if ๐’Ÿ is unbounded, then ๐’ซ is infeasible.

Strong Duality

If ๐’ซ is bounded, then ๐’Ÿ is also bounded and their optimal values are equal, denoted by โƒ—๐‘ฅโˆ— and โƒ—๐‘ฆโˆ—. Moreover,

โƒ—โƒ—โƒ—โƒ—โƒ—โƒ—โƒ—โƒ—๐‘๐‘‡โƒ—๐‘ฅโˆ—=โƒ—โƒ—โƒ—โƒ—โƒ—โƒ—โƒ—โƒ—๐‘๐‘‡โƒ—๐‘ฆโˆ—

This is the main mental model to keep: the primal is the thing you actually want, and the dual is a systematic way to prove you cannot do better. In max-flow terms, the dual object will turn out to be a cut.

Complementary Slackness

From the proof of weak duality, the two inequalities in the chain โƒ—๐‘๐‘‡โƒ—๐‘ฅโ‰คโƒ—๐‘ฆ๐‘‡๐ดโƒ—๐‘ฅโ‰คโƒ—๐‘๐‘‡โƒ—๐‘ฆ must both be tight at optimality (by strong duality). This forces:

Complementary Slackness

Let โƒ—๐‘ฅโˆ— and โƒ—๐‘ฆโˆ— be optimal for ๐’ซ and ๐’Ÿ respectively. Then for all ๐‘– and ๐‘—:

  1. ๐‘ฆโˆ—๐‘–โ‹…(๐‘๐‘–โˆ’(๐ดโƒ—๐‘ฅโˆ—)๐‘–)=0 โ€” if the ๐‘–-th primal constraint is not tight, then ๐‘ฆโˆ—๐‘–=0
  2. ๐‘ฅโˆ—๐‘—โ‹…((๐ด๐‘‡โƒ—๐‘ฆโˆ—)๐‘—โˆ’๐‘๐‘—)=0 โ€” if the ๐‘—-th dual constraint is not tight, then ๐‘ฅโˆ—๐‘—=0

In other words, every dual variable โ€œpairedโ€ with a slack primal constraint must be zero, and vice versa. This is useful for verifying optimality or solving for one solution given the other.

What are these ๐‘ฆโˆ—s? Duality tells us which constraints to use to optimally bound the primal objective. It can tell us which primal constraints are degenerate (tossing them out keeps the optimal the same).

Minimum Spanning Tree Connection

Recall Minimum Spanning Trees. Today, we will assume all edge weights are positive and distinct. Kruskalโ€™s algorithm repeatedly takes the minimum weight edge that doesnโ€™t form a cycle until you get a minimum spanning tree. How can we connect this to linear programming?

Given ๐บ=(๐‘‰,๐ธ), finding a minimum spanning tree can be seen as a decision regarding each ๐‘’โˆˆ๐ธ, i.e. does ๐‘’โˆˆ๐‘‡ or not, while maintaining the property that the edges chosen to be a part of ๐‘‡ and minimizing the total weight of the chosen edges.

Say we assign |๐ธ| decision variables ๐‘ฅ๐‘’ for all ๐‘’โˆˆ๐ธ, that are basically indicators: ๐‘ฅ๐‘’=1 if ๐‘’โˆˆ๐‘‡ else 0. The problem is, how do we ensure the chosen edges form a spanning tree?

Let ๐‘† be a spanning subgraph of ๐บ. A partition ฮ ={๐‘‰1,โ€ฆ,๐‘‰๐‘˜} splits ๐‘‰ into ๐‘˜ disjoint parts, where ๐‘‰๐‘–โˆฉ๐‘‰๐‘—=โˆ… for all ๐‘–โ‰ ๐‘—. We say |ฮ |=๐‘˜. Claim: if ๐‘† is a spanning subgraph, then for every partition ฮ , ๐‘† has at least |ฮ |โˆ’1 edges crossing it. We also claim the reverse holds.

Claim: ๐‘† is a minimum spanning subgraph iff ๐‘† is a minimum spanning tree of ๐บ. Check that for every partition ฮ , ๐‘‡ has exactly |ฮ |โˆ’1 edges crossing it (a tree on ๐‘› vertices has ๐‘›โˆ’1 edges, and each partition edge connects two parts).

This gives us the LP relaxation:

๐’ซ:minโˆ‘๐‘’โˆˆ๐ธ๐‘ค๐‘’๐‘ฅ๐‘’ โˆ‘๐‘’crossingฮ ๐‘ฅ๐‘’โ‰ฅ|ฮ |โˆ’1โˆ€partitionsฮ  ๐‘ฅ๐‘’โ‰ฅ0โˆ€๐‘’โˆˆ๐ธ

Note that we dropped the integrality constraint ๐‘ฅ๐‘’โˆˆ{0,1} and relaxed it to ๐‘ฅ๐‘’โ‰ฅ0. In general, LP relaxations give a lower bound on the integer optimum. The key result here is that this LP is integral: its optimal solution has ๐‘ฅ๐‘’โˆˆ{0,1} for all ๐‘’, so the LP relaxation solves the MST problem exactly.

Theorem

The MST LP relaxation is integral, i.e. its optimal solution is the indicator vector of the minimum spanning tree.

To see why, consider the dual. There is one dual variable ๐‘ฆฮ โ‰ฅ0 for each partition ฮ . The dual is:

๐’Ÿ:maxโˆ‘ฮ (|ฮ |โˆ’1)๐‘ฆฮ  โˆ‘ฮ :๐‘’crossesฮ ๐‘ฆฮ โ‰ค๐‘ค๐‘’โˆ€๐‘’โˆˆ๐ธ ๐‘ฆฮ โ‰ฅ0โˆ€ฮ 

By complementary slackness, at optimality:

  • If ๐‘ฅโˆ—๐‘’>0 (edge is in the tree), then โˆ‘ฮ :๐‘’crossesฮ ๐‘ฆโˆ—ฮ =๐‘ค๐‘’ (dual constraint is tight)
  • If ๐‘ฆโˆ—ฮ >0 (partition is โ€œactiveโ€), then โˆ‘๐‘’crossingฮ ๐‘ฅโˆ—๐‘’=|ฮ |โˆ’1 (primal constraint is tight, i.e. the tree crosses ฮ  with exactly |ฮ |โˆ’1 edges)

This connects back to Kruskalโ€™s algorithm: at each step, Kruskal picks the cheapest edge crossing some partition of the connected components. The dual variables ๐‘ฆฮ  can be interpreted as the โ€œpriceโ€ Kruskalโ€™s algorithm pays at each merge step, and the algorithm implicitly constructs a dual feasible solution that matches the primal cost, certifying optimality via strong duality.