In the previous post, I discussed how an induction on dimension approach could establish Hilbert’s nullstellensatz, which we interpreted as a result describing all the obstructions to solving a system of polynomial equations and inequations over an algebraically closed field. Today, I want to point out that exactly the same approach also gives the Hahn-Banach theorem (at least in finite dimensions), which we interpret as a result describing all the obstructions to solving a system of linear inequalities over the reals (or in other words, a linear programming problem); this formulation of the Hahn-Banach theorem is sometimes known as Farkas’ lemma. Then I would like to discuss some standard applications of the Hahn-Banach theorem, such as the separation theorem of Dieudonné, the minimax theorem of von Neumann, Menger’s theorem, and Helly’s theorem (which was mentioned recently in an earlier post).

To simplify the exposition we shall only work in finite dimensions and with finite complexity objects, such as finite systems of linear inequalities, or convex polytopes with only finitely many sides. The results can be extended to the infinite complexity setting but this requires a bit of care, and can distract from the main ideas, so I am ignoring all of these extensions here.

Let us first phrase a formulation of the Hahn-Banach theorem – namely, Farkas’ lemma – which is deliberately chosen to mimic that of the nullstellensatz in the preceding post. We consider systems of linear inequalities of the form

P_1(x), \ldots, P_m(x) \geq 0

where x \in {\Bbb R}^d lies in a finite-dimensional real vector space, and P_1,\ldots,P_m: {\Bbb R}^d \to {\Bbb R} are affine-linear functionals. We are interested in the classic linear programming problem of whether such a system admits a solution. One obvious obstruction would be if the above system of inequalities are inconsistent in the sense that they imply -1 \geq 0. More precisely, if we can find non-negative reals q_1,\ldots,q_m \geq 0 such that q_1 P_1 + \ldots + q_m P_m = -1, then the above system is not solvable. Farkas’ lemma asserts that this is in fact the only obstruction:

Farkas’ lemma. Let P_1,\ldots,P_m: {\Bbb R}^d \to {\Bbb R} be affine-linear functionals. Then exactly one of the following statements holds:

  1. The system of inequalities P_1(x), \ldots, P_m(x) \geq 0 has a solution x \in {\Bbb R}^d.
  2. There exist non-negative reals q_1,\ldots,q_m \geq 0 such that q_1 P_1 + \ldots + q_m P_m = -1.

As in the previous post, we prove this by induction on d. The trivial case d=0 could be used as the base case, but again it is instructive to look at the d=1 case first before starting the induction.

If d=1, then each inequality P_j(x) \geq 0 can be rescaled into one of three forms: x-a_j \geq 0, b_j-x \geq 0, or c_j \geq 0, where a_j, b_j, or c_j is a real number. The latter inequalities are either trivially true or trivially false, and can be discarded in either case. As for the inequalities of the first and second type, they can be solved so long as all of the a_j which appear here are less than equal to all of the b_j which appear. If this is not the case, then we have b_j < a_k for some j,k, which allows us to fashion -1 as a non-negative linear combination of (x-a_k) and (b_j-x), and the claim follows.

Now suppose that d \geq 2 and the claim has already been proven for d-1. As in the previous post, we now split x = (x',t) for x' \in {\Bbb R}^{d-1} and t \in {\Bbb R}. Each linear inequality P_j(x',t) \geq 0 can now be rescaled into one of three forms: t - a_j(x') \geq 0, b_j(x') - t \geq 0, and c_j(x') \geq 0.

We fix x’ and ask what properties x’ must obey in order for the above system to be solvable in t. By the one-dimensional analysis, we know that the necessary and sufficient conditions are that c_j(x') \geq 0 for all c_j, and that a_j(x') \leq b_k(x') for all a_j and b_k. If we can find an x’ obeying these inequalities, then we are in conclusion (1) and we are done. Otherwise, we apply the induction hypothesis and conclude that we can fashion -1 as a non-negative linear combination of the c_j(x') and of the b_k(x') - a_j(x'). But the b_k(x') - a_j(x') can in turn be expressed as a non-negative linear combination of t - a_j(x') and b_k(x')-t, and so we are in conclusion (2) as desired.

Exercise: use Farkas’ lemma to derive the duality theorem in linear programming.

— Applications —

Now we connect the above lemma to results which are closer to the Hahn-Banach theorem in its traditional form. We begin with

Separation theorem. Let A, B be disjoint convex polytopes in {\Bbb R}^d. Then there exists an affine-linear functional P: {\Bbb R}^d \to {\Bbb R} such that P(x) \geq 1 for x\in A and P(x) \leq -1 for x\in B.

Proof. We can view the system of inequalities P(x) - 1\geq 0 for x \in A and -1-P(x) \geq 0 for x \in B as a system of linear equations on P (or, if you wish, on the coefficients of P). If this system is solvable then we are done, so suppose the system is not solvable. Applying Farkas’ lemma, we conclude that there exists x_1,\ldots,x_m \in A and y_1,\ldots,y_n \in B and non-negative constants q_1,\ldots,q_m, r_1,\ldots,r_n such that

\sum_{i=1}^m q_i (P(x_i)-1) + \sum_{j=1}^n r_j (-1-P(y_j)) = -1

for all P. Setting P to be an arbitrary constant, we conclude

\sum_{i=1}^m q_i = \sum_{j=1}^n r_j = 1/2,

and so by the affine nature of P, we can rearrange the above identity as

P(\sum_{i=1}^m 2q_i x_i) = P( \sum_{j=1}^n 2r_i y_j)

for all P; since affine functions separate points, we conclude that

\sum_{i=1}^m 2q_i x_i = \sum_{j=1}^n 2r_i y_j.

But by convexity the left-hand side is in A and the right-hand side in B, a contradiction. \Box.

The above theorem asserts that any two disjoint convex polytopes can be separated by a hyperplane. One can establish more generally that any two disjoint convex bodies can be separated by a hyperplane; in particular, this implies that if a convex function always exceeds a concave function, then there is an affine linear function separating the two. From this it is a short step to the Hahn-Banach theorem, at least in the setting of finite-dimensional spaces; if one wants to find a linear functional \lambda: {\Bbb R}^n \to {\Bbb R} which has prescribed values on some subspace W, and lies between some convex and concave sets (e.g. -\|x\| \leq \lambda(x) \leq \|x\| for some semi-norm \|x\|), then by quotienting out W we can reduce to the previous problem.

We turn now to the minimax theorem. Consider a zero-sum game between two players, Alice and Bob. Alice can pick any one of n strategies using a probability distribution p = (p_1,\ldots,p_n) of her choosing; simultaneously, Bob can pick any one of m strategies using a probability distribution q = (q_1,\ldots,q_m) of his choosing. Alice’s expected payoff F then takes the form F(p,q) := \sum_{i=1}^n \sum_{j=1}^m c_{i,j} p_i q_j for some fixed real coefficients c_{i,j}; Bob’s expected payoff in this zero-sum game is then -F(p,q).

Minimax theorem. Given any coefficients c_{i,j}, there exists a unique optimal payoff \alpha such that

  1. (Alice can expect to win at least \alpha) There exists an optimal strategy p_* for Alice such that F( p_*, q) \geq \alpha
    for all q;
  2. (Bob can expect to lose at most \alpha) There exists an optimal strategy q_* for Bob such that F(p,q_*) \leq \alpha for all p.

Proof. By playing Alice’s optimal strategy off against Bob’s, we see that the supremum of the set of \alpha which obeys (1) is clearly finite, and less than or equal to the infimum of the set of \alpha which obey (2), which is also finite. To finish the proof, it suffices to show that these two numbers are equal. If they were not equal, then we could find an \alpha for which neither of (1) and (2) were true.

If (1) could not be satisfied, this means that the system of linear equations

p_1,\ldots,p_n \geq 0; p_1+\ldots+p_n \leq 1; F(p,q) \geq \alpha \hbox{ for all } q

has no solution. From the convexity of F(p,q) in q, we can replace this system with

p_1,\ldots,p_n \geq 0; p_1+\ldots+p_n \leq 1; \sum_{i=1}^n c_{i,j} p_i \geq \alpha \hbox{ for all } j.

Applying Farkas’ lemma, we conclude that some non-negative combination of p_1,\ldots,p_n, 1-p_1-\ldots-p_n, and \sum_{i=1}^n c_{i,j} p_i - \alpha for 1 \leq j \leq m are identically -1. A little bit of algebra shows that there must therefore exist a strategy q for Bob such that -1 can be expressed as a non-negative combination of p_1,\ldots,p_n, 1-p_1-\ldots-p_n, and F(p,q)-\alpha. In particular, F(p,q)-\alpha must be negative for all strategies p for Alice, and so (2) is true, a contradiction. \Box.

The minimax theorem can be used to give game-theoretic proofs of various theorems of Hahn-Banach type. Here is one example:

Menger’s theorem. Let G be a directed graph, and let v and w be non-adjacent vertices in G. Then the maxflow from v to w in G (the largest number of disjoint paths one can find in G from v to w) is equal to the min-cut (the least number of vertices (other than v or w) one needs to delete from G to disconnect v from w).

The proof we give here is definitely not the shortest proof of Menger’s theorem, but it does illustrate how game-theoretic techniques can be used to prove combinatorial theorems.

Proof. Consider the following game. Bob picks a path from v to w, and Alice picks a vertex (other than v or w). If Bob’s path hits Alice’s vertex, then Alice wins 1 (and Bob wins -1); otherwise Alice wins 0 (and Bob wins 0 as well). Let \alpha be Alice’s optimal payoff. Observe that we can prove \alpha \leq 1/maxflow by letting Bob picks one of some maximal collection of disjoint paths from v to w at random as his strategy; conversely, we can prove \alpha \geq 1/mincut by letting Alice pick a vertex from some minimal cut set at random as her strategy. To finish the proof we need to show that in fact 1/\alpha \geq mincut and 1/\alpha \leq maxflow.

Let’s first show that 1/\alpha \geq mincut. Let’s assume that Alice and Bob are both playing optimal strategies, to get the optimal payoff \alpha for Alice. Then it is not hard to use the optimality to show that every path that Bob might play must be hit by Alice with probability exactly \alpha (and any other path will be hit by Alice with probability at least \alpha), and conversely every vertex that Alice might pick will be hit by Bob with probability \alpha (and any other vertex will be hit by Bob with probability at most \alpha).

Now suppose that two of Bob’s paths intersect at some intermediate vertex u. One can show that the two resulting sub-paths from v to u must have an equal chance of being hit by Alice, otherwise by swapping those two sub-paths one can create a path which Alice hits with probability strictly less than \alpha, a contradiction. Similarly, the two sub-paths from u to w must have equal chance of being hit by Alice.

Now consider all the vertices u that Alice can pick for which there exists a path of Bob which hits u before it hits any other vertex of Alice. Let U be the set of such u. Every path from v to w must hit U, because if it is possible to avoid U and instead hit another vertex of Alice, it will again be possible to crate a path that Alice hits with probability strictly less than \alpha, by the above discussion. Thus, U is a cut set. Any given path of Bob hits exactly one vertex in U (again by the above discussion). Since each u in U has a probability \alpha of being hit by Bob, we thus see that this cut set has size exactly 1/\alpha. Thus 1/\alpha \geq mincut as desired.

Now we show that 1/\alpha \leq maxflow. Define an \alpha-flow to be a collection of non-negative weights on the directed edges of G such that

  1. the net flow at v (the total inflow minus total outflow) is -1, and the net flow at w is +1;
  2. for any other vertex u, the net flow at u is zero, and total inflow or outflow at u is at most \alpha.

We first observe that at least one \alpha-flow exists. Indeed, if we pick one of Bob’s optimal strategies, and weight each edge by the probability that Bob’s path passes through that edge, one easily verifies that this gives a \alpha-flow.

Given an \alpha-flow, consider the directed graph G consisting of directed edges on which the weight of the \alpha-flow is strictly between 0 and \alpha.  Observe that if an edge in G terminates at a vertex with net inflow less than \alpha, then there must be a edge in G exiting this vertex also; conversely, if an edge in G terminates at a vertex with net inflow at least \alpha, then there must be another edge in G coming into this vertex also.  Similarly, if an edge in G originates in a vertex with net outflow less than \alpha, there must be an edge in G entering this vertex; and if it originates in a vertex with net outflow at least \alpha, then there must be another edge in G exiting this vertex.  Thus, if G is not empty, then one can chase an edge backwards and forwards and end up with a simple oriented cycle consisting of edges in G and reversals of edges in G, such that at any vertex with net inflow at least \alpha, every edge in G entering this vertex in the cycle is immediately followed by a reversed edge in G exiting the vertex, and similarly at any vertex with net outflow at least \alpha, every reversed edge in G entering the vertex is immediately followed by an edge in G exiting the vertex.  Given such a cycle, then one can modify the \alpha-flow on this cycle by an epsilon, increasing the flow weight by \epsilon on edges on the cycle that go with the flow, and reducing them by the same \epsilon on edges that go against the flow; note that this preserves the property of being an \alpha-flow. Increasing \epsilon, we eventually reduce one of the weights to zero, thus reducing the number of edges on which the flow is supported. We can repeat this procedure indefinitely until one arrives at an \alpha-flow whose graph G is empty, thus every edge has weight either zero or \alpha. Now, as v has outflow at least +1 and every vertex adjacent to v can have inflow at most \alpha (recall that v is not adjacent to w), the flow must propagate from v to at least 1/\alpha other vertices. Each of these vertices must eventually flow to w (which is the only vertex with positive flow) by at least one path; by the above discussion, these paths need to be disjoint. Thus we have 1/\alpha \leq maxflow as desired. \Box.

The above argument can also be generalised to prove the max-flow min-cut theorem, but we will not do so here.

Exercise: use Menger’s theorem to prove Hall’s marriage theorem.

Now we turn to Helly’s theorem. One formulation of this theorem is the following:

Helly’s theorem. Let B_1,\ldots,B_m be a collection of convex bodies in {\Bbb R}^d with m > d. If every d+1 of these convex bodies have a point in common, then all m of these convex bodies have a point in common.

The reader is invited to verify Helly’s theorem in the d=1 case, to get a flavour as to what is going on.

For simplicity we shall just consider the case when each of the B_1,\ldots,B_m are convex polytopes.

Proof. A convex polytope is the intersection of finitely many half-spaces. From this, one quickly sees that to prove Helly’s theorem for convex polytopes, it suffices to do so for half-spaces. Each half-space can be expressed as the form \{ x: P(x) \geq c \} for some linear functional P: {\Bbb R}^n \to {\Bbb R} and c \in {\Bbb R}. Note that by duality, one can view (P,c) as living in an d+1-dimensional vector space ({\Bbb R}^d)^* \times {\Bbb R}.

Let say that there are m half-spaces involved, and let (P_1,c_1),\ldots, (P_m,c_m) be the corresponding linear functionals. It suffices to show that if the system P_1(x) \geq c_1,\ldots,P_m(x) \geq c_m has no solution, then there is some sub-collection P_{i_1},\ldots,P_{i_j} with j \leq d+1 such that P_{i_1}(x) \geq c_{i_1},\ldots,P_{i_j}(x) \geq c_{i_j} also has no solution.

By Farkas’ lemma, we know that the system P_1(x) \geq c_1,\ldots,P_m(x) \geq c_m has no solution if and only if (0,-1) is a convex combination of the (P_1,c_1),\ldots,(P_m,c_m). So everything reduces to establishing

Dual Helly theorem. Suppose that a vector v in a d+1-dimensional space can be expressed as a non-negative linear combination of a collection of vectors v_1,\ldots,v_m. Then v is also a non-negative combination of at most d+1 vectors from that collection.

To prove this theorem, we use an argument a little reminiscent to that used to prove 1/\alpha \leq maxflow in the proof of Menger’s theorem. Suppose we can express v as a convex combination of some of the v_i. If at most d+1 of the vectors have non-zero coefficients attached to them then we are done. Now suppose instead that at least d+2 vectors have non-zero coefficients, say v_1,\ldots,v_{d+2} have non-zero coefficients. Then there is a linear dependency among these vectors, which allows us to find coefficients c_1,\ldots,c_{d+2}, not all zero, such that c_1 v_1 + \ldots + c_{d+2} v_{d+2} = 0. We can then perturb our preceding non-negative combination by an epsilon multiple of this equation to obtain a new representation of 0 as a non-negative combination of vectors. If we increase epsilon, we must eventually send one of the coefficients to zero, decreasing the total number of vectors with non-zero coefficients. Iterating this procedure we eventually obtain the dual Helly theorem and hence the original version of Helly’s theorem.\Box

[Update, Apr 30: Proof of Helly’s theorem fixed.]