There are many situations in combinatorics in which one is running some sort of iteration algorithm to continually “improve” some object ; each loop of the algorithm replaces with some better version of itself, until some desired property of is attained and the algorithm halts. In order for such arguments to yield a useful conclusion, it is often necessary that the algorithm halts in a finite amount of time, or (even better), in a bounded amount of time. (In general, one cannot use infinitary iteration tools, such as transfinite induction or Zorn’s lemma, in combinatorial settings, because the iteration processes used to improve some target object often degrade some other finitary quantity in the process, and an infinite iteration would then have the undesirable effect of making infinite.)
A basic strategy to ensure termination of an algorithm is to exploit a monotonicity property, or more precisely to show that some key quantity keeps increasing (or keeps decreasing) with each loop of the algorithm, while simultaneously staying bounded. (Or, as the economist Herbert Stein was fond of saying, “If something cannot go on forever, it must stop.”)
Here are four common flavours of this monotonicity strategy:
- The mass increment argument. This is perhaps the most familiar way to ensure termination: make each improved object “heavier” than the previous one by some non-trivial amount (e.g. by ensuring that the cardinality of is strictly greater than that of , thus ). Dually, one can try to force the amount of “mass” remaining “outside” of in some sense to decrease at every stage of the iteration. If there is a good upper bound on the “mass” of that stays essentially fixed throughout the iteration process, and a lower bound on the mass increment at each stage, then the argument terminates. Many “greedy algorithm” arguments are of this type. The proof of the Hahn decomposition theorem in measure theory also falls into this category. The general strategy here is to keep looking for useful pieces of mass outside of , and add them to to form , thus exploiting the additivity properties of mass. Eventually no further usable mass remains to be added (i.e. is maximal in some sense), and this should force some desirable property on .
- The density increment argument. This is a variant of the mass increment argument, in which one increments the “density” of rather than the “mass”. For instance, might be contained in some ambient space , and one seeks to improve to (and to ) in such a way that the density of the new object in the new ambient space is better than that of the previous object (e.g. for some ). On the other hand, the density of is clearly bounded above by . As long as one has a sufficiently good lower bound on the density increment at each stage, one can conclude an upper bound on the number of iterations in the algorithm. The prototypical example of this is Roth’s proof of his theorem that every set of integers of positive upper density contains an arithmetic progression of length three. The general strategy here is to keep looking for useful density fluctuations inside , and then “zoom in” to a region of increased density by reducing and appropriately. Eventually no further usable density fluctuation remains (i.e. is uniformly distributed), and this should force some desirable property on .
- The energy increment argument. This is an “” analogue of the ““-based mass increment argument (or the ““-based density increment argument), in which one seeks to increments the amount of “energy” that captures from some reference object , or (equivalently) to decrement the amount of energy of which is still “orthogonal” to . Here and are related somehow to a Hilbert space, and the energy involves the norm on that space. A classic example of this type of argument is the existence of orthogonal projections onto closed subspaces of a Hilbert space; this leads among other things to the construction of conditional expectation in measure theory, which then underlies a number of arguments in ergodic theory, as discussed for instance in this earlier blog post. Another basic example is the standard proof of the Szemerédi regularity lemma (where the “energy” is often referred to as the “index”). These examples are related; see this blog post for further discussion. The general strategy here is to keep looking for useful pieces of energy orthogonal to , and add them to to form , thus exploiting square-additivity properties of energy, such as Pythagoras’ theorem. Eventually, no further usable energy outside of remains to be added (i.e. is maximal in some sense), and this should force some desirable property on .
- The rank reduction argument. Here, one seeks to make each new object to have a lower “rank”, “dimension”, or “order” than the previous one. A classic example here is the proof of the linear algebra fact that given any finite set of vectors, there exists a linearly independent subset which spans the same subspace; the proof of the more general Steinitz exchange lemma is in the same spirit. The general strategy here is to keep looking for “collisions” or “dependencies” within , and use them to collapse to an object of lower rank. Eventually, no further usable collisions within remain, and this should force some desirable property on .
Much of my own work in additive combinatorics relies heavily on at least one of these types of arguments (and, in some cases, on a nested combination of two or more of them). Many arguments in nonlinear partial differential equations also have a similar flavour, relying on various monotonicity formulae for solutions to such equations, though the objective in PDE is usually slightly different, in that one wants to keep control of a solution as one approaches a singularity (or as some time or space coordinate goes off to infinity), rather than to ensure termination of an algorithm. (On the other hand, many arguments in the theory of concentration compactness, which is used heavily in PDE, does have the same algorithm-terminating flavour as the combinatorial arguments; see this earlier blog post for more discussion.)
Recently, a new species of monotonicity argument was introduced by Moser, as the primary tool in his elegant new proof of the Lovász local lemma. This argument could be dubbed an entropy compression argument, and only applies to probabilistic algorithms which require a certain collection of random “bits” or other random choices as part of the input, thus each loop of the algorithm takes an object (which may also have been generated randomly) and some portion of the random string to (deterministically) create a better object (and a shorter random string , formed by throwing away those bits of that were used in the loop). The key point is to design the algorithm to be partially reversible, in the sense that given and and some additional data that logs the cumulative history of the algorithm up to this point, one can reconstruct together with the remaining portion not already contained in . Thus, each stage of the argument compresses the information-theoretic content of the string into the string in a lossless fashion. However, a random variable such as cannot be compressed losslessly into a string of expected size smaller than the Shannon entropy of that variable. Thus, if one has a good lower bound on the entropy of , and if the length of is significantly less than that of (i.e. we need the marginal growth in the length of the history file per iteration to be less than the marginal amount of randomness used per iteration), then there is a limit as to how many times the algorithm can be run, much as there is a limit as to how many times a random data file can be compressed before no further length reduction occurs.
It is interesting to compare this method with the ones discussed earlier. In the previous methods, the failure of the algorithm to halt led to a new iteration of the object which was “heavier”, “denser”, captured more “energy”, or “lower rank” than the previous instance of . Here, the failure of the algorithm to halt leads to new information that can be used to “compress” (or more precisely, the full state ) into a smaller amount of space. I don’t know yet of any application of this new type of termination strategy to the fields I work in, but one could imagine that it could eventually be of use (perhaps to show that solutions to PDE with sufficiently “random” initial data can avoid singularity formation?), so I thought I would discuss it here.
Below the fold I give a special case of Moser’s argument, based on a blog post of Lance Fortnow on this topic.
Rather than deal with the Lovász local lemma in full generality, I will follow Fortnow and work with a special case of this lemma involving the -satisfiability problem (in conjunctive normal form). Here, one is given a set of boolean variables together with their negations ; we refer to the variables and their negations collectively as literals. We fix an integer , and define a (length ) clause to be a disjunction of literals, for instance
is a clause of length three, which is true unless is false, is true, and is false. We define the support of a clause to be the set of variables that are involved in the clause, thus for instance has support . To avoid degeneracy we assume that no clause uses a variable more than once (or equivalently, all supports have cardinality exactly ), thus for instance we do not consider or to be clauses.
Note that the failure of a clause reveals complete information about all of the boolean variables in the support; this will be an important fact later on.
The -satisfiability problem is the following: given a set of clauses of length involving boolean variables , is there a way to assign truth values to each of the , so that all of the clauses are simultaneously satisfied?
For general , this problem is easy for (essentially equivalent to the problem of -colouring a graph), but NP-complete for (this is the famous Cook-Levin theorem). But the problem becomes simpler if one makes some more assumptions on the set of clauses. For instance, if the clauses in have disjoint supports, then they can be satisfied independently of each other, and so one easily has a positive answer to the satisfiability problem in this case. (Indeed, one only needs each clause in to have one variable in its support that is disjoint from all the other supports in order to make this argument work.)
Now suppose that the clauses are not completely disjoint, but have a limited amount of overlap; thus most clauses in have disjoint supports, but not all. With too much overlap, of course, one expects satisfability to fail (e.g. if is the set of all length clauses). But with a sufficiently small amount of overlap, one still has satisfiability:
Theorem 1 (Lovász local lemma, special case) Suppose that is a set of length clauses, such that the support of each clause in intersects at most supports of clauses in (including itself), where is a sufficiently large absolute constant. Then the clauses in are simultaneously satisfiable.
One of the reasons that this result is powerful is that the bounds here are uniform in the number of variables. Apart from the loss of , this result is sharp; consider for instance the set of all clauses with support , which is clearly unsatisfiable.
The standard proof of this theorem proceeds by assigning each of the boolean variables a truth value independently at random (with each truth value occurring with an equal probability of ); then each of the clauses in has a positive zero probability of holding (in fact, the probability is ). Furthermore, if denotes the event that a clause is satisfied, then the are mostly independent of each other; indeed, each event is independent of all but most other events . Applying the Lovász local lemma, one concludes that the simultaneously hold with positive probability (if is a little bit larger than ), and the claim follows.
The textbook proof of the Lovász local lemma is short but nonconstructive; in particular, it does not easily offer any quick way to compute an actual satisfying assignment for , only saying that such an assignment exists. Moser’s argument, by contrast, gives a simple and natural algorithm to locate such an assignment (and thus prove Theorem 1). (The constant becomes rather than , although the bound has since been recovered in a paper of Moser and Tardos.)
As with the usual proof, one begins by randomly assigning truth values to ; call this random assignment . If satisfied all the clauses in , we would be done. However, it is likely that there will be some non-empty subset of clauses in which are not satisfied by .
We would now like to modify in such a manner to reduce the number of violated clauses. If, for instance, we could always find a modification of whose set of violated clauses was strictly smaller than (assuming of course that is non-empty), then we could iterate and be done (this is basically a mass decrement argument). One obvious way to try to achieve this is to pick a clause in that is violated by , and modify the values of on the support of to create a modified set that satisfies , which is easily accomplished; in fact, any non-trivial modification of on the support will work here. In order to maximize the amount of entropy in the system (which is what one wants to do for an entropy compression argument), we will choose this modification of randomly; in particular, we will use fresh random bits to replace the bits of in the support of . (By doing so, there is a small probability () that we in fact do not change at all, but the argument is (very) slightly simpler if we do not bother to try to eliminate this case.)
If all the clauses had disjoint supports, then this strategy would work without difficulty. But when the supports are not disjoint, one has a problem: every time one modifies to “fix” a clause by modifying the variables on the support of , one may cause other clauses whose supports overlap those of to fail, thus potentially increasing the size of by as much as . One could then try fixing all the clauses which were broken by the first fix, but it appears that the number of clauses needed to repair could grow indefinitely with this procedure, and one might never terminate in a state in which all clauses are simultaneously satisfied.
The key observation of Moser, as alluded earlier, is that each failure of a clause for an assignment reveals bits of information about , namely that the exact values that assigns to the support of . The plan is then to use each failure of a clause as a part of a compression protocol that compresses (plus some other data) losslessly into a smaller amount of space. A crucial point is that at each stage of the process, the clause one is trying to fix is almost always going to be one that overlapped the clause that one had just previously fixed. Thus the total number of possibilities for each clause, given the previous clauses, is basically , which requires only bits of storage, compared with the bits of entropy that have been eliminated. This is what is going to force the algorithm to terminate in finite time (with positive probability).
Let’s make the details more precise. We will need the following objects:
- A truth assignment of truth values , which is initially assigned randomly, but which will be modified as the algorithm progresses;
- A long random string of bits, from which we will make future random choices, with each random bit being removed from as it is read.
We also need a recursive algorithm , which modifies the string to satisfy a clause in (and, as a bonus, may also make obey some other clauses in that it did not previously satisfy). It is defined recursively:
- Step 1. If already satisfies , do nothing (i.e. leave unchanged).
- Step 2. Otherwise, read off random bits from (thus shortening by bits), and use these to replace the bits of on the support of in the obvious manner (ordering the support of by some fixed ordering, and assigning the bit from to the variable in the support for ).
- Step 3. Next, find all the clauses in whose supports intersect , and which now violates; this is a collection of at most clauses, possibly including itself. Order these clauses in some arbitrary fashion, and then apply to each such clause in turn. (Thus the original algorithm is put “on hold” on some CPU stack while all the child processes are executed; once all of the child processes are complete, then terminates also.)
An easy induction shows that if terminates, then the resulting modification of will satisfy ; and furthermore, any other clause in which was already satisfied by before was called, will continue to be satisfied by after is called. Thus, can only serve to decrease the number of unsatisfied clauses in , and so one can fix all the clauses by calling once for each clause in – provided that these algorithms all terminate.
Each time Step 2 of the Fix algorithm is called, the assignment changes to a new assignment , and the random string changes to a shorter string . Is this process reversible? Yes – provided that one knows what clause was being fixed by this instance of the algorithm. Indeed, if are known, then can be recovered by changing the assignment of on the support of to the only set of choices that violates , while can be recovered from by appending to the bits of on the support of .
This type of reversibility does not seem very useful for an entropy compression argument, because while is shorter than by bits, it requires about bits to store the clause . So the map is only a compression if , which is not what is being assumed here (and in any case the satisfiability of in the case is trivial from the union bound).
The key trick is that while it does indeed take bits to store any given clause , there is an economy of scale: after many recursive applications of the fix algorithm, the marginal amount of bits needed to store drops to merely , which is less than if is large enough, and which will therefore make the entropy compression argument work.
Let’s see why this is the case. Observe that the clauses for which the above algorithm is called come in two categories. Firstly, there are those which came from the original list of failed clauses. Each of these will require bits to store – but there are only of them. Since , the net amount of storage space required for these clauses is at most. Actually, one can just store the subset of using bits (one for each element of , to record whether it lies in or not).
Of more interest is the other category of clauses , in which is called recursively from some previously invoked call to the fix algorithm. But then is one of the at most clauses in whose support intersects that of . Thus one can encode using and a number between and , representing the position of (with respect to some arbitrarily chosen fixed ordering of ) in the list of all clauses in whose supports intersect that of . Let us call this number the index of the call .
Now imagine that while the Fix routine is called, a running log file (or history) of the routine is kept, which records each time one of the original calls with is invoked, and also records the index of any other call made during the recursive procedure. Finally, we assume that this log file records a termination symbol whenever a routine terminates. By performing a stack trace, one sees that whenever a Fix routine is called, the clause that is being repaired by that routine can be deduced from an inspection of the log file up to that point.
As a consequence, at any intermediate stage in the process of all these fix calls, the original state of the assignment and the random string of bits can be deduced from the current state of these objects, plus the history up to that point.
Now suppose for contradiction that is not satisfiable; thus the stack of fix calls can never completely terminate. We trace through this stack for steps, where is some large number to be chosen later. After these steps, the random string has shortened by an amount of ; if we set to initially have length , then the string is now completely empty, . On the other hand, the history has size at most , since it takes bits to store the initial clauses in , bits to record all the instances when Step 1 occurs, and every subsequent call to Fix generates a -bit number, plus possibly a termination symbol of size . Thus we have a lossless compression algorithm from completely random bits to bits (recall that and were chosen randomly, and independently of each other). But since random bits cannot be compressed losslessly into any smaller space, we have the entropy bound
which leads to a contradiction if is large enough (and if is larger than an absolute constant). This proves Theorem 1.
which leads to a contradiction if is large enough (and if is larger than an absolute constant). This proves Theorem 1.
Remark 1 Observe that the above argument in fact gives an explicit bound on , and with a small bit of additional effort, it can be converted into a probabilistic algorithm that (with high probability) computes a satisfying assignment for in time polynomial in and .
Remark 2 One can replace the usage of randomness and Shannon entropy in the above argument with Kolmogorov complexity instead; thus, one sets to be a string of bits which cannot be computed by any algorithm of length , the existence of which is guaranteed as soon as (1) is violated; the proof now becomes deterministic, except of course for the problem of building the high-complexity string, which by their definition can only be constructed quickly by probabilistic methods. This is the approach taken by Fortnow.