In the field of analysis, it is common to make a distinction between “hard”, “quantitative”, or “finitary” analysis on one hand, and “soft”, “qualitative”, or “infinitary” analysis on the other. “Hard analysis” is mostly concerned with finite quantities (e.g. the cardinality of finite sets, the measure of bounded sets, the value of convergent integrals, the norm of finite-dimensional vectors, etc.) and their quantitative properties (in particular, upper and lower bounds). “Soft analysis”, on the other hand, tends to deal with more infinitary objects (e.g. sequences, measurable sets and functions, \sigma-algebras, Banach spaces, etc.) and their qualitative properties (convergence, boundedness, integrability, completeness, compactness, etc.). To put it more symbolically, hard analysis is the mathematics of \varepsilon, N, O(), and \leq[1]; soft analysis is the mathematics of 0, \infty, \in, and \to.

At first glance, the two types of analysis look very different; they deal with different types of objects, ask different types of questions, and seem to use different techniques in their proofs. They even use[2] different axioms of mathematics; the axiom of infinity, the axiom of choice, and the Dedekind completeness axiom for the real numbers are often invoked in soft analysis, but rarely in hard analysis. (As a consequence, there are occasionally some finitary results that can be proven easily by soft analysis but are in fact impossible to prove via hard analysis methods; the Paris-Harrington theorem gives a famous example.) Because of all these differences, it is common for analysts to specialise in only one of the two types of analysis. For instance, as a general rule (and with notable exceptions), discrete mathematicians, computer scientists, real-variable harmonic analysts, and analytic number theorists tend to rely on “hard analysis” tools, whereas functional analysts, operator algebraists, abstract harmonic analysts, and ergodic theorists tend to rely on “soft analysis” tools. (PDE is an interesting intermediate case in which both types of analysis are popular and useful, though many practitioners of PDE still prefer to primarily use just one of the two types. Another interesting transition occurs on the interface between point-set topology, which largely uses soft analysis, and metric geometry, which largely uses hard analysis. Also, the ineffective bounds which crop up from time to time in analytic number theory are a sort of hybrid of hard and soft analysis. Finally, there are examples of evolution of a field from soft analysis to hard (e.g. Banach space geometry) or vice versa (e.g. recent developments in extremal combinatorics, particularly in relation to the regularity lemma).)

It is fairly well known that the results obtained by hard and soft analysis respectively can be connected to each other by various “correspondence principles” or “compactness principles”. It is however my belief that the relationship between the two types of analysis is in fact much closer[3] than just this; in many cases, qualitative analysis can be viewed as a convenient abstraction of quantitative analysis, in which the precise dependencies between various finite quantities has been efficiently concealed from view by use of infinitary notation. Conversely, quantitative analysis can often be viewed as a more precise and detailed refinement of qualitative analysis. Furthermore, a method from hard analysis often has some analogue in soft analysis and vice versa, though the language and notation of the analogue may look completely different from that of the original. I therefore feel that it is often profitable for a practitioner of one type of analysis to learn about the other, as they both offer their own strengths, weaknesses, and intuition, and knowledge of one gives more insight[4] into the workings of the other. I wish to illustrate this point here using a simple but not terribly well known result, which I shall call the “finite convergence principle” (thanks to Ben Green for suggesting this name; Jennifer Chayes has also suggested the “metastability principle”). It is the finitary analogue of an utterly trivial infinitary result – namely, that every bounded monotone sequence converges – but sometimes, a careful analysis of a trivial result can be surprisingly revealing, as I hope to demonstrate here.

Before I discuss this principle, let me first present an informal, incomplete, and inaccurate “dictionary” between soft and hard analysis, to try to give a rough idea of the (partial) correspondences between the two:

Soft analysis Hard analysis
x finite x bounded (e.g. x = O(1))
x vanishes x small (e.g. |x| \leq \varepsilon)
x infinite x large (e.g. |x| \geq N)
x_n \to 0 Quantitative decay bound (e.g. x_n = O(n^{-c}))
x_n is convergent x_n is metastable (see below)
f uniformly continuous Lipschitz or Hölder bound on f (e.g. |f(x)-f(y)| = O(|x-y|))
f \in X \|f\|_X = O(1)
E compact Metric entropy bound on E
E is Lebesgue measurable E is (quantitatively) approximated by bounded complexity sets
V is generated by S V is an algorithm initialised by S
u locally extremises F(u) u has no nearby competitor with significantly better value of F

One can draw two conclusions from this table:

  1. Soft analysis statements can often be stated both succinctly and rigorously, by using precisely defined and useful concepts (e.g. compactness, measurability, etc.). In hard analysis, one usually has to sacrifice one or the other: either one is rigorous but verbose (using lots of parameters such as \varepsilon, N, etc.), or succinct but “fuzzy” (using intuitive but vaguely defined concepts such as “size”, “complexity”, “nearby”, etc.).
  2. A single concept in soft analysis can have multiple hard analysis counterparts; a “naive” translation of a statement in soft analysis into hard analysis may be incorrect. (In particular, one should not use the above table blindly to convert from one to the other.)

Anyway, back to the finite convergence principle. The infinite convergence principle is well known, though perhaps not by this name, to anyone who has taken undergraduate analysis:

Infinite convergence principle. Every bounded monotone sequence x_n of real numbers is convergent.

This basic principle – essentially equivalent to the Dedekind completeness axiom for the real numbers – is of course fundamental to many parts of infinitary analysis, most obviously in the theory of infinite sequences and series, but it also is implicit in just about any context in which one studies an “infinite” object (e.g. an infinite-dimensional vector space) by expressing it as a monotone limit of “finite” objects. It is undoubtedly an important tool in soft analysis. What, then, is its counterpart in hard analysis?

We will answer this question presently, but let us first make the infinite convergence principle a bit more quantitative. We may as well normalise the bounded sequence x_n to lie between 0 and 1. Expanding out the “epsilon-delta” definition of convergence, we obtain

Infinite convergence principle (again). If 0 \leq x_1 \leq x_2 \leq \ldots \leq 1, then there exists a real number x such that for every \varepsilon > 0, there exists an N such that |x_n-x| \leq \varepsilon for all n \geq N.

There are quite a lot of quantifiers here. One can cut down the complexity a little bit by replacing the notion of a convergent sequence with that of a Cauchy sequence. This lets us eliminate the need for a limit x, which does not have an obvious finitary counterpart. So we get

Infinite convergence principle (yet again). If \varepsilon > 0 and 0 \leq x_1 \leq x_2 \leq \ldots \leq 1, there exists an N such that |x_n-x_m| \leq \varepsilon for all n,m \geq N.

Note now that one does not need the real number system to make this principle both meaningful and non-trivial; the principle already works quite well when restricted to the rationals. (Exercise: prove this principle for the rationals without constructing the real number system.) Informally speaking, this principle asserts that every bounded monotone sequence is eventually stable up to error \varepsilon.

Now let’s try to find the finitary (quantitative) equivalent of this principle. The most naive thing to do is simply to replace the infinite sequence by a finite sequence, thus

Finite convergence principle (first attempt). If \varepsilon > 0 and 0 \leq x_1 \leq x_2 \leq \ldots \leq x_M \leq 1, there exists an N such that |x_n-x_m| \leq \varepsilon for all N \leq n,m \leq M.

But this is trivially true; one can simply set N equal to M (or any number larger than M). So one needs to strengthen the claim. What about making N be independent of M, and only dependent on \varepsilon?

Finite convergence principle (second attempt). If \varepsilon > 0 and 0 \leq x_1 \leq x_2 \leq \ldots \leq x_M \leq 1, there exists an N = N(\varepsilon) depending only on \varepsilon such that |x_n-x_m| \leq \varepsilon for all N \leq n,m \leq M.

But this is trivially false; consider for instance a sequence x_i which equals zero except at i=M, at which point we jump up to x_M = 1. We are not going to get the Cauchy property unless we set N to be as large as M… but we can’t do that if we only want N to depend on \varepsilon.

So, is there anything non-trivial that one can say at all about finite bounded monotone sequences? Well, we have the pigeonhole principle:

Pigeonhole principle. If \varepsilon > 0 and 0 \leq x_1 \leq x_2 \leq \ldots \leq x_M \leq 1 is such that M \geq 1/\varepsilon + 1, there exists an 1 \leq N < M such that |x_{N+1} - x_N| \leq \varepsilon.

Indeed, if the gaps between each element x_N of the sequence and the next x_{N+1} were always larger than \varepsilon, then x_M - x_1 would exceed (M-1)\epsilon \geq 1, a contradiction. This principle is true, but it is too weak to be considered a true finitary version of the infinite convergence principle; indeed, we see that the pigeonhole principle easily implies

Weak infinite convergence principle. If 0 \leq x_1 \leq x_2 \leq \ldots \leq 1, then \liminf_{n \to \infty} |x_{n+1}-x_n| = 0.

but does not obviously imply the full infinite convergence principle.

The problem is that the pigeonhole principle only establishes instantaneous stability of the sequence at some point n, whereas the infinite convergence principle concludes the permanent stability of the sequence after some point N. To get a better finitary match to the infinite convergence principle, we need to extend the region of stability that the pigeonhole principle offers. Now, one can do some trivial extensions such as

Pigeonhole principle (second version). If \varepsilon > 0 and k \geq 1 and 0 \leq x_1 \leq x_2 \leq \ldots \leq x_M \leq 1 is such that M \geq k/\varepsilon + k, there exists 1 \leq N < N+k \leq M such that |x_n - x_m| \leq \varepsilon for all N \leq n,m \leq N+k,

which one can quickly deduce from the first pigeonhole principle by considering the sparsified sequence x_k, x_{2k}, x_{3k}, \ldots. But this is only a little bit better, as it now gives the infinitary statement

Slightly less weak infinite convergence principle. If 0 \leq x_1 \leq x_2 \leq \ldots \leq 1, then \liminf_{n \to \infty} |x_{n+k}-x_n| = 0 for all k,

but is still not strong enough to imply the infinite convergence principle in its full strength. Nevertheless, it shows that we can extend the realm of stability offered by the pigeonhole principle. One can for instance sparsify further, replacing n+k with 2n:

Pigeonhole principle (third version). If \varepsilon > 0 and k \geq 1 and 0 \leq x_1 \leq x_2 \leq \ldots \leq x_M \leq 1 is such that M \geq 2^{1/\varepsilon + 1}, there exists 1 \leq N < 2N \leq M such that |x_n - x_m| \leq \varepsilon for all N \leq n,m \leq 2N.

This can be proven by applying the first version of the pigeonhole principle to the sparsified sequence x_1, x_2, x_4, x_8, \ldots. This corresponds to an infinite convergence principle in which the conclusion is that \liminf_{n \to \infty} |x_{2n} - x_n| = 0.

One can of course keep doing this, achieving various sparsified versions of the pigeonhole principle which each capture part of the infinite convergence principle. To get the full infinite convergence principle, one cannot use any single such sparsified version of the pigeonhole principle, but instead must take all of them at once. This is the full strength of the finite convergence principle:

Finite convergence principle. If \varepsilon > 0 and F: {\Bbb Z}_+ \to {\Bbb Z}_+ is a function and 0 \leq x_1 \leq x_2 \leq \ldots \leq x_M \leq 1 is such that M is sufficiently large depending on F and \varepsilon, then there exists 1 \leq N < N+F(N) \leq M such that |x_n - x_m| \leq \varepsilon for all N \leq n,m \leq N+F(N).

This principle is easily proven by appealing to the first pigeonhole principle with the sparsified sequence x_{i_1}, x_{i_2}, x_{i_3}, \ldots where the indices are defined recursively by i_1 := 1 and i_{j+1} := i_j + F(i_j). This gives an explicit bound on M as M := i_{\lfloor 1/\varepsilon\rfloor + 1}. Note that the first pigeonhole principle corresponds to the case F(N) \equiv 1, the second pigeonhole principle to the case F(N) \equiv k, and the third to the case F(N) \equiv N. A particularly useful case for applications is when F grows exponentially in N, in which case M grows tower-exponentially in 1/\varepsilon.

Informally, the above principle asserts that any sufficiently long (but finite) bounded monotone sequence will experience arbitrarily high-quality amounts of metastability with a specified error tolerance \varepsilon, in which the duration F(N) of the metastability exceeds the time N of onset of the metastability by an arbitrary function F which is specified in advance.

Let us now convince ourselves that this is the true finitary version of the infinite convergence principle, by deducing them from each other:

The finite convergence principle implies the infinite convergence principle. Suppose for contradiction that the infinite convergence principle failed. Untangling the quantifiers, this asserts that there is an infinite sequence 0 \leq x_1 \leq x_2 \leq \ldots \leq 1 and an \varepsilon > 0 with the property that, given any positive integer N, there exists a larger integer N+F(N) such that x_{N+F(N)} - x_N > \varepsilon. But this contradicts the finite convergence principle. \Box

The infinite convergence principle implies the finite convergence principle. Suppose for contradiction that the finite convergence principle failed. Untangling the quantifiers, this asserts that there exists \varepsilon > 0 and a function F, together with a collection 0 \leq x^{(i)}_1 \leq \ldots \leq x^{(i)}_{M_i} \leq 1 of bounded monotone sequences whose length M_i goes to infinity, such that for each one of these sequences, there does not exist 1 \leq N < N+F(N) \leq M_i such that |x^{(i)}_n - x^{(i)}_m| \leq \varepsilon for all N \leq n,m \leq N+F(N). Let us extend each of the finite bounded sequences to infinite bounded sequences in some arbitrary manner, e.g. defining x^{(i)}_n = 1 whenever n > M_i. The space of all bounded sequences is well-known[5] to be sequentially compact in the product topology, thus after refining the i labels to a subsequence if necessary, we can assume that the sequences (x^{(i)}_n)_{n=1}^\infty converge in the product topology (i.e. pointwise) to a new limit sequence (x_n)_{n=1}^\infty. Since each of the original sequences were bounded in the interval [0,1] and monotone, we see that the limit sequence is also. Furthermore, we claim that there does not exist any N \geq 1 for which |x_n - x_m| < \varepsilon for all N \leq n,m \leq N+F(N). Indeed, if this were the case, then by pointwise convergence we would also have |x^{(i)}_n - x^{(i)}_m| < \varepsilon for all N \leq n,m \leq N+F(N) and all sufficiently large i, but this contradicts the construction of the x^{(i)}_n. But now we see that this infinite bounded monotone sequence (x_n)_{n=1}^\infty contradicts the infinite convergence principle. \Box

One can draw some morals from the above discussion:

  1. The finitary version of an infinitary statement can be significantly more verbose and ugly-looking than the infinitary original, and the arrangement of quantifiers becomes crucial.
  2. The “naive” finitisation of an infinitary statement is often not the correct one.
  3. While the finitary version of an infinitary statement is indeed quantitative, the bounds obtained can be quite poor (e.g. tower-exponential or worse).
  4. The deduction of the infinitary statement from the finitary one is quite short, as long as one is willing to work indirectly (arguing by contradiction).
  5. The deduction of the finitary statement from the infinitary one is a bit more complicated, but still straightforward, and relies primarily on compactness.
  6. In particular, the equivalence of the finitary and infinitary formulations requires a non-trivial amount of infinitary mathematics (though in this particular case, we can at least leave the ultrafilters out of it).

These morals apply not only to the finite and infinite convergence principle, but to many other pairs of finitary and infinitary statements, for instance Szemerédi’s theorem on one hand and the Furstenberg recurrence theorem on the other, as briefly discussed in my Simons lecture on these topics. (In these contexts, the correspondence between the finitary and infinitary statements is known as the Furstenberg correspondence principle.)

— Applications —

So, we’ve now extracted a quantitative finitary equivalent of the infinitary principle that every bounded monotone sequence converges. But can we actually use this finite convergence principle for some non-trivial finitary application? The answer is a definite yes: the finite convergence principle (implicitly) underlies the famous Szemerédi regularity lemma, which is a major tool in graph theory, and also underlies some rather less well known regularity lemmas, such as the arithmetic regularity lemma of Green. More generally, this principle seems to often arise in any finitary application in which tower-exponential bounds are inevitably involved.

Before plunging into these applications, let us first establish a Hilbert space version[6] of the convergence principle. Given a (closed) subspace X of a Hilbert space H, and a vector v \in H, let \pi_X v be the orthogonal projection from v onto X. If X is finite dimensional, then this projection can be defined in a finitary way, for instance by using the Gram-Schmidt orthogonalisation procedure to X. If X is infinite dimensional, then even the existence of the orthogonal projection is not completely trivial, and in fact relies ultimately on the infinite convergence principle. Closely related to the existence of this projection is the following monotone continuity property:

Hilbert space infinite convergence principle. Let 0 \subset X_1 \subset X_2 \subset \ldots \subset H be a nested sequence of subspaces of a Hilbert space H, and let X := \overline{\bigcup_{n=1}^\infty X_n} be the monotone closed limit of the X_n. Then for any vector v, \pi_{X_n} v converges strongly in H to \pi_X v.

As with the infinite convergence principle in [0,1], there is a Cauchy sequence version which already captures the bulk of the content:

Hilbert space infinite convergence principle (again). Let 0 \subset X_1 \subset X_2 \subset \ldots \subset H be a nested sequence of subspaces of a Hilbert space H, and let \varepsilon > 0. Then for any vector v there exists N such that \| \pi_{X_n} v - \pi_{X_m} v \|_H^2 \leq \varepsilon for all n,m \geq N.

One can deduce this principle from the analogous principle in [0,1] by first normalising \|v\|_H = 1, and then observing from Pythagoras’ theorem that \| \pi_{X_n} v \|_H^2 (which one should view as the energy of X_n as measured relative to v) is a bounded monotone sequence from 0 to 1. Applying the infinite convergence principle, followed by Pythagoras’ theorem yet again, we obtain the claim. Once one sees this, one immediately concludes that there is also a finitary equivalent:

Hilbert space finite convergence principle. If \varepsilon > 0 and F: {\Bbb Z}_+ \to {\Bbb Z}_+, and 0 \subset X_1 \subset X_2 \subset \ldots X_M \subset H is such that M is sufficiently large depending on F and \varepsilon, then for any vector v with \|v\|_H \leq 1 there exists 1 \leq N \leq N + F(N) \leq M such that \| \pi_{X_n} v - \pi_{X_m} v \|_H^2 \leq \varepsilon for all N \leq n,m \leq N+F(N).

Informally, given a long enough sequence of nested subspaces, and a given bounded vector v, one can find an arbitrarily good region of metastability in the orthogonal projections of v into these subspaces.

From this principle one can then quickly deduce the Szemerédi regularity lemma as follows. Let G = (V,E) be a graph. One can think of the adjacency matrix 1_E of this graph as an element of the (finite-dimensional) Hilbert space L^2(V \times V), where the product space V \times V is given normalised counting measure (and the discrete sigma-algebra 2^V \times 2^V). We can construct a nested sequence {\mathcal B}_0 \subset {\mathcal B}_1 \subset {\mathcal B}_2 \subset \ldots of \sigma-algebras in V (which one can think of as a sequence of increasingly fine partitions of V), together with the attendant sequence L^2({\mathcal B}_0 \times {\mathcal B}_0) \subset L^2({\mathcal B}_1 \times {\mathcal B}_1) \subset \ldots of subspaces (this corresponds to functions on V \times V which are constant on any product of pair of cells in the partition), by the following greedy algorithm[7]:

  1. Initialise {\mathcal B}_0 := \{ \emptyset, V\} to be the trivial \sigma-algebra (i.e. the trivial partition).
  2. Given {\mathcal B}_n, let f_n := {\mathbb E}( 1_E | {\mathcal B}_n \times {\mathcal B}_n) be the orthogonal projection of 1_E to the space L^2({\mathcal B}_n \times {\mathcal B}_n) (thus the value on any product of cells is just the edge density between that pair of cells), and let g_n := 1_E - f_n be the deviation of the graph from its density.
  3. Let A_n, B_n be sets in V which maximise the discrepancy \frac{1}{|V|^2} \sum_{a \in A_n} \sum_{b \in B_n} g_n(a,b).
  4. Let {\mathcal B}_{n+1} be the \sigma-algebra generated by {\mathcal B}_n and A_n, B_n. Now increment n by n+1 and return to Step 2.

Let \varepsilon > 0 and F: {\Bbb Z}_+ \to {\Bbb Z}_+ be a function. Applying the Hilbert space finite convergence principle to the above sequence of vector spaces, we obtain some N with some bounded size (depending on \varepsilon and F) such that

\| f_n - f_m \|_{L^2(V \times V)} \leq \varepsilon^2 (*)

for all N \leq n, m \leq N+F(N). By a further application of the pigeonhole principle (for Hilbert spaces), one can find N \leq n \leq N+F(N) such that

\| f_{n+1} - f_n\|_{L^2(V \times V)} \leq \varepsilon^2/F(N).

What this basically means is that the partition {\mathcal B}_n is very regular, in that even the greediest way to refine this partition does not significantly capture any more of the fluctuations of the graph G. By choosing F to be a suitable exponentially growing function, one can make the regularity of this partition exceed the number of cells (which is basically 2^{2N}) in the partition {\mathcal B}_N, which is “within epsilon” of the partition {\mathcal B}_n in the sense of (*). Putting all this together, one can get a strong version of the Szemerédi regularity lemma, which implies the usual formulation by a simple argument; see my paper on this topic for further discussion. The choice of F being exponential is what results in the notorious tower-exponential bounds in this regularity lemma (which are necessary, thanks to a result of Gowers). But one can reduce F to, say, a polynomial, resulting in more civilised bounds but with a weaker regularity conclusion. Such a “weak regularity lemma” was for instance established by Frieze and Kannan, and also underlies the “generalised Koopman von Neumann theorem” which is a key component of my result with Ben Green establishing long arithmetic progressions in the primes. In the opposite direction, various flavours of “strong regularity lemma” have appeared in the literature, for instance in this paper of Alon and Shapira, and also turn out to be convenient ways to formulate hypergraph versions of the regularity lemma of adequate strength to imply non-trivial theorems (such as Szemerédi’s theorem).

Rather than using sets which maximise discrepancy, one can also use sublevel sets of the eigenvectors of the adjacency matrix corresponding to the largest eigenvalues of the matrix to generate the partition; see this paper of Frieze and Kannan for details of a closely related construction.

The appearance of spectral theory (eigenvalues and eigenvectors) into this topic brings one in contact with Fourier analysis, especially if one considers circulant matrices (which correspond in graph-theoretic terms to Cayley graphs on a cyclic group). This leads us towards the arithmetic regularity lemma of Green, which regularises a bounded function f on a finite abelian group G in terms of a partition generated by the sublevel sets (Bohr sets) of a bounded number of characters; the precise formulation is a bit lengthy to state properly, although it simplifies substantially in the “finite field model” case when G is a vector space over a small finite field (e.g. {\Bbb F}_2). This arithmetic regularity lemma can also be established using the finite convergence principle (in either the numerical form or the Hilbert space form). Indeed, if we let H = L^2(G) and let V_n be the vector space generated by the characters associated to the n largest Fourier coefficients of f, then by applying the finite convergence principle (with v = f) we can locate a metastable region, where there is not much going on (in an L^2 sense) between V_N and V_{N+F(N)} for some (exponentially growing) function F, thus there is a “spectral gap” of sorts between the N largest Fourier coefficients and the coefficients ranked N+F(N) and beyond. The sublevel sets of characters associated to the N largest coefficients can then be used to regularise the original function f. Similar ideas also appear in an old paper of Bourgain, and in a paper of Green and Konyagin.

— A non-finitisable statement —

One may think from the above discussion that every infinitary statement concerning, say, the natural numbers, has a finitary analogue which is equivalent to it. It turns out that this is not quite the case (and in fact some subtle issues in logic come in), even for very “natural” and “non-pathological” infinitary statements. In particular, the following innocuous-looking infinitary statement is basically non-finitisable:

Infinite pigeonhole principle. If the natural numbers are divided into finitely many colour classes, then one of the colour classes is infinite.

This principle is of course a triviality in the infinitary world, and there are some obvious candidates for finitary versions (e.g. the finite pigeonhole principle), but they are not equivalent to the infinitary principle the same way that the finite convergence principle is equivalent to the infinite convergence principle; there is a failure of compactness here. There is a finitary version (of sorts), but it is somewhat difficult to state. Define a set function to be any function f which takes a finite set A of natural numbers as input, and returns a natural number F(A) as output. Let us say that a set function F is asymptotically stable near infinite sets[8] if, given any sequence A_1, A_2, A_3, \ldots of finite sets of natural numbers which converges to an infinite set A in the sense that for any finite set I, A_n \cap I = A \cap I for all sufficiently large n, the numbers F(A_n) are constant for sufficiently large n. For instance, the “least element” set function f(A) = \inf( A ) (adopting the non-standard convention that the infimum of the empty set is 0) is asymptotically stable, but the “cardinality” set function f(A) = |A| is not. Anyway, the correct “finitary” version of the infinite pigeonhole principle is

“Finitary” infinite pigeonhole principle. Let F be an set function that is asymptotically stable near infinite sets, and let k be a positive integer. Then there exists a positive integer N with the property that whenever the set {1,…,N} is divided into k colour classes, at least one of the colour classes A has the property that |A| > F(A).

It is a good exercise to see that this principle is equivalent to the infinite pigeonhole principle. Note that the plain old pigeonhole principle essentially corresponds to the case when F is a constant function – and is thus definitely a very special case. The case when F(A) = f(\inf( A )) for some fixed function f is already very interesting – it is the 1-uniform case of a “strong Ramsey theorem”and is barely provable by finitary means (try it, say for k=10 and F(A) := \inf(A) + 10. What quantitative bound for N do you get?), although the general case of that theorem is not (but does follow easily from the above principle), thanks to the Paris-Harrington theorem. The assumption of asymptotic stability of F near infinite sets is necessary, as one can see by considering the counterexample F(A) := |A|.

I am enclosing “finitary” in quotes because while most of the assertion of this principle is finitary, one part still is not, which is the notion of “asymptotically stable near infinite sets”. This is a notion which cannot be precisely formulated in a purely finitary manner, even though the notion of a set function is basically a finitary concept (ignoring for now a subtle issue about what “function” means). If one insists on working in a finitary setting, then one can recast the infinite pigeonhole principle as a schema of finitary principles, one for each set function F that is asymptotically stable near infinite sets, but in order to work out exactly which set functions are asymptotically stable near infinite sets or not requires infinitary mathematics. (And for some (constructible, well-defined) set functions, the asymptotic stability near infinite sets is undecidable; this fact is closely related to the undecidability of the halting problem and is left as an exercise to the reader.)

The topic of exactly which statements in infinitary mathematics are “truly infinitary” is a fascinating one, and is basically a question in reverse mathematics, but we will not be able to discuss it here.

(I am indebted to Harvey Friedman for discussions on the Paris-Harrington theorem and the infinite pigeonhole principle.)

— Footnotes —

(N.B. Footnotes were created following the suggestions from this post by santoki.)

[1] One can subdivide hard analysis into further subcategories by inspecting what kind of inequalities are used. There is “exact hard analysis” where one really uses \leq; “quasi-exact hard analysis” in which one is willing to lose absolute constants (and so one sees notation such as O(), \lesssim, or \ll); “logarithmically coarse hard analysis” in which one is willing to lose quantities such as \log^{O(1)} N which are “logarithmic” in some key parameter N; and “polynomially coarse hard analysis” in which one is willing to lose quantities such as N^{O(1)} which are polynomial in key parameters. Finally, there is “coarse analysis” in which one is willing to lose arbitrary functions of key parameters. The relationships between these flavours of hard analysis are interesting, but will have to wait to be discussed elsewhere.

[2] One can use these axioms to make finer distinctions, for instance “strongly finitary” analysis, in which one is not even willing to use real numbers, but instead only works with finite complexity numbers (e.g. rationals), and “strongly infinitary” analysis, in which one freely uses the axiom of choice (or related concepts such as ultrafilters). There are also hybrids between finitary and infinitary analysis, such as “pre-infinitary” analysis, in which one takes sequences of increasingly large or complex objects, and uses phrases such as “passing to a subsequence if necessary” a lot, but does not actually “jump to the limit”; we also have “pseudo-finitary” analysis, of which non-standard analysis is the most prominent example, in which infinitary methods are re-expressed using infinitesimals or other pseudo-finitary objects. Again, these distinctions will have to wait to be discussed elsewhere.

[3] There are rigorous results from proof theory, such as Herbrand’s theorem, which can allow one to automatically convert certain types of qualitative arguments into quantitative ones, but there appears to be only a small amount of literature applying these general results in logic to actual soft analysis arguments. (See the comments to this post for some recent developments in this area.)

[4] For instance, in my result with Ben Green establishing arbitrarily long arithmetic progressions of primes, the argument was (necessarily) finitary in nature, but it was absolutely essential for us to be aware of the infinitary arguments and intuition that had been developed in ergodic theory, as we had to adapt such arguments to the finitary setting in order to conclude our proof, and it would have far less evident how to discover such arguments if we were always restricted to looking at finitary settings. In general, it seems that infinitary methods are good for “long-range” mathematics, as by ignoring all quantitative issues one can move more rapidly to uncover qualitatively new kinds of results, whereas finitary methods are good for “short-range” mathematics, in which existing “soft” results are refined and understood much better via the process of making them increasingly sharp, precise, and quantitative. I feel therefore that these two methods are complementary, and are both important to deepening our understanding of mathematics as a whole.

[5] This looks like Tychonoff’s theorem, but because we require sequential compactness here rather than topological compactness, the result here is in fact much closer in spirit to the Arzelà-Ascoli theorem. In particular, the axiom of choice is not actually used here, instead one repeatedly uses the Bolzano-Weierstrass theorem for the interval [0,1] followed by a diagonalisation argument. The astute reader here will observe that the Bolzano-Weierstrass theorem is essentially equivalent to the infinite convergence principle! Fortunately, there is no circularity here, because we are only using this theorem in order to deduce the finite convergence principle from the infinite, and not the other way around.

[6] One could also view this as a “noncommutative” or “quantum” version of the convergence principle, but this is somewhat of an abuse of terminology, despite the presence of the Hilbert space, since we don’t actually have any noncommutativity or any other quantum weirdness going on.

[7] One can also replace this greedy algorithm by a random algorithm, in which each {\mathcal B}_{n+1} is obtained from {\mathcal B}_n by adding a neighbourhood N(v_n) of a randomly chosen vertex v_n in V. This use of randomisation appears in the infinitary setting in this paper of myself, and in the finitary setting in this paper of Ishigami.

[8] This notation is very roughly equivalent to the notion of a function F(A) defined for all sets of integers (both finite and infinite) which can always be “computed” in “finite time” if the set is infinite. But one should take this informal definition with a large grain of salt: while there is indeed an algorithm for computing F(A) for any given set A which will eventually give the right answer, you might not be able to tell when the algorithm has finished! A good example is the function F(A) := \inf(A) which is asymptotically stable near infinite sets: you can “compute” this function for any set A by initialising the answer to 0, running a counter n from 0 to infinity, and resetting the answer permanently to n the first time n lies in A. As long as A is non-empty, this algorithm terminates in finite time with the correct answer; if A is empty, the algorithm gives the right answer from the beginning, but you can never be sure of this fact! In contrast, the cardinality |A| of a possibly infinite set A cannot be computed even in this rather unsatisfactory sense of having a running “provisional answer” which is guaranteed to eventually be correct.

[Update, May 24: Typos corrected. A distinction drawn between the strong Ramsey theorem (a theorem that can be stated in finitary mathematics), and the Paris-Harrington theorem (which shows that the strong Ramsey theorem is unprovable by finitary means). Acknowledgment to Harvey Friedman added, as well as a link to reverse mathematics.]

[Update, May 25: functional analysis changed to operator algebras. Footnote [3] modified in view of recent comments.]

[Update, May 31: \inf(A \cup \{0\}) corrected to “\inf(A) when A is non-empty, and 0 if A is empty”.]

[Update, Aug 19, 2008: Definition of asymptotic stability corrected; thanks to Jaime Gaspar for pointing out the problem.]