In this second lecture, I wish to talk about the dichotomy between structure and randomness as it manifests itself in four closely related areas of mathematics:

  • Combinatorial number theory, which seeks to find patterns in unstructured dense sets (or colourings) of integers;
  • Ergodic theory (or more specifically, multiple recurrence theory), which seeks to find patterns in positive-measure sets under the action of a discrete dynamical system on probability spaces (or more specifically, measure-preserving actions of the integers {\Bbb Z});
  • Graph theory, or more specifically the portion of this theory concerned with finding patterns in large unstructured dense graphs; and
  • Ergodic graph theory, which is a very new and undeveloped subject, which roughly speaking seems to be concerned with the patterns within a measure-preserving action of the infinite permutation group S_\infty, which is one of several models we have available to study infinite “limits” of graphs.

The two “discrete” (or “finitary”, or “quantitative”) fields of combinatorial number theory and graph theory happen to be related to each other, basically by using the Cayley graph construction; I will give an example of this shortly. The two “continuous” (or “infinitary”, or “qualitative”) fields of ergodic theory and ergodic graph theory are at present only related on the level of analogy and informal intuition, but hopefully some more systematic connections between them will appear soon.

On the other hand, we have some very rigorous connections between combinatorial number theory and ergodic theory, and also (more recently) between graph theory and ergodic graph theory, basically by the procedure of viewing the infinitary continuous setting as a limit of the finitary discrete setting. These two connections go by the names of the Furstenberg correspondence principle and the graph correspondence principle respectively. These principles allow one to tap the power of the infinitary world (for instance, the ability to take limits and perform completions or closures of objects) in order to establish results in the finitary world, or at least to take the intuition gained in the infinitary world and transfer it to a finitary setting. Conversely, the finitary world provides an excellent model setting to refine one’s understanding of infinitary objects, for instance by establishing quantitative analogues of “soft” results obtained in an infinitary manner. I will remark here that this best-of-both-worlds approach, borrowing from both the finitary and infinitary traditions of mathematics, was absolutely necessary for Ben Green and I in order to establish our result on long arithmetic progressions in the primes. In particular, the infinitary setting is excellent for being able to rigorously define and study concepts (such as structure or randomness) which are much “fuzzier” and harder to pin down exactly in the finitary world.

Let me first discuss the connection between combinatorial number theory and graph theory. We can illustrate this connection with two classical results from the former and latter field respectively:

  • Schur’s theorem: If the positive integers are coloured using finitely many colours, then one can find positive integers x, y such that x, y, x+y all have the same colour.
  • Ramsey’s theorem: If an infinite complete graph is edge-coloured using finitely many colours, then one can find a triangle all of whose edges have the same colour.

(In fact, both of these theorems can be generalised to say much stronger statements, but we will content ourselves with just these special cases). It is in fact easy to see that Schur’s theorem is deducible from Ramsey’s theorem. Indeed, given a colouring of the positive integers, one can create an infinite coloured complete graph (the Cayley graph associated to that colouring) whose vertex set is the integers {\Bbb Z}, and such that an edge {a,b} with a < b is coloured using the colour assigned to b-a. Applying Ramsey’s theorem, together with the elementary identity (c-a) = (b-a) + (c-b), we then quickly deduce Schur’s theorem.

Let us now turn to ergodic theory. The basic object of study here is a measure-preserving system (or probability-preserving system), which is a probability space (X, {\mathcal B}, \mu) (i.e. a set X equipped with a sigma-algebra \mathcal B of measurable sets and a probability measure \mu on that sigma-algebra), together with a shift map T: X \to X, which for simplicity we shall take to be invertible and bi-measurable (so its inverse is also measurable); in particular we have iterated shift maps T^n: X \to X for any integer n, giving rise to an action of the integers {\Bbb Z}. The important property we need is that the shift map is measure-preserving, thus \mu(T(E)) = \mu(E) for all measurable sets E.

In the last lecture we saw that sets of integers could be divided (rather informally) into structured sets, pseudorandom sets, and hybrids between the two. The same is true in ergodic theory – and this time, one can in fact make these notions extremely precise. Let us first start with some examples:

  • The circle shift, in which X := {\Bbb R}/{\Bbb Z} is the standard unit circle with normalised Haar measure, and T(x) := x + \alpha for some fixed real number \alpha. If we identify X with the unit circle in the complex plane via the standard identification x \mapsto e^{2\pi i x}, then the shift corresponds to an anti-clockwise rotation by \alpha. This is a very structured system, and corresponds in combinatorial number theory to Bohr sets such as \{ n \in {\Bbb Z}: 0 < \{ n \alpha \} < 0.01 \}, which implicitly made an appearance in the previous lecture.
  • The two-point shift, in which X := {0,1} with uniform probability measure, and T simply interchanges 0 and 1. This very structured system corresponds to the set A of odd numbers (or of even numbers) mentioned in the previous lecture. More generally, any permutation on a finite set gives rise to a simple measure-preserving system.
  • The skew shift, in which X := ({\Bbb R}/{\Bbb Z})^2 is the 2-torus with normalised Haar measure, and T(x,y) := (x+\alpha,y+x) for some fixed real number \alpha. If we just look at the behaviour of the x-component of this torus we see that the skew shift contains the circle shift as a factor, or equivalently that the skew shift is an extension of the circle shift (in this particular case, since the fibres are circles and the action on the fibres is rotation, we call this a circle extension of the circle shift). This system is also structured (but in a more complicated way than the previous two shifts), and corresponds to quadratically structured sets such as the quadratic Bohr set \{ n \in {\Bbb Z}: 0 < \{ \sqrt{2} n^2 \} < 0.01 \}, which made an appearance in the previous lecture.
  • The Bernoulli shift, in which X := \{0,1\}^{\Bbb Z} \equiv 2^{\Bbb Z} is the space of infinite 0-1 sequences (or equivalently, the space of all sets of integers), equipped with uniform product probability measure, and T is the left shift T (x_n)_{n \in \Bbb Z} := (x_{n+1})_{n \in \Bbb Z}. This is a very random system, corresponding to the random sets B discussed in the previous lecture.
  • Hybrid systems, e.g. products of a circle shift and a Bernoulli shift, or extensions of a circle shift by a Bernoulli system, a doubly skew shift (a circle extension of a circle extension of a circle shift), etc.

One can classify these systems in precise terms according to how the shift action T^n moves sets E around. On the one hand, we have some well-defined notions which represent structure:

  • Trivial systems are such that T^n E = E for all E and all n.
  • Periodic systems are such that for every E, there exists a positive n such that T^n E = E. The two-point shift is an example, as is the circle shift when \alpha is rational.
  • Almost periodic or compact systems are such that for every E and every \epsilon > 0, there exists a positive n such that T^n E and E differ by a set of measure at most \epsilon. The circle shift is a good example of this (thanks to the equidistribution theorem). The term “compact” is used because there is an equivalent characterisation of compact systems, namely that the orbits of the shift in L^2(X) are always precompact in the strong topology.

On the other hand, we have some well-defined terms which represent pseudorandomness:

  • Strongly mixing systems are such that for every E, F, we have \mu(T^n E \cap F) \to \mu(E) \mu(F) as n tends to infinity; the Bernoulli shift is a good example. Informally, this is saying that shifted sets become asymptotically independent of unshifted sets.
  • Weakly mixing systems are such that for every E, F, we have \mu(T^n E \cap F) \to \mu(E) \mu(F) as n tends to infinity after excluding a set of exceptional values of n of asymptotic density zero. For technical reasons, weak mixing is a better notion to use in the structure-randomness dichotomy than strong mixing (for much the same reason that one always wants to allow negligible sets of measure zero in measure theory).

There are also more complicated (but well-defined) hybrid notions of structure and randomness which we will not give here. We will however briefly discuss the situation for the skew shift. This shift is not almost periodic: most sets A will become increasingly “skewed” as it gets shifted, and will never return to resemble itself again. However, if one restricts attention to the underlying circle shift factor (i.e. restricting attention only to those sets which are unions of vertical fibres), then one recovers almost periodicity. Furthermore, the skew shift is almost periodic relative to the underlying circle shift, in the sense that while the shifts T^n A of a given set A do not return to resemble A globally, they do return to resemble A when restricted to any fixed vertical fibre (this can be shown using the method of Weyl sums from Fourier analysis). Because of this, we say that the skew shift is a compact extension of a compact system.

As discussed in the above examples, every dynamical system is capable of generating some interesting sets of integers, specifically recurrence sets \{ n \in {\Bbb Z}: T^n x_0 \in E \} where E is a set in X and x_0 is a point in X. This set actally captures much of the dynamics of E in the system (especially if X is ergodic and x_0 is “generic”). The Furstenberg correspondence principle reverses this procedure, starting with a set of integers A and using that to generate a dynamical system which “models” that set in a certain way. Modulo some minor technicalities, it works as follows.

  1. As with the Bernoulli shift, we work in the space X := \{0,1\}^{\Bbb Z} \equiv 2^{\Bbb Z}, with the product sigma-algebra and the left shift; but we leave the probability measure \mu (which can be interpreted as the distribution of a certain random subset of the integers) undefined for now. The original set A can now be interpreted as a single point inside X.
  2. Now pick a large number N, and shift A backwards and forwards up to N times, giving rise to 2N+1 sets T^{-N} A, \ldots, T^N A, which can be thought of as 2N+1 points inside X. We consider the uniform distribution on these points, i.e. we shift A by a random amount between -N and N. This gives rise to a discrete probability measure \mu_N on X (which is only supported on 2N+1 points inside X). Each of these measures is approximately invariant under the shift T.
  3. We now let N go to infinity. We apply the (sequential form of the) Banach-Alaoglu theorem, which among other things shows that the space of Borel probability measures on a compact Hausdorff space (which X is) is sequentially compact in the weak-* topology. (This particular version of Banach-Alaoglu can in fact be established by a diagonalisation argument which completely avoids the axiom of choice.) Thus we can find a subsequence of the measures \mu_N which converge in the weak-* topology to a limit \mu (this subsequence and limit may not be unique, but this will not concern us). Since the \mu_N are approximately invariant under T, with the degree of approximation improving with N, one can easily show that the limit measure \mu is shift-invariant.

By using this recipe to construct a measure-preserving system from a set of integers, it is possible to deduce theorems in combinatorial number theory from those in ergodic theory (similarly to how the Cayley graph construction allowed one to deduce theorems in combinatorial number theory from those in graph theory). The most famous example of this concerns the following two deep theorems:

Using the above correspondence principle (or a slight variation thereof), it is not difficult to show that the two theorems are in fact equivalent; see for instance Furstenberg’s book on the subject. The power of these two theorems derives from the fact that the former works for arbitrary sets of positive density, and the latter works for arbitrary measure-preserving systems – there are essentially no structural assumptions on the basic object of study in either, and it is therefore quite remarkable that one can still conclude such a non-trivial result.

The story of Szemerédi’s theorem is quite a long one, which I have discussed in many previous places, and will not do so again here, though I will note here that all the proofs of this theorem exploit the dichotomy between structure and randomness (and there are some good reasons for this – the underlying cause of arithmetic progressions is totally different in the structured and pseudorandom cases). I will however briefly describe how Furstenberg’s recurrence theorem is proven (following the approach of Furstenberg, Katznelson, and Ornstein; there are a couple other ergodic theoretic proofs, including of course Furstenberg’s original proof). The first major step is to establish the Furstenberg structure theorem, which takes an arbitrary measure-preserving system and describes it as a suitable hybrid of a compact system and a weakly mixing system (or more precisely, a weakly mixing extension of a transfinite tower of compact extensions). This theorem relies on Zorn’s lemma, although it is possible to give a proof of the recurrence theorem without recourse to the axiom of choice. The proof requires various tools from infinitary analysis (e.g. the compactness of integral operators) but is relatively straightforward. Next, one makes the rather simple observation that the Furstenberg recurrence theorem is easy to show both for compact systems and for weakly mixing systems. In the former case, the almost periodicity shows that there are lots of integers n for which T^n A is almost identical with A (in the sense that they differ by a set of small measure) – which, after shifting by n again, implies that T^{2n} A is almost identical with T^n A, and so forth – which soon makes it easy to arrange matters so that A \cap T^n A \cap \ldots \cap T^{(k-1)n} A is non-empty. In the latter case, the weak mixing shows that for most n, the sets (or “events”) A and T^n A are almost uncorrelated (or “independent”); similarly, for any fixed m, we have A \cap T^m A and T^n(A \cap T^m A) = T^n A \cap T^{n+m} A almost uncorrelated for n large enough. By using the Cauchy-Schwarz inequality (in the form of a useful lemma of van der Corput) repeatedly, we can eventually show that A, T^n A, \ldots, T^{(k-1)n} A are almost jointly independent (as opposed to being merely almost pairwise independent) for many n, at which point the recurrence theorem is easy to show. It is somewhat more tricky to show that one can also combine these arguments with each other to show that the recurrence property also holds for the transfinite combinations of compact and weakly mixing systems that come out of the Furstenberg structure theorem, but it can be done with a certain amount of effort, and this concludes the proof of the recurrence theorem. This same method of proof turns out, with several additional technical twists, to establish many further varieties of recurrence theorems, which in turn (via the correspondence principle) gives several powerful results in combinatorial number theory, several of which continue to have no non-ergodic proof even today.

(There has also been a significant amount of progress more recently by several ergodic theorists in understanding the “structured” side of the Furstenberg structure theorem, in which dynamical notions of structure, such as compactness, have been converted into algebraic and topological notions of structure, in particular into the actions of nilpotent Lie groups on their homogeneous spaces. This is an important development, and is closely related to the polynomial and generalised polynomial sequences appearing in the previous talk, but it would be beyond the scope of this talk to discuss it here.)

Let us now leave ergodic theory and return to graph theory. Given the power of the Furstenberg correspondence principle, it is natural to look for something similar in graph theory, which would connect up results in finitary graph theory with some infinitary variant. A typical candidate for a finitary graph theory result that one would hope to do this for is the triangle removal lemma, which was discussed in a recent blog post here. That lemma is in fact closely connected with Szemerédi’s theorem, indeed it implies the k=3 case of that theorem (i.e. Roth’s theorem) in much the same way that Ramsey’s theorem implies Schur’s theorem. It does turn out that this is possible, although the infinitary analogues of things like the triangle removal lemma are a little strange-looking (one such analogue can be found in this paper; another can be found in this one). But it is easier to describe the concept of a graph limit. There are several equivalent formulations of this limit, including the notion of a “graphon” introduced by Lovász and Szegedy, the flag algebra construction introduced by Razborov, and the notion of a permutation-invariant measure space introduced by myself. I will discuss my own construction here, which is closely modelled on the Furstenberg correspondence principle. What it does is starts with a sequence G_n of graphs (which one should think of as getting increasingly large, while remaining dense) and extracts a limit object, which is a probability space (X, {\mathcal B}, \mu) together with an action of the permutation group S_\infty on the integers, as follows.

  1. We let X = 2^{\binom{\Bbb Z}{2}} be the space of all graphs on the integers, with the standard product (i.e. weak) topology, and hence product sigma-algebra. This space has an obvious action of the permutation group S_\infty, formed by permuting the vertices.
  2. Each graph G_n generates a random graph on the integers – or equivalently, a probability measure \mu_n in X – as follows. We randomly and independently sample the vertices of the graph G_n infinitely often, creating a sequence (v_{n,i})_{i \in {\Bbb Z}} of vertices in the graph G_n. (Of course, many of these vertices will collide, but this will be not be important for us.) This then creates a random graph on the integers, with any two integers i and j connected by an edge if their associated vertices v_{n,i}, v_{n,j} are distinct and are connected by an edge in G_n. By construction, the probability measure \mu_n associated to this graph is already S_\infty-invariant.
  3. We then let n go to infinity, and extract a weak limit \mu just as with the Furstenberg correspondence principle.

It is then possible to prove results somewhat analogous to the Furstenberg structure theorem and Furstenberg recurrence theorem in this setting, and use this to prove several results in graph theory (as well as its more complicated generalisation, hypergraph theory). I myself am optimistic that by transferring more ideas from traditional ergodic theory into this new setting of “ergodic graph theory”, that one could obtain a new tool for systematically establishing a number of other qualitative results in graph theory, particularly those which are traditionally reliant on the Szemerédi regularity lemma (which is almost a qualitative result itself, given how poor the bounds are). This is however still a work in progress.