In this second lecture, I wish to talk about the dichotomy between structure and randomness as it manifests itself in four closely related areas of mathematics:

*Combinatorial number theory*, which seeks to find patterns in unstructured dense sets (or colourings) of integers;*Ergodic theory*(or more specifically, multiple recurrence theory), which seeks to find patterns in positive-measure sets under the action of a discrete dynamical system on probability spaces (or more specifically, measure-preserving actions of the integers );*Graph theory*, or more specifically the portion of this theory concerned with finding patterns in large unstructured dense graphs; and*Ergodic graph theory*, which is a very new and undeveloped subject, which roughly speaking seems to be concerned with the patterns within a measure-preserving action of the infinite permutation group , which is one of several models we have available to study infinite “limits” of graphs.

The two “discrete” (or “finitary”, or “quantitative”) fields of combinatorial number theory and graph theory happen to be related to each other, basically by using the Cayley graph construction; I will give an example of this shortly. The two “continuous” (or “infinitary”, or “qualitative”) fields of ergodic theory and ergodic graph theory are at present only related on the level of analogy and informal intuition, but hopefully some more systematic connections between them will appear soon.

On the other hand, we have some very rigorous connections between combinatorial number theory and ergodic theory, and also (more recently) between graph theory and ergodic graph theory, basically by the procedure of viewing the infinitary continuous setting as a limit of the finitary discrete setting. These two connections go by the names of the *Furstenberg correspondence principle* and the *graph correspondence principle* respectively. These principles allow one to tap the power of the infinitary world (for instance, the ability to take limits and perform completions or closures of objects) in order to establish results in the finitary world, or at least to take the *intuition* gained in the infinitary world and transfer it to a finitary setting. Conversely, the finitary world provides an excellent model setting to refine one’s understanding of infinitary objects, for instance by establishing quantitative analogues of “soft” results obtained in an infinitary manner. I will remark here that this best-of-both-worlds approach, borrowing from both the finitary and infinitary traditions of mathematics, was absolutely necessary for Ben Green and I in order to establish our result on long arithmetic progressions in the primes. In particular, the infinitary setting is excellent for being able to rigorously define and study concepts (such as structure or randomness) which are much “fuzzier” and harder to pin down exactly in the finitary world.

Let me first discuss the connection between combinatorial number theory and graph theory. We can illustrate this connection with two classical results from the former and latter field respectively:

- Schur’s theorem: If the positive integers are coloured using finitely many colours, then one can find positive integers x, y such that x, y, x+y all have the same colour.
- Ramsey’s theorem: If an infinite complete graph is edge-coloured using finitely many colours, then one can find a triangle all of whose edges have the same colour.

(In fact, both of these theorems can be generalised to say much stronger statements, but we will content ourselves with just these special cases). It is in fact easy to see that Schur’s theorem is deducible from Ramsey’s theorem. Indeed, given a colouring of the positive integers, one can create an infinite coloured complete graph (the *Cayley graph *associated to that colouring) whose vertex set is the integers , and such that an edge {a,b} with a < b is coloured using the colour assigned to b-a. Applying Ramsey’s theorem, together with the elementary identity (c-a) = (b-a) + (c-b), we then quickly deduce Schur’s theorem.

Let us now turn to ergodic theory. The basic object of study here is a *measure-preserving system* (or *probability-preserving system*), which is a probability space (i.e. a set X equipped with a sigma-algebra of measurable sets and a probability measure on that sigma-algebra), together with a shift map , which for simplicity we shall take to be invertible and bi-measurable (so its inverse is also measurable); in particular we have iterated shift maps for any integer n, giving rise to an action of the integers . The important property we need is that the shift map is measure-preserving, thus for all measurable sets E.

In the last lecture we saw that sets of integers could be divided (rather informally) into structured sets, pseudorandom sets, and hybrids between the two. The same is true in ergodic theory – and this time, one can in fact make these notions extremely precise. Let us first start with some examples:

- The
*circle shift,*in which is the standard unit circle with normalised Haar measure, and for some fixed real number . If we identify X with the unit circle in the complex plane via the standard identification , then the shift corresponds to an anti-clockwise rotation by . This is a very structured system, and corresponds in combinatorial number theory to*Bohr sets*such as , which implicitly made an appearance in the previous lecture. - The
*two-point shift*, in which X := {0,1} with uniform probability measure, and T simply interchanges 0 and 1. This very structured system corresponds to the set A of odd numbers (or of even numbers) mentioned in the previous lecture. More generally, any permutation on a finite set gives rise to a simple measure-preserving system. - The
*skew shift*, in which is the 2-torus with normalised Haar measure, and for some fixed real number . If we just look at the behaviour of the x-component of this torus we see that the skew shift contains the circle shift as a*factor*, or equivalently that the skew shift is an*extension*of the circle shift (in this particular case, since the fibres are circles and the action on the fibres is rotation, we call this a*circle extension*of the circle shift). This system is also structured (but in a more complicated way than the previous two shifts), and corresponds to quadratically structured sets such as the*quadratic Bohr set*, which made an appearance in the previous lecture. - The
*Bernoulli shift*, in which is the space of infinite 0-1 sequences (or equivalently, the space of all sets of integers), equipped with uniform product probability measure, and T is the left shift . This is a very random system, corresponding to the random sets B discussed in the previous lecture. - Hybrid systems, e.g. products of a circle shift and a Bernoulli shift, or extensions of a circle shift by a Bernoulli system, a doubly skew shift (a circle extension of a circle extension of a circle shift), etc.

One can classify these systems in precise terms according to how the shift action moves sets E around. On the one hand, we have some well-defined notions which represent structure:

*Trivial*systems are such that for all E and all n.*Periodic*systems are such that for every E, there exists a positive n such that . The two-point shift is an example, as is the circle shift when is rational.*Almost periodic*or*compact*systems are such that for every E and every , there exists a positive n such that and differ by a set of measure at most . The circle shift is a good example of this (thanks to the equidistribution theorem). The term “compact” is used because there is an equivalent characterisation of compact systems, namely that the orbits of the shift in are always precompact in the strong topology.

On the other hand, we have some well-defined terms which represent pseudorandomness:

*Strongly mixing*systems are such that for every E, F, we have as n tends to infinity; the Bernoulli shift is a good example. Informally, this is saying that shifted sets become asymptotically independent of unshifted sets.*Weakly mixing*systems are such that for every E, F, we have as n tends to infinity after excluding a set of exceptional values of n of asymptotic density zero. For technical reasons, weak mixing is a better notion to use in the structure-randomness dichotomy than strong mixing (for much the same reason that one always wants to allow negligible sets of measure zero in measure theory).

There are also more complicated (but well-defined) hybrid notions of structure and randomness which we will not give here. We will however briefly discuss the situation for the skew shift. This shift is not almost periodic: most sets A will become increasingly “skewed” as it gets shifted, and will never return to resemble itself again. However, if one restricts attention to the underlying circle shift factor (i.e. restricting attention only to those sets which are unions of vertical fibres), then one recovers almost periodicity. Furthermore, the skew shift is almost periodic *relative* to the underlying circle shift, in the sense that while the shifts of a given set A do not return to resemble A globally, they do return to resemble A when restricted to any fixed vertical fibre (this can be shown using the method of Weyl sums from Fourier analysis). Because of this, we say that the skew shift is a *compact extension* of a compact system.

As discussed in the above examples, every dynamical system is capable of generating some interesting sets of integers, specifically *recurrence sets* where E is a set in X and is a point in X. This set actally captures much of the dynamics of E in the system (especially if X is ergodic and is “generic”). The *Furstenberg correspondence principle* reverses this procedure, starting with a set of integers A and using that to generate a dynamical system which “models” that set in a certain way. Modulo some minor technicalities, it works as follows.

- As with the Bernoulli shift, we work in the space , with the product sigma-algebra and the left shift; but we leave the probability measure (which can be interpreted as the distribution of a certain random subset of the integers) undefined for now. The original set A can now be interpreted as a single point inside X.
- Now pick a large number N, and shift A backwards and forwards up to N times, giving rise to 2N+1 sets , which can be thought of as 2N+1 points inside X. We consider the uniform distribution on these points, i.e. we shift A by a random amount between -N and N. This gives rise to a discrete probability measure on X (which is only supported on 2N+1 points inside X). Each of these measures is approximately invariant under the shift T.
- We now let N go to infinity. We apply the (sequential form of the) Banach-Alaoglu theorem, which among other things shows that the space of Borel probability measures on a compact Hausdorff space (which X is) is sequentially compact in the weak-* topology. (This particular version of Banach-Alaoglu can in fact be established by a diagonalisation argument which completely avoids the axiom of choice.) Thus we can find a subsequence of the measures which converge in the weak-* topology to a limit (this subsequence and limit may not be unique, but this will not concern us). Since the are approximately invariant under T, with the degree of approximation improving with N, one can easily show that the limit measure is shift-invariant.

By using this recipe to construct a measure-preserving system from a set of integers, it is possible to deduce theorems in combinatorial number theory from those in ergodic theory (similarly to how the Cayley graph construction allowed one to deduce theorems in combinatorial number theory from those in graph theory). The most famous example of this concerns the following two deep theorems:

- Szemerédi’s theorem: If A is a set of integers of positive upper density, and k is a positive integer, then A contains infinitely many arithmetic progressions of length k. (Note that the case k=2 is trivial.)
- Furstenberg’s recurrence theorem: If E is a set of positive measure in a measure-preserving system, and k is a positive integer, then there are infinitely many integers n for which . (Note that the case k=2 is the more classical Poincaré recurrence theorem).

Using the above correspondence principle (or a slight variation thereof), it is not difficult to show that the two theorems are in fact equivalent; see for instance Furstenberg’s book on the subject. The power of these two theorems derives from the fact that the former works for *arbitrary* sets of positive density, and the latter works for *arbitrary* measure-preserving systems – there are essentially no structural assumptions on the basic object of study in either, and it is therefore quite remarkable that one can still conclude such a non-trivial result.

The story of Szemerédi’s theorem is quite a long one, which I have discussed in many previous places, and will not do so again here, though I will note here that all the proofs of this theorem exploit the dichotomy between structure and randomness (and there are some good reasons for this – the underlying cause of arithmetic progressions is totally different in the structured and pseudorandom cases). I will however briefly describe how Furstenberg’s recurrence theorem is proven (following the approach of Furstenberg, Katznelson, and Ornstein; there are a couple other ergodic theoretic proofs, including of course Furstenberg’s original proof). The first major step is to establish the *Furstenberg structure theorem*, which takes an arbitrary measure-preserving system and describes it as a suitable hybrid of a compact system and a weakly mixing system (or more precisely, a weakly mixing extension of a transfinite tower of compact extensions). This theorem relies on Zorn’s lemma, although it is possible to give a proof of the recurrence theorem without recourse to the axiom of choice. The proof requires various tools from infinitary analysis (e.g. the compactness of integral operators) but is relatively straightforward. Next, one makes the rather simple observation that the Furstenberg recurrence theorem is easy to show both for compact systems and for weakly mixing systems. In the former case, the almost periodicity shows that there are lots of integers n for which is almost identical with A (in the sense that they differ by a set of small measure) – which, after shifting by n again, implies that is almost identical with , and so forth – which soon makes it easy to arrange matters so that is non-empty. In the latter case, the weak mixing shows that for most n, the sets (or “events”) and are almost uncorrelated (or “independent”); similarly, for any fixed m, we have and almost uncorrelated for n large enough. By using the Cauchy-Schwarz inequality (in the form of a useful lemma of van der Corput) repeatedly, we can eventually show that are almost *jointly* independent (as opposed to being merely almost pairwise independent) for many n, at which point the recurrence theorem is easy to show. It is somewhat more tricky to show that one can also combine these arguments with each other to show that the recurrence property also holds for the transfinite combinations of compact and weakly mixing systems that come out of the Furstenberg structure theorem, but it can be done with a certain amount of effort, and this concludes the proof of the recurrence theorem. This same method of proof turns out, with several additional technical twists, to establish many further varieties of recurrence theorems, which in turn (via the correspondence principle) gives several powerful results in combinatorial number theory, several of which continue to have no non-ergodic proof even today.

(There has also been a significant amount of progress more recently by several ergodic theorists in understanding the “structured” side of the Furstenberg structure theorem, in which dynamical notions of structure, such as compactness, have been converted into algebraic and topological notions of structure, in particular into the actions of nilpotent Lie groups on their homogeneous spaces. This is an important development, and is closely related to the polynomial and generalised polynomial sequences appearing in the previous talk, but it would be beyond the scope of this talk to discuss it here.)

Let us now leave ergodic theory and return to graph theory. Given the power of the Furstenberg correspondence principle, it is natural to look for something similar in graph theory, which would connect up results in finitary graph theory with some infinitary variant. A typical candidate for a finitary graph theory result that one would hope to do this for is the triangle removal lemma, which was discussed in a recent blog post here. That lemma is in fact closely connected with Szemerédi’s theorem, indeed it implies the k=3 case of that theorem (i.e. Roth’s theorem) in much the same way that Ramsey’s theorem implies Schur’s theorem. It does turn out that this is possible, although the infinitary analogues of things like the triangle removal lemma are a little strange-looking (one such analogue can be found in this paper; another can be found in this one). But it is easier to describe the concept of a graph limit. There are several equivalent formulations of this limit, including the notion of a “graphon” introduced by Lovász and Szegedy, the flag algebra construction introduced by Razborov, and the notion of a permutation-invariant measure space introduced by myself. I will discuss my own construction here, which is closely modelled on the Furstenberg correspondence principle. What it does is starts with a sequence of graphs (which one should think of as getting increasingly large, while remaining dense) and extracts a limit object, which is a probability space together with an action of the permutation group on the integers, as follows.

- We let be the space of all graphs on the integers, with the standard product (i.e. weak) topology, and hence product sigma-algebra. This space has an obvious action of the permutation group , formed by permuting the vertices.
- Each graph generates a random graph on the integers – or equivalently, a probability measure in X – as follows. We randomly and independently sample the vertices of the graph infinitely often, creating a sequence of vertices in the graph . (Of course, many of these vertices will collide, but this will be not be important for us.) This then creates a random graph on the integers, with any two integers i and j connected by an edge if their associated vertices are distinct and are connected by an edge in . By construction, the probability measure associated to this graph is already -invariant.
- We then let n go to infinity, and extract a weak limit just as with the Furstenberg correspondence principle.

It is then possible to prove results somewhat analogous to the Furstenberg structure theorem and Furstenberg recurrence theorem in this setting, and use this to prove several results in graph theory (as well as its more complicated generalisation, hypergraph theory). I myself am optimistic that by transferring more ideas from traditional ergodic theory into this new setting of “ergodic graph theory”, that one could obtain a new tool for systematically establishing a number of other qualitative results in graph theory, particularly those which are traditionally reliant on the Szemerédi regularity lemma (which is almost a qualitative result itself, given how poor the bounds are). This is however still a work in progress.

## 14 comments

Comments feed for this article

7 April, 2007 at 9:34 pm

rcourantDr. Tao, what do you think about Richard Courant as a mathematician?

7 April, 2007 at 9:47 pm

rcourantTerry,

Do you believe that financial markets are predictable in the long run? Black and Scholes came up with the following:

which gives the price of a derivative at a time .

Thanks!

8 April, 2007 at 12:01 pm

Simons Lecture III: Structure and randomness in PDE « What’s new[...] Hamiltonian PDE, such as the Schrödinger equation , which are heuristically related (via Liouville’s theorem) to measure-preserving actions of the real line (or time axis) , somewhat in analogy to how combinatorial number theory and graph theory were related to measure-preserving actions of and respectively, as discussed in the previous lecture. [...]

10 April, 2007 at 7:55 am

Anonymous“This theorem relies on Zorn’s lemma, although it is possible to give a proof of the recurrence theorem without recourse to the axiom of choice. The proof requires various tools from infinitary analysis (e.g. the compactness of integral operators) but is relatively straightforward.”

Can you give us a link or reference?

10 April, 2007 at 9:02 am

Terence TaoDear Anonymous,

Furstenberg’s original 1977 paper proved the recurrence theorem without Zorn’s lemma, and with some effort can be made fully choice-free. The later proof of Furstenberg-Katznelson-Ornstein in 1982 is perhaps more elegant, proceeding via the Furstenberg structure theorem, but that theorem of course requires transfinite induction (Zorn’s lemma) to prove.

10 April, 2007 at 10:23 am

Not Even Wrong » Blog Archive » Math and Physics Roundup[...] the highest possible quality, see Terry Tao’s postings on his Simons lectures at MIT, here, here and [...]

13 April, 2007 at 11:48 am

HoangDear Professor,

“but that theorem of course requires transfinite induction (Zorn’s lemma) to prove.”

I guess transfinite induction is quite different from Zorn’s lemma. Transfinite induction works with every well-ordered set regardless of Choice (of course this is the case of every set if we assume Choice). As far as I know, the Furstenberg structure theorem makes use of transfinite induction on the ordinal numbers, which are already well-ordered regardless of Choice.

14 April, 2007 at 7:17 pm

Terence TaoDear Hoang,

That’s a good point, though I do view Zorn’s lemma as a generalisation of transfinite induction, which takes place on a poset rather than a well-ordered set. Indeed, one often uses Zorn’s lemma in the following formulation: Assume that a property P holds for at least one element in a poset (base case), that the limit of every chain of elements obeying P, also obeys P (limit ordinal case), and that every non-maximal element of the poset obeying P has a larger element also obeying P (successor ordinal case). Then at least one maximal element of the poset obeys P. In this formulation the similarity to transfinite induction is quite clear.

One can prove Furstenberg’s recurrence theorem using Zorn’s lemma in the above formulation, where the poset is the poset of factors of the original system, and P is the property that Furstenberg’s recurrence theorem holds for that factor. The Furstenberg structure theorem is also proven by Zorn’s lemma, though in this case it is easier to use that lemma in its more usual formuation (to construct a maximal distal factor of a system) than to try to prove this structure theorem in an “inductive” manner.

It turns out that the tower of compact factors that shows up in the Furstenberg structure theorem has an ordinal which is at most countable (and that any such ordinal can arise in this way – a result of Foreman and Beleznay), which probably means that the structure theorem can be proven just using the axiom of countable choice, though I haven’t checked this.

15 April, 2007 at 12:17 pm

Simons Lecture I: Structure and randomness in Fourier analysis and number theory « What’s new[...] first lecture, I describe the dichotomy as it appears in Fourier analysis and in number theory. (In the second, I discuss the dichotomy in ergodic theory and graph theory, while in the third, I discuss [...]

23 May, 2007 at 7:35 pm

Soft analysis, hard analysis, and the finite convergence principle « What’s new[...] theorem on one hand and the Furstenberg recurrence theorem on the other, as briefly discussed in my Simons lecture on these topics. (In these contexts, the correspondence between the finitary and infinitary statements is known as [...]

31 July, 2007 at 4:52 pm

Structure and randomness in combinatorics « What’s new[...] in October. This tutorial covers similar ground as my ICM paper (or slides), or my first two Simons lectures, but focuses more on the “nuts-and-bolts” of how structure theorems actually work to [...]

8 January, 2008 at 7:06 pm

254A, Lecture 1: Overview « What’s new[...] satisfactorily. (In particular, ergodic theory is a framework in which our understanding of the dichotomy between structure and randomness is at is most [...]

31 August, 2008 at 8:58 pm

The correspondence principle and finitary ergodic theory « What’s new[...] that it actually implies the k=3 case of Szemerédi’s theorem. (See also my previous blog posts on related topics.) It turns out that this result can be deduced from some infinitary [...]

1 February, 2009 at 5:52 pm

A massively collaborative mathematical project « What’s new[...] and Roth’s theorem has a combinatorial proof based on the triangle removal lemma (see e.g. my Simons lecture on the subject, or Tim Gowers’ background article for the project), it is thus natural to ask whether the [...]