You are currently browsing the tag archive for the ‘szemeredi regularity lemma’ tag.

Perhaps the most important structural result about general large dense graphs is the Szemerédi regularity lemma. Here is a standard formulation of that lemma:

Lemma 1 (Szemerédi regularity lemma) Let {G = (V,E)} be a graph on {n} vertices, and let {\epsilon > 0}. Then there exists a partition {V = V_1 \cup \ldots \cup V_M} for some {M \leq M(\epsilon)} with the property that for all but at most {\epsilon M^2} of the pairs {1 \leq i \leq j \leq M}, the pair {V_i, V_j} is {\epsilon}-regular in the sense that

\displaystyle  | d( A, B ) - d( V_i, V_j ) | \leq \epsilon

whenever {A \subset V_i, B \subset V_j} are such that {|A| \geq \epsilon |V_i|} and {|B| \geq \epsilon |V_j|}, and {d(A,B) := |\{ (a,b) \in A \times B: \{a,b\} \in E \}|/|A| |B|} is the edge density between {A} and {B}. Furthermore, the partition is equitable in the sense that {||V_i| - |V_j|| \leq 1} for all {1 \leq i \leq j \leq M}.

There are many proofs of this lemma, which is actually not that difficult to establish; see for instance these previous blog posts for some examples. In this post I would like to record one further proof, based on the spectral decomposition of the adjacency matrix of {G}, which is essentially due to Frieze and Kannan. (Strictly speaking, Frieze and Kannan used a variant of this argument to establish a weaker form of the regularity lemma, but it is not difficult to modify the Frieze-Kannan argument to obtain the usual form of the regularity lemma instead. Some closely related spectral regularity lemmas were also developed by Szegedy.) I found recently (while speaking at the Abel conference in honour of this year’s laureate, Endre Szemerédi) that this particular argument is not as widely known among graph theory experts as I had thought, so I thought I would record it here.

For reasons of exposition, it is convenient to first establish a slightly weaker form of the lemma, in which one drops the hypothesis of equitability (but then has to weight the cells {V_i} by their magnitude when counting bad pairs):

Lemma 2 (Szemerédi regularity lemma, weakened variant) . Let {G = (V,E)} be a graph on {n} vertices, and let {\epsilon > 0}. Then there exists a partition {V = V_1 \cup \ldots \cup V_M} for some {M \leq M(\epsilon)} with the property that for all pairs {(i,j) \in \{1,\ldots,M\}^2} outside of an exceptional set {\Sigma}, one has

\displaystyle  | E(A,B) - d_{ij} |A| |B| | \ll \epsilon |V_i| |V_j| \ \ \ \ \ (1)

whenever {A \subset V_i, B \subset V_j}, for some real number {d_{ij}}, where {E(A,B) := |\{ (a,b) \in A \times B: \{a,b\} \in E \}|} is the number of edges between {A} and {B}. Furthermore, we have

\displaystyle  \sum_{(i,j) \in \Sigma} |V_i| |V_j| \ll \epsilon |V|^2. \ \ \ \ \ (2)

Let us now prove Lemma 2. We enumerate {V} (after relabeling) as {V = \{1,\ldots,n\}}. The adjacency matrix {T} of the graph {G} is then a self-adjoint {n \times n} matrix, and thus admits an eigenvalue decomposition

\displaystyle  T = \sum_{i=1}^n \lambda_i u_i^* u_i

for some orthonormal basis {u_1,\ldots,u_n} of {{\bf C}^n} and some eigenvalues {\lambda_1,\ldots,\lambda_n \in {\bf R}}, which we arrange in decreasing order of magnitude:

\displaystyle  |\lambda_1| \geq \ldots \geq |\lambda_n|.

We can compute the trace of {T^2} as

\displaystyle  \hbox{tr}(T^2) = \sum_{i=1}^n |\lambda_i|^2.

But we also have {\hbox{tr}(T^2) = 2|E| \leq n^2}, so

\displaystyle  \sum_{i=1}^n |\lambda_i|^2 \leq n^2. \ \ \ \ \ (3)

Among other things, this implies that

\displaystyle  |\lambda_i| \leq \frac{n}{\sqrt{i}} \ \ \ \ \ (4)

for all {i \geq 1}.

Let {F: {\bf N} \rightarrow {\bf N}} be a function (depending on {\epsilon}) to be chosen later, with {F(i) \geq i} for all {i}. Applying (3) and the pigeonhole principle (or the finite convergence principle, see this blog post), we can find {J \leq C(F,\epsilon)} such that

\displaystyle  \sum_{J \leq i < F(J)} |\lambda_i|^2 \leq \epsilon^3 n^2.

(Indeed, the bound on {J} is basically {F} iterated {1/\epsilon^3} times.) We can now split

\displaystyle  T = T_1 + T_2 + T_3, \ \ \ \ \ (5)

where {T_1} is the “structured” component

\displaystyle  T_1 := \sum_{i < J} \lambda_i u_i^* u_i, \ \ \ \ \ (6)

{T_2} is the “small” component

\displaystyle  T_2 := \sum_{J \leq i < F(J)} \lambda_i u_i^* u_i, \ \ \ \ \ (7)

and {T_3} is the “pseudorandom” component

\displaystyle  T_3 := \sum_{i > F(J)} \lambda_i u_i^* u_i. \ \ \ \ \ (8)

We now design a vertex partition to make {T_1} approximately constant on most cells. For each {i < J}, we partition {V} into {O_{J,\epsilon}(1)} cells on which {u_i} (viewed as a function from {V} to {{\bf C}}) only fluctuates by {O(\epsilon n^{-1/2} /J)}, plus an exceptional cell of size {O( \frac{\epsilon}{J} |V|)} coming from the values where {|u_i|} is excessively large (larger than {\sqrt{\frac{J}{\epsilon}} n^{-1/2}}). Combining all these partitions together, we can write {V = V_1 \cup \ldots \cup V_{M-1} \cup V_M} for some {M = O_{J,\epsilon}(1)}, where {V_M} has cardinality at most {\epsilon |V|}, and for all {1 \leq i \leq M-1}, the eigenfunctions {u_1,\ldots,u_{J-1}} all fluctuate by at most {O(\epsilon/J)}. In particular, if {1 \leq i,j \leq M-1}, then (by (4) and (6)) the entries of {T_1} fluctuate by at most {O(\epsilon)} on each block {V_i \times V_j}. If we let {d_{ij}} be the mean value of these entries on {V_i \times V_j}, we thus have

\displaystyle  1_B^* T_1 1_A = d_{ij} |A| |B| + O( \epsilon |V_i| |V_j| ) \ \ \ \ \ (9)

for any {1 \leq i,j \leq M-1} and {A \subset V_i, B \subset V_j}, where we view the indicator functions {1_A, 1_B} as column vectors of dimension {n}.

Next, we observe from (3) and (7) that {\hbox{tr} T_2^2 \leq \epsilon^3 n^2}. If we let {x_{ab}} be the coefficients of {T_2}, we thus have

\displaystyle  \sum_{a,b \in V} |x_{ab}|^2 \leq \epsilon^3 n^2

and hence by Markov’s inequality we have

\displaystyle  \sum_{a \in V_i} \sum_{b \in V_j} |x_{ab}|^2 \leq \epsilon^2 |V_i| |V_j| \ \ \ \ \ (10)

for all pairs {(i,j) \in \{1,\ldots,M-1\}^2} outside of an exceptional set {\Sigma_1} with

\displaystyle  \sum_{(i,j) \in \Sigma_1} |V_i| |V_j| \leq \epsilon |V|^2.

If {(i,j) \in \{1,\ldots,M-1\}^2} avoids {\Sigma_1}, we thus have

\displaystyle  1_B^* T_2 1_A = O( \epsilon |V_i| |V_j| ) \ \ \ \ \ (11)

for any {A \subset V_i, B \subset V_j}, by (10) and the Cauchy-Schwarz inequality.

Finally, to control {T_3} we see from (4) and (8) that {T_3} has an operator norm of at most {n/\sqrt{F(J)}}. In particular, we have from the Cauchy-Schwarz inequality that

\displaystyle  1_B^* T_3 1_A = O( n^2 / \sqrt{F(J)} ) \ \ \ \ \ (12)

for any {A, B \subset V}.

Let {\Sigma} be the set of all pairs {(i,j) \in \{1,\ldots,M\}^2} where either {(i,j) \in \Sigma_1}, {i = M}, {j=M}, or

\displaystyle  \min(|V_i|, |V_j|) \leq \frac{\epsilon}{M} n.

One easily verifies that (2) holds. If {(i,j) \in \{1,\ldots,M\}^2} is not in {\Sigma}, then by summing (9), (11), (12) and using (5), we see that

\displaystyle  1_B^* T 1_A = d_{ij} |A| |B| + O( \epsilon |V_i| |V_j| ) + O( n^2 / \sqrt{F(J)} ) \ \ \ \ \ (13)

for all {A \subset V_i, B \subset V_j}. The left-hand side is just {E(A,B)}. As {(i,j) \not \in \Sigma}, we have

\displaystyle  |V_i|, |V_j| > \frac{\epsilon}{M} n

and so (since {M = O_{J,\epsilon}(1)})

\displaystyle  n^2 / \sqrt{F(J)} \ll_{J,\epsilon} |V_i| |V_j| / \sqrt{F(J)}.

If we let {F} be a sufficiently rapidly growing function of {J} that depends on {\epsilon}, the second error term in (13) can be absorbed in the first, and (1) follows. This concludes the proof of Lemma 2.

To prove Lemma 1, one argues similarly (after modifying {\epsilon} as necessary), except that the initial partition {V_1,\ldots,V_M} of {V} constructed above needs to be subdivided further into equitable components (of size {\epsilon |V|/M+O(1)}), plus some remainder sets which can be aggregated into an exceptional component of size {O( \epsilon |V| )} (and which can then be redistributed amongst the other components to arrive at a truly equitable partition). We omit the details.

Remark 1 It is easy to verify that {F} needs to be growing exponentially in {J} in order for the above argument to work, which leads to tower-exponential bounds in the number of cells {M} in the partition. It was shown by Gowers that a tower-exponential bound is actually necessary here. By varying {F}, one basically obtains the strong regularity lemma first established by Alon, Fischer, Krivelevich, and Szegedy; in the opposite direction, setting {F(J) := J} essentially gives the weak regularity lemma of Frieze and Kannan.

Remark 2 If we specialise to a Cayley graph, in which {V = (V,+)} is a finite abelian group and {E = \{ (a,b): a-b \in A \}} for some (symmetric) subset {A} of {V}, then the eigenvectors are characters, and one essentially recovers the arithmetic regularity lemma of Green, in which the vertex partition classes {V_i} are given by Bohr sets (and one can then place additional regularity properties on these Bohr sets with some additional arguments). The components {T_1, T_2, T_3} of {T}, representing high, medium, and low eigenvalues of {T}, then become a decomposition associated to high, medium, and low Fourier coefficients of {A}.

Remark 3 The use of spectral theory here is parallel to the use of Fourier analysis to establish results such as Roth’s theorem on arithmetic progressions of length three. In analogy with this, one could view hypergraph regularity as being a sort of “higher order spectral theory”, although this spectral perspective is not as convenient as it is in the graph case.

A few days ago, Endre Szemerédi was awarded the 2012 Abel prize “for his fundamental contributions to discrete mathematics and theoretical computer science, and in recognition of the profound and lasting impact of these contributions on additive number theory and ergodic theory.” The full citation for the prize may be found here, and the written notes for a talk given by Tim Gowers on Endre’s work at the announcement may be found here (and video of the talk can be found here).

As I was on the Abel prize committee this year, I won’t comment further on the prize, but will instead focus on what is arguably Endre’s most well known result, namely Szemerédi’s theorem on arithmetic progressions:

Theorem 1 (Szemerédi’s theorem) Let {A} be a set of integers of positive upper density, thus {\lim \sup_{N \rightarrow\infty} \frac{|A \cap [-N,N]|}{|[-N,N]|} > 0}, where {[-N,N] := \{-N, -N+1,\ldots,N\}}. Then {A} contains an arithmetic progression of length {k} for any {k>1}.

Szemerédi’s original proof of this theorem is a remarkably intricate piece of combinatorial reasoning. Most proofs of theorems in mathematics – even long and difficult ones – generally come with a reasonably compact “high-level” overview, in which the proof is (conceptually, at least) broken down into simpler pieces. There may well be technical difficulties in formulating and then proving each of the component pieces, and then in fitting the pieces together, but usually the “big picture” is reasonably clear. To give just one example, the overall strategy of Perelman’s proof of the Poincaré conjecture can be briefly summarised as follows: to show that a simply connected three-dimensional manifold is homeomorphic to a sphere, place a Riemannian metric on it and perform Ricci flow, excising any singularities that arise by surgery, until the entire manifold becomes extinct. By reversing the flow and analysing the surgeries performed, obtain enough control on the topology of the original manifold to establish that it is a topological sphere.

In contrast, the pieces of Szemerédi’s proof are highly interlocking, particularly with regard to all the epsilon-type parameters involved; it takes quite a bit of notational setup and foundational lemmas before the key steps of the proof can even be stated, let alone proved. Szemerédi’s original paper contains a logical diagram of the proof (reproduced in Gowers’ recent talk) which already gives a fair indication of this interlocking structure. (Many years ago I tried to present the proof, but I was unable to find much of a simplification, and my exposition is probably not that much clearer than the original text.) Even the use of nonstandard analysis, which is often helpful in cleaning up armies of epsilons, turns out to be a bit tricky to apply here. (In typical applications of nonstandard analysis, one can get by with a single nonstandard universe, constructed as an ultrapower of the standard universe; but to correctly model all the epsilons occuring in Szemerédi’s argument, one needs to repeatedly perform the ultrapower construction to obtain a (finite) sequence of increasingly nonstandard (and increasingly saturated) universes, each one containing unbounded quantities that are far larger than any quantity that appears in the preceding universe, as discussed at the end of this previous blog post. This sequence of universes does end up concealing all the epsilons, but it is not so clear that this is a net gain in clarity for the proof; I may return to the nonstandard presentation of Szemeredi’s argument at some future juncture.)

Instead of trying to describe the entire argument here, I thought I would instead show some key components of it, with only the slightest hint as to how to assemble the components together to form the whole proof. In particular, I would like to show how two particular ingredients in the proof – namely van der Waerden’s theorem and the Szemerédi regularity lemma – become useful. For reasons that will hopefully become clearer later, it is convenient not only to work with ordinary progressions {P_1 = \{ a, a+r_1, a+2r_1, \ldots, a+(k_1-1)r_1\}}, but also progressions of progressions {P_2 := \{ P_1, P_1 + r_2, P_1+2r_2, \ldots, P_1+(k_2-1)r_2\}}, progressions of progressions of progressions, and so forth. (In additive combinatorics, these objects are known as generalised arithmetic progressions of rank one, two, three, etc., and play a central role in the subject, although the way they are used in Szemerédi’s proof is somewhat different from the way that they are normally used in additive combinatorics.) Very roughly speaking, Szemerédi’s proof begins by building an enormous generalised arithmetic progression of high rank containing many elements of the set {A} (arranged in a “near-maximal-density” configuration), and then steadily prunes this progression to improve the combinatorial properties of the configuration, until one ends up with a single rank one progression of length {k} that consists entirely of elements of {A}.

To illustrate some of the basic ideas, let us first consider a situation in which we have located a progression {P, P + r, \ldots, P+(k-1)r} of progressions of length {k}, with each progression {P+ir}, {i=0,\ldots,k-1} being quite long, and containing a near-maximal amount of elements of {A}, thus

\displaystyle  |A \cap (P+ir)| \approx \delta |P|

where {\delta := \lim \sup_{|P| \rightarrow \infty} \frac{|A \cap P|}{|P|}} is the “maximal density” of {A} along arithmetic progressions. (There are a lot of subtleties in the argument about exactly how good the error terms are in various approximations, but we will ignore these issues for the sake of this discussion and just use the imprecise symbols such as {\approx} instead.) By hypothesis, {\delta} is positive. The objective is then to locate a progression {a, a+r', \ldots,a+(k-1)r'} in {A}, with each {a+ir} in {P+ir} for {i=0,\ldots,k-1}. It may help to view the progression of progressions {P, P + r, \ldots, P+(k-1)r} as a tall thin rectangle {P \times \{0,\ldots,k-1\}}.

If we write {A_i := \{ a \in P: a+ir \in A \}} for {i=0,\ldots,k-1}, then the problem is equivalent to finding a (possibly degenerate) arithmetic progression {a_0,a_1,\ldots,a_{k-1}}, with each {a_i} in {A_i}.

By hypothesis, we know already that each set {A_i} has density about {\delta} in {P}:

\displaystyle  |A_i \cap P| \approx \delta |P|. \ \ \ \ \ (1)

Let us now make a “weakly mixing” assumption on the {A_i}, which roughly speaking asserts that

\displaystyle  |A_i \cap E| \approx \delta \sigma |P| \ \ \ \ \ (2)

for “most” subsets {E} of {P} of density {\approx \sigma} of a certain form to be specified shortly. This is a plausible type of assumption if one believes {A_i} to behave like a random set, and if the sets {E} are constructed “independently” of the {A_i} in some sense. Of course, we do not expect such an assumption to be valid all of the time, but we will postpone consideration of this point until later. Let us now see how this sort of weakly mixing hypothesis could help one count progressions {a_0,\ldots,a_{k-1}} of the desired form.

We will inductively consider the following (nonrigorously defined) sequence of claims {C(i,j)} for each {0 \leq i \leq j < k}:

  • {C(i,j)}: For most choices of {a_j \in P}, there are {\sim \delta^i |P|} arithmetic progressions {a_0,\ldots,a_{k-1}} in {P} with the specified choice of {a_j}, such that {a_l \in A_l} for all {l=0,\ldots,i-1}.

(Actually, to avoid boundary issues one should restrict {a_j} to lie in the middle third of {P}, rather than near the edges, but let us ignore this minor technical detail.) The quantity {\delta^i |P|} is natural here, given that there are {\sim |P|} arithmetic progressions {a_0,\ldots,a_{k-1}} in {P} that pass through {a_i} in the {i^{th}} position, and that each one ought to have a probability of {\delta^i} or so that the events {a_0 \in A_0, \ldots, a_{i-1} \in A_{i-1}} simultaneously hold.) If one has the claim {C(k-1,k-1)}, then by selecting a typical {a_{k-1}} in {A_{k-1}}, we obtain a progression {a_0,\ldots,a_{k-1}} with {a_i \in A_i} for all {i=0,\ldots,k-1}, as required. (In fact, we obtain about {\delta^k |P|^2} such progressions by this method.)

We can heuristically justify the claims {C(i,j)} by induction on {i}. For {i=0}, the claims {C(0,j)} are clear just from direct counting of progressions (as long as we keep {a_j} away from the edges of {P}). Now suppose that {i>0}, and the claims {C(i-1,j)} have already been proven. For any {i \leq j < k} and for most {a_j \in P}, we have from hypothesis that there are {\sim \delta^{i-1} |P|} progressions {a_0,\ldots,a_{k-1}} in {P} through {a_j} with {a_0 \in A_0,\ldots,a_{i-2}\in A_{i-2}}. Let {E = E(a_j)} be the set of all the values of {a_{i-1}} attained by these progressions, then {|E| \sim \delta^{i-1} |P|}. Invoking the weak mixing hypothesis, we (heuristically, at least) conclude that for most choices of {a_j}, we have

\displaystyle  |A_{i-1} \cap E| \sim \delta^i |P|

which then gives the desired claim {C(i,j)}.

The observant reader will note that we only needed the claim {C(i,j)} in the case {j=k-1} for the above argument, but for technical reasons, the full proof requires one to work with more general values of {j} (also the claim {C(i,j)} needs to be replaced by a more complicated version of itself, but let’s ignore this for sake of discussion).

We now return to the question of how to justify the weak mixing hypothesis (2). For a single block {A_i} of {A}, one can easily concoct a scenario in which this hypothesis fails, by choosing {E} to overlap with {A_i} too strongly, or to be too disjoint from {A_i}. However, one can do better if one can select {A_i} from a long progression of blocks. The starting point is the following simple double counting observation that gives the right upper bound:

Proposition 2 (Single upper bound) Let {P, P+r, \ldots, P+(M-1)r} be a progression of progressions {P} for some large {M}. Suppose that for each {i=0,\ldots,M-1}, the set {A_i := \{ a \in P: a+ir \in A \}} has density {\approx \delta} in {P} (i.e. (1) holds). Let {E} be a subset of {P} of density {\approx \sigma}. Then (if {M} is large enough) one can find an {i = 0,\ldots,M-1} such that

\displaystyle  |A_i \cap E| \lessapprox \delta \sigma |P|.

Proof: The key is the double counting identity

\displaystyle  \sum_{i=0}^{M-1} |A_i \cap E| = \sum_{a \in E} |A \cap \{ a, a+r, \ldots, a+(M-1) r\}|.

Because {A} has maximal density {\delta} and {M} is large, we have

\displaystyle  |A \cap \{ a, a+r, \ldots, a+(M-1) r\}| \lessapprox \delta M

for each {a}, and thus

\displaystyle  \sum_{i=0}^{M-1} |A_i \cap E| \lessapprox \delta M |E|.

The claim then follows from the pigeonhole principle. \Box

Now suppose we want to obtain weak mixing not just for a single set {E}, but for a small number {E_1,\ldots,E_m} of such sets, i.e. we wish to find an {i} for which

\displaystyle  |A_i \cap E_j| \lessapprox \delta \sigma_j |P|. \ \ \ \ \ (3)

for all {j=1,\ldots,m}, where {\sigma_j} is the density of {E_j} in {P}. The above proposition gives, for each {j}, a choice of {i} for which (3) holds, but it could be a different {i} for each {j}, and so it is not immediately obvious how to use Proposition 2 to find an {i} for which (3) holds simultaneously for all {j}. However, it turns out that the van der Waerden theorem is the perfect tool for this amplification:

Proposition 3 (Multiple upper bound) Let {P, P+r, \ldots, P+(M-1)r} be a progression of progressions {P+ir} for some large {M}. Suppose that for each {i=0,\ldots,M-1}, the set {A_i := \{ a \in P: a+ir \in A \}} has density {\approx \delta} in {P} (i.e. (1) holds). For each {1 \leq j \leq m}, let {E_j} be a subset of {P} of density {\approx \sigma_j}. Then (if {M} is large enough depending on {j}) one can find an {i = 0,\ldots,M-1} such that

\displaystyle  |A_i \cap E_j| \lessapprox \delta \sigma_j |P|

simultaneously for all {1 \leq j \leq m}.

Proof: Suppose that the claim failed (for some suitably large {M}). Then, for each {i = 0,\ldots,M-1}, there exists {j \in \{1,\ldots,m\}} such that

\displaystyle  |A_i \cap E_j| \gg \delta \sigma_j |P|.

This can be viewed as a colouring of the interval {\{1,\ldots,M\}} by {m} colours. If we take {M} large compared to {m}, van der Waerden’s theorem allows us to then find a long subprogression of {\{1,\ldots,M\}} which is monochromatic, so that {j} is constant on this progression. But then this will furnish a counterexample to Proposition 2. \Box

One nice thing about this proposition is that the upper bounds can be automatically upgraded to an asymptotic:

Proposition 4 (Multiple mixing) Let {P, P+r, \ldots, P+(M-1)r} be a progression of progressions {P+ir} for some large {M}. Suppose that for each {i=0,\ldots,M-1}, the set {A_i := \{ a \in P: a+ir \in A \}} has density {\approx \delta} in {P} (i.e. (1) holds). For each {1 \leq j \leq m}, let {E_j} be a subset of {P} of density {\approx \sigma_j}. Then (if {M} is large enough depending on {m}) one can find an {i = 0,\ldots,M-1} such that

\displaystyle  |A_i \cap E_j| \approx \delta \sigma_j |P|

simultaneously for all {1 \leq j \leq m}.

Proof: By applying the previous proposition to the collection of sets {E_1,\ldots,E_m} and their complements {P\backslash E_1,\ldots,P \backslash E_m} (thus replacing {m} with {2m}, one can find an {i} for which

\displaystyle  |A_i \cap E_j| \lessapprox \delta \sigma_j |P|

and

\displaystyle  |A_i \cap (P \backslash E_j)| \lessapprox \delta (1-\sigma_j) |P|

which gives the claim. \Box

However, this improvement of Proposition 2 turns out to not be strong enough for applications. The reason is that the number {m} of sets {E_1,\ldots,E_m} for which mixing is established is too small compared with the length {M} of the progression one has to use in order to obtain that mixing. However, thanks to the magic of the Szemerédi regularity lemma, one can amplify the above proposition even further, to allow for a huge number of {E_i} to be mixed (at the cost of excluding a small fraction of exceptions):

Proposition 5 (Really multiple mixing) Let {P, P+r, \ldots, P+(M-1)r} be a progression of progressions {P+ir} for some large {M}. Suppose that for each {i=0,\ldots,M-1}, the set {A_i := \{ a \in P: a+ir \in A \}} has density {\approx \delta} in {P} (i.e. (1) holds). For each {v} in some (large) finite set {V}, let {E_v} be a subset of {P} of density {\approx \sigma_v}. Then (if {M} is large enough, but not dependent on the size of {V}) one can find an {i = 0,\ldots,M-1} such that

\displaystyle  |A_i \cap E_v| \approx \delta \sigma_v |P|

simultaneously for almost all {v \in V}.

Proof: We build a bipartite graph {G = (P, V, E)} connecting the progression {P} to the finite set {V} by placing an edge {(a,v)} between an element {a \in P} and an element {v \in V} whenever {a \in E_v}. The number {|E_v| \approx \sigma_v |P|} can then be interpreted as the degree of {v} in this graph, while the number {|A_i \cap E_v|} is the number of neighbours of {v} that land in {A_i}.

We now apply the regularity lemma to this graph {G}. Roughly speaking, what this lemma does is to partition {P} and {V} into almost equally sized cells {P = P_1 \cup \ldots P_m} and {V = V_1 \cup \ldots V_m} such that for most pairs {P_j, V_k} of cells, the graph {G} resembles a random bipartite graph of some density {d_{jk}} between these two cells. The key point is that the number {m} of cells here is bounded uniformly in the size of {P} and {V}. As a consequence of this lemma, one can show that for most vertices {v} in a typical cell {V_k}, the number {|E_v|} is approximately equal to

\displaystyle  |E_v| \approx \sum_{j=1}^m d_{ij} |P_j|

and the number {|A_i \cap E_v|} is approximately equal to

\displaystyle  |A_i \cap E_v| \approx \sum_{j=1}^m d_{ij} |A_i \cap P_j|.

The point here is that the {|V|} different statistics {|A_i \cap E_v|} are now controlled by a mere {m} statistics {|A_i \cap P_j|} (this is not unlike the use of principal component analysis in statistics, incidentally, but that is another story). Now, we invoke Proposition 4 to find an {i} for which

\displaystyle  |A_i \cap P_j| \approx \delta |P_j|

simultaneously for all {j=1,\ldots,m}, and the claim follows. \Box

This proposition now suggests a way forward to establish the type of mixing properties (2) needed for the preceding attempt at proving Szemerédi’s theorem to actually work. Whereas in that attempt, we were working with a single progression of progressions {P, P+r, \ldots, P+(k-1)r} of progressions containing a near-maximal density of elements of {A}, we will now have to work with a family {(P_\lambda, P_\lambda+r_\lambda,\ldots,P_\lambda+(k-1)r_\lambda)_{\lambda \in \Lambda}} of such progression of progressions, where {\Lambda} ranges over some suitably large parameter set. Furthermore, in order to invoke Proposition 5, this family must be “well-arranged” in some arithmetic sense; in particular, for a given {i}, it should be possible to find many reasonably large subfamilies of this family for which the {i^{th}} terms {P_\lambda + i r_\lambda} of the progression of progressions in this subfamily are themselves in arithmetic progression. (Also, for technical reasons having to do with the fact that the sets {E_v} in Proposition 5 are not allowed to depend on {i}, one also needs the progressions {P_\lambda + i' r_\lambda} for any given {0 \leq i' < i} to be “similar” in the sense that they intersect {A} in the same fashion (thus the sets {A \cap (P_\lambda + i' r_\lambda)} as {\lambda} varies need to be translates of each other).) If one has this sort of family, then Proposition 5 allows us to “spend” some of the degrees of freedom of the parameter set {\Lambda} in order to gain good mixing properties for at least one of the sets {P_\lambda +i r_\lambda} in the progression of progressions.

Of course, we still have to figure out how to get such large families of well-arranged progressions of progressions. Szemerédi’s solution was to begin by working with generalised progressions of a much larger rank {d} than the rank {2} progressions considered here; roughly speaking, to prove Szemerédi’s theorem for length {k} progressions, one has to consider generalised progressions of rank as high as {2^k+1}. It is possible by a reasonably straightforward (though somewhat delicate) “density increment argument” to locate a huge generalised progression of this rank which is “saturated” by {A} in a certain rather technical sense (related to the concept of “near maximal density” used previously). Then, by another reasonably elementary argument, it is possible to locate inside a suitable large generalised progression of some rank {d}, a family of large generalised progressions of rank {d-1} which inherit many of the good properties of the original generalised progression, and which have the arithmetic structure needed for Proposition 5 to be applicable, at least for one value of {i}. (But getting this sort of property for all values of {i} simultaneously is tricky, and requires many careful iterations of the above scheme; there is also the problem that by obtaining good behaviour for one index {i}, one may lose good behaviour at previous indices, leading to a sort of “Tower of Hanoi” situation which may help explain the exponential factor in the rank {2^k+1} that is ultimately needed. It is an extremely delicate argument; all the parameters and definitions have to be set very precisely in order for the argument to work at all, and it is really quite remarkable that Endre was able to see it through to the end.)

Many structures in mathematics are incomplete in one or more ways. For instance, the field of rationals {{\bf Q}} or the reals {{\bf R}} are algebraically incomplete, because there are some non-trivial algebraic equations (such as {x^2=2} in the case of the rationals, or {x^2=-1} in the case of the reals) which could potentially have solutions (because they do not imply a necessarily false statement, such as {1=0}, just using the laws of algebra), but do not actually have solutions in the specified field.

Similarly, the rationals {{\bf Q}}, when viewed now as a metric space rather than as a field, are also metrically incomplete, beause there exist sequences in the rationals (e.g. the decimal approximations {3, 3.1, 3.14, 3.141, \ldots} of the irrational number {\pi}) which could potentially converge to a limit (because they form a Cauchy sequence), but do not actually converge in the specified metric space.

A third type of incompleteness is that of logical incompleteness, which applies now to formal theories rather than to fields or metric spaces. For instance, Zermelo-Frankel-Choice (ZFC) set theory is logically incomplete, because there exist statements (such as the consistency of ZFC) which could potentially be provable by the theory (because it does not lead to a contradiction, or at least so we believe, just from the axioms and deductive rules of the theory), but is not actually provable in this theory.

A fourth type of incompleteness, which is slightly less well known than the above three, is what I will call elementary incompleteness (and which model theorists call the failure of the countable saturation property). It applies to any structure that is describable by a first-order language, such as a field, a metric space, or a universe of sets. For instance, in the language of ordered real fields, the real line {{\bf R}} is elementarily incomplete, because there exists a sequence of statements (such as the statements {0 < x < 1/n} for natural numbers {n=1,2,\ldots}) in this language which are potentially simultaneously satisfiable (in the sense that any finite number of these statements can be satisfied by some real number {x}) but are not actually simultaneously satisfiable in this theory.

In each of these cases, though, it is possible to start with an incomplete structure and complete it to a much larger structure to eliminate the incompleteness. For instance, starting with an arbitrary field {k}, one can take its algebraic completion (or algebraic closure) {\overline{k}}; for instance, {{\bf C} = \overline{{\bf R}}} can be viewed as the algebraic completion of {{\bf R}}. This field is usually significantly larger than the original field {k}, but contains {k} as a subfield, and every element of {\overline{k}} can be described as the solution to some polynomial equation with coefficients in {k}. Furthermore, {\overline{k}} is now algebraically complete (or algebraically closed): every polynomial equation in {\overline{k}} which is potentially satisfiable (in the sense that it does not lead to a contradiction such as {1=0} from the laws of algebra), is actually satisfiable in {\overline{k}}.

Similarly, starting with an arbitrary metric space {X}, one can take its metric completion {\overline{X}}; for instance, {{\bf R} = \overline{{\bf Q}}} can be viewed as the metric completion of {{\bf Q}}. Again, the completion {\overline{X}} is usually much larger than the original metric space {X}, but contains {X} as a subspace, and every element of {\overline{X}} can be described as the limit of some Cauchy sequence in {X}. Furthermore, {\overline{X}} is now a complete metric space: every sequence in {\overline{X}} which is potentially convergent (in the sense of being a Cauchy sequence), is now actually convegent in {\overline{X}}.

In a similar vein, we have the Gödel completeness theorem, which implies (among other things) that for any consistent first-order theory {T} for a first-order language {L}, there exists at least one completion {\overline{T}} of that theory {T}, which is a consistent theory in which every sentence in {L} which is potentially true in {\overline{T}} (because it does not lead to a contradiction in {\overline{T}}) is actually true in {\overline{T}}. Indeed, the completeness theorem provides at least one model (or structure) {{\mathfrak U}} of the consistent theory {T}, and then the completion {\overline{T} = \hbox{Th}({\mathfrak U})} can be formed by interpreting every sentence in {L} using {{\mathfrak U}} to determine its truth value. Note, in contrast to the previous two examples, that the completion is usually not unique in any way; a theory {T} can have multiple inequivalent models {{\mathfrak U}}, giving rise to distinct completions of the same theory.

Finally, if one starts with an arbitrary structure {{\mathfrak U}}, one can form an elementary completion {{}^* {\mathfrak U}} of it, which is a significantly larger structure which contains {{\mathfrak U}} as a substructure, and such that every element of {{}^* {\mathfrak U}} is an elementary limit of a sequence of elements in {{\mathfrak U}} (I will define this term shortly). Furthermore, {{}^* {\mathfrak U}} is elementarily complete; any sequence of statements that are potentially simultaneously satisfiable in {{}^* {\mathfrak U}} (in the sense that any finite number of statements in this collection are simultaneously satisfiable), will actually be simultaneously satisfiable. As we shall see, one can form such an elementary completion by taking an ultrapower of the original structure {{\mathfrak U}}. If {{\mathfrak U}} is the standard universe of all the standard objects one considers in mathematics, then its elementary completion {{}^* {\mathfrak U}} is known as the nonstandard universe, and is the setting for nonstandard analysis.

As mentioned earlier, completion tends to make a space much larger and more complicated. If one algebraically completes a finite field, for instance, one necessarily obtains an infinite field as a consequence. If one metrically completes a countable metric space with no isolated points, such as {{\bf Q}}, then one necessarily obtains an uncountable metric space (thanks to the Baire category theorem). If one takes a logical completion of a consistent first-order theory that can model true arithmetic, then this completion is no longer describable by a recursively enumerable schema of axioms, thanks to Gödel’s incompleteness theorem. And if one takes the elementary completion of a countable structure, such as the integers {{\bf Z}}, then the resulting completion {{}^* {\bf Z}} will necessarily be uncountable.

However, there are substantial benefits to working in the completed structure which can make it well worth the massive increase in size. For instance, by working in the algebraic completion of a field, one gains access to the full power of algebraic geometry. By working in the metric completion of a metric space, one gains access to powerful tools of real analysis, such as the Baire category theorem, the Heine-Borel theorem, and (in the case of Euclidean completions) the Bolzano-Weierstrass theorem. By working in a logically and elementarily completed theory (aka a saturated model) of a first-order theory, one gains access to the branch of model theory known as definability theory, which allows one to analyse the structure of definable sets in much the same way that algebraic geometry allows one to analyse the structure of algebraic sets. Finally, when working in an elementary completion of a structure, one gains a sequential compactness property, analogous to the Bolzano-Weierstrass theorem, which can be interpreted as the foundation for much of nonstandard analysis, as well as providing a unifying framework to describe various correspondence principles between finitary and infinitary mathematics.

In this post, I wish to expand upon these above points with regard to elementary completion, and to present nonstandard analysis as a completion of standard analysis in much the same way as, say, complex algebra is a completion of real algebra, or real metric geometry is a completion of rational metric geometry.

Read the rest of this entry »

In a previous post, we discussed the Szemerédi regularity lemma, and how a given graph could be regularised by partitioning the vertex set into random neighbourhoods. More precisely, we gave a proof of

Lemma 1 (Regularity lemma via random neighbourhoods) Let {\varepsilon > 0}. Then there exists integers {M_1,\ldots,M_m} with the following property: whenever {G = (V,E)} be a graph on finitely many vertices, if one selects one of the integers {M_r} at random from {M_1,\ldots,M_m}, then selects {M_r} vertices {v_1,\ldots,v_{M_r} \in V} uniformly from {V} at random, then the {2^{M_r}} vertex cells {V^{M_r}_1,\ldots,V^{M_r}_{2^{M_r}}} (some of which can be empty) generated by the vertex neighbourhoods {A_t := \{ v \in V: (v,v_t) \in E \}} for {1 \leq t \leq M_r}, will obey the regularity property

\displaystyle  \sum_{(V_i,V_j) \hbox{ not } \varepsilon-\hbox{regular}} |V_i| |V_j| \leq \varepsilon |V|^2 \ \ \ \ \ (1)

with probability at least {1-O(\varepsilon)}, where the sum is over all pairs {1 \leq i \leq j \leq k} for which {G} is not {\varepsilon}-regular between {V_i} and {V_j}. [Recall that a pair {(V_i,V_j)} is {\varepsilon}-regular for {G} if one has

\displaystyle  |d( A, B ) - d( V_i, V_j )| \leq \varepsilon

for any {A \subset V_i} and {B \subset V_j} with {|A| \geq \varepsilon |V_i|, |B| \geq \varepsilon |V_j|}, where {d(A,B) := |E \cap (A \times B)|/|A| |B|} is the density of edges between {A} and {B}.]

The proof was a combinatorial one, based on the standard energy increment argument.

In this post I would like to discuss an alternate approach to the regularity lemma, which is an infinitary approach passing through a graph-theoretic version of the Furstenberg correspondence principle (mentioned briefly in this earlier post of mine). While this approach superficially looks quite different from the combinatorial approach, it in fact uses many of the same ingredients, most notably a reliance on random neighbourhoods to regularise the graph. This approach was introduced by myself back in 2006, and used by Austin and by Austin and myself to establish some property testing results for hypergraphs; more recently, a closely related infinitary hypergraph removal lemma developed in the 2006 paper was also used by Austin to give new proofs of the multidimensional Szemeredi theorem and of the density Hales-Jewett theorem (the latter being a spinoff of the polymath1 project).

For various technical reasons we will not be able to use the correspondence principle to recover Lemma 1 in its full strength; instead, we will establish the following slightly weaker variant.

Lemma 2 (Regularity lemma via random neighbourhoods, weak version) Let {\varepsilon > 0}. Then there exist an integer {M_*} with the following property: whenever {G = (V,E)} be a graph on finitely many vertices, there exists {1 \leq M \leq M_*} such that if one selects {M} vertices {v_1,\ldots,v_{M} \in V} uniformly from {V} at random, then the {2^{M}} vertex cells {V^{M}_1,\ldots,V^{M}_{2^{M}}} generated by the vertex neighbourhoods {A_t := \{ v \in V: (v,v_t) \in E \}} for {1 \leq t \leq M}, will obey the regularity property (1) with probability at least {1-\varepsilon}.

Roughly speaking, Lemma 1 asserts that one can regularise a large graph {G} with high probability by using {M_r} random neighbourhoods, where {M_r} is chosen at random from one of a number of choices {M_1,\ldots,M_m}; in contrast, the weaker Lemma 2 asserts that one can regularise a large graph {G} with high probability by using some integer {M} from {1,\ldots,M_*}, but the exact choice of {M} depends on {G}, and it is not guaranteed that a randomly chosen {M} will be likely to work. While Lemma 2 is strictly weaker than Lemma 1, it still implies the (weighted) Szemerédi regularity lemma (Lemma 2 from the previous post).

Read the rest of this entry »

In the theory of dense graphs on {n} vertices, where {n} is large, a fundamental role is played by the Szemerédi regularity lemma:

Lemma 1 (Regularity lemma, standard version) Let {G = (V,E)} be a graph on {n} vertices, and let {\epsilon > 0} and {k_0 \geq 0}. Then there exists a partition of the vertices {V = V_1 \cup \ldots \cup V_k}, with {k_0 \leq k \leq C(k_0,\epsilon)} bounded below by {k_0} and above by a quantity {C(k_0,\epsilon)} depending only on {k_0, \epsilon}, obeying the following properties:

  • (Equitable partition) For any {1 \leq i,j \leq k}, the cardinalities {|V_i|, |V_j|} of {V_i} and {V_j} differ by at most {1}.
  • (Regularity) For all but at most {\epsilon k^2} pairs {1 \leq i < j \leq k}, the portion of the graph {G} between {V_i} and {V_j} is {\epsilon}-regular in the sense that one has

    \displaystyle  |d( A, B ) - d( V_i, V_j )| \leq \epsilon

    for any {A \subset V_i} and {B \subset V_j} with {|A| \geq \epsilon |V_i|, |B| \geq \epsilon |V_j|}, where {d(A,B) := |E \cap (A \times B)|/|A| |B|} is the density of edges between {A} and {B}.

This lemma becomes useful in the regime when {n} is very large compared to {k_0} or {1/\epsilon}, because all the conclusions of the lemma are uniform in {n}. Very roughly speaking, it says that “up to errors of size {\epsilon}“, a large graph can be more or less described completely by a bounded number of quantities {d(V_i, V_j)}. This can be interpreted as saying that the space of all graphs is totally bounded (and hence precompact) in a suitable metric space, thus allowing one to take formal limits of sequences (or subsequences) of graphs; see for instance this paper of Lovasz and Szegedy for a discussion.

For various technical reasons it is easier to work with a slightly weaker version of the lemma, which allows for the cells {V_1,\ldots,V_k} to have unequal sizes:

Lemma 2 (Regularity lemma, weighted version) Let {G = (V,E)} be a graph on {n} vertices, and let {\epsilon > 0}. Then there exists a partition of the vertices {V = V_1 \cup \ldots \cup V_k}, with {1 \leq k \leq C(\epsilon)} bounded above by a quantity {C(\epsilon)} depending only on {\epsilon}, obeying the following properties:

  • (Regularity) One has

    \displaystyle  \sum_{(V_i,V_j) \hbox{ not } \epsilon-\hbox{regular}} |V_i| |V_j| = O(\epsilon |V|^2) \ \ \ \ \ (1)

    where the sum is over all pairs {1 \leq i \leq j \leq k} for which {G} is not {\epsilon}-regular between {V_i} and {V_j}.

While Lemma 2 is, strictly speaking, weaker than Lemma 1 in that it does not enforce the equitable size property between the atoms, in practice it seems that the two lemmas are roughly of equal utility; most of the combinatorial consequences of Lemma 1 can also be proven using Lemma 2. The point is that one always has to remember to weight each cell {V_i} by its density {|V_i|/|V|}, rather than by giving each cell an equal weight as in Lemma 1. Lemma 2 also has the advantage that one can easily generalise the result from finite vertex sets {V} to other probability spaces (for instance, one could weight {V} with something other than the uniform distribution). For applications to hypergraph regularity, it turns out to be slightly more convenient to have two partitions (coarse and fine) rather than just one; see for instance my own paper on this topic. In any event the arguments below that we give to prove Lemma 2 can be modified to give a proof of Lemma 1 also. The proof of the regularity lemma is usually conducted by a greedy algorithm. Very roughly speaking, one starts with the trivial partition of {V}. If this partition already regularises the graph, we are done; if not, this means that there are some sets {A} and {B} in which there is a significant density fluctuation beyond what has already been detected by the original partition. One then adds these sets to the partition and iterates the argument. Every time a new density fluctuation is incorporated into the partition that models the original graph, this increases a certain “index” or “energy” of the partition. On the other hand, this energy remains bounded no matter how complex the partition, so eventually one must reach a long “energy plateau” in which no further refinement is possible, at which point one can find the regular partition.

One disadvantage of the greedy algorithm is that it is not efficient in the limit {n \rightarrow \infty}, as it requires one to search over all pairs of subsets {A, B} of a given pair {V_i, V_j} of cells, which is an exponentially long search. There are more algorithmically efficient ways to regularise, for instance a polynomial time algorithm was given by Alon, Duke, Lefmann, Rödl, and Yuster. However, one can do even better, if one is willing to (a) allow cells of unequal size, (b) allow a small probability of failure, (c) have the ability to sample vertices from {G} at random, and (d) allow for the cells to be defined “implicitly” (via their relationships with a fixed set of reference vertices) rather than “explicitly” (as a list of vertices). In that case, one can regularise a graph in a number of operations bounded in {n}. Indeed, one has

Lemma 3 (Regularity lemma via random neighbourhoods) Let {\epsilon > 0}. Then there exists integers {M_1,\ldots,M_m} with the following property: whenever {G = (V,E)} be a graph on finitely many vertices, if one selects one of the integers {M_r} at random from {M_1,\ldots,M_m}, then selects {M_r} vertices {v_1,\ldots,v_{M_r} \in V} uniformly from {V} at random, then the {2^{M_r}} vertex cells {V^{M_r}_1,\ldots,V^{M_r}_{2^{M_r}}} (some of which can be empty) generated by the vertex neighbourhoods {A_t := \{ v \in V: (v,v_t) \in E \}} for {1 \leq t \leq M_r}, will obey the conclusions of Lemma 2 with probability at least {1-O(\epsilon)}.

Thus, roughly speaking, one can regularise a graph simply by taking a large number of random vertex neighbourhoods, and using the partition (or Venn diagram) generated by these neighbourhoods as the partition. The intuition is that if there is any non-uniformity in the graph (e.g. if the graph exhibits bipartite behaviour), this will bias the random neighbourhoods to seek out the partitions that would regularise that non-uniformity (e.g. vertex neighbourhoods would begin to fill out the two vertex cells associated to the bipartite property); if one takes sufficiently many such random neighbourhoods, the probability that all detectable non-uniformity is captured by the partition should converge to {1}. (It is more complicated than this, because the finer one makes the partition, the finer the types of non-uniformity one can begin to detect, but this is the basic idea.)

This fact seems to be reasonably well-known folklore, discovered independently by many authors; it is for instance quite close to the graph property testing results of Alon and Shapira, and also appears implicitly in a paper of Ishigami, as well as a paper of Austin (and perhaps even more implicitly in a paper of myself). However, in none of these papers is the above lemma stated explicitly. I was asked about this lemma recently, so I decided to provide a proof here.

Read the rest of this entry »

Archives