You are currently browsing the tag archive for the ‘regularity lemma’ tag.

Perhaps the most important structural result about general large dense graphs is the Szemerédi regularity lemma. Here is a standard formulation of that lemma:

Lemma 1 (Szemerédi regularity lemma) Let ${G = (V,E)}$ be a graph on ${n}$ vertices, and let ${\epsilon > 0}$. Then there exists a partition ${V = V_1 \cup \ldots \cup V_M}$ for some ${M \leq M(\epsilon)}$ with the property that for all but at most ${\epsilon M^2}$ of the pairs ${1 \leq i \leq j \leq M}$, the pair ${V_i, V_j}$ is ${\epsilon}$-regular in the sense that

$\displaystyle | d( A, B ) - d( V_i, V_j ) | \leq \epsilon$

whenever ${A \subset V_i, B \subset V_j}$ are such that ${|A| \geq \epsilon |V_i|}$ and ${|B| \geq \epsilon |V_j|}$, and ${d(A,B) := |\{ (a,b) \in A \times B: \{a,b\} \in E \}|/|A| |B|}$ is the edge density between ${A}$ and ${B}$. Furthermore, the partition is equitable in the sense that ${||V_i| - |V_j|| \leq 1}$ for all ${1 \leq i \leq j \leq M}$.

There are many proofs of this lemma, which is actually not that difficult to establish; see for instance these previous blog posts for some examples. In this post I would like to record one further proof, based on the spectral decomposition of the adjacency matrix of ${G}$, which is essentially due to Frieze and Kannan. (Strictly speaking, Frieze and Kannan used a variant of this argument to establish a weaker form of the regularity lemma, but it is not difficult to modify the Frieze-Kannan argument to obtain the usual form of the regularity lemma instead. Some closely related spectral regularity lemmas were also developed by Szegedy.) I found recently (while speaking at the Abel conference in honour of this year’s laureate, Endre Szemerédi) that this particular argument is not as widely known among graph theory experts as I had thought, so I thought I would record it here.

For reasons of exposition, it is convenient to first establish a slightly weaker form of the lemma, in which one drops the hypothesis of equitability (but then has to weight the cells ${V_i}$ by their magnitude when counting bad pairs):

Lemma 2 (Szemerédi regularity lemma, weakened variant) . Let ${G = (V,E)}$ be a graph on ${n}$ vertices, and let ${\epsilon > 0}$. Then there exists a partition ${V = V_1 \cup \ldots \cup V_M}$ for some ${M \leq M(\epsilon)}$ with the property that for all pairs ${(i,j) \in \{1,\ldots,M\}^2}$ outside of an exceptional set ${\Sigma}$, one has

$\displaystyle | E(A,B) - d_{ij} |A| |B| | \ll \epsilon |V_i| |V_j| \ \ \ \ \ (1)$

whenever ${A \subset V_i, B \subset V_j}$, for some real number ${d_{ij}}$, where ${E(A,B) := |\{ (a,b) \in A \times B: \{a,b\} \in E \}|}$ is the number of edges between ${A}$ and ${B}$. Furthermore, we have

$\displaystyle \sum_{(i,j) \in \Sigma} |V_i| |V_j| \ll \epsilon |V|^2. \ \ \ \ \ (2)$

Let us now prove Lemma 2. We enumerate ${V}$ (after relabeling) as ${V = \{1,\ldots,n\}}$. The adjacency matrix ${T}$ of the graph ${G}$ is then a self-adjoint ${n \times n}$ matrix, and thus admits an eigenvalue decomposition

$\displaystyle T = \sum_{i=1}^n \lambda_i u_i^* u_i$

for some orthonormal basis ${u_1,\ldots,u_n}$ of ${{\bf C}^n}$ and some eigenvalues ${\lambda_1,\ldots,\lambda_n \in {\bf R}}$, which we arrange in decreasing order of magnitude:

$\displaystyle |\lambda_1| \geq \ldots \geq |\lambda_n|.$

We can compute the trace of ${T^2}$ as

$\displaystyle \hbox{tr}(T^2) = \sum_{i=1}^n |\lambda_i|^2.$

But we also have ${\hbox{tr}(T^2) = 2|E| \leq n^2}$, so

$\displaystyle \sum_{i=1}^n |\lambda_i|^2 \leq n^2. \ \ \ \ \ (3)$

Among other things, this implies that

$\displaystyle |\lambda_i| \leq \frac{n}{\sqrt{i}} \ \ \ \ \ (4)$

for all ${i \geq 1}$.

Let ${F: {\bf N} \rightarrow {\bf N}}$ be a function (depending on ${\epsilon}$) to be chosen later, with ${F(i) \geq i}$ for all ${i}$. Applying (3) and the pigeonhole principle (or the finite convergence principle, see this blog post), we can find ${J \leq C(F,\epsilon)}$ such that

$\displaystyle \sum_{J \leq i < F(J)} |\lambda_i|^2 \leq \epsilon^3 n^2.$

(Indeed, the bound on ${J}$ is basically ${F}$ iterated ${1/\epsilon^3}$ times.) We can now split

$\displaystyle T = T_1 + T_2 + T_3, \ \ \ \ \ (5)$

where ${T_1}$ is the “structured” component

$\displaystyle T_1 := \sum_{i < J} \lambda_i u_i^* u_i, \ \ \ \ \ (6)$

${T_2}$ is the “small” component

$\displaystyle T_2 := \sum_{J \leq i < F(J)} \lambda_i u_i^* u_i, \ \ \ \ \ (7)$

and ${T_3}$ is the “pseudorandom” component

$\displaystyle T_3 := \sum_{i > F(J)} \lambda_i u_i^* u_i. \ \ \ \ \ (8)$

We now design a vertex partition to make ${T_1}$ approximately constant on most cells. For each ${i < J}$, we partition ${V}$ into ${O_{J,\epsilon}(1)}$ cells on which ${u_i}$ (viewed as a function from ${V}$ to ${{\bf C}}$) only fluctuates by ${O(\epsilon n^{-1/2} /J)}$, plus an exceptional cell of size ${O( \frac{\epsilon}{J} |V|)}$ coming from the values where ${|u_i|}$ is excessively large (larger than ${\sqrt{\frac{J}{\epsilon}} n^{-1/2}}$). Combining all these partitions together, we can write ${V = V_1 \cup \ldots \cup V_{M-1} \cup V_M}$ for some ${M = O_{J,\epsilon}(1)}$, where ${V_M}$ has cardinality at most ${\epsilon |V|}$, and for all ${1 \leq i \leq M-1}$, the eigenfunctions ${u_1,\ldots,u_{J-1}}$ all fluctuate by at most ${O(\epsilon/J)}$. In particular, if ${1 \leq i,j \leq M-1}$, then (by (4) and (6)) the entries of ${T_1}$ fluctuate by at most ${O(\epsilon)}$ on each block ${V_i \times V_j}$. If we let ${d_{ij}}$ be the mean value of these entries on ${V_i \times V_j}$, we thus have

$\displaystyle 1_B^* T_1 1_A = d_{ij} |A| |B| + O( \epsilon |V_i| |V_j| ) \ \ \ \ \ (9)$

for any ${1 \leq i,j \leq M-1}$ and ${A \subset V_i, B \subset V_j}$, where we view the indicator functions ${1_A, 1_B}$ as column vectors of dimension ${n}$.

Next, we observe from (3) and (7) that ${\hbox{tr} T_2^2 \leq \epsilon^3 n^2}$. If we let ${x_{ab}}$ be the coefficients of ${T_2}$, we thus have

$\displaystyle \sum_{a,b \in V} |x_{ab}|^2 \leq \epsilon^3 n^2$

and hence by Markov’s inequality we have

$\displaystyle \sum_{a \in V_i} \sum_{b \in V_j} |x_{ab}|^2 \leq \epsilon^2 |V_i| |V_j| \ \ \ \ \ (10)$

for all pairs ${(i,j) \in \{1,\ldots,M-1\}^2}$ outside of an exceptional set ${\Sigma_1}$ with

$\displaystyle \sum_{(i,j) \in \Sigma} |V_i| |V_j| \leq \epsilon |V|^2.$

If ${(i,j) \in \{1,\ldots,M-1\}^2}$ avoids ${\Sigma}$, we thus have

$\displaystyle 1_B^* T_2 1_A = O( \epsilon |V_i| |V_j| ) \ \ \ \ \ (11)$

for any ${A \subset V_i, B \subset V_j}$, by (10) and the Cauchy-Schwarz inequality.

Finally, to control ${T_3}$ we see from (4) and (8) that ${T_3}$ has an operator norm of at most ${1/F(J)}$. In particular, we have from the Cauchy-Schwarz inequality that

$\displaystyle 1_B^* T_3 1_A = O( n^2 / F(J) ) \ \ \ \ \ (12)$

for any ${A, B \subset V}$.

Let ${\Sigma}$ be the set of all pairs ${(i,j) \in \{1,\ldots,M\}^2}$ where either ${(i,j) \in \Sigma_1}$, ${i = M}$, ${j=M}$, or

$\displaystyle \min(|V_i|, |V_j|) \leq \frac{\epsilon}{M} n.$

One easily verifies that (2) holds. If ${(i,j) \in \{1,\ldots,M\}^2}$ is not in ${\Sigma}$, then by summing (9), (11), (12) and using (5), we see that

$\displaystyle 1_B^* T 1_A = d_{ij} |A| |B| + O( \epsilon |V_i| |V_j| ) + O( n^2 / F(J) ) \ \ \ \ \ (13)$

for all ${A \subset V_i, B \subset V_j}$. The left-hand side is just ${E(A,B)}$. As ${(i,j) \not \in \Sigma}$, we have

$\displaystyle |V_i|, |V_j| > \frac{\epsilon}{M} n$

and so (since ${M = O_{J,\epsilon}(1)}$)

$\displaystyle n^2 / F(J) \ll_{J,\epsilon} |V_i| |V_j| / F(J).$

If we let ${F}$ be a sufficiently rapidly growing function of ${J}$ that depends on ${\epsilon}$, the second error term in (13) can be absorbed in the first, and (1) follows. This concludes the proof of Lemma 2.

To prove Lemma 1, one argues similarly (after modifying ${\epsilon}$ as necessary), except that the initial partition ${V_1,\ldots,V_M}$ of ${V}$ constructed above needs to be subdivided further into equitable components (of size ${\epsilon |V|/M+O(1)}$), plus some remainder sets which can be aggregated into an exceptional component of size ${O( \epsilon |V| )}$ (and which can then be redistributed amongst the other components to arrive at a truly equitable partition). We omit the details.

Remark 1 It is easy to verify that ${F}$ needs to be growing exponentially in ${J}$ in order for the above argument to work, which leads to tower-exponential bounds in the number of cells ${M}$ in the partition. It was shown by Gowers that a tower-exponential bound is actually necessary here. By varying ${F}$, one basically obtains the strong regularity lemma first established by Alon, Fischer, Krivelevich, and Szegedy; in the opposite direction, setting ${F(J) := J}$ essentially gives the weak regularity lemma of Frieze and Kannan.

Remark 2 If we specialise to a Cayley graph, in which ${V = (V,+)}$ is a finite abelian group and ${E = \{ (a,b): a-b \in A \}}$ for some (symmetric) subset ${A}$ of ${V}$, then the eigenvectors are characters, and one essentially recovers the arithmetic regularity lemma of Green, in which the vertex partition classes ${V_i}$ are given by Bohr sets (and one can then place additional regularity properties on these Bohr sets with some additional arguments). The components ${T_1, T_2, T_3}$ of ${T}$, representing high, medium, and low eigenvalues of ${T}$, then become a decomposition associated to high, medium, and low Fourier coefficients of ${A}$.

Remark 3 The use of spectral theory here is parallel to the use of Fourier analysis to establish results such as Roth’s theorem on arithmetic progressions of length three. In analogy with this, one could view hypergraph regularity as being a sort of “higher order spectral theory”, although this spectral perspective is not as convenient as it is in the graph case.

I’ve just uploaded to the arXiv my paper “Expanding polynomials over finite fields of large characteristic, and a regularity lemma for definable sets“, submitted to Contrib. Disc. Math. The motivation of this paper is to understand a certain polynomial variant of the sum-product phenomenon in finite fields. This phenomenon asserts that if ${A}$ is a non-empty subset of a finite field ${F}$, then either the sumset ${A+A := \{a+b: a,b \in A\}}$ or product set ${A \cdot A := \{ab: a,b \in A \}}$ will be significantly larger than ${A}$, unless ${A}$ is close to a subfield of ${F}$ (or to ${\{1\}}$). In particular, in the regime when ${A}$ is large, say ${|F|^{1-c} < |A| \leq |F|}$, one expects an expansion bound of the form

$\displaystyle |A+A| + |A \cdot A| \gg (|F|/|A|)^{c'} |A| \ \ \ \ \ (1)$

for some absolute constants ${c, c' > 0}$. Results of this type are known; for instance, Hart, Iosevich, and Solymosi obtained precisely this bound for ${(c,c')=(3/10,1/3)}$ (in the case when ${|F|}$ is prime), which was then improved by Garaev to ${(c,c')=(1/3,1/2)}$.

We have focused here on the case when ${A}$ is a large subset of ${F}$, but sum-product estimates are also extremely interesting in the opposite regime in which ${A}$ is allowed to be small (see for instance the papers of Katz-Shen and Li and of Garaev for some recent work in this case, building on some older papers of Bourgain, Katz and myself and of Bourgain, Glibichuk, and Konyagin). However, the techniques used in these two regimes are rather different. For large subsets of ${F}$, it is often profitable to use techniques such as the Fourier transform or the Cauchy-Schwarz inequality to “complete” a sum over a large set (such as ${A}$) into a set over the entire field ${F}$, and then to use identities concerning complete sums (such as the Weil bound on complete exponential sums over a finite field). For small subsets of ${F}$, such techniques are usually quite inefficient, and one has to proceed by somewhat different combinatorial methods which do not try to exploit the ambient field ${F}$. But my paper focuses exclusively on the large ${A}$ regime, and unfortunately does not directly say much (except through reasoning by analogy) about the small ${A}$ case.

Note that it is necessary to have both ${A+A}$ and ${A \cdot A}$ appear on the left-hand side of (1). Indeed, if one just has the sumset ${A+A}$, then one can set ${A}$ to be a long arithmetic progression to give counterexamples to (1). Similarly, if one just has a product set ${A \cdot A}$, then one can set ${A}$ to be a long geometric progression. The sum-product phenomenon can then be viewed that it is not possible to simultaneously behave like a long arithmetic progression and a long geometric progression, unless one is already very close to behaving like a subfield.

Now we consider a polynomial variant of the sum-product phenomenon, where we consider a polynomial image

$\displaystyle P(A,A) := \{ P(a,b): a,b \in A \}$

of a set ${A \subset F}$ with respect to a polynomial ${P: F \times F \rightarrow F}$; we can also consider the asymmetric setting of the image

$\displaystyle P(A,B) := \{ P(a,b): a \in A,b \in B \}$

of two subsets ${A,B \subset F}$. The regime we will be interested is the one where the field ${F}$ is large, and the subsets ${A, B}$ of ${F}$ are also large, but the polynomial ${P}$ has bounded degree. Actually, for technical reasons it will not be enough for us to assume that ${F}$ has large cardinality; we will also need to assume that ${F}$ has large characteristic. (The two concepts are synonymous for fields of prime order, but not in general; for instance, the field with ${2^n}$ elements becomes large as ${n \rightarrow \infty}$ while the characteristic remains fixed at ${2}$, and is thus not going to be covered by the results in this paper.)

In this paper of Vu, it was shown that one could replace ${A \cdot A}$ with ${P(A,A)}$ in (1), thus obtaining a bound of the form

$\displaystyle |A+A| + |P(A,A)| \gg (|F|/|A|)^{c'} |A|$

whenever ${|A| \geq |F|^{1-c}}$ for some absolute constants ${c, c' > 0}$, unless the polynomial ${P}$ had the degenerate form ${P(x,y) = Q(L(x,y))}$ for some linear function ${L: F \times F \rightarrow F}$ and polynomial ${Q: F \rightarrow F}$, in which ${P(A,A)}$ behaves too much like ${A+A}$ to get reasonable expansion. In this paper, we focus instead on the question of bounding ${P(A,A)}$ alone. In particular, one can ask to classify the polynomials ${P}$ for which one has the weak expansion property

$\displaystyle |P(A,A)| \gg (|F|/|A|)^{c'} |A|$

whenever ${|A| \geq |F|^{1-c}}$ for some absolute constants ${c, c' > 0}$. One can also ask for stronger versions of this expander property, such as the moderate expansion property

$\displaystyle |P(A,A)| \gg |F|$

whenever ${|A| \geq |F|^{1-c}}$, or the almost strong expansion property

$\displaystyle |P(A,A)| \geq |F| - O( |F|^{1-c'})$

whenever ${|A| \geq |F|^{1-c}}$. (One can consider even stronger expansion properties, such as the strong expansion property ${|P(A,A)| \geq |F|-O(1)}$, but it was shown by Gyarmati and Sarkozy that this property cannot hold for polynomials of two variables of bounded degree when ${|F| \rightarrow \infty}$.) One can also consider asymmetric versions of these properties, in which one obtains lower bounds on ${|P(A,B)|}$ rather than ${|P(A,A)|}$.

The example of a long arithmetic or geometric progression shows that the polynomials ${P(x,y) = x+y}$ or ${P(x,y) = xy}$ cannot be expanders in any of the above senses, and a similar construction also shows that polynomials of the form ${P(x,y) = Q(f(x)+f(y))}$ or ${P(x,y) = Q(f(x) f(y))}$ for some polynomials ${Q, f: F \rightarrow F}$ cannot be expanders. On the other hand, there are a number of results in the literature establishing expansion for various polynomials in two or more variables that are not of this degenerate form (in part because such results are related to incidence geometry questions in finite fields, such as the finite field version of the Erdos distinct distances problem). For instance, Solymosi established weak expansion for polynomials of the form ${P(x,y) = f(x)+y}$ when ${f}$ is a nonlinear polynomial, with generalisations by Hart, Li, and Shen for various polynomials of the form ${P(x,y) = f(x) + g(y)}$ or ${P(x,y) = f(x) g(y)}$. Further examples of expanding polynomials appear in the work of Shkredov, Iosevich-Rudnev, and Bukh-Tsimerman, as well as the previously mentioned paper of Vu and of Hart-Li-Shen, and these papers in turn cite many further results which are in the spirit of the polynomial expansion bounds discussed here (for instance, dealing with the small ${A}$ regime, or working in other fields such as ${{\bf R}}$ instead of in finite fields ${F}$). We will not summarise all these results here; they are summarised briefly in my paper, and in more detail in the papers of Hart-Li-Shen and of Bukh-Tsimerman. But we will single out one of the results of Bukh-Tsimerman, which is one of most recent and general of these results, and closest to the results of my own paper. Roughly speaking, in this paper it is shown that a polynomial ${P(x,y)}$ of two variables and bounded degree will be a moderate expander if it is non-composite (in the sense that it does not take the form ${P(x,y) = Q(R(x,y))}$ for some non-linear polynomial ${Q}$ and some polynomial ${R}$, possibly having coefficients in the algebraic completion of ${F}$) and is monic on both ${x}$ and ${y}$, thus it takes the form ${P(x,y) = x^d + S(x,y)}$ for some ${d \geq 1}$ and some polynomial ${S}$ of degree at most ${d-1}$ in ${x}$, and similarly with the roles of ${x}$ and ${y}$ reversed, unless ${P}$ is of the form ${P(x,y) = f(x) + g(y)}$ or ${P(x,y) = f(x) g(y)}$ (in which case the expansion theory is covered to a large extent by the previous work of Hart, Li, and Shen).

Our first main result improves upon the Bukh-Tsimerman result by strengthening the notion of expansion and removing the non-composite and monic hypotheses, but imposes a condition of large characteristic. I’ll state the result here slightly informally as follows:

Theorem 1 (Criterion for moderate expansion) Let ${P: F \times F \rightarrow F}$ be a polynomial of bounded degree over a finite field ${F}$ of sufficiently large characteristic, and suppose that ${P}$ is not of the form ${P(x,y) = Q(f(x)+g(y))}$ or ${P(x,y) = Q(f(x)g(y))}$ for some polynomials ${Q,f,g: F \rightarrow F}$. Then one has the (asymmetric) moderate expansion property

$\displaystyle |P(A,B)| \gg |F|$

whenever ${|A| |B| \ggg |F|^{2-1/8}}$.

This is basically a sharp necessary and sufficient condition for asymmetric expansion moderate for polynomials of two variables. In the paper, analogous sufficient conditions for weak or almost strong expansion are also given, although these are not quite as satisfactory (particularly the conditions for almost strong expansion, which include a somewhat complicated algebraic condition which is not easy to check, and which I would like to simplify further, but was unable to).

The argument here resembles the Bukh-Tsimerman argument in many ways. One can view the result as an assertion about the expansion properties of the graph ${\{ (a,b,P(a,b)): a,b \in F \}}$, which can essentially be thought of as a somewhat sparse three-uniform hypergraph on ${F}$. Being sparse, it is difficult to directly apply techniques from dense graph or hypergraph theory for this situation; however, after a few applications of the Cauchy-Schwarz inequality, it turns out (as observed by Bukh and Tsimerman) that one can essentially convert the problem to one about the expansion properties of the set

$\displaystyle \{ (P(a,c), P(b,c), P(a,d), P(b,d)): a,b,c,d \in F \} \ \ \ \ \ (2)$

(actually, one should view this as a multiset, but let us ignore this technicality) which one expects to be a dense set in ${F^4}$, except in the case when the associated algebraic variety

$\displaystyle \{ (P(a,c), P(b,c), P(a,d), P(b,d)): a,b,c,d \in \overline{F} \}$

fails to be Zariski dense, but it turns out that in this case one can use some differential geometry and Riemann surface arguments (after first invoking the Lefschetz principle and the high characteristic hypothesis to work over the complex numbers instead over a finite field) to show that ${P}$ is of the form ${Q(f(x)+g(y))}$ or ${Q(f(x)g(y))}$. This reduction is related to the classical fact that the only one-dimensional algebraic groups over the complex numbers are the additive group ${({\bf C},+)}$, the multiplicative group ${({\bf C} \backslash \{0\},\times)}$, or the elliptic curves (but the latter have a group law given by rational functions rather than polynomials, and so ultimately end up being eliminated from consideration, though they would play an important role if one wanted to study the expansion properties of rational functions).

It remains to understand the structure of the set (2) is. To understand dense graphs or hypergraphs, one of the standard tools of choice is the Szemerédi regularity lemma, which carves up such graphs into a bounded number of cells, with the graph behaving pseudorandomly on most pairs of cells. However, the bounds in this lemma are notoriously poor (the regularity obtained is an inverse tower exponential function of the number of cells), and this makes this lemma unsuitable for the type of expansion properties we seek (in which we want to deal with sets ${A}$ which have a polynomial sparsity, e.g. ${|A| \sim |F|^{1-c}}$). Fortunately, in the case of sets such as (2) which are definable over the language of rings, it turns out that a much stronger regularity lemma is available, which I call the “algebraic regularity lemma”. I’ll state it (again, slightly informally) in the context of graphs as follows:

Lemma 2 (Algebraic regularity lemma) Let ${F}$ be a finite field of large characteristic, and let ${V, W}$ be definable sets over ${F}$ of bounded complexity (i.e. ${V, W}$ are subsets of ${F^n}$, ${F^m}$ for some bounded ${n,m}$ that can be described by some first-order predicate in the language of rings of bounded length and involving boundedly many constants). Let ${E}$ be a definable subset of ${V \times W}$, again of bounded complexity (one can view ${E}$ as a bipartite graph connecting ${V}$ and ${W}$). Then one can partition ${V, W}$ into a bounded number of cells ${V_1,\ldots,V_a}$, ${W_1,\ldots,W_b}$, still definable with bounded complexity, such that for all pairs ${i =1,\ldots a}$, ${j=1,\ldots,b}$, one has the regularity property

$\displaystyle |E \cap (A \times B)| = d_{ij} |A| |B| + O( |F|^{-1/4} |V| |W| )$

for all ${A \subset V_i, B \subset W_i}$, where ${d_{ij} := \frac{|E \cap (V_i \times W_j)|}{|V_i| |W_j|}}$ is the density of ${E}$ in ${V_i \times W_j}$.

This lemma resembles the Szemerédi regularity lemma, but regularises all pairs of cells (not just most pairs), and the regularity is of polynomial strength in ${|F|}$, rather than inverse tower exponential in the number of cells. Also, the cells are not arbitrary subsets of ${V,W}$, but are themselves definable with bounded complexity, which turns out to be crucial for applications. I am optimistic that this lemma will be useful not just for studying expanding polynomials, but for many other combinatorial questions involving dense subsets of definable sets over finite fields.

The above lemma is stated for graphs ${E \subset V \times W}$, but one can iterate it to obtain an analogous regularisation of hypergraphs ${E \subset V_1 \times \ldots \times V_k}$ for any bounded ${k}$ (for application to (2), we need ${k=4}$). This hypergraph regularity lemma, by the way, is not analogous to the strong hypergraph regularity lemmas of Rodl et al. and Gowers developed in the last six or so years, but closer in spirit to the older (but weaker) hypergraph regularity lemma of Chung which gives the same “order ${1}$” regularity that the graph regularity lemma gives, rather than higher order regularity.

One feature of the proof of Lemma 2 which I found striking was the need to use some fairly high powered technology from algebraic geometry, and in particular the Lang-Weil bound on counting points in varieties over a finite field (discussed in this previous blog post), and also the theory of the etale fundamental group. Let me try to briefly explain why this is the case. A model example of a definable set of bounded complexity ${E}$ is a set ${E \subset F^n \times F^m}$ of the form

$\displaystyle E = \{ (x,y) \in F^n \times F^m: \exists t \in F; P(x,y,t)=0 \}$

for some polynomial ${P: F^n \times F^m \times F \rightarrow F}$. (Actually, it turns out that one can essentially write all definable sets as an intersection of sets of this form; see this previous blog post for more discussion.) To regularise the set ${E}$, it is convenient to square the adjacency matrix, which soon leads to the study of counting functions such as

$\displaystyle \mu(x,x') := | \{ (y,t,t') \in F^m \times F \times F: P(x,y,t) = P(x',y,t') = 0 \}|.$

If one can show that this function ${\mu}$ is “approximately finite rank” in the sense that (modulo lower order errors, of size ${O(|F|^{-1/2})}$ smaller than the main term), this quantity depends only on a bounded number of bits of information about ${x}$ and a bounded number of bits of information about ${x'}$, then a little bit of linear algebra will then give the required regularity result.

One can recognise ${\mu(x,x')}$ as counting ${F}$-points of a certain algebraic variety

$\displaystyle V_{x,x'} := \{ (y,t,t') \in \overline{F}^m \times \overline{F} \times \overline{F}: P(x,y,t) = P(x',y,t') = 0 \}.$

The Lang-Weil bound (discussed in this previous post) provides a formula for this count, in terms of the number ${c(x,x')}$ of geometrically irreducible components of ${V_{x,x'}}$ that are defined over ${F}$ (or equivalently, are invariant with respect to the Frobenius endomorphism associated to ${F}$). So the problem boils down to ensuring that this quantity ${c(x,x')}$ is “generically bounded rank”, in the sense that for generic ${x,x'}$, its value depends only on a bounded number of bits of ${x}$ and a bounded number of bits of ${x'}$.

Here is where the étale fundamental group comes in. One can view ${V_{x,x'}}$ as a fibre product ${V_x \times_{\overline{F}^m} V_{x'}}$ of the varieties

$\displaystyle V_x := \{ (y,t) \in \overline{F}^m \times \overline{F}: P(x,y,t) = 0 \}$

and

$\displaystyle V_{x'} := \{ (y,t) \in \overline{F}^m \times \overline{F}: P(x',y,t) = 0 \}$

over ${\overline{F}^m}$. If one is in sufficiently high characteristic (or even better, in zero characteristic, which one can reduce to by an ultraproduct (or nonstandard analysis) construction, similar to that discussed in this previous post), the varieties ${V_x,V_{x'}}$ are generically finite étale covers of ${\overline{F}^m}$, and the fibre product ${V_x \times_{\overline{F}^m} V_{x'}}$ is then also generically a finite étale cover. One can count the components of a finite étale cover of a connected variety by counting the number of orbits of the étale fundamental group acting on a fibre of that variety (much as the number of components of a cover of a connected manifold is the number of orbits of the topological fundamental group acting on that fibre). So if one understands the étale fundamental group of a certain generic subset of ${\overline{F}^m}$ (formed by intersecting together an ${x}$-dependent generic subset of ${\overline{F}^m}$ with an ${x'}$-dependent generic subset), this in principle controls ${c(x,x')}$. It turns out that one can decouple the ${x}$ and ${x'}$ dependence of this fundamental group by using an étale version of the van Kampen theorem for the fundamental group, which I discussed in this previous blog post. With this fact (and another deep fact about the étale fundamental group in zero characteristic, namely that it is topologically finitely generated), one can obtain the desired generic bounded rank property of ${c(x,x')}$, which gives the regularity lemma.

In order to expedite the deployment of all this algebraic geometry (as well as some Riemann surface theory), it is convenient to use the formalism of nonstandard analysis (or the ultraproduct construction), which among other things can convert quantitative, finitary problems in large characteristic into equivalent qualitative, infinitary problems in zero characteristic (in the spirit of this blog post). This allows one to use several tools from those fields as “black boxes”; not just the theory of étale fundamental groups (which are considerably simpler and more favorable in characteristic zero than they are in positive characteristic), but also some results limiting the morphisms between compact Riemann surfaces of high genus (such as the de Franchis theorem, the Riemann-Hurwitz formula, or the fact that all morphisms between elliptic curves are essentially group homomorphisms), which would be quite unwieldy to utilise if one did not first pass to the zero characteristic case (and thence to the complex case) via the ultraproduct construction (followed by the Lefschetz principle).

I found this project to be particularly educational for me, as it forced me to wander outside of my usual range by quite a bit in order to pick up the tools from algebraic geometry and Riemann surfaces that I needed (in particular, I read through several chapters of EGA and SGA for the first time). This did however put me in the slightly unnerving position of having to use results (such as the Riemann existence theorem) whose proofs I have not fully verified for myself, but which are easy to find in the literature, and widely accepted in the field. I suppose this type of dependence on results in the literature is more common in the more structured fields of mathematics than it is in analysis, which by its nature has fewer reusable black boxes, and so key tools often need to be rederived and modified for each new application. (This distinction is discussed further in this article of Gowers.)

Ben Green, and I have just uploaded to the arXiv our paper “An arithmetic regularity lemma, an associated counting lemma, and applications“, submitted (a little behind schedule) to the 70th birthday conference proceedings for Endre Szemerédi. In this paper we describe the general-degree version of the arithmetic regularity lemma, which can be viewed as the counterpart of the Szemerédi regularity lemma, in which the object being regularised is a function ${f: [N] \rightarrow [0,1]}$ on a discrete interval ${[N] = \{1,\ldots,N\}}$ rather than a graph, and the type of patterns one wishes to count are additive patterns (such as arithmetic progressions ${n,n+d,\ldots,n+(k-1)d}$) rather than subgraphs. Very roughly speaking, this regularity lemma asserts that all such functions can be decomposed as a degree ${\leq s}$ nilsequence (or more precisely, a variant of a nilsequence that we call an virtual irrational nilsequence), plus a small error, plus a third error which is extremely tiny in the Gowers uniformity norm ${U^{s+1}[N]}$. In principle, at least, the latter two errors can be readily discarded in applications, so that the regularity lemma reduces many questions in additive combinatorics to questions concerning (virtual irrational) nilsequences. To work with these nilsequences, we also establish a arithmetic counting lemma that gives an integral formula for counting additive patterns weighted by such nilsequences.

The regularity lemma is a manifestation of the “dichotomy between structure and randomness”, as discussed for instance in my ICM article or FOCS article. In the degree ${1}$ case ${s=1}$, this result is essentially due to Green. It is powered by the inverse conjecture for the Gowers norms, which we and Tamar Ziegler have recently established (paper to be forthcoming shortly; the ${k=4}$ case of our argument is discussed here). The counting lemma is established through the quantitative equidistribution theory of nilmanifolds, which Ben and I set out in this paper.

The regularity and counting lemmas are designed to be used together, and in the paper we give three applications of this combination. Firstly, we give a new proof of Szemerédi’s theorem, which proceeds via an energy increment argument rather than a density increment one. Secondly, we establish a conjecture of Bergelson, Host, and Kra, namely that if ${A \subset [N]}$ has density ${\alpha}$, and ${\epsilon > 0}$, then there exist ${\gg_{\alpha,\epsilon} N}$ shifts ${h}$ for which ${A}$ contains at least ${(\alpha^4 - \epsilon)N}$ arithmetic progressions of length ${k=4}$ of spacing ${h}$. (The ${k=3}$ case of this conjecture was established earlier by Green; the ${k=5}$ case is false, as was shown by Ruzsa in an appendix to the Bergelson-Host-Kra paper.) Thirdly, we establish a variant of a recent result of Gowers-Wolf, showing that the true complexity of a system of linear forms over ${[N]}$ indeed matches the conjectured value predicted in their first paper.

In all three applications, the scheme of proof can be described as follows:

• Apply the arithmetic regularity lemma, and decompose a relevant function ${f}$ into three pieces, ${f_{nil}, f_{sml}, f_{unf}}$.
• The uniform part ${f_{unf}}$ is so tiny in the Gowers uniformity norm that its contribution can be easily dealt with by an appropriate “generalised von Neumann theorem”.
• The contribution of the (virtual, irrational) nilsequence ${f_{nil}}$ can be controlled using the arithmetic counting lemma.
• Finally, one needs to check that the contribution of the small error ${f_{sml}}$ does not overwhelm the main term ${f_{nil}}$. This is the trickiest bit; one often needs to use the counting lemma again to show that one can find a set of arithmetic patterns for ${f_{nil}}$ that is so sufficiently “equidistributed” that it is not impacted by the small error.

To illustrate the last point, let us give the following example. Suppose we have a set ${A \subset [N]}$ of some positive density (say ${|A| = 10^{-1} N}$) and we have managed to prove that ${A}$ contains a reasonable number of arithmetic progressions of length ${5}$ (say), e.g. it contains at least ${10^{-10} N^2}$ such progressions. Now we perturb ${A}$ by deleting a small number, say ${10^{-2} N}$, elements from ${A}$ to create a new set ${A'}$. Can we still conclude that the new set ${A'}$ contains any arithmetic progressions of length ${5}$?

Unfortunately, the answer could be no; conceivably, all of the ${10^{-10} N^2}$ arithmetic progressions in ${A}$ could be wiped out by the ${10^{-2} N}$ elements removed from ${A}$, since each such element of ${A}$ could be associated with up to ${|A|}$ (or even ${5|A|}$) arithmetic progressions in ${A}$.

But suppose we knew that the ${10^{-10} N^2}$ arithmetic progressions in ${A}$ were equidistributed, in the sense that each element in ${A}$ belonged to the same number of such arithmetic progressions, namely ${5 \times 10^{-9} N}$. Then each element deleted from ${A}$ only removes at most ${5 \times 10^{-9} N}$ progressions, and so one can safely remove ${10^{-2} N}$ elements from ${A}$ and still retain some arithmetic progressions. The same argument works if the arithmetic progressions are only approximately equidistributed, in the sense that the number of progressions that a given element ${a \in A}$ belongs to concentrates sharply around its mean (for instance, by having a small variance), provided that the equidistribution is sufficiently strong. Fortunately, the arithmetic regularity and counting lemmas are designed to give precisely such a strong equidistribution result.

A succinct (but slightly inaccurate) summation of the regularity+counting lemma strategy would be that in order to solve a problem in additive combinatorics, it “suffices to check it for nilsequences”. But this should come with a caveat, due to the issue of the small error above; in addition to checking it for nilsequences, the answer in the nilsequence case must be sufficiently “dispersed” in a suitable sense, so that it can survive the addition of a small (but not completely negligible) perturbation.

One last “production note”. Like our previous paper with Emmanuel Breuillard, we used Subversion to write this paper, which turned out to be a significant efficiency boost as we could work on different parts of the paper simultaneously (this was particularly important this time round as the paper was somewhat lengthy and complicated, and there was a submission deadline). When doing so, we found it convenient to split the paper into a dozen or so pieces (one for each section of the paper, basically) in order to avoid conflicts, and to help coordinate the writing process. I’m also looking into git (a more advanced version control system), and am planning to use it for another of my joint projects; I hope to be able to comment on the relative strengths of these systems (and with plain old email) in the future.

This week I am at Rutgers University, giving the Lewis Memorial Lectures for this year, which are also concurrently part of a workshop in random matrices. I gave four lectures, three of which were on random matrices, and one of which was on the Szemerédi regularity lemma.

The titles, abstracts, and slides of these talks are as follows.

1. Szemerédi’s lemma revisited. In this general-audience talk, I discuss the Szemerédi regularity lemma (which, roughly speaking, shows that an arbitrary large dense graph can always be viewed as the disjoint union of a bounded number of pseudorandom components), and how it has recently been reinterpreted in a more analytical (and infinitary) language using the theory of graph limits or of exchangeable measures. I also discuss arithmetic analogues of this lemma, including one which (implicitly) underlies my result with Ben Green that the primes contain arbitrarily long arithmetic progressions.
2. Singularity and determinant of random matrices. Here, I present recent progress in understanding the question of how likely a random matrix (e.g. one whose entries are all +1 or -1 with equal probability) is to be invertible, as well as the related question of how large the determinant should be. The case of continuous matrix ensembles (such as the Gaussian ensemble) is well understood, but the discrete case contains some combinatorial difficulties and took longer to understand properly. In particular I present the results of Kahn-Komlós-Szemerédi and later authors showing that discrete random matrices are invertible with exponentially high probability, and also give some results for the distribution of the determinant.
3. The least singular value of random matrices. A more quantitative version of the question “when is a matrix invertible?” is “what is the least singular value of that matrix”? I present here the recent results of Litvak-Pajor-Rudelson-Tomczak-Jaegermann, Rudelson, myself and Vu, and Rudelson-Vershynin on addressing this question in the discrete case. A central role is played by the inverse Littlewood-Offord theorems of additive combinatorics, which give reasonably sharp necessary conditions for a discrete random walk to concentrate in a small ball.
4. The circular law. One interesting application of the above theory is to extend the circular law for the spectrum of random matrices from the continuous case to the discrete case. Previous arguments of Girko and Bai for the continuous case can be transplanted to the discrete case, but the key new ingredient needed is a least singular value bound for shifted matrices $M-\lambda I$ in order to avoid the spectrum being overwhelmed by pseudospectrum. It turns out that the results of the preceding lecture are almost precisely what are needed to accomplish this.

[Update, Mar 31: first lecture slides corrected.  Thanks to Yoshiyasu Ishigami for pointing out a slight inaccuracy in the text.]

I’ve just uploaded to the arXiv my joint paper with Tim Austin, “On the testability and repair of hereditary hypergraph properties“, which has been submitted to Random Structures and Algorithms. In this paper we prove some positive and negative results for the testability (and the local repairability) of various properties of directed or undirected graphs and hypergraphs, which can be either monochromatic or multicoloured.

The negative results have already been discussed in a previous posting of mine, so today I will focus on the positive results. The property testing results here are finitary results, but it turns out to be rather convenient to use a certain correspondence principle (the hypergraph version of the Furstenberg correspondence principle) to convert the question into one about exchangeable probability measures on spaces of hypergraphs (i.e. on random hypergraphs whose probability distribution is invariant under exchange of vertices). Such objects are also closely related to the”graphons” and “hypergraphons” that emerge as graph limits, as studied by Lovasz-Szegedy, Elek-Szegedy, and others. Somewhat amusingly, once one does so, it then becomes convenient to keep track of objects indexed by vertex sets and how they are exchanged via the language of category theory, and in particular using the concept of a natural transformation to describe such objects as exchangeable measures, graph colourings, and local modification rules. I will try to sketch out some of these connections, after describing the main positive results.

This post is a sequel of sorts to my earlier post on hard and soft analysis, and the finite convergence principle. Here, I want to discuss a well-known theorem in infinitary soft analysis – the Lebesgue differentiation theorem – and whether there is any meaningful finitary version of this result. Along the way, it turns out that we will uncover a simple analogue of the Szemerédi regularity lemma, for subsets of the interval rather than for graphs. (Actually, regularity lemmas seem to appear in just about any context in which fine-scaled objects can be approximated by coarse-scaled ones.) The connection between regularity lemmas and results such as the Lebesgue differentiation theorem was recently highlighted by Elek and Szegedy, while the connection between the finite convergence principle and results such as the pointwise ergodic theorem (which is a close cousin of the Lebesgue differentiation theorem) was recently detailed by Avigad, Gerhardy, and Towsner.

The Lebesgue differentiation theorem has many formulations, but we will avoid the strongest versions and just stick to the following model case for simplicity:

Lebesgue differentiation theorem. If $f: [0,1] \to [0,1]$ is Lebesgue measurable, then for almost every $x \in [0,1]$ we have $f(x) = \lim_{r \to 0} \frac{1}{r} \int_x^{x+r} f(y)\ dy$. Equivalently, the fundamental theorem of calculus $f(x) = \frac{d}{dy} \int_0^y f(z) dz|_{y=x}$ is true for almost every x in [0,1].

Here we use the oriented definite integral, thus $\int_x^y = - \int_y^x$. Specialising to the case where $f = 1_A$ is an indicator function, we obtain the Lebesgue density theorem as a corollary:

Lebesgue density theorem. Let $A \subset [0,1]$ be Lebesgue measurable. Then for almost every $x \in A$, we have $\frac{|A \cap [x-r,x+r]|}{2r} \to 1$ as $r \to 0^+$, where |A| denotes the Lebesgue measure of A.

In other words, almost all the points x of A are points of density of A, which roughly speaking means that as one passes to finer and finer scales, the immediate vicinity of x becomes increasingly saturated with A. (Points of density are like robust versions of interior points, thus the Lebesgue density theorem is an assertion that measurable sets are almost like open sets. This is Littlewood’s first principle.) One can also deduce the Lebesgue differentiation theorem back from the Lebesgue density theorem by approximating f by a finite linear combination of indicator functions; we leave this as an exercise.

In this second lecture, I wish to talk about the dichotomy between structure and randomness as it manifests itself in four closely related areas of mathematics:

• Combinatorial number theory, which seeks to find patterns in unstructured dense sets (or colourings) of integers;
• Ergodic theory (or more specifically, multiple recurrence theory), which seeks to find patterns in positive-measure sets under the action of a discrete dynamical system on probability spaces (or more specifically, measure-preserving actions of the integers ${\Bbb Z}$);
• Graph theory, or more specifically the portion of this theory concerned with finding patterns in large unstructured dense graphs; and
• Ergodic graph theory, which is a very new and undeveloped subject, which roughly speaking seems to be concerned with the patterns within a measure-preserving action of the infinite permutation group $S_\infty$, which is one of several models we have available to study infinite “limits” of graphs.

The two “discrete” (or “finitary”, or “quantitative”) fields of combinatorial number theory and graph theory happen to be related to each other, basically by using the Cayley graph construction; I will give an example of this shortly. The two “continuous” (or “infinitary”, or “qualitative”) fields of ergodic theory and ergodic graph theory are at present only related on the level of analogy and informal intuition, but hopefully some more systematic connections between them will appear soon.

On the other hand, we have some very rigorous connections between combinatorial number theory and ergodic theory, and also (more recently) between graph theory and ergodic graph theory, basically by the procedure of viewing the infinitary continuous setting as a limit of the finitary discrete setting. These two connections go by the names of the Furstenberg correspondence principle and the graph correspondence principle respectively. These principles allow one to tap the power of the infinitary world (for instance, the ability to take limits and perform completions or closures of objects) in order to establish results in the finitary world, or at least to take the intuition gained in the infinitary world and transfer it to a finitary setting. Conversely, the finitary world provides an excellent model setting to refine one’s understanding of infinitary objects, for instance by establishing quantitative analogues of “soft” results obtained in an infinitary manner. I will remark here that this best-of-both-worlds approach, borrowing from both the finitary and infinitary traditions of mathematics, was absolutely necessary for Ben Green and I in order to establish our result on long arithmetic progressions in the primes. In particular, the infinitary setting is excellent for being able to rigorously define and study concepts (such as structure or randomness) which are much “fuzzier” and harder to pin down exactly in the finitary world.

The question in extremal graph theory I wish to discuss here originates from Luca Trevisan; it shows that we still don’t know everything that we should about the “local” properties of large dense graphs.

Let $G = (V,E)$ be a large (undirected) graph, thus V is the vertex set with some large number n of vertices, and E is the collection of edges {x,y} connecting two vertices in the graph. We can allow the graph to have loops {x,x} if one wishes; it’s not terribly important for this question (since the number of loops is so small compared to the total number of edges), so let’s say there are no loops. We define three quantities of the graph G:

• The edge density $0 \leq \alpha \leq 1$, defined as the number of edges in G, divided by the total number of possible edges, i.e. $n(n-1)/2$;
• The triangle density $0 \leq \beta \leq 1$, defined as the number of triangles in G (i.e. unordered triplets {x,y,z} such that {x,y},{y,z}, {z,x} all lie in G), divided by the total number of possible triangles, namely $n(n-1)(n-2)/6$;
• The diamond density $0 \leq \gamma \leq 1$, defined as the number of diamonds in G (i.e. unordered pairs { {x,y,z}, {x,y,w} } of triangles in G which share a common edge), divided by the total number of possible diamonds, namely $n(n-1)(n-2)(n-3)/4$.