You are currently browsing the tag archive for the ‘Roth’s theorem’ tag.

In the modern theory of additive combinatorics, a large role is played by the Gowers uniformity norms ${\|f\|_{U^k(G)}}$, where ${k \geq 1}$, ${G = (G,+)}$ is a finite abelian group, and ${f: G \rightarrow {\bf C}}$ is a function (one can also consider these norms in finite approximate groups such as ${[N] = \{1,\dots,N\}}$ instead of finite groups, but we will focus on the group case here for simplicity). These norms can be defined by the formula

$\displaystyle \|f\|_{U^k(G)} := (\mathop{\bf E}_{x,h_1,\dots,h_k \in G} \Delta_{h_1} \dots \Delta_{h_k} f(x))^{1/2^k}$

where we use the averaging notation

$\displaystyle \mathop{\bf E}_{x \in A} f(x) := \frac{1}{|A|} \sum_{x \in A} f(x)$

for any non-empty finite set ${A}$ (with ${|A|}$ denoting the cardinality of ${A}$), and ${\Delta_h}$ is the multiplicative discrete derivative operator

$\displaystyle \Delta_h f(x) := f(x+h) \overline{f(x)}.$

One reason why these norms play an important role is that they control various multilinear averages. We give two sample examples here:

Proposition 1 Let ${G = {\bf Z}/N{\bf Z}}$.

• (i) If ${a_1,\dots,a_k}$ are distinct elements of ${G}$ for some ${k \geq 2}$, and ${f_1,\dots,f_k: G \rightarrow {\bf C}}$ are ${1}$-bounded functions (thus ${|f_j(x)| \leq 1}$ for all ${j=1,\dots,k}$ and ${x \in G}$), then

$\displaystyle \mathop{\bf E}_{x, h \in G} f_1(x+a_1 h) \dots f_k(x+a_k h) \leq \|f_i\|_{U^{k-1}(G)} \ \ \ \ \ (1)$

for any ${i=1,\dots,k}$.

• (ii) If ${f_1,f_2,f_3: G \rightarrow {\bf C}}$ are ${1}$-bounded, then one has

$\displaystyle \mathop{\bf E}_{x, h \in G} f_1(x) f_2(x+h) f_3(x+h^2) \ll \|f_3\|_{U^4(G)} + N^{-1/4}.$

We establish these claims a little later in this post.

In some more recent literature (e.g., this paper of Conlon, Fox, and Zhao), the role of Gowers norms have been replaced by (generalisations) of the cut norm, a concept originating from graph theory. In this blog post, it will be convenient to define these cut norms in the language of probability theory (using boldface to denote random variables).

Definition 2 (Cut norm) Let ${{\bf X}_1,\dots,{\bf X}_k, {\bf Y}_1,\dots,{\bf Y}_l}$ be independent random variables with ${k,l \geq 0}$; to avoid minor technicalities we assume that these random variables are discrete and take values in a finite set. Given a random variable ${{\bf F} = F( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )}$ of these independent random variables, we define the cut norm

$\displaystyle \| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )} := \sup | \mathop{\bf E} {\bf F} {\bf B}_1 \dots {\bf B}_k |$

where the supremum ranges over all choices ${{\bf B}_1,\dots,{\bf B}_k}$ of random variables ${{\bf B}_i = B_i( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )}$ that are ${1}$-bounded (thus ${|{\bf B}_i| \leq 1}$ surely), and such that ${{\bf B}_i}$ does not depend on ${{\bf X}_i}$.

If ${l=0}$, we abbreviate ${\| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )}}$ as ${\| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k )}}$.

Strictly speaking, the cut norm is only a cut semi-norm when ${k=0,1}$, but we will abuse notation by referring to it as a norm nevertheless.

Example 3 If ${G = (V_1,V_2,E)}$ is a bipartite graph, and ${\mathbf{v_1}}$, ${\mathbf{v_2}}$ are independent random variables chosen uniformly from ${V_1,V_2}$ respectively, then

$\displaystyle \| 1_E(\mathbf{v_1},\mathbf{v_2}) \|_{\mathrm{CUT}(\mathbf{v_1}, \mathbf{v_2})}$

$\displaystyle = \sup_{\|f\|_\infty, \|g\|_\infty \leq 1} |\mathop{\bf E}_{v_1 \in V_1, v_2 \in V_2} 1_E(v_1,v_2) f(v_1) g(v_2)|$

where the supremum ranges over all ${1}$-bounded functions ${f: V_1 \rightarrow [-1,1]}$, ${g: V_2 \rightarrow [-1,1]}$. The right hand side is essentially the cut norm of the graph ${G}$, as defined for instance by Frieze and Kannan.

The cut norm is basically an expectation when ${k=0,1}$:

Example 4 If ${k=0}$, we see from definition that

$\displaystyle \| {\bf F} \|_{\mathrm{CUT}( ; {\bf Y}_1,\dots,{\bf Y}_l )} =| \mathop{\bf E} {\bf F} |.$

If ${k=1}$, one easily checks that

$\displaystyle \| {\bf F} \|_{\mathrm{CUT}( {\bf X}; {\bf Y}_1,\dots,{\bf Y}_l )} = \mathop{\bf E} | \mathop{\bf E}_{\bf X} {\bf F} |,$

where ${\mathop{\bf E}_{\bf X} {\bf F} = \mathop{\bf E}( {\bf F} | {\bf Y}_1,\dots,{\bf Y}_l )}$ is the conditional expectation of ${{\bf F}}$ to the ${\sigma}$-algebra generated by all the variables other than ${{\bf X}}$, i.e., the ${\sigma}$-algebra generated by ${{\bf Y}_1,\dots,{\bf Y}_l}$. In particular, if ${{\bf X}, {\bf Y}_1,\dots,{\bf Y}_l}$ are independent random variables drawn uniformly from ${X,Y_1,\dots,Y_l}$ respectively, then

$\displaystyle \| F( {\bf X}; {\bf Y}_1,\dots, {\bf Y}_l) \|_{\mathrm{CUT}( {\bf X}; {\bf Y}_1,\dots,{\bf Y}_l )}$

$\displaystyle = \mathop{\bf E}_{y_1 \in Y_1,\dots, y_l \in Y_l} |\mathop{\bf E}_{x \in X} F(x; y_1,\dots,y_l)|.$

Here are some basic properties of the cut norm:

Lemma 5 (Basic properties of cut norm) Let ${{\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l}$ be independent discrete random variables, and ${{\bf F} = F({\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l)}$ a function of these variables.

• (i) (Permutation invariance) The cut norm ${\| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )}}$ is invariant with respect to permutations of the ${{\bf X}_1,\dots,{\bf X}_k}$, or permutations of the ${{\bf Y}_1,\dots,{\bf Y}_l}$.
• (ii) (Conditioning) One has

$\displaystyle \| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )} = \mathop{\bf E} \| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k )}$

where on the right-hand side we view, for each realisation ${y_1,\dots,y_l}$ of ${{\bf Y}_1,\dots,{\bf Y}_l}$, ${{\bf F}}$ as a function ${F( {\bf X}_1,\dots,{\bf X}_k; y_1,\dots,y_l)}$ of the random variables ${{\bf X}_1,\dots, {\bf X}_k}$ alone, thus the right-hand side may be expanded as

$\displaystyle \sum_{y_1,\dots,y_l} \| F( {\bf X}_1,\dots,{\bf X}_k; y_1,\dots,y_l) \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k )}$

$\displaystyle \times \mathop{\bf P}( Y_1=y_1,\dots,Y_l=y_l).$

• (iii) (Monotonicity) If ${k \geq 1}$, we have

$\displaystyle \| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )} \geq \| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_{k-1}; {\bf X}_k, {\bf Y}_1,\dots,{\bf Y}_l )}.$

• (iv) (Multiplicative invariances) If ${{\bf B} = B({\bf X}_1,\dots,{\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l)}$ is a ${1}$-bounded function that does not depend on one of the ${{\bf X}_i}$, then

$\displaystyle \| {\bf B} {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )} \leq \| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )}.$

In particular, if we additionally assume ${|{\bf B}|=1}$, then

$\displaystyle \| {\bf B} {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )} = \| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )}.$

• (v) (Cauchy-Schwarz) If ${k \geq 1}$, one has

$\displaystyle \| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )} \leq \| \Box_{{\bf X}_1, {\bf X}'_1} {\bf F} \|_{\mathrm{CUT}( {\bf X}_2, \dots, {\bf X}_k; {\bf X}_1, {\bf X}'_1, {\bf Y}_1,\dots,{\bf Y}_l )}^{1/2}$

where ${{\bf X}'_1}$ is a copy of ${{\bf X}_1}$ that is independent of ${{\bf X}_1,\dots,{\bf X}_k,{\bf Y}_1,\dots,{\bf Y}_l}$ and ${\Box_{{\bf X}_1, {\bf X}'_1} {\bf F}}$ is the random variable

$\displaystyle \Box_{{\bf X}_1, {\bf X}'_1} {\bf F} := F( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )$

$\displaystyle \times \overline{F}( {\bf X}'_1, {\bf X}_2, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l ).$

• (vi) (Averaging) If ${k \geq 1}$ and ${{\bf F} = \mathop{\bf E}_{\bf Z} {\bf F}_{\bf Z}}$, where ${{\bf Z}}$ is another random variable independent of ${{\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l}$, and ${{\bf F}_{\bf Z} = F_{\bf Z}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )}$ is a random variable depending on both ${{\bf Z}}$ and ${{\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l}$, then

$\displaystyle \| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )} \leq \| {\bf F}_{\bf Z} \|_{\mathrm{CUT}( ({\bf X}_1, {\bf Z}), {\bf X}_2, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )}$

Proof: The claims (i), (ii) are clear from expanding out all the definitions. The claim (iii) also easily follows from the definitions (the left-hand side involves a supremum over a more general class of multipliers ${{\bf B}_1,\dots,{\bf B}_{k}}$, while the right-hand side omits the ${{\bf B}_k}$ multiplier), as does (iv) (the multiplier ${{\bf B}}$ can be absorbed into one of the multipliers in the definition of the cut norm). The claim (vi) follows by expanding out the definitions, and observing that all of the terms in the supremum appearing in the left-hand side also appear as terms in the supremum on the right-hand side. It remains to prove (v). By definition, the left-hand side is the supremum over all quantities of the form

$\displaystyle |{\bf E} {\bf F} {\bf B}_1 \dots {\bf B}_k|$

where the ${{\bf B}_i}$ are ${1}$-bounded functions of ${{\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l}$ that do not depend on ${{\bf X}_i}$. We average out in the ${{\bf X}_1}$ direction (that is, we condition out the variables ${{\bf X}_2, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l}$), and pull out the factor ${{\bf B}_1}$ (which does not depend on ${{\bf X}_1}$), to write this as

$\displaystyle |{\bf E} {\bf B}_1 {\bf E}_{{\bf X}_1}( {\bf F} {\bf B}_2 \dots {\bf B}_k )|,$

which by Cauchy-Schwarz is bounded by

$\displaystyle ( |{\bf E} |{\bf E}_{{\bf X}_1}( {\bf F} {\bf B}_2 \dots {\bf B}_k )|^2)^{1/2},$

which can be expanded using the copy ${{\bf X}_1}$ as

$\displaystyle |{\bf E} \Box_{{\bf X}_1,{\bf X}'_1} ({\bf F} {\bf B}_2 \dots {\bf B}_k) |^{1/2}.$

Expanding

$\displaystyle \Box_{{\bf X}_1,{\bf X}'_1} ({\bf F} {\bf B}_2 \dots {\bf B}_k) = (\Box_{{\bf X}_1,{\bf X}'_1} {\bf F}) (\Box_{{\bf X}_1,{\bf X}'_1} {\bf B}_2) \dots (\Box_{{\bf X}_1,{\bf X}'_1} {\bf B}_k)$

and noting that each ${\Box_{{\bf X}_1,{\bf X}'_1} {\bf B}_i}$ is ${1}$-bounded and independent of ${{\bf X}_i}$ for ${i=2,\dots,k}$, we obtain the claim. $\Box$

Now we can relate the cut norm to Gowers uniformity norms:

Lemma 6 Let ${G}$ be a finite abelian group, let ${{\bf x}, {\bf h}_1,\dots,{\bf h}_k}$ be independent random variables uniformly drawn from ${G}$ for some ${k \geq 0}$, and let ${f: G \rightarrow {\bf C}}$. Then

$\displaystyle \| f({\bf x} + {\bf h}_1 + \dots + {\bf h}_k) \|_{\mathrm{CUT}( {\bf h}_1,\dots,{\bf h}_k, {\bf x} )} \leq \|f\|_{U^{k+1}(G)} \ \ \ \ \ (2)$

and similarly (if ${k \geq 1}$)

$\displaystyle \| f({\bf x} + {\bf h}_1 + \dots + {\bf h}_k) \|_{\mathrm{CUT}( {\bf h}_1,\dots,{\bf h}_k; {\bf x} )} \leq \|f\|_{U^{k}(G)} \ \ \ \ \ (3)$

If ${f}$ is additionally assumed to be ${1}$-bounded, we have the converse inequalities

$\displaystyle \|f\|_{U^{k+1}(G)}^{2^{k+1}} \leq \| f({\bf x} + {\bf h}_1 + \dots + {\bf h}_k) \|_{\mathrm{CUT}( {\bf h}_1,\dots,{\bf h}_k, {\bf x} )} \ \ \ \ \ (4)$

and (if ${k \geq 1}$)

$\displaystyle \|f\|_{U^{k}(G)}^{2^{k}} \leq \| f({\bf x} + {\bf h}_1 + \dots + {\bf h}_k) \|_{\mathrm{CUT}( {\bf h}_1,\dots,{\bf h}_k; {\bf x} )}. \ \ \ \ \ (5)$

Proof: Applying Lemma 5(v) ${k}$ times, we can bound

$\displaystyle \| f({\bf x} + {\bf h}_1 + \dots + {\bf h}_k) \|_{\mathrm{CUT}( {\bf h_1},\dots,{\bf h_k}, {\bf x} )}$

by

$\displaystyle \| \Box_{{\bf h}_k,{\bf h}'_k} \dots \Box_{{\bf h}_1,{\bf h}'_1} (f({\bf x} + {\bf h}_1 + \dots + {\bf h}_k)) \|_{\mathrm{CUT}( {\bf x}; {\bf h}_1, {\bf h}'_1, \dots, {\bf h}_k, {\bf h}'_k )}^{1/2^k} \ \ \ \ \ (6)$

where ${{\bf h}'_1,\dots,{\bf h}'_k}$ are independent copies of ${{\bf h}_1,\dots,{\bf h}_k}$ that are also independent of ${{\bf x}}$. The expression inside the norm can also be written as

$\displaystyle \Delta_{{\bf h}_k - {\bf h}'_k} \dots \Delta_{{\bf h}_1 - {\bf h}'_1} f({\bf x} + {\bf h}'_1 + \dots + {\bf h}'_k)$

so by Example 4 one can write (6) as

$\displaystyle |\mathop{\bf E}_{h_1,\dots,h_k,h'_1,\dots,h'_k \in G} |\mathop{\bf E}_{x \in G} \Delta_{h_k - h'_k} \dots \Delta_{h_1 - h'_1} f(x+h'_1+\dots+h'_k)||^{1/2^k}$

which after some change of variables simplifies to

$\displaystyle |\mathop{\bf E}_{h_1,\dots,h_k \in G} |\mathop{\bf E}_{x \in G} \Delta_{h_k} \dots \Delta_{h_1} f(x)||^{1/2^k}$

which by Cauchy-Schwarz is bounded by

$\displaystyle |\mathop{\bf E}_{h_1,\dots,h_k \in G} |\mathop{\bf E}_{x \in G} \Delta_{h_k} \dots \Delta_{h_1} f(x)|^2|^{1/2^{k+1}}$

which one can rearrange as

$\displaystyle |\mathop{\bf E}_{h_1,\dots,h_k,h_{k+1},x \in G} \Delta_{h_{k+1}} \Delta_{h_k} \dots \Delta_{h_1} f(x)|^{1/2^{k+1}}$

giving (2). A similar argument bounds

$\displaystyle \| f({\bf x} + {\bf h}_1 + \dots + {\bf h}_k) \|_{\mathrm{CUT}( {\bf h_1},\dots,{\bf h_k}; {\bf x} )}$

by

$\displaystyle |\mathop{\bf E}_{h_1,\dots,h_k \in G} \mathop{\bf E}_{x \in G} \Delta_{h_k} \dots \Delta_{h_1} f(x)|^{1/2^k}$

which gives (3).

For (4), we can reverse the above steps and expand ${\|f\|_{U^{k+1}(G)}^{2^{k+1}}}$ as

$\displaystyle \mathop{\bf E}_{h_1,\dots,h_k \in G} |\mathop{\bf E}_{x \in G} \Delta_{h_k} \dots \Delta_{h_1} f(x)|^2$

which we can write as

$\displaystyle |\mathop{\bf E}_{h_1,\dots,h_k \in G} b(h_1,\dots,h_k) \mathop{\bf E}_{x \in G} \Delta_{h_k} \dots \Delta_{h_1} f(x)|$

for some ${1}$-bounded function ${b}$. This can in turn be expanded as

$\displaystyle |\mathop{\bf E}_{h_1,\dots,h_k,x \in G} f(x+h_1+\dots+h_k) b(h_1,\dots,h_k) \prod_{i=1}^k b_i(x,h_1,\dots,h_k)|$

for some ${1}$-bounded functions ${b_i}$ that do not depend on ${h_i}$. By Example 4, this can be written as

$\displaystyle \| f({\bf x} + {\bf h_1}+\dots+{\bf h}_k) b({\bf h}_1,\dots,{\bf h}_k) \prod_{i=1}^k b_i(x,h_1,\dots,h_k) \|_{\mathrm{CUT}(; {\bf h}_1,\dots,{\bf h}_k, {\bf x})}$

which by several applications of Theorem 5(iii) and then Theorem 5(iv) can be bounded by

$\displaystyle \| f({\bf x} + {\bf h_1}+\dots+{\bf h}_k) \|_{\mathrm{CUT}( {\bf h}_1,\dots,{\bf h}_k, {\bf x})},$

giving (4). A similar argument gives (5). $\Box$

Now we can prove Proposition 1. We begin with part (i). By permutation we may assume ${i=k}$, then by translation we may assume ${a_k=0}$. Replacing ${x}$ by ${x+h_1+\dots+h_{k-1}}$ and ${h}$ by ${h - a_1^{-1} h_1 - \dots - a_{k-1}^{-1} h_{k-1}}$, we can write the left-hand side of (1) as

$\displaystyle \mathop{\bf E}_{x,h,h_1,\dots,h_{k-1} \in G} f_k(x+h_1+\dots+h_{k-1}) \prod_{i=1}^{k-1} b_i(x,h,h_1,\dots,h_{k-1})$

where

$\displaystyle b_i(x,h,h_1,\dots,h_{k-1})$

$\displaystyle := f_i( x + h_1+\dots+h_{k-1}+ a_i(h - a_1^{-1} h_1 - \dots - a_k^{-1} h_{k-1}))$

is a ${1}$-bounded function that does not depend on ${h_i}$. Taking ${{\bf x}, {\bf h}, {\bf h}_1,\dots,{\bf h}_k}$ to be independent random variables drawn uniformly from ${G}$, the left-hand side of (1) can then be written as

$\displaystyle \mathop{\bf E} f_k({\bf x}+{\bf h}_1+\dots+{\bf h}_{k-1}) \prod_{i=1}^{k-1} b_i({\bf x},{\bf h},{\bf h}_1,\dots,{\bf h}_{k-1})$

which by Example 4 is bounded in magnitude by

$\displaystyle \| f_k({\bf x}+{\bf h}_1+\dots+{\bf h}_{k-1}) \prod_{i=1}^{k-1} b_i({\bf x},{\bf h},{\bf h}_1,\dots,{\bf h}_{k-1}) \|_{\mathrm{CUT}(; {\bf h}_1,\dots,{\bf h}_{k-1}, {\bf x}, {\bf h})}.$

After many applications of Lemma 5(iii), (iv), this is bounded by

$\displaystyle \| f_k({\bf x}+{\bf h_1}+\dots+{\bf h_{k-1}}) \|_{\mathrm{CUT}({\bf h}_1,\dots,{\bf h}_{k-1}; {\bf x}, {\bf h})}$

By Lemma 5(ii) we may drop the ${{\bf h}}$ variable, and then the claim follows from Lemma 6.

For part (ii), we replace ${x}$ by ${x+a-h^2}$ and ${h}$ by ${h-a+b}$ to write the left-hand side as

$\displaystyle \mathop{\bf E}_{x, a,b,h \in G} f_1(x+a-h^2) f_2(x+h+b-h^2) f_3(x+a+(h-a+b)^2-h^2);$

the point here is that the first factor does not involve ${b}$, the second factor does not involve ${a}$, and the third factor has no quadratic terms in ${h}$. Letting ${{\bf x}, {\bf a}, {\bf b}, {\bf h}}$ be independent variables drawn uniformly from ${G}$, we can use Example 4 to bound this in magnitude by

$\displaystyle \| f_1({\bf x}+{\bf a}-{\bf h}^2) f_2({\bf x}+{\bf h}+{\bf b}-{\bf h}^2)$

$\displaystyle f_3( {\bf x}+{\bf a}+({\bf h}-{\bf a}+{\bf b})^2-{\bf h}^2 ) \|_{\mathrm{CUT}(; {\bf x}, {\bf a}, {\bf b}, {\bf h})}$

which by Lemma 5(i),(iii),(iv) is bounded by

$\displaystyle \| f_3( {\bf x}+{\bf a}+({\bf h}-{\bf a}+{\bf b})^2 - {\bf h}^2 ) \|_{\mathrm{CUT}({\bf a}, {\bf b}; {\bf x}, {\bf h})}$

and then by Lemma 5(v) we may bound this by

$\displaystyle \| \Box_{{\bf a}, {\bf a}'} \Box_{{\bf b}, {\bf b}'} f_3( {\bf x}+{\bf a}+({\bf h}-{\bf a}+{\bf b})^2 - {\bf h}^2 ) \|_{\mathrm{CUT}(;{\bf a}, {\bf a}', {\bf b}, {\bf b}', {\bf x}, {\bf h})}^{1/4}$

which by Example 4 is

$\displaystyle |\mathop{\bf E} \Box_{{\bf a}, {\bf a}'} \Box_{{\bf b}, {\bf b}'} f_3( {\bf x}+{\bf a}+({\bf h}-{\bf a}+{\bf b})^2 - {\bf h}^2 )|^{1/4}$

Now the expression inside the expectation is the product of four factors, each of which is ${f_3}$ or ${\overline{f}_3}$ applied to an affine form ${{\bf x} + {\bf c} + {\bf a} {\bf h}}$ where ${{\bf c}}$ depends on ${{\bf a}, {\bf a}', {\bf b}, {\bf b}'}$ and ${{\bf a}}$ is one of ${2({\bf b}-{\bf a})}$, ${2({\bf b}'-{\bf a})}$, ${2({\bf b}-{\bf a}')}$, ${2({\bf b}'-{\bf a}')}$. With probability ${1-O(1/N)}$, the four different values of ${{\bf a}}$ are distinct, and then by part (i) we have

$\displaystyle |\mathop{\bf E}(\Box_{{\bf a}, {\bf a}'} \Box_{{\bf b}, {\bf b}'} f_3( {\bf x}+{\bf a}+({\bf h}-{\bf a}+{\bf b})^2 - {\bf h}^2 )|{\bf a}, {\bf a}', {\bf b}, {\bf b}')| \leq \|f_3\|_{U^4({\bf Z}/N{\bf Z})}.$

When they are not distinct, we can instead bound this quantity by ${1}$. Taking expectations in ${{\bf a}, {\bf a}', {\bf b}, {\bf b}'}$, we obtain the claim. $\Box$

The analogue of the inverse ${U^2}$ theorem for cut norms is the following claim (which I learned from Ben Green):

Lemma 7 (${U^2}$-type inverse theorem) Let ${\mathbf{x}, \mathbf{h}}$ be independent random variables drawn from a finite abelian group ${G}$, and let ${f: G \rightarrow {\bf C}}$ be ${1}$-bounded. Then we have

$\displaystyle \| f(\mathbf{x} + \mathbf{h}) \|_{\mathrm{CUT}(\mathbf{x}, \mathbf{h})} = \sup_{\xi \in\hat G} \| f(\mathbf{x}) e(\xi \cdot \mathbf{x}) \|_{\mathrm{CUT}(\mathbf{x})}$

where ${\hat G}$ is the group of homomorphisms ${\xi: x \mapsto \xi \cdot x}$ is a homomorphism from ${G}$ to ${{\bf R}/{\bf Z}}$, and ${e(\theta) := e^{2\pi i \theta}}$.

Proof: Suppose first that ${\| f(\mathbf{x} + \mathbf{h}) \|_{\mathrm{CUT}(\mathbf{x}, \mathbf{h})} > \delta}$ for some ${\delta}$, then by definition

$\displaystyle |\mathop{\bf E}_{x,h \in G} f(x+h) a(x) b(h)| > \delta$

for some ${1}$-bounded ${a,b: G \rightarrow {\bf C}}$. By Fourier expansion, the left-hand side is also

$\displaystyle \sum_{\xi \in \hat G} \hat f(-\xi) \hat a(\xi) \hat b(\xi)$

where ${\hat f(\xi) := \mathop{\bf E}_{x \in G} f(x) e(-\xi \cdot x)}$. From Plancherel’s theorem we have

$\displaystyle \sum_{\xi \in \hat G} |\hat a(\xi)|^2, \sum_{\xi \in \hat G} |\hat b(\xi)|^2 \leq 1$

hence by Hölder’s inequality one has ${|\hat f(-\xi)| > \delta}$ for some ${\xi \in \hat G}$, and hence

$\displaystyle \sup_{\xi \in\hat G} \| f(\mathbf{x}) e(\xi \cdot \mathbf{x}) \|_{\mathrm{CUT}(\mathbf{x})} > \delta. \ \ \ \ \ (7)$

Conversely, suppose (7) holds. Then there is ${\xi \in \hat G}$ such that

$\displaystyle \| f(\mathbf{x}) e(\xi \cdot \mathbf{x}) \|_{\mathrm{CUT}(\mathbf{x})} > \delta$

which on substitution and Example 4 implies

$\displaystyle \| f(\mathbf{x}+\mathbf{h}) e(\xi \cdot (\mathbf{x}+\mathbf{h})) \|_{\mathrm{CUT}(;\mathbf{x}, \mathbf{h})} > \delta.$

The term ${e(\xi \cdot (\mathbf{x}+\mathbf{h}))}$ splits into the product of a factor ${e(\xi \cdot \mathbf{x})}$ not depending on ${\mathbf{h}}$, and a factor ${e(\xi \cdot \mathbf{h})}$ not depending on ${\mathbf{x}}$. Applying Lemma 5(iii), (iv) we conclude that

$\displaystyle \| f(\mathbf{x}+\mathbf{h}) \|_{\mathrm{CUT}(\mathbf{x}, \mathbf{h})} > \delta.$

The claim follows. $\Box$

The higher order inverse theorems are much less trivial (and the optimal quantitative bounds are not currently known). However, there is a useful degree lowering argument, due to Peluse and Prendiville, that can allow one to lower the order of a uniformity norm in some cases. We give a simple version of this argument here:

Lemma 8 (Degree lowering argument, special case) Let ${G}$ be a finite abelian group, let ${Y}$ be a non-empty finite set, and let ${f: G \rightarrow {\bf C}}$ be a function of the form ${f(x) := \mathop{\bf E}_{y \in Y} F_y(x)}$ for some ${1}$-bounded functions ${F_y: G \rightarrow {\bf C}}$ indexed by ${y \in Y}$. Suppose that

$\displaystyle \|f\|_{U^k(G)} \geq \delta$

for some ${k \geq 2}$ and ${0 < \delta \leq 1}$. Then one of the following claims hold (with implied constants allowed to depend on ${k}$):

• (i) (Degree lowering) one has ${\|f\|_{U^{k-1}(G)} \gg \delta^{O(1)}}$.
• (ii) (Non-zero frequency) There exist ${h_1,\dots,h_{k-2} \in G}$ and non-zero ${\xi \in \hat G}$ such that

$\displaystyle |\mathop{\bf E}_{x \in G, y \in Y} \Delta_{h_1} \dots \Delta_{h_{k-2}} F_y(x) e( \xi \cdot x )| \gg \delta^{O(1)}.$

There are more sophisticated versions of this argument in which the frequency ${\xi}$ is “minor arc” rather than “zero frequency”, and then the Gowers norms are localised to suitable large arithmetic progressions; this is implicit in the above-mentioned paper of Peluse and Prendiville.

Proof: One can write

$\displaystyle \|f\|_{U^k(G)}^{2^k} = \mathop{\bf E}_{h_1,\dots,h_{k-2} \in G} \|\Delta_{h_1} \dots \Delta_{h_{k-2}} f \|_{U^2(G)}^4$

and hence we conclude that

$\displaystyle \|\Delta_{h_1} \dots \Delta_{h_{k-2}} f \|_{U^2(G)} \gg \delta^{O(1)}$

for a set ${\Sigma}$ of tuples ${(h_1,\dots,h_{k-2}) \in G^{k-2}}$ of density ${h_1,\dots,h_{k-2}}$. Applying Lemma 6 and Lemma 7, we see that for each such tuple, there exists ${\phi(h_1,\dots,h_{k-2}) \in \hat G}$ such that

$\displaystyle \| \Delta_{h_1} \dots \Delta_{h_{k-2}} f({\bf x}) e( \phi(h_1,\dots,h_{k-2}) \cdot {\bf x} ) \|_{\mathrm{CUT}({\bf x})} \gg \delta^{O(1)}, \ \ \ \ \ (8)$

where ${{\bf x}}$ is drawn uniformly from ${G}$.

Let us adopt the convention that ${e( \phi( _1,\dots,h_{k-2}) \cdot {\bf x} ) }$ vanishes for ${(h_1,\dots,h_{k-2})}$ not in ${\Sigma}$, then from Lemma 5(ii) we have

$\displaystyle \| \Delta_{{\bf h}_1} \dots \Delta_{{\bf h}_{k-2}} f({\bf x}) e( \phi({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot {\bf x} ) \|_{\mathrm{CUT}({\bf x}; {\bf h}_1,\dots, {\bf h}_{k-2})} \gg \delta^{O(1)},$

where ${{\bf h}_1,\dots,{\bf h}_{k-2}}$ are independent random variables drawn uniformly from ${G}$ and also independent of ${{\bf x}}$. By repeated application of Lemma 5(iii) we then have

$\displaystyle \| \Delta_{{\bf h}_1} \dots \Delta_{{\bf h}_{k-2}} f({\bf x}) e( \phi({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot {\bf x} ) \|_{\mathrm{CUT}({\bf x},{\bf h}_1,\dots, {\bf h}_{k-2})} \gg \delta^{O(1)}.$

Expanding out ${\Delta_{h_1} \dots \Delta_{h_{k-2}} f({\bf x})}$ and using Lemma 5(iv) repeatedly we conclude that

$\displaystyle \| f({\bf x} + {\bf h}_1 + \dots + {\bf h}_{k-2}) e( \phi({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot {\bf x} ) \|_{\mathrm{CUT}({\bf x},{\bf h}_1,\dots, {\bf h}_{k-2})} \gg \delta^{O(1)}.$

From definition of ${f}$ we then have

$\displaystyle \| {\bf E}_{y \in Y} F_y({\bf x} + {\bf h}_1 + \dots + {\bf h}_{k-2}) e( \phi({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot {\bf x} ) \|_{\mathrm{CUT}({\bf x},{\bf h}_1,\dots, {\bf h}_{k-2})}$

$\displaystyle \gg \delta^{O(1)}.$

By Lemma 5(vi), we see that the left-hand side is less than

$\displaystyle \| F_{\bf y}({\bf x} + {\bf h}_1 + \dots + {\bf h}_{k-2}) e( \phi({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot {\bf x} ) \|_{\mathrm{CUT}(({\bf x}, {\bf y}),{\bf h}_1,\dots, {\bf h}_{k-2})},$

where ${{\bf y}}$ is drawn uniformly from ${Y}$, independently of ${{\bf x}, {\bf h}_1,\dots,{\bf h}_{k-2}}$. By repeated application of Lemma 5(i), (v) repeatedly, we conclude that

$\displaystyle \| \Box_{{\bf h}_1, {\bf h}'_1} \dots \Box_{{\bf h}_{k-2}, {\bf h}'_{k-2}} (F_{\bf y}({\bf x} + {\bf h}_1 + \dots + {\bf h}_{k-2}) e( \phi({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot {\bf x} )) \|_{\mathrm{CUT}(({\bf x},{\bf y}); {\bf h}_1,{\bf h}'_1,\dots, {\bf h}_{k-2}, {\bf h}'_{k-2})} \gg \delta^{O(1)},$

where ${{\bf h}'_1,\dots,{\bf h}'_{k-2}}$ are independent copies of ${{\bf h}_1,\dots,{\bf h}_{k-2}}$ that are also independent of ${{\bf x}}$, ${{\bf y}}$. By Lemma 5(ii) and Example 4 we conclude that

$\displaystyle |\mathop{\bf E}( \Box_{{\bf h}_1, {\bf h}'_1} \dots \Box_{{\bf h}_{k-2}, {\bf h}'_{k-2}} (F_{\bf y}({\bf x} + {\bf h}_1 + \dots + {\bf h}_{k-2}) e( \phi({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot {\bf x} )) | {\bf h}_1,{\bf h}'_1,\dots, {\bf h}_{k-2}, {\bf h}'_{k-2}) )| \gg \delta^{O(1)} \ \ \ \ \ (9)$

with probability ${\gg \delta^{O(1)}}$.

The left-hand side can be rewritten as

$\displaystyle |\mathop{\bf E}_{x \in G, y \in Y} \Delta_{{\bf h}_1 - {\bf h}'_1} \dots \Delta_{{\bf h}_{k-2} - {\bf h}'_{k-2}} F_y( x + {\bf h}'_1 + \dots + {\bf h}'_{k-2})$

$\displaystyle e( \delta_{{\bf h}_1, {\bf h}'_1} \dots \delta_{{\bf h}_{k-2}, {\bf h}'_{k-2}} \phi({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot x )|$

where ${\delta_{{\bf h}_1, {\bf h}'_1}}$ is the additive version of ${\Box_{{\bf h}_1, {\bf h}'_1}}$, thus

$\displaystyle \delta_{{\bf h}_1, {\bf h}'_1} \phi({\bf h}_1,\dots,{\bf h}_{k-2}) := \phi({\bf h}_1,\dots,{\bf h}_{k-2}) - \phi({\bf h}'_1,\dots,{\bf h}_{k-2}).$

Translating ${x}$, we can simplify this a little to

$\displaystyle |\mathop{\bf E}_{x \in G, y \in Y} \Delta_{{\bf h}_1 - {\bf h}'_1} \dots \Delta_{{\bf h}_k - {\bf h}'_k} F_y( x ) e( \delta_{{\bf h}_1, {\bf h}'_1} \dots \delta_{{\bf h}_{k-2}, {\bf h}'_{k-2}} \phi({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot x )|$

If the frequency ${\delta_{{\bf h}_1, {\bf h}'_1} \dots \delta_{{\bf h}_{k-2}, {\bf h}'_{k-2}} \phi({\bf h}_1,\dots,{\bf h}_{k-2})}$ is ever non-vanishing in the event (9) then conclusion (ii) applies. We conclude that

$\displaystyle \delta_{{\bf h}_1, {\bf h}'_1} \dots \delta_{{\bf h}_{k-2}, {\bf h}'_{k-2}} \phi({\bf h}_1,\dots,{\bf h}_{k-2}) = 0$

with probability ${\gg \delta^{O(1)}}$. In particular, by the pigeonhole principle, there exist ${h'_1,\dots,h'_{k-2} \in G}$ such that

$\displaystyle \delta_{{\bf h}_1, h'_1} \dots \delta_{{\bf h}_{k-2}, h'_{k-2}} \phi({\bf h}_1,\dots,{\bf h}_{k-2}) = 0$

with probability ${\gg \delta^{O(1)}}$. Expanding this out, we obtain a representation of the form

$\displaystyle \phi({\bf h}_1,\dots,{\bf h}_{k-2}) = \sum_{i=1}^{k-2} \phi_i({\bf h}_1,\dots,{\bf h}_{k-2})$

holding with probability ${\gg \delta^{O(1)}}$, where the ${\phi_i: G^{k-2} \rightarrow {\bf R}/{\bf Z}}$ are functions that do not depend on the ${i^{th}}$ coordinate. From (8) we conclude that

$\displaystyle \| \Delta_{h_1} \dots \Delta_{h_{k-2}} f({\bf x}) e( \sum_{i=1}^{k-2} \phi_i(h_1,\dots,h_{k-2}) \cdot {\bf x} ) \|_{\mathrm{CUT}({\bf x})} \gg \delta^{O(1)}$

for ${\gg \delta^{O(1)}}$ of the tuples ${(h_1,\dots,h_{k-2}) \in G^{k-2}}$. Thus by Lemma 5(ii)

$\displaystyle \| \Delta_{{\bf h}_1} \dots \Delta_{{\bf h}_{k-2}} f({\bf x}) e( \sum_{i=1}^{k-2} \phi_i({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot {\bf x} ) \|_{\mathrm{CUT}({\bf x}; {\bf h}_1,\dots,{\bf h}_{k-2})} \gg \delta^{O(1)}.$

By repeated application of Lemma 5(iii) we then have

$\displaystyle \| \Delta_{{\bf h}_1} \dots \Delta_{{\bf h}_{k-2}} f({\bf x}) e( \sum_{i=1}^{k-2} \phi_i({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot {\bf x} ) \|_{\mathrm{CUT}({\bf x}, {\bf h}_1,\dots,{\bf h}_{k-2})} \gg \delta^{O(1)}$

and then by repeated application of Lemma 5(iv)

$\displaystyle \| f({\bf x} + {\bf h}_1 + \dots + {\bf h}_{k-2}) \|_{\mathrm{CUT}({\bf x}, {\bf h}_1,\dots,{\bf h}_{k-2})} \gg \delta^{O(1)}$

and then the conclusion (i) follows from Lemma 6. $\Box$

As an application of degree lowering, we give an inverse theorem for the average in Proposition 1(ii), first established by Bourgain-Chang and later reproved by Peluse (by different methods from those given here):

Proposition 9 Let ${G = {\bf Z}/N{\bf Z}}$ be a cyclic group of prime order. Suppose that one has ${1}$-bounded functions ${f_1,f_2,f_3: G \rightarrow {\bf C}}$ such that

$\displaystyle |\mathop{\bf E}_{x, h \in G} f_1(x) f_2(x+h) f_3(x+h^2)| \geq \delta \ \ \ \ \ (10)$

for some ${\delta > 0}$. Then either ${N \ll \delta^{-O(1)}}$, or one has

$\displaystyle |\mathop{\bf E}_{x \in G} f_1(x)|, |\mathop{\bf E}_{x \in G} f_2(x)| \gg \delta^{O(1)}.$

We remark that a modification of the arguments below also give ${|\mathop{\bf E}_{x \in G} f_3(x)| \gg \delta^{O(1)}}$.

Proof: The left-hand side of (10) can be written as

$\displaystyle |\mathop{\bf E}_{x \in G} F(x) f_3(x)|$

where ${F}$ is the dual function

$\displaystyle F(x) := \mathop{\bf E}_{h \in G} f_1(x-h^2) f_2(x-h^2+h).$

By Cauchy-Schwarz one thus has

$\displaystyle |\mathop{\bf E}_{x \in G} F(x) \overline{F}(x)| \geq \delta^2$

and hence by Proposition 1, we either have ${N \ll \delta^{-O(1)}}$ (in which case we are done) or

$\displaystyle \|F\|_{U^4(G)} \gg \delta^2.$

Writing ${F = \mathop{\bf E}_{h \in G} F_h}$ with ${F_h(x) := f_1(x-h^2) f_2(x-h^2+h)}$, we conclude that either ${\|F\|_{U^3(G)} \gg \delta^{O(1)}}$, or that

$\displaystyle |\mathop{\bf E}_{x,h \in G} \Delta_{h_1} \Delta_{h_2} F_h(x) e(\xi x / N )| \gg \delta^{O(1)}$

for some ${h_1,h_2 \in G}$ and non-zero ${\xi \in G}$. The left-hand side can be rewritten as

$\displaystyle |\mathop{\bf E}_{x,h \in G} g_1(x-h^2) g_2(x-h^2+h) e(\xi x/N)|$

where ${g_1 = \Delta_{h_1} \Delta_{h_2} f_1}$ and ${g_2 = \Delta_{h_1} \Delta_{h_2} f_2}$. We can rewrite this in turn as

$\displaystyle |\mathop{\bf E}_{x,y \in G} g_1(x) g_2(y) e(\xi (x + (y-x)^2) / N)|$

which is bounded by

$\displaystyle \| e(\xi({\bf x} + ({\bf y}-{\bf x})^2)/N) \|_{\mathrm{CUT}({\bf x}, {\bf y})}$

where ${{\bf x}, {\bf y}}$ are independent random variables drawn uniformly from ${G}$. Applying Lemma 5(v), we conclude that

$\displaystyle \| \Box_{{\bf y}, {\bf y}'} e(\xi({\bf x} + ({\bf y}-{\bf x})^2)/N) \|_{\mathrm{CUT}({\bf x}; {\bf y}, {\bf y}')} \gg \delta^{O(1)}.$

However, a routine Gauss sum calculation reveals that the left-hand side is ${O(N^{-c})}$ for some absolute constant ${c>0}$ because ${\xi}$ is non-zero, so that ${N \ll \delta^{-O(1)}}$. The only remaining case to consider is when

$\displaystyle \|F\|_{U^3(G)} \gg \delta^{O(1)}.$

Repeating the above arguments we then conclude that

$\displaystyle \|F\|_{U^2(G)} \gg \delta^{O(1)},$

and then

$\displaystyle \|F\|_{U^1(G)} \gg \delta^{O(1)}.$

The left-hand side can be computed to equal ${|\mathop{\bf E}_{x \in G} f_1(x)| |\mathop{\bf E}_{x \in G} f_2(x)|}$, and the claim follows. $\Box$

This argument was given for the cyclic group setting, but the argument can also be applied to the integers (see Peluse-Prendiville) and can also be used to establish an analogue over the reals (that was first obtained by Bourgain).

Szemerédi’s theorem asserts that all subsets of the natural numbers of positive density contain arbitrarily long arithmetic progressions.  Roth’s theorem is the special case when one considers arithmetic progressions of length three.  Both theorems have many important proofs using tools from additive combinatorics, (higher order) Fourier analysis, (hyper) graph regularity theory, and ergodic theory.  However, the original proof by Endre Szemerédi, while extremely intricate, was purely combinatorial (and in particular “elementary”) and almost entirely self-contained, except for an invocation of the van der Waerden theorem.  It is also notable for introducing a prototype of what is now known as the Szemerédi regularity lemma.

Back in 2005, I rewrote Szemerédi’s original proof in order to understand it better, however my rewrite ended up being about the same length as the original argument and was probably only usable to myself.  In 2012, after Szemerédi was awarded the Abel prize, I revisited this argument with the intention to try to write up a more readable version of the proof, but ended up just presenting some ingredients of the argument in a blog post, rather than try to rewrite the whole thing.  In that post, I suspected that the cleanest way to write up the argument would be through the language of nonstandard analysis (perhaps in an iterated hyperextension that could handle various hierarchies of infinitesimals), but was unable to actually achieve any substantial simplifications by passing to the nonstandard world.

A few weeks ago, I participated in a week-long workshop at the American Institute of Mathematics on “Nonstandard methods in combinatorial number theory”, and spent some time in a working group with Shabnam Akhtari, Irfam Alam, Renling Jin, Steven Leth, Karl Mahlburg, Paul Potgieter, and Henry Towsner to try to obtain a manageable nonstandard version of Szemerédi’s original proof.  We didn’t end up being able to do so – in fact there are now signs that perhaps nonstandard analysis is not the optimal framework in which to place this argument – but we did at least clarify the existing standard argument, to the point that I was able to go back to my original rewrite of the proof and present it in a more civilised form, which I am now uploading here as an unpublished preprint.   There are now a number of simplifications to the proof.  Firstly, one no longer needs the full strength of the regularity lemma; only the simpler “weak” regularity lemma of Frieze and Kannan is required.  Secondly, the proof has been “factored” into a number of stand-alone propositions of independent interest, in particular involving just (families of) one-dimensional arithmetic progressions rather than the complicated-looking multidimensional arithmetic progressions that occur so frequently in the original argument of Szemerédi.  Finally, the delicate manipulations of densities and epsilons via double counting arguments in Szemerédi’s original paper have been abstracted into a certain key property of families of arithmetic progressions that I call the “double counting property”.

The factoring mentioned above is particularly simple in the case of proving Roth’s theorem, which is now presented separately in the above writeup.  Roth’s theorem seeks to locate a length three progression ${(P(1),P(2),P(3)) = (a, a+r, a+2r)}$ in which all three elements lie in a single set.  This will be deduced from an easier variant of the theorem in which one locates (a family of) length three progressions in which just the first two elements ${P(1), P(2)}$ of the progression lie in a good set (and some other properties of the family are also required).  This is in turn derived from an even easier variant in which now just the first element of the progression is required to be in the good set.

More specifically, Roth’s theorem is now deduced from

Theorem 1.5.  Let ${L}$ be a natural number, and let ${S}$ be a set of integers of upper density at least ${1-1/10L}$.  Then, whenever ${S}$ is partitioned into finitely many colour classes, there exists a colour class ${A}$ and a family ${(P_l(1),P_l(2),P_l(3))_{l=1}^L}$ of 3-term arithmetic progressions with the following properties:

1. For each ${l}$, ${P_l(1)}$ and ${P_l(2)}$ lie in ${A}$.
2. For each ${l}$, ${P_l(3)}$ lie in ${S}$.
3. The ${P_l(3)}$ for ${l=1,\dots,L}$ are in arithmetic progression.

The situation in this theorem is depicted by the following diagram, in which elements of $A$ are in blue and elements of $S$ are in grey:

Theorem 1.5 is deduced in turn from the following easier variant:

Theorem 1.6.  Let ${L}$ be a natural number, and let ${S}$ be a set of integers of upper density at least ${1-1/10L}$.  Then, whenever ${S}$ is partitioned into finitely many colour classes, there exists a colour class ${A}$ and a family ${(P_l(1),P_l(2),P_l(3))_{l=1}^L}$ of 3-term arithmetic progressions with the following properties:

1. For each ${l}$, ${P_l(1)}$ lie in ${A}$.
2. For each ${l}$, ${P_l(2)}$ and ${P_l(3)}$ lie in ${S}$.
3. The ${P_l(2)}$ for ${l=1,\dots,L}$ are in arithmetic progression.

The situation here is described by the figure below.

Theorem 1.6 is easy to prove.  To derive Theorem 1.5 from Theorem 1.6, or to derive Roth’s theorem from Theorem 1.5, one uses double counting arguments, van der Waerden’s theorem, and the weak regularity lemma, largely as described in this previous blog post; see the writeup for the full details.  (I would be interested in seeing a shorter proof of Theorem 1.5 though that did not go through these arguments, and did not use the more powerful theorems of  Roth or Szemerédi.)

Roth’s theorem on arithmetic progressions asserts that every subset of the integers ${{\bf Z}}$ of positive upper density contains infinitely many arithmetic progressions of length three. There are many versions and variants of this theorem. Here is one of them:

Theorem 1 (Roth’s theorem) Let ${G = (G,+)}$ be a compact abelian group, with Haar probability measure ${\mu}$, which is ${2}$-divisible (i.e. the map ${x \mapsto 2x}$ is surjective) and let ${A}$ be a measurable subset of ${G}$ with ${\mu(A) \geq \alpha}$ for some ${0 < \alpha < 1}$. Then we have

$\displaystyle \int_G \int_G 1_A(x) 1_A(x+r) 1_A(x+2r)\ d\mu(x) d\mu(r) \gg_\alpha 1,$

where ${X \gg_\alpha Y}$ denotes the bound ${X \geq c_\alpha Y}$ for some ${c_\alpha > 0}$ depending only on ${\alpha}$.

This theorem is usually formulated in the case that ${G}$ is a finite abelian group of odd order (in which case the result is essentially due to Meshulam) or more specifically a cyclic group ${G = {\bf Z}/N{\bf Z}}$ of odd order (in which case it is essentially due to Varnavides), but is also valid for the more general setting of ${2}$-divisible compact abelian groups, as we shall shortly see. One can be more precise about the dependence of the implied constant ${c_\alpha}$ on ${\alpha}$, but to keep the exposition simple we will work at the qualitative level here, without trying at all to get good quantitative bounds. The theorem is also true without the ${2}$-divisibility hypothesis, but the proof we will discuss runs into some technical issues due to the degeneracy of the ${2r}$ shift in that case.

We can deduce Theorem 1 from the following more general Khintchine-type statement. Let ${\hat G}$ denote the Pontryagin dual of a compact abelian group ${G}$, that is to say the set of all continuous homomorphisms ${\xi: x \mapsto \xi \cdot x}$ from ${G}$ to the (additive) unit circle ${{\bf R}/{\bf Z}}$. Thus ${\hat G}$ is a discrete abelian group, and functions ${f \in L^2(G)}$ have a Fourier transform ${\hat f \in \ell^2(\hat G)}$ defined by

$\displaystyle \hat f(\xi) := \int_G f(x) e^{-2\pi i \xi \cdot x}\ d\mu(x).$

If ${G}$ is ${2}$-divisible, then ${\hat G}$ is ${2}$-torsion-free in the sense that the map ${\xi \mapsto 2 \xi}$ is injective. For any finite set ${S \subset \hat G}$ and any radius ${\rho>0}$, define the Bohr set

$\displaystyle B(S,\rho) := \{ x \in G: \sup_{\xi \in S} \| \xi \cdot x \|_{{\bf R}/{\bf Z}} < \rho \}$

where ${\|\theta\|_{{\bf R}/{\bf Z}}}$ denotes the distance of ${\theta}$ to the nearest integer. We refer to the cardinality ${|S|}$ of ${S}$ as the rank of the Bohr set. We record a simple volume bound on Bohr sets:

Lemma 2 (Volume packing bound) Let ${G}$ be a compact abelian group with Haar probability measure ${\mu}$. For any Bohr set ${B(S,\rho)}$, we have

$\displaystyle \mu( B( S, \rho ) ) \gg_{|S|, \rho} 1.$

Proof: We can cover the torus ${({\bf R}/{\bf Z})^S}$ by ${O_{|S|,\rho}(1)}$ translates ${\theta+Q}$ of the cube ${Q := \{ (\theta_\xi)_{\xi \in S} \in ({\bf R}/{\bf Z})^S: \sup_{\xi \in S} \|\theta_\xi\|_{{\bf R}/{\bf Z}} < \rho/2 \}}$. Then the sets ${\{ x \in G: (\xi \cdot x)_{\xi \in S} \in \theta + Q \}}$ form an cover of ${G}$. But all of these sets lie in a translate of ${B(S,\rho)}$, and the claim then follows from the translation invariance of ${\mu}$. $\Box$

Given any Bohr set ${B(S,\rho)}$, we define a normalised “Lipschitz” cutoff function ${\nu_{B(S,\rho)}: G \rightarrow {\bf R}}$ by the formula

$\displaystyle \nu_{B(S,\rho)}(x) = c_{B(S,\rho)} (1 - \frac{1}{\rho} \sup_{\xi \in S} \|\xi \cdot x\|_{{\bf R}/{\bf Z}})_+ \ \ \ \ \ (1)$

where ${c_{B(S,\rho)}}$ is the constant such that

$\displaystyle \int_G \nu_{B(S,\rho)}\ d\mu = 1,$

thus

$\displaystyle c_{B(S,\rho)} = \left( \int_{B(S,\rho)} (1 - \frac{1}{\rho} \sup_{\xi \in S} \|\xi \cdot x\|_{{\bf R}/{\bf Z}})\ d\mu(x) \right)^{-1}.$

The function ${\nu_{B(S,\rho)}}$ should be viewed as an ${L^1}$-normalised “tent function” cutoff to ${B(S,\rho)}$. Note from Lemma 2 that

$\displaystyle 1 \ll_{|S|,\rho} c_{B(S,\rho)} \ll_{|S|,\rho} 1. \ \ \ \ \ (2)$

We then have the following sharper version of Theorem 1:

Theorem 3 (Roth-Khintchine theorem) Let ${G = (G,+)}$ be a ${2}$-divisible compact abelian group, with Haar probability measure ${\mu}$, and let ${\epsilon>0}$. Then for any measurable function ${f: G \rightarrow [0,1]}$, there exists a Bohr set ${B(S,\rho)}$ with ${|S| \ll_\epsilon 1}$ and ${\rho \gg_\epsilon 1}$ such that

$\displaystyle \int_G \int_G f(x) f(x+r) f(x+2r) \nu_{B(S,\rho)}*\nu_{B(S,\rho)}(r)\ d\mu(x) d\mu(r) \ \ \ \ \ (3)$

$\displaystyle \geq (\int_G f\ d\mu)^3 - O(\epsilon)$

where ${*}$ denotes the convolution operation

$\displaystyle f*g(x) := \int_G f(y) g(x-y)\ d\mu(y).$

A variant of this result (expressed in the language of ergodic theory) appears in this paper of Bergelson, Host, and Kra; a combinatorial version of the Bergelson-Host-Kra result that is closer to Theorem 3 subsequently appeared in this paper of Ben Green and myself, but this theorem arguably appears implicitly in a much older paper of Bourgain. To see why Theorem 3 implies Theorem 1, we apply the theorem with ${f := 1_A}$ and ${\epsilon}$ equal to a small multiple of ${\alpha^3}$ to conclude that there is a Bohr set ${B(S,\rho)}$ with ${|S| \ll_\alpha 1}$ and ${\rho \gg_\alpha 1}$ such that

$\displaystyle \int_G \int_G 1_A(x) 1_A(x+r) 1_A(x+2r) \nu_{B(S,\rho)}*\nu_{B(S,\rho)}(r)\ d\mu(x) d\mu(r) \gg \alpha^3.$

But from (2) we have the pointwise bound ${\nu_{B(S,\rho)}*\nu_{B(S,\rho)} \ll_\alpha 1}$, and Theorem 1 follows.

Below the fold, we give a short proof of Theorem 3, using an “energy pigeonholing” argument that essentially dates back to the 1986 paper of Bourgain mentioned previously (not to be confused with a later 1999 paper of Bourgain on Roth’s theorem that was highly influential, for instance in emphasising the importance of Bohr sets). The idea is to use the pigeonhole principle to choose the Bohr set ${B(S,\rho)}$ to capture all the “large Fourier coefficients” of ${f}$, but such that a certain “dilate” of ${B(S,\rho)}$ does not capture much more Fourier energy of ${f}$ than ${B(S,\rho)}$ itself. The bound (3) may then be obtained through elementary Fourier analysis, without much need to explicitly compute things like the Fourier transform of an indicator function of a Bohr set. (However, the bound obtained by this argument is going to be quite poor – of tower-exponential type.) To do this we perform a structural decomposition of ${f}$ into “structured”, “small”, and “highly pseudorandom” components, as is common in the subject (e.g. in this previous blog post), but even though we crucially need to retain non-negativity of one of the components in this decomposition, we can avoid recourse to conditional expectation with respect to a partition (or “factor”) of the space, using instead convolution with one of the ${\nu_{B(S,\rho)}}$ considered above to achieve a similar effect.

We now give a basic application of Fourier analysis to the problem of counting additive patterns in sets, namely the following famous theorem of Roth:

Theorem 1 (Roth’s theorem) Let ${A}$ be a subset of the integers ${{\bf Z}}$ whose upper density

$\displaystyle \overline{\delta}(A) := \limsup_{N \rightarrow \infty} \frac{|A \cap [-N,N]|}{2N+1}$

is positive. Then ${A}$ contains infinitely many arithmetic progressions ${a, a+r, a+2r}$ of length three, with ${a \in {\bf Z}}$ and ${r>0}$.

This is the first non-trivial case of Szemerédi’s theorem, which is the same assertion but with length three arithmetic progressions replaced by progressions of length ${k}$ for any ${k}$.
As it turns out, one can prove Roth’s theorem by an application of linear Fourier analysis – by comparing the set ${A}$ (or more precisely, the indicator function ${1_A}$ of that set, or of pieces of that set) against linear characters ${n \mapsto e(\alpha n)}$ for various frequencies ${\alpha \in {\bf R}/{\bf Z}}$. There are two extreme cases to consider (which are model examples of a more general dichotomy between structure and randomness). One is when ${A}$ is aligned up almost completely with one of these linear characters, for instance by being a Bohr set of the form

$\displaystyle \{ n \in {\bf Z}: \| \alpha n - \theta \|_{{\bf R}/{\bf Z}} < \varepsilon \}$

or more generally of the form

$\displaystyle \{ n \in {\bf Z}: \alpha n \in U \}$

for some multi-dimensional frequency ${\alpha \in {\bf T}^d}$ and some open set ${U}$. In this case, arithmetic progressions can be located using the equidistribution theory of the previous set of notes. At the other extreme, one has Fourier-uniform or Fourier-pseudorandom sets, whose correlation with any linear character is negligible. In this case, arithmetic progressions can be produced in abundance via a Fourier-analytic calculation.
To handle the general case, one must somehow synthesise together the argument that deals with the structured case with the argument that deals with the random case. There are several known ways to do this, but they can be basically classified into two general methods, namely the density increment argument (or ${L^\infty}$ increment argument) and the energy increment argument (or ${L^2}$ increment argument).
The idea behind the density increment argument is to introduce a dichotomy: either the object ${A}$ being studied is pseudorandom (in which case one is done), or else one can use the theory of the structured objects to locate a sub-object of significantly higher “density” than the original object. As the density cannot exceed one, one should thus be done after a finite number of iterations of this dichotomy. This argument was introduced by Roth in his original proof of the above theorem.
The idea behind the energy increment argument is instead to decompose the original object ${A}$ into two pieces (and, sometimes, a small additional error term): a structured component that captures all the structured objects that have significant correlation with ${A}$, and a pseudorandom component which has no significant correlation with any structured object. This decomposition usually proceeds by trying to maximise the “energy” (or ${L^2}$ norm) of the structured component, or dually by trying to minimise the energy of the residual between the original object and the structured object. This argument appears for instance in the proof of the Szemerédi regularity lemma (which, not coincidentally, can also be used to prove Roth’s theorem), and is also implicit in the ergodic theory approach to such problems (through the machinery of conditional expectation relative to a factor, which is a type of orthogonal projection, the existence of which is usually established via an energy increment argument). However, one can also deploy the energy increment argument in the Fourier analytic setting, to give an alternate Fourier-analytic proof of Roth’s theorem that differs in some ways from the density increment proof.
In these notes we give both two Fourier-analytic proofs of Roth’s theorem, one proceeding via the density increment argument, and the other by the energy increment argument. As it turns out, both of these arguments extend to establish Szemerédi’s theorem, and more generally in counting other types of patterns, but this is non-trivial (requiring some sort of inverse conjecture for the Gowers uniformity norms in both cases); we will discuss this further in later notes.
Read the rest of this entry »

In the previous lecture, we studied the recurrence properties of compact systems, which are systems in which all measurable functions exhibit almost periodicity – they almost return completely to themselves after repeated shifting. Now, we consider the opposite extreme of mixing systems – those in which all measurable functions (of mean zero) exhibit mixing – they become orthogonal to themselves after repeated shifting. (Actually, there are two different types of mixing, strong mixing and weak mixing, depending on whether the orthogonality occurs individually or on the average; it is the latter concept which is of more importance to the task of establishing the Furstenberg recurrence theorem.)

We shall see that for weakly mixing systems, averages such as $\frac{1}{N} \sum_{n=0}^{N-1} T^n f \ldots T^{(k-1)n} f$ can be computed very explicitly (in fact, this average converges to the constant $(\int_X f\ d\mu)^{k-1}$). More generally, we shall see that weakly mixing components of a system tend to average themselves out and thus become irrelevant when studying many types of ergodic averages. Our main tool here will be the humble Cauchy-Schwarz inequality, and in particular a certain consequence of it, known as the van der Corput lemma.

As one application of this theory, we will be able to establish Roth’s theorem (the k=3 case of Szemerédi’s theorem).

Earlier this month, in the previous incarnation of this page, I posed a question which I thought was unsolved, and obtained the answer (in fact, it was solved 25 years ago) within a week. Now that this new version of the page has better feedback capability, I am now tempted to try again, since I have a large number of such questions which I would like to publicise. (Actually, I even have a secret web page full of these somewhere near my home page, though it will take a non-trivial amount of effort to find it!)

Perhaps my favourite open question is the problem on the maximal size of a cap set – a subset of ${\Bbb F}^n_3$ (${\Bbb F}_3$ being the finite field of three elements) which contains no lines, or equivalently no non-trivial arithmetic progressions of length three. As an upper bound, one can easily modify the proof of Roth’s theorem to show that cap sets must have size $O(3^n/n)$ (see e.g. this paper of Meshulam). This of course is better than the trivial bound of $3^n$ once n is large. In the converse direction, the trivial example $\{0,1\}^n$ shows that cap sets can be as large as $2^n$; the current world record is $(2.2174\ldots)^n$, held by Edel. The gap between these two bounds is rather enormous; I would be very interested in either an improvement of the upper bound to $o(3^n/n)$, or an improvement of the lower bound to $(3-o(1))^n$. (I believe both improvements are true, though a good friend of mine disagrees about the improvement to the lower bound.)