A nonstandard analysis proof of Szemeredi’s theorem

20 July, 2015 in expository, math.CO, math.DS | Tags: Gowers uniformity norms, nonstandard analysis, Szemeredi's theorem, uniform almost periodicity | by Terence Tao

Szemerédi’s theorem asserts that any subset of the integers of positive upper density contains arbitrarily large arithmetic progressions. Here is an equivalent quantitative form of this theorem:

Theorem 1 (Szemerédi’s theorem) Let ${N}$ be a positive integer, and let ${f: {\bf Z}/N{\bf Z} \rightarrow [0,1]}$ be a function with ${{\bf E}_{x \in {\bf Z}/N{\bf Z}} f(x) \geq \delta}$ for some ${\delta>0}$ , where we use the averaging notation ${{\bf E}_{x \in A} f(x) := \frac{1}{|A|} \sum_{x \in A} f(x)}$ , ${{\bf E}_{x,r \in A} f(x) := \frac{1}{|A|^2} \sum_{x, r \in A} f(x)}$ , etc.. Then for ${k \geq 3}$ we have

$\displaystyle {\bf E}_{x,r \in {\bf Z}/N{\bf Z}} f(x) f(x+r) \dots f(x+(k-1)r) \geq c(k,\delta)$

for some ${c(k,\delta)>0}$ depending only on ${k,\delta}$ .

The equivalence is basically thanks to an averaging argument of Varnavides; see for instance Chapter 11 of my book with Van Vu or this previous blog post for a discussion. We have removed the cases ${k=1,2}$ as they are trivial and somewhat degenerate.

There are now many proofs of this theorem. Some time ago, I took an ergodic-theoretic proof of Furstenberg and converted it to a purely finitary proof of the theorem. The argument used some simplifying innovations that had been developed since the original work of Furstenberg (in particular, deployment of the Gowers uniformity norms, as well as a “dual” norm that I called the uniformly almost periodic norm, and an emphasis on van der Waerden’s theorem for handling the “compact extension” component of the argument). But the proof was still quite messy. However, as discussed in this previous blog post, messy finitary proofs can often be cleaned up using nonstandard analysis. Thus, there should be a nonstandard version of the Furstenberg ergodic theory argument that is relatively clean. I decided (after some encouragement from Ben Green and Isaac Goldbring) to write down most of the details of this argument in this blog post, though for sake of brevity I will skim rather quickly over arguments that were already discussed at length in other blog posts. In particular, I will presume familiarity with nonstandard analysis (in particular, the notion of a standard part of a bounded real number, and the Loeb measure construction), see for instance this previous blog post for a discussion.

By routine “compactness and contradiction” arguments (as discussed in this previous post), Theorem 1 can be deduced from the following nonstandard variant:

Theorem 2 Let ${N}$ be a nonstandard positive integer, let ${X}$ be the nonstandard cyclic group ${{}^*{\bf Z}/N^*{\bf Z}}$ , and let ${f: X \rightarrow {}^*[0,1]}$ be an internal function with ${\hbox{st} \mathop{\bf E}_{x \in X} f(x) > 0}$ . Then for any standard ${k \geq 3}$ ,

$\displaystyle \hbox{st} \mathop{\bf E}_{x,r \in X} f(x) f(x+r) \dots f(x+(k-1)r) > 0.$

Here of course the averaging notation is interpreted internally.

Indeed, if Theorem 1 failed, one could create a sequence of functions ${f_n: {\bf Z}/N_n{\bf Z} \rightarrow [0,1]}$ of density at least ${\delta>0}$ for some fixed ${\delta}$ , and a fixed ${k}$ such that

$\displaystyle {\bf E}_{x,r \in {\bf Z}/N_n{\bf Z}} f_n(x) f_n(x+r) \dots f_n(x+(k-1)r) \rightarrow 0;$

taking ultralimits one can then soon obtain a counterexample to Theorem 2.

It remains to prove Theorem 2. Henceforth ${N}$ is a fixed nonstandard positive integer, and ${X := {}^* {\bf Z}/N{}^* {\bf Z}}$ . By the Loeb measure construction (discussed in this previous blog post), one can give ${X}$ the structure of a probability space ${(X, {\mathcal L}_X, \mu)}$ (the Loeb space of ${X}$ ), such that every internal subset ${A}$ of ${X}$ is (Loeb) measurable with

$\displaystyle \mu(A) = \hbox{st} \frac{|A|}{|X|},$

which implies that any bounded internal function ${f: X \rightarrow {}^*{\bf R}}$ has standard part ${\hbox{st} f}$ which is (Loeb) measurable with

$\displaystyle \int_X f\ d\mu = \hbox{st} {\bf E}_{x \in X} f(x)\ dx.$

Conversely, a countable saturation argument shows that any function in ${L^\infty(X)}$ is equal almost everywhere to the standard part of a bounded internal function.

From Hölder’s inequality we see that the ${k}$ -linear form

$\displaystyle \hbox{st} \mathop{\bf E}_{x,r \in X} f_0(x) f_1(x+r) \dots f_{k-1}(x+(k-1)r)$

vanishes if one of the ${f_i}$ has standard part vanishing almost everywhere. As such, we can (by abuse of notation) extend this ${k}$ -linear form to functions ${f_0,\dots,f_{k-1}}$ that are elements of ${L^\infty(X)}$ , rather than bounded internal functions. With this convention, we see that Theorem 2 is equivalent to the following assertion.

Theorem 3 For any non-negative ${f \in L^\infty(X,\mu)}$ with ${\int_X f\ d\mu > 0}$ , one has for any standard ${k \geq 3}$ ,

$\displaystyle \hbox{st} \mathop{\bf E}_{x,r \in X} f(x) f(x+r) \dots f(x+(k-1)r) > 0.$

The next step is to introduce the Gowers-Host-Kra uniformity seminorms ${\|f\|_{U^d(X)}}$ , defined for ${f \in L^\infty(X)}$ by the formula

$\displaystyle \|f \|_{U^d(X)}^{2^d} := \hbox{st} {\bf E}_{x,h_1,\dots,h_d \in X} \prod_{\omega \in \{0,1\}^d} F(x+\omega_1 h_1 + \dots + \omega_d h_d)$

where ${F}$ is any bounded internal function whose standard part equals ${f}$ almost everywhere. From Hölder’s inequality one can see that the exact choice of ${F}$ does not matter, so that this seminorm is well-defined. (It is indeed a seminorm, but we will not need this fact here.)

We have the following application of the van der Corput inequality:

Theorem 4 (Generalised von Neumann theorem) Let ${k \geq 3}$ be standard. For any ${f_0,\dots,f_{k-1} \in L^\infty(X,\mu)}$ with ${\|f_i\|_{U^{k-1}(X)}=0}$ for some ${i=0,\dots,k-1}$ , one has

$\displaystyle \hbox{st} \mathop{\bf E}_{x,r \in X} f_0(x) f_1(x+r) \dots f_k(x+(k-1)r) = 0.$

This estimate is proven in numerous places in the literature (e.g. Lemma 11.4 of my book with Van Vu, or Exercise 23 of this blog post) and will not be repeated here. In particular, from multilinearity we see that

$\displaystyle \hbox{st} \mathop{\bf E}_{x,r \in X} f(x) f(x+r) \dots f(x+(k-1)r) = \hbox{st} \mathop{\bf E}_{x,r \in X} g(x) g(x+r) \dots g(x+(k-1)r) \ \ \ \ \ (1)$

whenever ${f,g \in L^\infty(X)}$ with ${\|f-g\|_{U^{k-1}(X)} = 0}$ .

Dual to the Gowers norms ${U^{k-1}(X)}$ are the uniformly almost periodic norms ${UAP^{k-2}(X)}$ . Let us first define the internal version of these norms. We define ${{}^* UAP^0(X)}$ to be the space of constant internal functions ${f: x \mapsto c}$ , with internal norm ${\|f\|_{{}^* UAP^0(X)} = |c|}$ . Once ${{}^* UAP^d(X)}$ is defined for some ${d \geq 0}$ , we define ${{}^* UAP^{d+1}(X)}$ to be the internal normed vector space of internal functions ${f: X \rightarrow {}^* {\bf R}}$ for which there exists a nonstandard real number ${M}$ , an internally finite non-empty set ${H}$ , an internal family ${h \mapsto g_h}$ of internal functions ${g_h: X \rightarrow {}^* {\bf R}}$ bounded in magnitude by one for each ${h \in H}$ , and an internal family ${(n,h) \mapsto c_{n,h}}$ of internal functions ${c_{n,h}: X \rightarrow {}^* {\bf R}}$ in the unit ball of ${{}^* UAP^d(X)}$ such that one had the representation

$\displaystyle T^n f = M {\bf E}_{h \in H} c_{n,h} g_h$

for all ${n \in X}$ , where ${T^n f(x) := f(x+n)}$ is the shift of ${f}$ by ${n}$ . The internal infimum of all such ${M}$ is then the ${{}^* UAP^{d+1}(X)}$ norm of ${f}$ . This gives each of the ${{}^* UAP^d(X)}$ the structure of an internal shift-invariant Banach algebra; see Section 5 of . The ${{}^* UAP^d(X)}$ norms also controlled the supremum norm:

$\displaystyle \sup_{x \in X} |f(x)| \leq \|f\|_{{}^* UAP^d(X)}.$

In particular, if we write ${UAP^d(X)}$ for the space of standard parts of internal functions of bounded norm in ${{}^* UAP^d(X)}$ , then ${UAP^d(X)}$ is an (external) Banach algebra contained (as a real vector space) in ${L^\infty(X)}$ . For ${d \geq 1}$ , we can then define a factor ${Z^{d-1}(X)}$ of ${X}$ to be the probability space ${Z^{d-1}(X) = (X, {\mathcal Z}^{d-1}(X), \mu)}$ , where ${{\mathcal Z}^{d-1}(X)}$ is the subalgebra of ${{\mathcal L}_X}$ consisting of those sets ${E}$ such that ${1_E}$ lies in the ${L^2}$ closure of ${UAP^d(X)}$ . This is easily seen to be a shift-invariant ${\sigma}$ -algebra, and so ${Z^{d-1}(X)}$ is a factor.

We have the following key characteristic factor relationship:

Theorem 5 Let ${f \in L^\infty(X)}$ with ${{\bf E}(f|Z^{k-2}(X))=0}$ . Then ${\|f\|_{U^{k-1}(X)} = 0}$ .

In fact the converse implication is true also (making ${Z^{k-2}(X)}$ the universal characteristic factor for the ${U^{k-1}(X)}$ seminorm), but we will not need this direction of the implication.

Proof: Suppose for contradiction that ${\|f\|_{U^{k-1}(X)} > 0}$ ; we can normalise ${\|f\|_{L^\infty(X)} \leq 1}$ . Writing ${f = \hbox{st} F}$ for some bounded internal ${F}$ , we then see that ${f}$ has a non-zero inner product with ${\hbox{st} {\mathcal D}_{k-2} F}$ , where the dual function ${{\mathcal D}_d F: X \rightarrow {}^* {\bf R}}$ for ${d \geq 0}$ is the bounded internal function

$\displaystyle {\mathcal D}_d F(x) := {\bf E}_{h_1,\dots,h_d \in X} \prod_{\omega \in \{0,1\}^d \backslash \{0\}^d} F(x+\omega_1 h_1 + \dots + \omega_d h_d).$

From the easily verified identity

$\displaystyle T^n {\mathcal D}_d F(x) = {\bf E}_{h \in X} {\mathcal D}_{d-1}(T^n f T^h f) T^h f$

and a routine induction, we see that ${{\mathcal D}_d F}$ lies in the unit ball of ${{}^* UAP^d(X)}$ , and so ${\hbox{st} {\mathcal D}_d F}$ is measurable with respect to ${Z^{k-2}(X)}$ . By hypothesis this implies that ${f}$ is orthogonal to ${\hbox{st} {\mathcal D}_d F}$ , a contradiction. $\Box$

In view of the above theorem and (1), we may replace ${f}$ by ${{\bf E}(f|Z^{k-2}(X))}$ without affecting the average in Theorem 3. Thus that theorem is equivalent to the following.

Theorem 6 Let ${d \geq 0}$ and ${k \geq 1}$ be standard. Then for any non-negative ${f \in L^\infty(Z^d(X))}$ with ${\int_X f\ d\mu > 0}$ , one has

$\displaystyle \hbox{st} \mathop{\bf E}_{x,r \in X} f(x) f(x+r) \dots f(x+(k-1)r) > 0. \ \ \ \ \ (2)$

We only apply this theorem in the case ${k \geq 3}$ and ${d = k-2}$ , but for inductive purposes it is convenient to decouple the two parameters.

We prove Theorem 6 by induction on ${d}$ (allowing ${k}$ to be arbitrary). When ${d=0}$ , the claim is obvious for any ${k}$ because all functions in ${L^\infty(Z^0(X))}$ are essentially constant. Now suppose that ${d \geq 1}$ and that the claim has already been proven for ${d-1}$ .

Let ${f \in L^\infty(Z^d(X))}$ be a nonnegative function whose mean ${\delta := \int_X f\ d\mu}$ is positive; we may normalise ${f}$ to take values in ${[0,1]}$ . Let ${k \geq 1}$ be standard, and let ${\varepsilon>0}$ be a sufficiently small standard quantity depending on ${k,\delta}$ to be chosen later (one could for instance take ${\varepsilon := \frac{\delta^{10k}}{10 k^{10}}}$ , but we will not attempt to optimise in ${\varepsilon}$ ). As ${f}$ is ${Z^d(X)}$ -measurable, one can find an internal function ${F: X \rightarrow {}^* {\bf R}}$ with ${0 \leq F \leq 1}$ and bounded ${{}^* UAP^d(X)}$ norm such that ${\|f-\hbox{st} F\|_{L^1(X)} \leq \varepsilon}$ . (Note though that while the ${{}^* UAP^d(X)}$ norm of ${F}$ is bounded, this bound could be extremely large compared to ${k}$ , ${1/\delta}$ , ${1/\varepsilon}$ .)

Set ${Y := Z^{d-1}(X)}$ . We define the relative inner product ${\langle f, g \rangle_{L^2(X|Y)} \in L^\infty(Y)}$ for ${f,g \in L^\infty(X)}$ by the formula

$\displaystyle \langle f,g \rangle_{L^2(X|Y)} := {\bf E}(f g|Y)$

and the relative norm

$\displaystyle \|f\|_{L^2(X|Y)} := \langle f,f \rangle_{L^2(X|Y)}^{1/2}.$

This gives ${L^\infty(X)}$ the structure of a (pre-)Hilbert module over ${L^\infty(Y)}$ , as discussed in this previous blog post.

A crucial point is that the function ${F}$ is relatively almost periodic over the previous characteristic factor ${Y}$ , in the following sense.

Proposition 7 (Relative almost periodicity) There exists a standard natural number ${m}$ and functions ${q_1,\dots,q_m}$ in the unit ball of ${L^\infty(X)}$ with the following “relative total boundedness” property: for any ${n \in X}$ , there exists a ${Y}$ -measurable function ${i: X \rightarrow \{1,\dots,m\}}$ such that ${\| \hbox{st} T^n F - q_i \|_{L^2(X|Y)} \leq \varepsilon}$ almost everywhere (where ${q_i(x)}$ is short-hand for ${\sum_{j=1}^m q_j(x) 1_{i(x)=j}}$ ).

Proof: This will be a relative version of the standard analysis fact that integral operators on finite measure spaces with bounded kernel are in the Hilbert-Schmidt class, and thus compact.

By construction, there exists an internally finite non-empty set ${H}$ , an internal collection ${h \mapsto g_h}$ of internal functions ${g_h: X \rightarrow {}^* {\bf R}}$ that are uniformly bounded in ${h}$ , and an internal collection ${(n,h) \mapsto c_{n,h}}$ of internal functions ${c_{n,h}: X \rightarrow {}^* {\bf R}}$ that are uniformly bounded in ${{}^* UAP^{d-1}(X)}$ , such that

$\displaystyle T^n F = {\bf E}_{h \in H} c_{n,h} g_h \ \ \ \ \ (3)$

for all ${n \in X}$ . Note in particular that the ${\hbox{st} c_{n,h}}$ all lie in a bounded subset of ${L^\infty(Y)}$ , and the ${g_h}$ all lie in a bounded subset of ${L^\infty(X)}$ .

We give ${Y \times H}$ the ${\sigma}$ -algebra generated from the standard parts of bounded internal functions ${(x,h) \mapsto f(x,h)}$ such that the standard parts of ${x \mapsto f(x,h)}$ all lie in a bounded subset of ${L^\infty(Y)}$ ; this gives a probability space that extends the product measure of ${Y}$ and ${H}$ . We define an operator ${S: L^\infty(Y \times H) \rightarrow L^\infty(X)}$ as follows. If ${b \in L^\infty(Y \times H)}$ , then ${b}$ is the standard part of some bounded internal function ${B: Y \times H \rightarrow {}^* {\bf R}}$ . We then define ${Sb}$ by the formula

$\displaystyle Sb(x) := \hbox{st} {\bf E}_{h \in H} B(x,h) g_h(x).$

This can easily be seen to not depend on the choice of ${B}$ , and ${S}$ defines a ${L^\infty(Y)}$ -linear operator (embedding ${L^\infty(Y)}$ into both ${L^\infty(Y \times H)}$ and ${L^\infty(X)}$ in the obvious fashion). Note that ${T^n f}$ lies in the range of ${S}$ applied to a function in the unit ball of ${L^\infty(Y \times H)}$ .

Now we claim that this operator is relatively Hilbert-Schmidt over ${L^\infty(Y)}$ , in the sense that there exists a finite bound ${A}$ such that

$\displaystyle \| ( \sum_{m=1}^M \sum_{n=1}^N \langle S e_m, f_n \rangle_{L^2(X|Y)}^2 )^{1/2} \|_{L^\infty(Y)} \leq A \ \ \ \ \ (4)$

for all finite collections ${e_m \in L^\infty(Y \times H), f_n \in L^\infty(X)}$ of functions that are relatively orthonormal over ${L^\infty(Y)}$ in the sense that

$\displaystyle \langle e_m, e_{m'} \rangle_{L^2(Y\times H|Y)} = 1_{m=m'}$

and

$\displaystyle \langle f_n, f_{n'} \rangle_{L^2(X|Y)} = 1_{n=n'}$

for all ${1 \leq m,m' \leq M}$ and ${1 \leq n,n' \leq N}$ . Indeed, the left-hand side of (4) may be expanded first as

$\displaystyle \int_Y \sum_{m=1}^M \sum_{n=1}^N a_{m,n} \langle S e_m, f_n \rangle$

for some sequence ${a_{m,n}}$ in ${L^\infty(Y)}$ with ${\sum_{m,n} \| a_{m,n} \|_{L^2(Y)}^2 = 1}$ , and then as

$\displaystyle \int_{X \times H} \sum_{m=1}^M \sum_{n=1}^N a_{m,n} e_m f_n g$

where we use Loeb measure on ${X \times H}$ and ${g \in L^\infty(X \times H)}$ is the function ${g(x,h) := \hbox{st} g_h(x)}$ , and ${a_{m,n}, e_m, f_n}$ are lifted up to ${L^\infty(X \times H)}$ in the obvious fashion. By Cauchy-Schwarz and the boundedness of ${g}$ , we can bound this by

$\displaystyle A \| \|\sum_{m=1}^M \sum_{n=1}^N a_{m,n} e_m f_n \|_{L^2(X \times H|Y)} \|_{L^2(Y)},$

But the ${e_m f_n}$ are relatively orthonormal over ${L^\infty(Y)}$ (this reflects the relative orthogonality of ${Y \times H}$ and ${X}$ over ${Y}$ ), so that

$\displaystyle \|\sum_{m=1}^M \sum_{n=1}^N a_{m,n} e_m f_n \|_{L^2(X \times H|Y)} = (\sum_{m=1}^M \sum_{n=1}^N a_{m,n}^2)^{1/2}$

and the claim follows from the hypotheses on ${a}$ .

Using the relative spectral theorem for relative Hilbert-Schmidt operators (see Corollary 17 of this blog post), we may thus find relatively orthonormal systems ${e_n, f_n}$ in ${L^\infty(Y \times H)}$ and ${L^\infty(X)}$ respectively over ${L^\infty(Y)}$ and a non-increasing sequence of non-negative coefficients ${\sigma_n \in L^\infty(Y)}$ (the relative singular values) with ${\sum_n \sigma_n^2 \leq A^2}$ almost everywhere, such that we have the spectral decomposition

$\displaystyle Sb = \sum_n \sigma_n \langle b, e_n \rangle_{L^2(Y \times H|Y)} f_n$

wiht the sum converging in ${L^2(X|Y)}$ . (If ${X, Y, Y \times H}$ were standard Borel spaces, one could deduce this theorem from the usual spectral theorem for Hilbert-Schmidt operators using disintegration. Loeb spaces are certainly not standard Borel, but as discussed in the linked blog post above, one can adapt the proof of the spectral theorem to the relative setting without using the device of disintegration.

Since ${\sum_n \sigma_n^2 \leq A^2}$ and the ${\sigma_n}$ are decreasing, one can find an ${N}$ such that ${\sigma_n \leq \varepsilon/2}$ almost everywhere for all ${n \geq N}$ . For ${b}$ in the unit ball of ${L^\infty(Y \times H)}$ , this lets one approximate ${Sb}$ by the finite rank operator ${\sum_{n \leq N} \sigma_n \langle b, e_n \rangle_{L^2(Y \times H|Y)} f_n}$ to within ${\varepsilon/2}$ almost everywhere in ${L^2(X|Y)}$ norm. If one rounds ${\sigma_n \langle b, e_n \rangle_{L^2(Y \times H|Y)}}$ to the nearest multiple of ${\varepsilon/2N}$ for each ${y \in Y}$ , and lets ${q_1,\dots,q_m}$ be the collection of linear combinations of the form ${\sum_n c_n f_n}$ with ${c_n \in [-1,1]}$ a multiple of ${\varepsilon/2N}$ , we obtain the claim. $\Box$

We return to the proof of (2). Since ${\int_X f = \delta}$ and ${\|f-\hbox{st} F \|_{L^1(X)} \leq \varepsilon}$ , we have

$\displaystyle \int_Y \mathop{\bf E}( \hbox{st} F | Y ) = \int_X \hbox{st} F \geq \delta/2$

if ${\varepsilon}$ is small enough. In particular there is a ${Y}$ -measurable set ${E'}$ of measure at least ${\delta/4}$ such that ${\mathop{\bf E}( \hbox{st} F | Y ) \geq \delta/4}$ on ${E'}$ . Since

$\displaystyle \int_Y \mathop{\bf E}( | f - \hbox{st} F | | Y ) = \|f-\hbox{st} F \|_{L^1(X)} \leq \varepsilon,$

we see from Markov’s inequality (for small enough ${\varepsilon}$ ) that there is a ${Y}$ -measurable subset ${E}$ of ${E'}$ of measure at least ${\delta/8}$ such that

$\displaystyle \|f - \hbox{st} F \|_{L^1(X|Y)} = O_\delta(\varepsilon) \ \ \ \ \ (5)$

on ${E}$ , where we write

$\displaystyle \|f\|_{L^1(X|Y)} := \mathop{\bf E}( |f| | Y )$

for the relative ${L^1}$ norm. In particular we have

$\displaystyle \mathop{\bf E}(f|Y) \geq \delta/8 \ \ \ \ \ (6)$

almost everywhere on ${E}$ .

Let ${K}$ be a sufficiently large standard natural number (depending on ${\delta,\varepsilon,k}$ and the quantity ${m}$ from Proposition 7), in fact it will essentially be a van der Waerden number of these inputs) to be chosen later. Applying the induction hypothesis, we have

$\displaystyle \hbox{st} \mathop{\bf E}_{x,r \in X} 1_E(x) 1_E(x+r) \dots 1_E(x+(K-1)r) > 0.$

In particular, there is a standard ${\kappa > 0}$ , such that for ${r}$ in a subset of ${X}$ of measure at least ${\kappa}$ , we have

$\displaystyle \hbox{st} \mathop{\bf E}_{x \in X} 1_E(x) 1_E(x+r) \dots 1_E(x+(K-1)r) \geq \kappa$

or equivalently that the set

$\displaystyle E_r := E \cap (E-r) \cap \dots \cap (E - (K-1)r)$

has measure at least ${\kappa}$ .

Let ${r}$ be as above, and let ${q_1,\dots,q_m}$ be the functions from Proposition 7. Then for ${j=0,\dots,K-1}$ , we can find a measurable function ${i_j: Y \rightarrow \{1,\dots,m\}}$ such that

$\displaystyle \| \hbox{st} T^{jr} F - q_{i_j} \|_{L^2(X|Y)} \leq \varepsilon$

almost everywhere on ${Y}$ , hence by (5) we have

$\displaystyle \| T^{jr} f - q_{i_j} \|_{L^1(X|Y)} \ll_\delta \varepsilon$

almost everywhere on ${E_r}$ . From this and the relative Hölder inequality, we see that

$\displaystyle \| T^{ar} f T^{(a+b)r} f \dots T^{(a+(k-1)b)r} f - q_{i_a} q_{i_{a+b}} \dots q_{i_{a+(k-1)b}} \|_{L^1(X|Y)} \ll_{\delta,k} \varepsilon$

a.e. on ${E_r}$ whenever ${0 \leq a, b \leq K/k}$ .

Now, for ${K}$ large enough, we see from van der Warden’s theorem that there exist measurable ${a,b: Y \rightarrow \{1,\dots,K/k\}}$ such that

$\displaystyle i_a = i_{a+b} = \dots = i_{a+(k-1)b}$

almost everywhere in ${Y}$ , and hence in ${E_r}$ (this can be seen by partitioning ${Y}$ into finitely many pieces, with each of the ${i_a}$ constant on each of these pieces). For that choice of ${a,b}$ we have

$\displaystyle \| T^{ar} f T^{(a+b)r} f \dots T^{(a+(k-1)b)r} f - q_{i_a}^k \|_{L^1(X|Y)} \ll_{\delta,k} \varepsilon$

and

$\displaystyle \| T^{ar} f T^{ar} f \dots T^{ar} f - q_{i_a}^k \|_{L^1(X|Y)} \ll_{\delta,k} \varepsilon$

and thus

$\displaystyle \mathop{\bf E}( T^{ar} f T^{(a+b)r} f \dots T^{(a+(k-1)b)r} f | Y ) \geq \mathop{\bf E}( (T^{ar} f)^k | Y) - O_{\delta,k}(\varepsilon)$

almost everywhere on ${E_r}$ . But from (6) one has

$\displaystyle \mathop{\bf E}( T^{ar} f | Y) \geq \delta/8$

a.e. on ${E_r}$ , so from Hölder’s inequality we have (for ${\varepsilon}$ sufficiently small) that

$\displaystyle \mathop{\bf E}( T^{ar} f T^{(a+b)r} f \dots T^{(a+(k-1)b)r} f | Y ) \gg_{\delta,k} 1.$

From non-negativity of ${f}$ , this implies that

$\displaystyle \sum_{1 \leq i,j \leq K/k} \mathop{\bf E}( T^{ir} fT^{(i+j)r}f \dots T^{(i+(k-1)j)r} f|Y) \gg_{\delta,k} 1$

which on integrating in ${Y}$ gives

$\displaystyle \sum_{1 \leq i,j \leq K/k} \hbox{st} {\bf E}_{x \in X} f(x+ir) f(x+(i+j)r) \dots f(x+(i+(k-1)j)r)$

$\displaystyle \gg_{\delta,k} \kappa.$

Averaging in ${r}$ , we conclude that

$\displaystyle \sum_{1 \leq i,j \leq K/k} \hbox{st} {\bf E}_{x,r \in X} f(x+ir) f(x+(i+j)r) \dots f(x+(i+(k-1)j)r)$

$\displaystyle \gg_{\delta,k} \kappa^2.$

Shifting ${x}$ by ${ir}$ , we conclude that

$\displaystyle \sum_{1 \leq j \leq K/k} \hbox{st} {\bf E}_{x,r \in X} f(x) f(x+jr) \dots f(x+(k-1)jr) \gg_{\delta,k,K,\kappa} 1.$

Dilating ${r}$ by ${j}$ (and noting that the map ${x \mapsto jx}$ is at most ${j}$ -to-one on ${X}$ ), we conclude that

$\displaystyle \hbox{st} {\bf E}_{x,r \in X} f(x) f(x+r) \dots f(x+(k-1)r) \gg_{\delta,k,K,\kappa} 1,$

and (2) follows.

32 comments

Comments feed for this article

21 July, 2015 at 7:24 am

Mikhail Katz

Your notation in theorem 1 for the expected value over x, r is a bit ambiguous. I assume the average is still over x rather than r.

[Clarification added – T.]

21 July, 2015 at 7:29 pm

William Gasarch

What is the easiest proof of Sz theorem known? Is it the one in this blog, or the elementary proof of DHJ, or something else? I know this depends on your definition of easy, so I admit I am asking for a discussion of the point.

22 July, 2015 at 1:34 pm

Terence Tao

My personal favorite is the proof using the hypergraph removal lemma, which nowadays has a reasonably simple (and elementary) proof, but the Furstenberg proof (which is not too far in spirit from the one given here) and the Polymath1 proof of density Hales-Jewett are certainly also quite simple and conceptual (though the Furstenberg proof would not be considered elementary, as for instance it uses measure theory and the spectral theorem). I suppose if one had to cover Szemeredi’s theorem (and ONLY this theorem) in as few lectures as possible to a class of beginning graduate students, the Polymath1 proof would be the fastest, though the other proofs are instructive in a number of other ways that would be useful beyond the narrow goal of proving Szemeredi’s theorem, and so there would be pedagogical advantages with using those proofs instead.

22 July, 2015 at 12:29 pm

Miklos Abert

I am confused. Can you clarify in what way this is significantly different from Balazs Szegedy’s approach to HOF? I am assuming you know his work.

22 July, 2015 at 12:36 pm

Terence Tao

They are similar in that they both use the Gowers norms and nonstandard analysis, but Szegedy’s approach is closer to Host-Kra’s work on characteristic factors and to my work with Green and Ziegler on the inverse conjecture for the Gowers norms, whereas the argument here is closer to the original arguments of Furstenberg (and Furstenberg-Katznelson-Ornstein) proving Szemeredi’s theorem. At a more technical level, the analysis here only uses the relatively easy fact that the $k^{th}$ Host-Kra factor is a compact extension of the $(k-1)^{st}$ factor, whereas the analysis of Host-Kra and Szegedy requires a much finer analysis of that extension (roughly, that it is an abelian extension by a cocycle obeying a certain “type equation”, which must then be solved for, or at least (in Szegedy’s approach) used to construct a topological nilspace which one then tries to classify). These are all overkill for the purposes of just proving Szemeredi’s theorem; the arguments of Szegedy or of Host-Kra (or of the more recent work of Gutman-Manners-Varju repairing and building upon Szegedy’s work) are significantly lengthier than those provided here, though they also provide much more precise information. (To give just one example, Furstenberg’s argument shows in a dynamical system that the averages $\frac{1}{N} \sum_{n=1}^N \int_X f T^n f \dots T^{(k-1)n} f\ d\mu$ are bounded below, but not that these averages converge to a limit; this latter fact was first proven in the above-mentioned work of Host and Kra using the advanced structure theory of characteristic factors that Szegedy later found nonstandard analogues of.)

23 July, 2015 at 7:08 am

Miklos Abert

Thanks, Terry! Your response clarifies a lot, but at the same time, it confuses me even more, for two reasons – I may be wrong in both of course.

1) Your theorems and notions above seem to be mostly byproducts of Szegedy’s already published work. You yourself hint at this when you call his work an overkill for the topic of this post. Simplifying and modifying other’s proofs can of course involve high level mathematics and can be very productive. Clearly, the community will benefit from your post. What confuses me here is that you fail to mention Szegedy’s name in it.

2) You write about the very recent and unpublished work of Gutman-Manners-Varju as something repairing Szegedy’s work. That is a strong word as it suggests that there are significant errors or gaps in Szegedy’s work. Are you aware of such? If yes, can you please clarify what they are? My understanding is that while the entry cost to Szegedy’s work on HOF is quite high, no one has found a concrete significant mistake or gap in it yet. Is that wrong?

Regards,

Miklos

23 July, 2015 at 12:18 pm

Terence Tao

As stated in my post, the proof here is essentially the nonstandard analysis translation of my proof of Szemeredi’s theorem from 2004. At the time, I had not learned nonstandard analysis, which I did at around 2007; I only got around to writing down a nonstandard translation of the argument because of some recent conversations with colleagues about the possibility of doing so. (This is not the first time that nonstandard translations of combinatorial arguments have been presented in this area; for instance, in 2007, Towsner gave a nonstandard translation of an ergodic convergence result of myself from earlier that year. I think the use of nonstandard methods in ergodic theory can be traced back to the 1982 paper of Kamae giving a nonstandard proof of the ergodic theorem via the Loeb measure construction.)

Of course, from about 2009 onwards, Szegedy also started to use nonstandard methods to attack related questions, such as the inverse conjecture for the Gowers norms. However, the methods described in this blog post go back much further than that, and can be largely traced back to the 1977 paper of Furstenberg (though with some simplifications using the Gowers-Host-Kra uniformity seminorms introduced by Gowers in 2001 and Host-Kra in 2005, as well as the uniform almost periodicity norms introduced in my 2004 paper).

Regarding the issue (involving a subtle distinction between the measurable and continuous nilspace categories) with the 2010 arXiv version of the paper with Camarena, this was confirmed to me as a problem by Balazs back in Nov 2010, and it should be fixable, but I don’t know if an updated version of this paper is available (perhaps Balazs himself could comment on the latest status).

24 July, 2015 at 11:02 am

Balazs Szegedy

Let me reply to the correctness issue first.

In 2010 November, after I announced my version of the inverse theorem for the Gowers norms at the Szemeredi 70 conference, you and I had an
e-mail conversation on my paper with Camarena on nilspaces – this is the one that deals with the algebraic part of my theory. I indeed wrote you that I plan to add an extra chapter further clarifying the continuity issue related to cocycles, mainly because I wasn’t quite happy with the presentation myself. However, I never hinted at an error in the paper, because I was not aware of one. That is still the case. One can easily check on the later arxiv version (2012) that no major changes were done, just a more detailed writeup.

It is true that even the 2012 version is a very dense and abstract paper and would deserve an even more detailed exposition but up to my knowledge, no substantial mistake has been found yet. Let me spell it out: I am not aware of such mistakes and no one has contacted me about such. As with many long papers, there are a number of small mistakes and typos and weaknesses in the explanation.

Back to the issue of ultrafilters.

Of course Furstenberg was the sole major source of this line of work. However, in my opinion it was a rather non-trivial step to identify the right framework that allowed for a clean way of interpreting the classical notions of Furstenberg’s theory in higher order Fourier analysis. This framework is the ultra-product characteristic factor language that I introduced based on another non-trivial work by myself and Elek on hypergraph-regularity.

The identification of the unique characteristic factors on ultraproduct groups opened up a perspective that I could use e.g. to prove inverse theorems for the Gowers norms on arbitrary compact abelian groups. The existence of these unique sigma-algebras appear first in my H.O.F paper from 2009.

Following your logic, the hypergraph removal lemma that I proved with Elek and which follows easily from the Lebesgue density theorem in the nonstandard setting is just a translation. The problem with this point of view is, that often the real mathematical work is done exactly in these translations and the nonstandard setting requires very new ways to prove the same kind of theorem. The same phenomenon happens in HOF.

24 July, 2015 at 12:15 pm

Terence Tao

To remind you of the issue with the paper which I previously sent to you (in my email of Nov 8, 2010), using the section numbering of the 2012 version of your paper with Camarena:

In the first paragraph of Section 3.10 on Page 48, you write that “by abusing the notation” you redefine the space $Trans(N)$ to denote the space of continuous translations; previously, they were simply the space of translations without continuity hypotheses. Then, at the end of the proof of Lemma 3.23, you assert that “Lemma 3.19 and Lemma 2.1 finish the proof”. However, Lemma 2.1 is stated and proved for the category of nilspaces, not of continuous nilspaces. As such, it is not clear why the object $\beta$ produced by Lemma 3.23 is continuous, as it ought to be to be consistent with the “abuse of notation”. Indeed, given that the section $S$ is merely Borel measurable rather than continuous, it is likely that the $\beta$ produced by this argument is also only measurable instead of continuous. It does appear that you have added a result (Theorem 2) in the 2012 version that may help address this issue, but at a bare minimum one needs a version of Lemma 2.1 in the measurable category.

Regarding the ultraproduct setting, it is true that you were one of the first authors to clean up combinatorial arguments of Szemeredi type by working in the nonstandard setting, starting with your 2007 paper with Elek and then your later 2009 papers on higher order Fourier analysis. But there was parallel work by other authors; see for instance this 2007 paper by Towsner translating a combinatorial argument of my own into the nonstandard setting. In 2009, Ben Green, Tamar Ziegler and I gave a combinatorial proof of the U^4 inverse theorem, and announced that the inverse theorem for higher U^k norms (which appeared in 2010) would use nonstandard analysis (see page 2 of the U^4 paper).

Without nonstandard analysis, the arguments are significantly less clean, but still doable. For instance the combinatorial analogue of characteristic factors was first introduced in my 2004 paper with Ben Green (see Footnote 8), then developed further in another 2004 paper of mine (see Remark 3.7). These combinatorial characteristic factors are not mere “analogues” of their nonstandard counterparts; they are logically equivalent through the transfer principle (roughly speaking, the nonstandard characteristic factor is generated by ultralimits of functions in the combinatorial characteristic factor). Similarly for the combinatorial and nonstandard versions of relative almost periodicity, the former of which appears in my 2004 paper, and the latter of which appears here and (essentially) in your HOF paper (although your formulation does not explicitly use the UAP norms that were one of the innovations of my 2004 paper, and which I continue to use in this blog post).

Your HOF paper did show that much of the Host-Kra machinery (e.g. the Furstenberg-Weiss reduction to abelian extensions and the Host-Kra type equation for cocycles, and the appearance of Host-Kra strong parallelepiped structures (which you call nilspaces)) could be extended to the nonstandard setting, and thus in principle to the combinatorial setting, and this had not been previously done in the literature. However, this machinery is not needed for the narrow goal of proving Szemeredi’s theorem, which is the focus of this blog post (and of the 2004 paper that it is based on).

23 July, 2015 at 9:53 am

Balazs Szegedy

Hi Terry, following earlier comments I think I have a clarifying question here about this post.

The main idea in my approach to Higher Order Fourier analysis was to carry out the following program 1) Take the ultra product of finite abelian groups 2) Identify the characteristic factors for the Gowers (and Host-Kra) semi-norms 3) Give a number of equivalent but increasingly stronger descriptions for these characteristic factors leading to a structure theorem.

The cocycle theorem that you mention in your reply to Abert is one of these equivalent characterisations (of intermediate difficulty) however I have a number of easier ones expressing relative compactness. For example I prove that if you take the sigma algebra F_k and the Hilbert space of L^2 functions in it then it is linearly generated by finite rank shift invariant modules over the algebra of L^\infty functions in F_{k-1}. The proof of this is tricky but not hard. My main question is: Is this equivalent with your “Relative almost periodicity” or maybe I misunderstand something.

Balazs

23 July, 2015 at 12:06 pm

Terence Tao

Yes, relative almost periodicity (in the sense of being expressible in terms of relatively Hilbert-Schmidt operators, or equivalently being relatively orthogonal to all relatively weakly mixing functions) is essentially the same concept (after applying the relative spectral theorem) as being approximable by finite rank modules over the base. This goes back (in the ergodic theory setting) to at least the original paper of Furstenberg, who called functions in the latter modules generalised eigenfunctions; see for instance Lemma 7.2 and Theorem 7.4 of Furstenberg’s 1977 paper; the concept is discussed in more detail in Furstenberg’s book, but I don’t have references to that handy at present. Using the Gowers-Host-Kra seminorms (and their duals) in place of the notion of relative weak mixing (and their duals) gives a slightly different tower of factors than Furstenberg’s factors (which is a tower of maximal compact extensions) but at least on the level of the analysis relating to relative almost periodicity, the structure is basically identical.

23 July, 2015 at 12:41 pm

Balazs Szegedy

Ok, this clarifies a lot. I think you have to agree that these classical things have to be re-proved in the ultraproduct setting since this is a new framework (and you did that here). I just wanted to say that I also did that in my sequence of papers on higher order fourier analysis. I was less familiar with classical ergodic theory so I had less background but regardless of this I developed it since I needed that for Higer order fourier analysis. For example your theorem 5 and proposition 7 appers explicitly in my work. If you are not familiar with it then I think it may be relevant to this post and is worth sharing.

23 July, 2015 at 12:52 pm

Terence Tao

Yes, these sorts of results come up in most work on this area. For instance, Theorem 5 of this post is more or less Lemma 5.11 of my 2004 paper translated from standard analysis into nonstandard analysis (and the analogue of Lemma 4.3 of Host-Kra’s 2005 paper), and Proposition 7 is similarly a translated version of Lemma 9.3 of my 2004 paper (and, as mentioned previously, is the analogue of Lemma 7.2 and Theorem 7.4 in Furstenberg’s 1977 paper).

25 July, 2015 at 3:16 am

Balazs Szegedy

Hi Terry,

1.) Thank you for bringing up this concrete question about my nilspace paper with Camarena. The answer is quite short:

It is proved in the paper (see Theorem 2 in section 3.5) that measurable morphisms between nilspaces are continuous. Since translations are automorphisms of nilspaces it follows that measurable translations are continuous.

Such conversations can help in removing opinions that our paper with Camarena “had to be repaired” or contains fundamental flaws.

2.) You are right that many earlier finite statements are basically equivalent with statements proved in the non-standard setting later, but then one might ask “what is the advantage of doing the proofs in the non-standard setting?”.

In my opinion the answer to this question is not just getting rid of epsilon management, but also identifying precise structures in an idealistic infinite setting that appear only in an approximative way in the finite setting. The advantage of this is that it is much easier to work with precise structures than with approximative structures.

I only wanted to say that the program started in my 2009 HOF paper was to identify the approximative structures related to Gowers norms as precise (sometimes algebraic) structures on the appropriate ultraproduct space.

This of course, does not decrease the value and originality of you post. I also have to admit that I was not familiar with your 2004 paper. I will of course cite it in my non-standard papers to explain more the connection between my infinite notions (such as Fourier sigma algebras) and the finite language.

Best,

Balazs

25 July, 2015 at 9:16 pm

Terence Tao

Balazs, in your proof of Theorem 2 on page 39 can you expand on why “The compact nilspace structure on $M$ guarantees that $f$ depends continuously on the system $\{ f_v \}_{0 \neq v \in \{0,1\}^{k+1}}$ “? It is clear that this is true in a pointwise sense (in that for any $x \in N$ and $c \in C_x^{k+1}(N)$ , that $f(x)(c)$ depends continuously on the $f_v(x)(c)$ ); however, the continuity required here is not pointwise but rather in the rather complicated topology of ${\mathcal L}(C^{k+1}(N),M)$ , and to verify continuity in this setting appears to require an additional argument, given that nonlinear continuous functions are usually discontinuous in weak topologies. A model question appears to be to establish that if $\{ \mu_y \}_{y \in Y}$ is a CSM on a factor map $\pi: X \to Y$ and $A$ is a compact abelian group, then the group law on ${\mathcal L}(X,A)$ is continuous. Given your analogy between the topology on ${\mathcal L}(X,A)$ and the strong L^1 topology, this is plausible, but of course an analogy is not sufficient by itself to constitute a rigorous proof.

(Roughly speaking, it seems one needs a relative version of the observation that weak convergence in L^2 plus convergence of norm implies strong convergence in L^2, which you mentioned on page 32 to justify your analogy between the ${\mathcal L}(S,T)$ topology and the strong L^1 topology. If one only has the weak convergence that one starts with in the definition of the ${\mathcal L}(S,T)$ topology, then it is not obvious that even such a “manifestly continuous” operation as squaring $f \mapsto f^2$ is actually continuous even when the domain and range of $f$ are compact; note for instance on $[0,1]$ that the functions $\sin(nx)$ converge weakly to 0, but $\sin(nx)^2$ does not converge weakly to 0.)

26 July, 2015 at 6:16 am

Balazs Szegedy

Hi Terry,

If I understand your question well, the answer relies on a fact about the $\mathcal{L}$ construction. This is that if you have a continuous function of the form $G:A\rightarrow B$ then the function $G':\mathcal{L}(X,A)\rightarrow\mathcal{L}(X,B)$ defined by composition with $G$ is continuous. This follows directly from the definition. (Take the basis of continuous functions defining the topology on $\mathcal{L}(X,G)$ and compose them with $G'$ . These compositions are by definition continuous on $\mathcal{L}(X,A)$ .)

To spell out why this is enough for the proof I break it down into 4 parts (It seems that your problem was in part 3):

1.) The functions $f_v:N\rightarrow\mathcal{L}(C^{k+1}(N),M)$ are continuous if $0\neq v\in\{0,1\}^{k+1}$ . Based on your question I think that you accepted this part.

2.) Let $f':N\rightarrow\mathcal{L}(C^{k+1}(N),M^{\{0,1\}^{k+1}\setminus\{0\}})$ be the function with the property that for every $x$ we have that $f'(x)$ is the function on $C_x^{k+1}(N)$ whose coordinate functions are $\{f_v(x)\}_{v\neq 0}$ .
It is clear from the previous statement that $f'$ is continuous.

3.) Let $S$ denote the set of corners of $k+1$ dimensional cubes in $M^{\{0,1\}^{k+1}\setminus\{0\}}$ . For every $x$ the function $f'(x)$ on $C_x^{k+1}(N)$ takes its values in the closed set $S$ . Now for every $x$ we compose $f'(x)$ pointwise (!) with the continuous function $h:S\rightarrow M$ that gives the unique completion of a corner. This way we get the function $f:N\rightarrow\mathcal{L}(C^{k+1}(N),M)$ that appears in the proof and has the property that $f(x)$ is the constant $\phi(x)$ function on $C_x(N)$ .

4.) By the observation I started with we have that these constant functions depend continuously on $x$ meaning that the constants also depend continuously.

26 July, 2015 at 6:42 am

Terence Tao

Actually, in your breakdown, the problem is with Step 2. The continuity of the individual $f_v$ does not “clearly” imply the continuity of $f'$ because it is not “clear” that the topology on ${\mathcal L}( C^{k+1}(N), M^{\{0,1\}^{k+1}\backslash \{0\}})$ is the product of the topologies on the individual ${\mathcal L}(C^{k+1}(N), M )$ . (There are test functions on $M^{\{0,1\}^{k+1} \backslash \{0\}}$ that are not linear combinations of functions of a single component $M$ , but instead depend jointly on all components in a “high-rank” fashion that cannot be easily resolved into one-component functions without taking tensor products, which is an operation which is not obviously continuous with regards to this topology.)

To repeat my previous remark, to resolve this issue it appears that one needs a relative version of the classical fact that if a sequence is weakly convergent in L^2 and its norms converge to the norm of the limit, then it is strongly convergent in L^2. But in your setting the measures are varying, and the simple expedient in the classical setting of subtracting a function from its limit is not easily available here.

26 July, 2015 at 1:51 pm

Balazs Szegedy

Hi Terry, I think the following calculation resolves this issue:

Proposition Let $X,Y,Z$ be compact spaces. Let $\pi:X\rightarrow Y$ be a CSM. Assume that $\{f_i\in\mathcal{L}(X,Z)\}_{i=1}^\infty$ converges to $f$ and $\{g_i\in\mathcal{L}(X,Z)\}_{i=1}^\infty$ converges to $g$ such that $f_i$ and $g_i$ are defined on $\pi^{-1}(y_i)$ and $f,g$ are defined on $\pi^{-1}(y)$ . Then functions $t\mapsto (f_i(t),g_i(t))$ converge to $t\mapsto (f(t),g(t))$ .

Proof: We have to show that $\lim_{i\to\infty}\int_{\pi^{-1}(y_i)}F(f_i,g_i)G=\int_{\pi^{-1}(y)}F(f,g)G$ holds for any pair of continuous functions $F:Z\times Z\to\mathbb{C}$ and $G:X\rightarrow\mathbb{C}$ .

First we prove it in the special case when $F(a,b)=F_1(a)F_2(b)$ for some continuous functions $F_1,F_2:Z\rightarrow\mathbb{C}$ and $n\in\mathbb{N}$ .
For any fixed $\epsilon>0$ we can construct a $\mathbb{C}$ -valued continuous function $q$ on $X$ with $\|q\|_\infty\leq\|F_2\|_\infty$ and such that the restriction $q'$ of $q$ to $\pi^{-1}(y)$ has the property that $\|F_2(g)-q'\|_2\leq\epsilon$ .
Let $q_i$ denote the restriction of $q$ to $\pi^{-1}(y_i)$ . We have by the convergence of $\{g_i\}_{i=1}^\infty$ that $\|F_2(g_i) - q_i\|_2^2=\int_{\pi^{-1}(y_i)}(F_2\circ g_i - q_i)^2$ converges to $\|F_2(g) -q' \|_2^2$ .
Now writing $F(f_i,g_i)$ as $F_1(f_i)(q_i+(F_2(g_i)-q_i))$ we have that $F(f_i,g_i)=F_1(f_i)q_i+H_i$ and $F(f,g)=F_1(f)q'+H$ where if $i$ is big enough the $L_2$ norm each $H_i$ and $H$ is at most $c(\epsilon)=\|F_1\|_\infty(2\|F_2\|_\infty\epsilon)^{1/2}$ . We have by convergence of $\{f_i\}$ that

$\displaystyle \lim_{i\to\infty} \int_{\pi^{-1}(y_i)}F_1(f_i)q_iG=\int_{\pi^{-1}(y)}F_1(f)q'G.$

It follows from the previous estimates that

$\displaystyle \limsup_{i\to\infty} |\int_{\pi^{-1}(y_i)} F(f_i,g_i)G-\int_{\pi^{-1}(y_i)} F(f,g)G|\leq 2c(\epsilon)\|G\|_\infty.$

By converging to $0$ with $\epsilon$ we obtain the statement.

Using Stone-Weierstrass we can approximate every two variable continuous function on $Z$ by a linear combination of functions of the form $F(a,b)=F_1(a)F_2(b)$ . This completes the proof.

27 July, 2015 at 7:44 am

Terence Tao

Balazs, thanks for this argument. It seems that this, together with the expanded argument provided previously for the proof of Theorem 2, as well as suitable clarification of the proof of Lemma 3.23 (in particular explaining why the translation $\beta$ is first Borel measurable, and then continuous) does indeed finally repair the issue I raised with you in my Nov 8 2010 email. (Incidentally, there is a further confusing typo in the proof of Lemma 3.23: it appears that you reference Lemma 2.1 when you should instead be referencing the rather different Proposition 2.1. This issue may also occur elsewhere in the paper, it is worth checking.)

I have some other questions about your 2010 paper with Camarena, relating more to the relationship between your paper and various concepts and results introduced in previous literature, including the 2005 paper [HK2005] of Host and Kra (reference [10] in your paper), the 2006 paper [HK2006] of Host and Kra (reference [11] in your paper), and a 2009 paper [HKM2009] of Host, Kra, and Maass (which you might not be aware of, as it is not referenced in your paper; see also the precursor [HM2006] to [HKM2009]).

You state on page 3 that your axiom system for nilspaces is a “variant” of the axiom system for strong parallelepiped systems from [HK2006]. But it seems to me that the connection is far tighter than that, namely that the axiom systems are in fact logically equivalent: every k-step nilspace is a k-step strong parallelepiped system and vice versa. (Most of [HK2006] is devoted to the cases k=1,2, but the higher k case is discussed in Section 7, and it is clear how to extend the axiom set for 1 and 2-step strong parallelepiped systems to higher k.) [HK2006] has a weaker composition axiom and compensates for this with a gluing axiom, but it seems straightforward to derive one set of axioms from the other (e.g. you derive the gluing axiom in Lemma 2.2 of your paper). There are also other parallels between [HK2006] and your 2010 paper that are worth mentioning, for instance the equivalence relation $\sim_k$ in your Definition 2.3 appears to be the same concept (for k=2 at least) as in Proposition 3 of [HK2006]; the abelian group in your Corollary 2.4 and Theorem 1 appears to be the same as the fibre group constructed in Section 5.2 of [HK2006]; the equivalence between liftability of a translation and a splitting condition (or equivalently, that a certain cocycle is a coboundary) appears both as Lemma 2.20 in your paper and Theorem 2 of [HK2006]; and the importance of the transitivity of the translation group (which you established in the connected setting as Corollary 3.3) is the main result of [HK2006] (see Theorem 1). You did note in page 24 that the translation groups were previously introduced and shown to be nilpotent in [HK2005,HK2006], but I think the connection between these works goes beyond that one result, and should be more strongly emphasised. Anyway, my question is if you could confirm that the axiom systems for nilspaces and k-step strong parallelepiped structures are actually logically equivalent, and not just variants of each other, and if you know of any further connections between the individual lemmas and propositions in [HK2006] and those in your paper.
The paper [HK2005] is set in the ergodic-theory setting rather than the compact nilspace setting, and so are ostensibly discussing different mathematical objects, yet there is a remarkably strong parallel between the structure of your 2010 paper and of that of [HK2005]. We have already discussed the nilpotent translation groups in both papers, but for instance there is also the reduction to abelian extensions which occurs in Lemma 6.2 of [HK2005] and Corollary 2.4 and Theorem 1 of your paper; the cocycles introduced in your Definition 2.14 appear to play a closely analogous role to the type k cocycles introduced in Definition 7.1 of [HK2005] (compare for instance Proposition 7.6 of [HK2005] with the discussion at the end of your Section 2.10); the lifting of translations criterion in your Proposition 2.1 is quite analogous to Lemma 10.6 of [HK2005]; the liftability of short translations appears as Lemma 3.23 in your paper and as Proposition 10.10 in [HK2005]; and the reduction to the finite rank case occurs in Section 3.8 of your paper and Section 10.1 of [HK2005]. You did acknowledge using an argument from Appendix A of [HK2005] to obtain the Lie nature of the translation group in your Theorem 6, but the connections with [HK2005] seem to go far beyond that one fact, and should probably be emphasised more in your paper. Anyway, my question is whether you know of any deeper explanation for the striking parallels between the ergodic theory context and the compact nilspace context (perhaps this issue is addressed in some of your other work?). See also the work in [HKM2009] which ties together the topological dynamics context with both the compact nilspace context and the ergodic theory context.
The paper [HKM2009] associates a compact k-step strong parallelepiped structure (or in your language, a compact nilspace) to each topological dynamical system, and introduces a number of concepts which may have parallels in your 2010 paper, for instance the regionally proximal relation discussed in [HKM2009] looks similar to the equivalence relations in your 2010 paper. There is also extensive discussion on the relation between the topological and measure theoretic categories in that paper which may relate to some of your other papers. Anyway, I was wondering if you could comment on any connections between the concepts studied in [HKM2009] and your paper; it would in particular be interesting to have an interpretation of the regionally proximal relation in the nilspace language.

27 July, 2015 at 11:14 am

Terence Tao

p.s. there may also be a link between the rigidity result in your Theorem 2 (that measurable morphisms are continuous) and the property testing literature, for instance this 2004 paper of Alon, Kaufman, Krivelevich, Litsyn, and Ron, in which they show that the property of being a polynomial of given degree over ${\mathbb F}_2$ is locally testable with one-sided error (the linear case being a famous paper of Blum, Luby, and Rubinfeld). In particular, the arguments in Section 4 of that paper remind me of the combinatorial manipulations on the three-cube in your paper, and the key property of a polynomial used is that its value at one corner of a parallelepiped is determined by its value at all other corners. (The technical details are of course a little different, for instance the alternating signs in the sums are not explicit in the AKKLR paper because they work in characteristic two. Still there seems to be some connection; it may be for instance that taking an ultraproduct of a sequence of instances of functions that locally test to be polynomial in the AKKLR sense leads to a measurable morphism to which your Theorem 2 may be applied. This would be in analogy with how property testing for graph properties interacts well with graph limits, as noted for instance in Lovasz’s book.) This is of course also closely related to the topic of the inverse conjecture for Gowers norms over finite fields.

26 July, 2015 at 5:03 am

Ben Green

Balazs, I think this conversation highlights the frustration felt by those of us who spent time trying to read your papers. The argument is incomplete from the standpoint of a human reader of the papers, by which I mean that there is too much to be filled in between the lines. At least, I found this to be the case. That’s not to suggest that it’s not a subset of a complete argument, quite possibly an everywhere dense one. Indeed the results (some known by other means) as well as the methods are quite natural.

The problem is compounded by the fact that the notation and the number of definitions and names make the paper extremely heavy reading, and by the fact that from 2009-2012 there were a large number of rather different new versions of the work posted on the arxiv. It looks as if things have stabilised with the 2012 version, which addresses the second issue. As regards the first, I think the community would universally appreciate a greatly expanded version of the complete argument, quite possibly illustrated in the 2-step case throughout. As I understand it, that case contains essentially all of the ideas whilst being much lighter on general notation.

Best wishes, Ben

27 July, 2015 at 8:18 am

Balazs Szegedy

Hi Ben,

Thank you for your comment.

First, the part that I mostly agree with. I take your phrase “everywhere dense argument” as a positive critique. I think what you mean is that the results are correctly broken down into lemmas, but the expositions of the proofs of the individual lemmas are of varying quality. I was definitely more busy (and interested) in breaking through the fundamental difficulties and finding the right inherent language than perfecting the exposition. I am taking responsibility for that.

A particular weakness in the exposition of the nilspace paper is that it builds on the subject of “continuous systems of measures” CSM’s. Quite surprisingly, this fundamental subject was not developed enough for our purposes. So with Omar we had to do a lot of ground work in it. Terry’s question was also basically about the behavior of CSM’s. I think that the best idea is to publish a separate paper that cleans up the CSM subject and thus making the paper more understandable by freeing it from this burden. We actually plan to write such a paper with Pablo Candela.

As of the history of my papers on the ArXiv I partially disagree with you. When I wrote my first paper in 2009 I observed that many non standard arguments that we developed for hypergraphs with Elek in 2007 can be also applied for the Gowers norms and I wanted to explore how far one can go with these ideas. So I wrote a sequence of 3 papers, mainly built on my paper with Elek and of course on the fundamental work of Host and Kra. Only at the end of the third paper it became clear to me that this leads to a theory of additive structures in compact abelian groups. My main goal was not to prove the inverse theorem. I look at it as a byproduct of a more general clean structure theory of characteristic factors on ultraproduct groups. I think that already my work in 2010 clearly demonstrates that I had all the fundamental ideas for such a structure theory. It was not a 2 page sketch, the math is there. You are right that it stabilized in a new version in 2012 when I changed some arguments to new ones that I felt more aesthetic.

I will try my best to come up with a version that is more pleasant to read and I am glad to take any advise from you give on how to do it.

Best,

Balazs

27 July, 2015 at 8:58 am

Ben Green

Balazs – yes, the expositions are of varying quality. I look forward to a fully fleshed out version of the arguments in due course. Best, Ben.

30 July, 2015 at 3:08 pm

A Phrase Is Born

Ben, that phrase “everywhere dense subset of a complete argument” is a gem that deserves wider circulation. Did you invent it?

27 July, 2015 at 4:07 am

Victor

@Balazs

I am not familiar with this area but I was curious. Since it’s much easier to work with precise structures in the infinite setting, are there any new results in the finite setting that one can obtain using these precise structures that were not known before using the approximate structures?

27 July, 2015 at 5:13 am

Balazs Szegedy

Hi Victor there is a standard machinery including furstenberg’s correspondence principle and the transference principle for ultraproducts that allows you to translate back anything that you do in the idealistic infinite universe to the finite setting. This is the way for example how Furstenberg proved Szemeredi’s theorem (which can be regarded as a finite statement) using infinite arguments. There are many finite statements that have much easier proofs in the infinite setting and of corse statements that had their first proofs in the infinite setting.

27 July, 2015 at 8:23 am

Victor

@all

Again, to satisfy my curiosity I would like to see mentioned a couple of examples of statements that had their first proofs in the infinite setting.

29 July, 2015 at 5:15 am

Anonymous

Density Hales-Jewett is a not-completely-bad example.

29 July, 2015 at 11:30 am

Victor

That’s a good example, thanks! I don’t understand why people took my question so negatively…

28 July, 2015 at 12:56 am

Balazs Szegedy

It is very much true that my whole work (in every aspect) can be regarded as a continuation of the work of Host and Kra starting with my 2009 paper which is built almost exclusively on the famous Annals paper of them on characteristic factors and my paper with Elek. I regard the nilspace paper as just a continuation of the Host-Kra paper on parallelepiped structures and we say it very clearly in the abstract but you are right that we should do a better job with citations throughout the paper. In a new version I will fix this. The main difficulty that we treat with Camarena is the topologization of the language of Host and Kra. The real novelty in the paper is in the methods that we developed for this topologization. (In my application in Higher order Fourier analysis I needed a topological version.)

Personally I am a big fan of the work of Host and Kra as they had the right vision from very early on and I am glad that I could contribute to it.

30 July, 2015 at 11:33 am

Yonatan

Dear Terry,

I would like to make some brief comments regarding the questions you posed above:

Q2 One possible explanation for the “striking parallels between the ergodic theory context and the compact nilspace context” is contained in some recent unpublished work of mine building heavily on the work of Host-Kra and Camarena-Szegedy . Essentially, one can show directly — i.e., without using the structure theory of [HK2005] — that the Host-Kra characteristic factors have (compact) topological (dynamical) models which carry compatible (with the dynamics) compact nilspace structure. This allows the structure theorem of [HK2005] to be *deduced* from a dynamical variant of the Camarena-Szegedy structure theorem for nilspaces due to Freddie Manners, Péter Varjú and myself.

Q3 Let $G$ be an arbitrary topological group and let $(G,X)$ be a topological dynamical system. Define a “dynamical” cubespace by ( $n\in \mathbb{N}$ )
$C_{G}^{n}(X)=\overline{\{HK^{n}(G)(x,x,\ldots,x)|\,x\in X\}}\subset X^{[n]}$ ( $HK^{n}(G)$ is the Host-Kra cubegroup). For $G$ abelian this is $C_{G}^{n}(X)=\overline{\{\big((\sum_{j=1}^{n}\epsilon_{j}g_{j})x\big)_{\vec{\epsilon}\in\{0,1\}^{n}}|\,x\in X,\,\vec{g}\in G^{n}\}}$ . Following Camarena-Szegedy define the relation $x\sim_k y$ if $(x,x,\ldots,y)\in C_{G}^{k+1}(X)$

If $G$ is abelian and $(G,X)$ is minimal then it follows from the works of Host-Kra-Maass and Shao-Ye that $\sim_k$ is an equivalence relation.
By a result of Freddie Manners, Péter Varjú and myself If $G$ is arbitrary and $(G,X)$ is minimal distal then $C_{G}$ is a prenilspace (cubespace with $k$ -completion for all $k$ ) and therefore $\sim_k$ is an equivalence relation.

Using the dynamical variant of the Camarena-Szegedy structure theorem alluded to above one can conclude that in all of these cases $(G,X/\sim_k)$ is the maximal pronilfactor of (H,X) of degree k (for $G=\mathbb{Z}$ it was already known from Host-Kra-Maass).

Eli Glasner and myself constructed a $\mathbb{Z}$ minimal example (a proximal extension of a rotation) for which $C_{G}$ is not a prenilspace.

Best wishes,

Yonatan

6 September, 2017 at 6:28 am

espressonator

I wish I could work with you, Professor Tao, on some nonstandard mathematics aimed at solving some (hard) open problems. You see things so quickly, it’s nice to read your blog posts and watch your youtube presentations.

	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Erratum for “An inverse…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on A Banach algebra proof of the…
	Anonymous on A Banach algebra proof of the…
	Aleksandar on 245C, Notes 4: Sobolev sp…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Terence Tao on 245C, Notes 4: Sobolev sp…

A nonstandard analysis proof of Szemeredi’s theorem

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

32 comments

Leave a comment Cancel reply

For commenters

A nonstandard analysis proof of Szemeredi’s theorem

Share this:

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

32 comments

Leave a comment Cancel reply

For commenters