Szemerédi’s theorem asserts that any subset of the integers of positive upper density contains arbitrarily large arithmetic progressions. Here is an equivalent quantitative form of this theorem:

Theorem 1 (Szemerédi’s theorem) Let {N} be a positive integer, and let {f: {\bf Z}/N{\bf Z} \rightarrow [0,1]} be a function with {{\bf E}_{x \in {\bf Z}/N{\bf Z}} f(x) \geq \delta} for some {\delta>0}, where we use the averaging notation {{\bf E}_{x \in A} f(x) := \frac{1}{|A|} \sum_{x \in A} f(x)}, {{\bf E}_{x,r \in A} f(x) := \frac{1}{|A|^2} \sum_{x, r \in A} f(x)}, etc.. Then for {k \geq 3} we have

\displaystyle  {\bf E}_{x,r \in {\bf Z}/N{\bf Z}} f(x) f(x+r) \dots f(x+(k-1)r) \geq c(k,\delta)

for some {c(k,\delta)>0} depending only on {k,\delta}.

The equivalence is basically thanks to an averaging argument of Varnavides; see for instance Chapter 11 of my book with Van Vu or this previous blog post for a discussion. We have removed the cases {k=1,2} as they are trivial and somewhat degenerate.

There are now many proofs of this theorem. Some time ago, I took an ergodic-theoretic proof of Furstenberg and converted it to a purely finitary proof of the theorem. The argument used some simplifying innovations that had been developed since the original work of Furstenberg (in particular, deployment of the Gowers uniformity norms, as well as a “dual” norm that I called the uniformly almost periodic norm, and an emphasis on van der Waerden’s theorem for handling the “compact extension” component of the argument). But the proof was still quite messy. However, as discussed in this previous blog post, messy finitary proofs can often be cleaned up using nonstandard analysis. Thus, there should be a nonstandard version of the Furstenberg ergodic theory argument that is relatively clean. I decided (after some encouragement from Ben Green and Isaac Goldbring) to write down most of the details of this argument in this blog post, though for sake of brevity I will skim rather quickly over arguments that were already discussed at length in other blog posts. In particular, I will presume familiarity with nonstandard analysis (in particular, the notion of a standard part of a bounded real number, and the Loeb measure construction), see for instance this previous blog post for a discussion.

By routine “compactness and contradiction” arguments (as discussed in this previous post), Theorem 1 can be deduced from the following nonstandard variant:

Theorem 2 Let {N} be a nonstandard positive integer, let {X} be the nonstandard cyclic group {{}^*{\bf Z}/N^*{\bf Z}}, and let {f: X \rightarrow {}^*[0,1]} be an internal function with {\hbox{st} \mathop{\bf E}_{x \in X} f(x) > 0}. Then for any standard {k \geq 3},

\displaystyle  \hbox{st} \mathop{\bf E}_{x,r \in X} f(x) f(x+r) \dots f(x+(k-1)r) > 0.

Here of course the averaging notation is interpreted internally.

Indeed, if Theorem 1 failed, one could create a sequence of functions {f_n: {\bf Z}/N_n{\bf Z} \rightarrow [0,1]} of density at least {\delta>0} for some fixed {\delta}, and a fixed {k} such that

\displaystyle  {\bf E}_{x,r \in {\bf Z}/N_n{\bf Z}} f_n(x) f_n(x+r) \dots f_n(x+(k-1)r) \rightarrow 0;

taking ultralimits one can then soon obtain a counterexample to Theorem 2.

It remains to prove Theorem 2. Henceforth {N} is a fixed nonstandard positive integer, and {X := {}^* {\bf Z}/N{}^* {\bf Z}}. By the Loeb measure construction (discussed in this previous blog post), one can give {X} the structure of a probability space {(X, {\mathcal L}_X, \mu)} (the Loeb space of {X}), such that every internal subset {A} of {X} is (Loeb) measurable with

\displaystyle  \mu(A) = \hbox{st} \frac{|A|}{|X|},

which implies that any bounded internal function {f: X \rightarrow {}^*{\bf R}} has standard part {\hbox{st} f} which is (Loeb) measurable with

\displaystyle  \int_X f\ d\mu = \hbox{st} {\bf E}_{x \in X} f(x)\ dx.

Conversely, a countable saturation argument shows that any function in {L^\infty(X)} is equal almost everywhere to the standard part of a bounded internal function.

From Hölder’s inequality we see that the {k}-linear form

\displaystyle  \hbox{st} \mathop{\bf E}_{x,r \in X} f_0(x) f_1(x+r) \dots f_{k-1}(x+(k-1)r)

vanishes if one of the {f_i} has standard part vanishing almost everywhere. As such, we can (by abuse of notation) extend this {k}-linear form to functions {f_0,\dots,f_{k-1}} that are elements of {L^\infty(X)}, rather than bounded internal functions. With this convention, we see that Theorem 2 is equivalent to the following assertion.

Theorem 3 For any non-negative {f \in L^\infty(X,\mu)} with {\int_X f\ d\mu > 0}, one has for any standard {k \geq 3},

\displaystyle  \hbox{st} \mathop{\bf E}_{x,r \in X} f(x) f(x+r) \dots f(x+(k-1)r) > 0.

The next step is to introduce the Gowers-Host-Kra uniformity seminorms {\|f\|_{U^d(X)}}, defined for {f \in L^\infty(X)} by the formula

\displaystyle  \|f \|_{U^d(X)}^{2^d} := \hbox{st} {\bf E}_{x,h_1,\dots,h_d \in X} \prod_{\omega \in \{0,1\}^d} F(x+\omega_1 h_1 + \dots + \omega_d h_d)

where {F} is any bounded internal function whose standard part equals {f} almost everywhere. From Hölder’s inequality one can see that the exact choice of {F} does not matter, so that this seminorm is well-defined. (It is indeed a seminorm, but we will not need this fact here.)

We have the following application of the van der Corput inequality:

Theorem 4 (Generalised von Neumann theorem) Let {k \geq 3} be standard. For any {f_0,\dots,f_{k-1} \in L^\infty(X,\mu)} with {\|f_i\|_{U^{k-1}(X)}=0} for some {i=0,\dots,k-1}, one has

\displaystyle  \hbox{st} \mathop{\bf E}_{x,r \in X} f_0(x) f_1(x+r) \dots f_k(x+(k-1)r) = 0.

This estimate is proven in numerous places in the literature (e.g. Lemma 11.4 of my book with Van Vu, or Exercise 23 of this blog post) and will not be repeated here. In particular, from multilinearity we see that

\displaystyle  \hbox{st} \mathop{\bf E}_{x,r \in X} f(x) f(x+r) \dots f(x+(k-1)r) = \hbox{st} \mathop{\bf E}_{x,r \in X} g(x) g(x+r) \dots g(x+(k-1)r) \ \ \ \ \ (1)

whenever {f,g \in L^\infty(X)} with {\|f-g\|_{U^{k-1}(X)} = 0}.

Dual to the Gowers norms {U^{k-1}(X)} are the uniformly almost periodic norms {UAP^{k-2}(X)}. Let us first define the internal version of these norms. We define {{}^* UAP^0(X)} to be the space of constant internal functions {f: x \mapsto c}, with internal norm {\|f\|_{{}^* UAP^0(X)} = |c|}. Once {{}^* UAP^d(X)} is defined for some {d \geq 0}, we define {{}^* UAP^{d+1}(X)} to be the internal normed vector space of internal functions {f: X \rightarrow {}^* {\bf R}} for which there exists a nonstandard real number {M}, an internally finite non-empty set {H}, an internal family {h \mapsto g_h} of internal functions {g_h: X \rightarrow {}^* {\bf R}} bounded in magnitude by one for each {h \in H}, and an internal family {(n,h) \mapsto c_{n,h}} of internal functions {c_{n,h}: X \rightarrow {}^* {\bf R}} in the unit ball of {{}^* UAP^d(X)} such that one had the representation

\displaystyle  T^n f = M {\bf E}_{h \in H} c_{n,h} g_h

for all {n \in X}, where {T^n f(x) := f(x+n)} is the shift of {f} by {n}. The internal infimum of all such {M} is then the {{}^* UAP^{d+1}(X)} norm of {f}. This gives each of the {{}^* UAP^d(X)} the structure of an internal shift-invariant Banach algebra; see Section 5 of . The {{}^* UAP^d(X)} norms also controlled the supremum norm:

\displaystyle  \sup_{x \in X} |f(x)| \leq \|f\|_{{}^* UAP^d(X)}.

In particular, if we write {UAP^d(X)} for the space of standard parts of internal functions of bounded norm in {{}^* UAP^d(X)}, then {UAP^d(X)} is an (external) Banach algebra contained (as a real vector space) in {L^\infty(X)}. For {d \geq 1}, we can then define a factor {Z^{d-1}(X)} of {X} to be the probability space {Z^{d-1}(X) = (X, {\mathcal Z}^{d-1}(X), \mu)}, where {{\mathcal Z}^{d-1}(X)} is the subalgebra of {{\mathcal L}_X} consisting of those sets {E} such that {1_E} lies in the {L^2} closure of {UAP^d(X)}. This is easily seen to be a shift-invariant {\sigma}-algebra, and so {Z^{d-1}(X)} is a factor.

We have the following key characteristic factor relationship:

Theorem 5 Let {f \in L^\infty(X)} with {{\bf E}(f|Z^{k-2}(X))=0}. Then {\|f\|_{U^{k-1}(X)} = 0}.

In fact the converse implication is true also (making {Z^{k-2}(X)} the universal characteristic factor for the {U^{k-1}(X)} seminorm), but we will not need this direction of the implication.

Proof: Suppose for contradiction that {\|f\|_{U^{k-1}(X)} > 0}; we can normalise {\|f\|_{L^\infty(X)} \leq 1}. Writing {f = \hbox{st} F} for some bounded internal {F}, we then see that {f} has a non-zero inner product with {\hbox{st} {\mathcal D}_{k-2} F}, where the dual function {{\mathcal D}_d F: X \rightarrow {}^* {\bf R}} for {d \geq 0} is the bounded internal function

\displaystyle  {\mathcal D}_d F(x) := {\bf E}_{h_1,\dots,h_d \in X} \prod_{\omega \in \{0,1\}^d \backslash \{0\}^d} F(x+\omega_1 h_1 + \dots + \omega_d h_d).

From the easily verified identity

\displaystyle  T^n {\mathcal D}_d F(x) = {\bf E}_{h \in X} {\mathcal D}_{d-1}(T^n f T^h f) T^h f

and a routine induction, we see that {{\mathcal D}_d F} lies in the unit ball of {{}^* UAP^d(X)}, and so {\hbox{st} {\mathcal D}_d F} is measurable with respect to {Z^{k-2}(X)}. By hypothesis this implies that {f} is orthogonal to {\hbox{st} {\mathcal D}_d F}, a contradiction. \Box

In view of the above theorem and (1), we may replace {f} by {{\bf E}(f|Z^{k-2}(X))} without affecting the average in Theorem 3. Thus that theorem is equivalent to the following.

Theorem 6 Let {d \geq 0} and {k \geq 1} be standard. Then for any non-negative {f \in L^\infty(Z^d(X))} with {\int_X f\ d\mu > 0}, one has

\displaystyle  \hbox{st} \mathop{\bf E}_{x,r \in X} f(x) f(x+r) \dots f(x+(k-1)r) > 0. \ \ \ \ \ (2)

We only apply this theorem in the case {k \geq 3} and {d = k-2}, but for inductive purposes it is convenient to decouple the two parameters.

We prove Theorem 6 by induction on {d} (allowing {k} to be arbitrary). When {d=0}, the claim is obvious for any {k} because all functions in {L^\infty(Z^0(X))} are essentially constant. Now suppose that {d \geq 1} and that the claim has already been proven for {d-1}.

Let {f \in L^\infty(Z^d(X))} be a nonnegative function whose mean {\delta := \int_X f\ d\mu} is positive; we may normalise {f} to take values in {[0,1]}. Let {k \geq 1} be standard, and let {\varepsilon>0} be a sufficiently small standard quantity depending on {k,\delta} to be chosen later (one could for instance take {\varepsilon := \frac{\delta^{10k}}{10 k^{10}}}, but we will not attempt to optimise in {\varepsilon}). As {f} is {Z^d(X)}-measurable, one can find an internal function {F: X \rightarrow {}^* {\bf R}} with {0 \leq F \leq 1} and bounded {{}^* UAP^d(X)} norm such that {\|f-\hbox{st} F\|_{L^1(X)} \leq \varepsilon}. (Note though that while the {{}^* UAP^d(X)} norm of {F} is bounded, this bound could be extremely large compared to {k}, {1/\delta}, {1/\varepsilon}.)

Set {Y := Z^{d-1}(X)}. We define the relative inner product {\langle f, g \rangle_{L^2(X|Y)} \in L^\infty(Y)} for {f,g \in L^\infty(X)} by the formula

\displaystyle  \langle f,g \rangle_{L^2(X|Y)} := {\bf E}(f g|Y)

and the relative norm

\displaystyle  \|f\|_{L^2(X|Y)} := \langle f,f \rangle_{L^2(X|Y)}^{1/2}.

This gives {L^\infty(X)} the structure of a (pre-)Hilbert module over {L^\infty(Y)}, as discussed in this previous blog post.

A crucial point is that the function {F} is relatively almost periodic over the previous characteristic factor {Y}, in the following sense.

Proposition 7 (Relative almost periodicity) There exists a standard natural number {m} and functions {q_1,\dots,q_m} in the unit ball of {L^\infty(X)} with the following “relative total boundedness” property: for any {n \in X}, there exists a {Y}-measurable function {i: X \rightarrow \{1,\dots,m\}} such that {\| \hbox{st} T^n F - q_i \|_{L^2(X|Y)} \leq \varepsilon} almost everywhere (where {q_i(x)} is short-hand for {\sum_{j=1}^m q_j(x) 1_{i(x)=j}}).

Proof: This will be a relative version of the standard analysis fact that integral operators on finite measure spaces with bounded kernel are in the Hilbert-Schmidt class, and thus compact.

By construction, there exists an internally finite non-empty set {H}, an internal collection {h \mapsto g_h} of internal functions {g_h: X \rightarrow {}^* {\bf R}} that are uniformly bounded in {h}, and an internal collection {(n,h) \mapsto c_{n,h}} of internal functions {c_{n,h}: X \rightarrow {}^* {\bf R}} that are uniformly bounded in {{}^* UAP^{d-1}(X)}, such that

\displaystyle  T^n F = {\bf E}_{h \in H} c_{n,h} g_h \ \ \ \ \ (3)

for all {n \in X}. Note in particular that the {\hbox{st} c_{n,h}} all lie in a bounded subset of {L^\infty(Y)}, and the {g_h} all lie in a bounded subset of {L^\infty(X)}.

We give {Y \times H} the {\sigma}-algebra generated from the standard parts of bounded internal functions {(x,h) \mapsto f(x,h)} such that the standard parts of {x \mapsto f(x,h)} all lie in a bounded subset of {L^\infty(Y)}; this gives a probability space that extends the product measure of {Y} and {H}. We define an operator {S: L^\infty(Y \times H) \rightarrow L^\infty(X)} as follows. If {b \in L^\infty(Y \times H)}, then {b} is the standard part of some bounded internal function {B: Y \times H \rightarrow {}^* {\bf R}}. We then define {Sb} by the formula

\displaystyle  Sb(x) := \hbox{st} {\bf E}_{h \in H} B(x,h) g_h(x).

This can easily be seen to not depend on the choice of {B}, and {S} defines a {L^\infty(Y)}-linear operator (embedding {L^\infty(Y)} into both {L^\infty(Y \times H)} and {L^\infty(X)} in the obvious fashion). Note that {T^n f} lies in the range of {S} applied to a function in the unit ball of {L^\infty(Y \times H)}.

Now we claim that this operator is relatively Hilbert-Schmidt over {L^\infty(Y)}, in the sense that there exists a finite bound {A} such that

\displaystyle  \| ( \sum_{m=1}^M \sum_{n=1}^N \langle S e_m, f_n \rangle_{L^2(X|Y)}^2 )^{1/2} \|_{L^\infty(Y)} \leq A \ \ \ \ \ (4)

for all finite collections {e_m \in L^\infty(Y \times H), f_n \in L^\infty(X)} of functions that are relatively orthonormal over {L^\infty(Y)} in the sense that

\displaystyle  \langle e_m, e_{m'} \rangle_{L^2(Y\times H|Y)} = 1_{m=m'}

and

\displaystyle  \langle f_n, f_{n'} \rangle_{L^2(X|Y)} = 1_{n=n'}

for all {1 \leq m,m' \leq M} and {1 \leq n,n' \leq N}. Indeed, the left-hand side of (4) may be expanded first as

\displaystyle  \int_Y \sum_{m=1}^M \sum_{n=1}^N a_{m,n} \langle S e_m, f_n \rangle

for some sequence {a_{m,n}} in {L^\infty(Y)} with {\sum_{m,n} \| a_{m,n} \|_{L^2(Y)}^2 = 1}, and then as

\displaystyle  \int_{X \times H} \sum_{m=1}^M \sum_{n=1}^N a_{m,n} e_m f_n g

where we use Loeb measure on {X \times H} and {g \in L^\infty(X \times H)} is the function {g(x,h) := \hbox{st} g_h(x)}, and {a_{m,n}, e_m, f_n} are lifted up to {L^\infty(X \times H)} in the obvious fashion. By Cauchy-Schwarz and the boundedness of {g}, we can bound this by

\displaystyle  A \| \|\sum_{m=1}^M \sum_{n=1}^N a_{m,n} e_m f_n \|_{L^2(X \times H|Y)} \|_{L^2(Y)},

But the {e_m f_n} are relatively orthonormal over {L^\infty(Y)} (this reflects the relative orthogonality of {Y \times H} and {X} over {Y}), so that

\displaystyle  \|\sum_{m=1}^M \sum_{n=1}^N a_{m,n} e_m f_n \|_{L^2(X \times H|Y)} = (\sum_{m=1}^M \sum_{n=1}^N a_{m,n}^2)^{1/2}

and the claim follows from the hypotheses on {a}.

Using the relative spectral theorem for relative Hilbert-Schmidt operators (see Corollary 17 of this blog post), we may thus find relatively orthonormal systems {e_n, f_n} in {L^\infty(Y \times H)} and {L^\infty(X)} respectively over {L^\infty(Y)} and a non-increasing sequence of non-negative coefficients {\sigma_n \in L^\infty(Y)} (the relative singular values) with {\sum_n \sigma_n^2 \leq A^2} almost everywhere, such that we have the spectral decomposition

\displaystyle  Sb = \sum_n \sigma_n \langle b, e_n \rangle_{L^2(Y \times H|Y)} f_n

wiht the sum converging in {L^2(X|Y)}. (If {X, Y, Y \times H} were standard Borel spaces, one could deduce this theorem from the usual spectral theorem for Hilbert-Schmidt operators using disintegration. Loeb spaces are certainly not standard Borel, but as discussed in the linked blog post above, one can adapt the proof of the spectral theorem to the relative setting without using the device of disintegration.

Since {\sum_n \sigma_n^2 \leq A^2} and the {\sigma_n} are decreasing, one can find an {N} such that {\sigma_n \leq \varepsilon/2} almost everywhere for all {n \geq N}. For {b} in the unit ball of {L^\infty(Y \times H)}, this lets one approximate {Sb} by the finite rank operator {\sum_{n \leq N} \sigma_n \langle b, e_n \rangle_{L^2(Y \times H|Y)} f_n} to within {\varepsilon/2} almost everywhere in {L^2(X|Y)} norm. If one rounds {\sigma_n \langle b, e_n \rangle_{L^2(Y \times H|Y)}} to the nearest multiple of {\varepsilon/2N} for each {y \in Y}, and lets {q_1,\dots,q_m} be the collection of linear combinations of the form {\sum_n c_n f_n} with {c_n \in [-1,1]} a multiple of {\varepsilon/2N}, we obtain the claim. \Box

We return to the proof of (2). Since {\int_X f = \delta} and {\|f-\hbox{st} F \|_{L^1(X)} \leq \varepsilon}, we have

\displaystyle  \int_Y \mathop{\bf E}( \hbox{st} F | Y ) = \int_X \hbox{st} F \geq \delta/2

if {\varepsilon} is small enough. In particular there is a {Y}-measurable set {E'} of measure at least {\delta/4} such that {\mathop{\bf E}( \hbox{st} F | Y ) \geq \delta/4} on {E'}. Since

\displaystyle  \int_Y \mathop{\bf E}( | f - \hbox{st} F | | Y ) = \|f-\hbox{st} F \|_{L^1(X)} \leq \varepsilon,

we see from Markov’s inequality (for small enough {\varepsilon}) that there is a {Y}-measurable subset {E} of {E'} of measure at least {\delta/8} such that

\displaystyle  \|f - \hbox{st} F \|_{L^1(X|Y)} = O_\delta(\varepsilon) \ \ \ \ \ (5)

on {E}, where we write

\displaystyle  \|f\|_{L^1(X|Y)} := \mathop{\bf E}( |f| | Y )

for the relative {L^1} norm. In particular we have

\displaystyle  \mathop{\bf E}(f|Y) \geq \delta/8 \ \ \ \ \ (6)

almost everywhere on {E}.

Let {K} be a sufficiently large standard natural number (depending on {\delta,\varepsilon,k} and the quantity {m} from Proposition 7), in fact it will essentially be a van der Waerden number of these inputs) to be chosen later. Applying the induction hypothesis, we have

\displaystyle  \hbox{st} \mathop{\bf E}_{x,r \in X} 1_E(x) 1_E(x+r) \dots 1_E(x+(K-1)r) > 0.

In particular, there is a standard {\kappa > 0}, such that for {r} in a subset of {X} of measure at least {\kappa}, we have

\displaystyle  \hbox{st} \mathop{\bf E}_{x \in X} 1_E(x) 1_E(x+r) \dots 1_E(x+(K-1)r) \geq \kappa

or equivalently that the set

\displaystyle  E_r := E \cap (E-r) \cap \dots \cap (E - (K-1)r)

has measure at least {\kappa}.

Let {r} be as above, and let {q_1,\dots,q_m} be the functions from Proposition 7. Then for {j=0,\dots,K-1}, we can find a measurable function {i_j: Y \rightarrow \{1,\dots,m\}} such that

\displaystyle  \| \hbox{st} T^{jr} F - q_{i_j} \|_{L^2(X|Y)} \leq \varepsilon

almost everywhere on {Y}, hence by (5) we have

\displaystyle  \| T^{jr} f - q_{i_j} \|_{L^1(X|Y)} \ll_\delta \varepsilon

almost everywhere on {E_r}. From this and the relative Hölder inequality, we see that

\displaystyle  \| T^{ar} f T^{(a+b)r} f \dots T^{(a+(k-1)b)r} f - q_{i_a} q_{i_{a+b}} \dots q_{i_{a+(k-1)b}} \|_{L^1(X|Y)} \ll_{\delta,k} \varepsilon

a.e. on {E_r} whenever {0 \leq a, b \leq K/k}.

Now, for {K} large enough, we see from van der Warden’s theorem that there exist measurable {a,b: Y \rightarrow \{1,\dots,K/k\}} such that

\displaystyle  i_a = i_{a+b} = \dots = i_{a+(k-1)b}

almost everywhere in {Y}, and hence in {E_r} (this can be seen by partitioning {Y} into finitely many pieces, with each of the {i_a} constant on each of these pieces). For that choice of {a,b} we have

\displaystyle  \| T^{ar} f T^{(a+b)r} f \dots T^{(a+(k-1)b)r} f - q_{i_a}^k \|_{L^1(X|Y)} \ll_{\delta,k} \varepsilon

and

\displaystyle  \| T^{ar} f T^{ar} f \dots T^{ar} f - q_{i_a}^k \|_{L^1(X|Y)} \ll_{\delta,k} \varepsilon

and thus

\displaystyle  \mathop{\bf E}( T^{ar} f T^{(a+b)r} f \dots T^{(a+(k-1)b)r} f | Y ) \geq \mathop{\bf E}( (T^{ar} f)^k | Y) - O_{\delta,k}(\varepsilon)

almost everywhere on {E_r}. But from (6) one has

\displaystyle  \mathop{\bf E}( T^{ar} f | Y) \geq \delta/8

a.e. on {E_r}, so from Hölder’s inequality we have (for {\varepsilon} sufficiently small) that

\displaystyle  \mathop{\bf E}( T^{ar} f T^{(a+b)r} f \dots T^{(a+(k-1)b)r} f | Y ) \gg_{\delta,k} 1.

From non-negativity of {f}, this implies that

\displaystyle  \sum_{1 \leq i,j \leq K/k} \mathop{\bf E}( T^{ir} fT^{(i+j)r}f \dots T^{(i+(k-1)j)r} f|Y) \gg_{\delta,k} 1

which on integrating in {Y} gives

\displaystyle  \sum_{1 \leq i,j \leq K/k} \hbox{st} {\bf E}_{x \in X} f(x+ir) f(x+(i+j)r) \dots f(x+(i+(k-1)j)r)

\displaystyle \gg_{\delta,k} \kappa.

Averaging in {r}, we conclude that

\displaystyle  \sum_{1 \leq i,j \leq K/k} \hbox{st} {\bf E}_{x,r \in X} f(x+ir) f(x+(i+j)r) \dots f(x+(i+(k-1)j)r)

\displaystyle  	\gg_{\delta,k} \kappa^2.

Shifting {x} by {ir}, we conclude that

\displaystyle  \sum_{1 \leq j \leq K/k} \hbox{st} {\bf E}_{x,r \in X} f(x) f(x+jr) \dots f(x+(k-1)jr) \gg_{\delta,k,K,\kappa} 1.

Dilating {r} by {j} (and noting that the map {x \mapsto jx} is at most {j}-to-one on {X}), we conclude that

\displaystyle  \hbox{st} {\bf E}_{x,r \in X} f(x) f(x+r) \dots f(x+(k-1)r) \gg_{\delta,k,K,\kappa} 1,

and (2) follows.