We now give a basic application of Fourier analysis to the problem of counting additive patterns in sets, namely the following famous theorem of Roth:

Theorem 1 (Roth’s theorem) Let {A} be a subset of the integers {{\bf Z}} whose upper density

\displaystyle  \overline{\delta}(A) := \limsup_{N \rightarrow \infty} \frac{|A \cap [-N,N]|}{2N+1}

is positive. Then {A} contains infinitely many arithmetic progressions {a, a+r, a+2r} of length three, with {a \in {\bf Z}} and {r>0}.

This is the first non-trivial case of Szemerédi’s theorem, which is the same assertion but with length three arithmetic progressions replaced by progressions of length {k} for any {k}.
As it turns out, one can prove Roth’s theorem by an application of linear Fourier analysis – by comparing the set {A} (or more precisely, the indicator function {1_A} of that set, or of pieces of that set) against linear characters {n \mapsto e(\alpha n)} for various frequencies {\alpha \in {\bf R}/{\bf Z}}. There are two extreme cases to consider (which are model examples of a more general dichotomy between structure and randomness). One is when {A} is aligned up almost completely with one of these linear characters, for instance by being a Bohr set of the form

\displaystyle  \{ n \in {\bf Z}: \| \alpha n - \theta \|_{{\bf R}/{\bf Z}} < \varepsilon \}

or more generally of the form

\displaystyle  \{ n \in {\bf Z}: \alpha n \in U \}

for some multi-dimensional frequency {\alpha \in {\bf T}^d} and some open set {U}. In this case, arithmetic progressions can be located using the equidistribution theory of the previous set of notes. At the other extreme, one has Fourier-uniform or Fourier-pseudorandom sets, whose correlation with any linear character is negligible. In this case, arithmetic progressions can be produced in abundance via a Fourier-analytic calculation.
To handle the general case, one must somehow synthesise together the argument that deals with the structured case with the argument that deals with the random case. There are several known ways to do this, but they can be basically classified into two general methods, namely the density increment argument (or {L^\infty} increment argument) and the energy increment argument (or {L^2} increment argument).
The idea behind the density increment argument is to introduce a dichotomy: either the object {A} being studied is pseudorandom (in which case one is done), or else one can use the theory of the structured objects to locate a sub-object of significantly higher “density” than the original object. As the density cannot exceed one, one should thus be done after a finite number of iterations of this dichotomy. This argument was introduced by Roth in his original proof of the above theorem.
The idea behind the energy increment argument is instead to decompose the original object {A} into two pieces (and, sometimes, a small additional error term): a structured component that captures all the structured objects that have significant correlation with {A}, and a pseudorandom component which has no significant correlation with any structured object. This decomposition usually proceeds by trying to maximise the “energy” (or {L^2} norm) of the structured component, or dually by trying to minimise the energy of the residual between the original object and the structured object. This argument appears for instance in the proof of the Szemerédi regularity lemma (which, not coincidentally, can also be used to prove Roth’s theorem), and is also implicit in the ergodic theory approach to such problems (through the machinery of conditional expectation relative to a factor, which is a type of orthogonal projection, the existence of which is usually established via an energy increment argument). However, one can also deploy the energy increment argument in the Fourier analytic setting, to give an alternate Fourier-analytic proof of Roth’s theorem that differs in some ways from the density increment proof.
In these notes we give both two Fourier-analytic proofs of Roth’s theorem, one proceeding via the density increment argument, and the other by the energy increment argument. As it turns out, both of these arguments extend to establish Szemerédi’s theorem, and more generally in counting other types of patterns, but this is non-trivial (requiring some sort of inverse conjecture for the Gowers uniformity norms in both cases); we will discuss this further in later notes.

— 1. The density increment argument —

We begin with the density increment argument. We first rephrase Roth’s theorem in a finitary form:

Theorem 2 (Roth’s theorem, again) For every {\delta > 0}, there exists an {N_0 = N_0(\delta) > 0}, such that for every {N \geq N_0}, and every {A \subset [N]} with {|A| \geq \delta N}, {A} contains an arithmetic progression of length three.

Exercise 3 Show that Theorem 1 and Theorem 2 are equivalent.

We prove Theorem 2 by a downward induction on the density parameter {\delta}. Let {P(\delta)} denote the proposition that Theorem 2 holds for that value of {\delta} (i.e. for sufficiently large {N} and all {A \subset [N]} with {|A| \geq \delta N}, {A} contains an arithmetic progression of length three). Our objective is to show that {P(\delta)} holds for all {\delta > 0}.
Clearly, {P(\delta)} is (vacuously) true for {\delta > 1} (and trivially true for {\delta \geq 1}). It is also monotone in the sense that if {P(\delta)} holds for some {\delta}, then {P(\delta')} holds for all {\delta'>\delta}. To downwardly induct on {\delta}, we will prove the following dichotomy:

Proposition 4 (Lack of progressions implies density increment) Let {\delta > 0}, let {N} be sufficiently large depending on {\delta}, and let {A \subset [N]} be such that {|A| \geq \delta N}. Then one of the following holds:

  • {A} contains an arithmetic progression of length three; or
  • there exists a subprogression {P} of {[N]} of length at least {N'} such that {|A \cap P| \geq (\delta + c(\delta)) |P|}, where {N' = N'(N)} goes to infinity as {N \rightarrow \infty}, and {c(\delta) > 0} is bounded away from zero whenever {\delta} is bounded away from zero.

Let us see why Proposition 4 implies Theorem 2. It is slightly more convenient to use a “well-ordering principle” argument rather than an induction argument, though of course the two approaches are equivalent. Let {\delta_*} be the infimum of all {\delta} for which {P(\delta)} holds, thus {0 \leq \delta_* \leq 1}. If {\delta_*=0} then we are done, so suppose that {\delta_*} is non-zero. Then for any {\varepsilon > 0}, {P(\delta_*-\varepsilon)} is false, thus there exist arbitrarily large {N} and {A \subset [N]} with {|A| \geq (\delta_*-\varepsilon)N} with no progressions of length three. By Proposition 4, we can thus find a subprogression {P} of {N} of length at least {N'} with {|A \cap P| \geq (\delta_*-\varepsilon + c(\delta_*-\varepsilon))|P|}; if {\varepsilon} is small enough, this implies that {|A \cap P| \geq (\delta_*+\varepsilon) |P|}. We then use an affine transformation to map {P} to {[N']} (noting crucially that the property of having no arithmetic progressions of a given length is preserved by affine transformations). As {N} can be arbitrarily large, {N'} can be arbitrarily large also. Since {P(\delta_*+\varepsilon)} is true, we see that {A \cap P} contains an arithmetic progression of length three, hence {A} does also; which gives the desired contradiction.
It remains to prove Proposition 4. There are two main steps. The first relies heavily on the fact that the progressions only have length three, and is proven via Fourier analysis:

Proposition 5 (Lack of progressions implies correlation with a linear phase) Let {\delta > 0}, let {N} be sufficiently large depending on {\delta}, let {A \subset [N]} be such that {|A| = \delta' N} for some {\delta' \geq \delta}, with {A} containing no arithmetic progressions of length three. Then there exists {\alpha \in {\bf R}/{\bf Z}} such that {|\mathop{\bf E}_{n \in [N]} (1_A(n) - \delta') e(-\alpha n)| \gg \delta^2}.

Proof: In order to use Fourier analysis, it will be convenient to embed {[N]} inside a cyclic group {{\bf Z}/N'{\bf Z}}, where {N'} is equal to (say) {2N+1}; the exact choice here is only of minor importance, though it will be convenient to take {N'} to be odd. We introduce the trilinear form

\displaystyle  \Lambda(f,g,h) := \mathop{\bf E}_{n,r \in {\bf Z}/N'{\bf Z}} f(n) g(n+r) h(n+2r)

for any functions {f,g,h: {\bf Z}/N'{\bf Z} \rightarrow {\bf C}}; we then observe that the quantity

\displaystyle  \Lambda(1_A,1_A,1_A) = \mathop{\bf E}_{n,r \in {\bf Z}/N'{\bf Z}} 1_A(n) 1_A(n+r) 1_A(n+2r)

(extending {1_A} by zero outside of {[N]}) is equal to the number of arithmetic progressions {n,n+r,n+2r} in {A} (counting the degenerate progressions in which {r=0}, and also allowing for {r} to be negative), divided by the normalising factor of {(N')^2}. On the other hand, by hypothesis, {A} contains no non-degenerate arithmetic progressions of length three, and clearly has {|A| \leq N} degenerate progressions; thus we have

\displaystyle  \Lambda(1_A,1_A,1_A) \ll 1/N. \ \ \ \ \ (1)

On the other hand, from the Fourier inversion formula on the cyclic group {{\bf Z}/N'{\bf Z}} we may write

\displaystyle  f(n) = \sum_{\alpha \in \frac{1}{N'}{\bf Z} / {\bf Z}} \hat f(\alpha) e(\alpha n)

for any function {f: {\bf Z}/N'{\bf Z} \rightarrow {\bf C}}, where {\hat f(\alpha)} are the Fourier coefficients

\displaystyle  \hat f(\alpha) := \mathop{\bf E}_{n \in {\bf Z}/N'{\bf Z}} f(n) e(-\alpha n).

We may thus write {\Lambda(f,g,h)} as

\displaystyle  \sum_{\alpha_1,\alpha_2,\alpha_3 \in \frac{1}{N'}{\bf Z}/{\bf Z}} \hat f(\alpha_1) \hat g(\alpha_2) \hat h(\alpha_3)

\displaystyle  \mathop{\bf E}_{n,r \in {\bf Z}/N'{\bf Z}} e( \alpha_1 n + \alpha_2 (n+r) + \alpha_3 (n+2r) ). \ \ \ \ \ (2)

Now observe that we have the identity

\displaystyle  \alpha n - 2 \alpha(n+r) + \alpha(n+2r)=0,

so the phase {\alpha_1 n + \alpha_2 (n+r) + \alpha_3 (n+2r)} is trivial whenever {(\alpha_1,\alpha_2,\alpha_3)} is of the form {(\alpha,-2\alpha,\alpha)}, and so the expectation in (2) is equal to {1}. Conversely, if {(\alpha_1,\alpha_2,\alpha_3)} is not of this form, then the phase is non-trivial, and from Fourier analysis we conclude that the expectation in (2) vanishes. We conclude that the left-hand side of (2) can be expressed as

\displaystyle  \sum_{\alpha \in \frac{1}{N'}{\bf Z} / {\bf Z}} \hat f(\alpha) \hat g(-2\alpha) \hat h(\alpha).

Now using Plancherel’s theorem we have

\displaystyle  \sum_{\alpha \in \frac{1}{N'}{\bf Z} / {\bf Z}} |\hat f(\alpha)|^2 = \| f\|_{L^2({\bf Z}/N'{\bf Z})}^2

(using normalised counting measure). Using this and Hölder’s inequality (and the fact that {N'} is odd), we obtain the bounds

\displaystyle  |\Lambda(f,g,h)| \leq \|f\|_{L^2({\bf Z}/N'{\bf Z})} \|g\|_{L^2({\bf Z}/N'{\bf Z})} \sup_{\xi \in {\bf Z}/N'{\bf Z}} |\hat h(\xi)| \ \ \ \ \ (3)

and similarly for permutations of {f,g,h} on the right-hand side.
We could apply this directly to {\Lambda(1_A,1_A,1_A)}, but this is not useful, since we seek a lower bound on this quantity rather than an upper bound. To get such a lower bound, we split {1_A = \delta' 1_{[N]} + f}, where {f := 1_A - \delta' 1_{[N]}} is the mean zero portion of {1_A}, and use trilinearity to split {\Lambda(1_A,1_A,1_A)} into a main term {\Lambda(\delta' 1_{[N]}, \delta' 1_{[N]}, \delta' 1_{[N]})}, plus seven other error terms involving {1_A = \delta' 1_{[N]}} and {f}, with each error term involving at least one copy of {f}. The main term can be computed explicitly as

\displaystyle  \Lambda(\delta' 1_{[N]}, \delta' 1_{[N]}, \delta' 1_{[N]}) \gg \delta^3.

Comparing this with (1), we conclude that one of the error terms must have magnitude {\gg \delta^3} also. For sake of concreteness, let us say that

\displaystyle  |\Lambda(f, \delta' 1_{[N]}, f)| \gg \delta^3;

the other cases are similar.
From the triangle inequality we see that {f, \delta' 1_{[N]}} have an {L^2({\bf Z}/N'{\bf Z})} norm of {O( \delta^{1/2} )}, and so from (3) one has

\displaystyle  |\Lambda(f, \delta' 1_{[N]}, f)| \ll \delta \sup_{\xi \in {\bf Z}/N'{\bf Z}} |\hat f(\xi)|,

and so we conclude that

\displaystyle  |\hat f(\xi)| \gg \delta^2

for some {\xi \in {\bf Z}/N'{\bf Z}}. (Similarly for other error terms, though sometimes onewill need a permutation of (3) instead of (3) itself.) The claim follows. \Box

Remark 6 The above argument relied heavily on the fact that there was only a one-parameter family of linear relations between {n, n+r, n+2r}. The same argument does not work directly for finding arithmetic progressions of length longer than three; we return to this point in later notes.

The second step converts correlation with a linear character into a density increment on a subprogression:

Proposition 7 (Fragmenting a linear character into progressions) Let {N \geq 1}, let {\varepsilon > 0}, and let {\phi(n) := e(\alpha n)} be a linear phase. Then there exists {N' = N'(N,\varepsilon)} which goes to infinity as {N \rightarrow \infty} for fixed {\varepsilon}, and a partition

\displaystyle  [N] = \bigcup_{j=1}^J P_j \cup E

of {[N]} into arithmetic progressions {P_j} of length at least {N'}, together with an error term {E} of cardinality at most {O(\varepsilon N)}, such that {\phi} fluctuates by at most {O(\varepsilon)} on each progression {P_j} (i.e. {|\phi(x)-\phi(y)| \ll \varepsilon} whenever {x,y \in P_j}).

Proof: We may assume that {N} is sufficiently large depending on {\varepsilon}, as the claim is trivial otherwise (just set {N'=1}).
Fix {\varepsilon}, and let {N'} be a slowly growing function of {N} to be chosen later. By using recurrence for the linear phase {n \mapsto \alpha n}, we can find a shift {h \geq 1} of size {h = O_{N',\varepsilon}(1)} such that {\| \alpha h \|_{{\bf R}/{\bf Z}} \leq \varepsilon/N'}. We then partition {[N]} into {h} arithmetic progressions of spacing {h}, and then partition each of those progressions in turn into subprogressions {P_j} of spacing {h} and length {N'}, plus an error of cardinality at most {N'}, leading to an error set {E} of cardinality at most {h N' = O_{N',\varepsilon}(1)}. On each of the {P_j}, {\alpha n} fluctuates by at most {\varepsilon}. The claim then follows by choosing {N'} to be a sufficiently slowly growing function of {N}. \Box
Now we can prove Proposition 4 (and thus Roth’s theorem). Let {N, \delta, \delta', A} be as in Proposition 4. By Proposition 5 (if {N} is large enough), we can find {\alpha} for which

\displaystyle |\mathop{\bf E}_{n \in [N]} (1_A(n) - \delta') e(-\alpha n)| \gg \delta^2.

We now let {\varepsilon > 0} be a small quantity depending on {\delta} to be chosen later (actually it turns out that we can take {\varepsilon} to be a small multiple of {\delta^2}) and apply Proposition 7 to decompose {[N]} into progressions {P_1,\ldots,P_J} and an error term {E} with the stated properties. Then we have

\displaystyle  \mathop{\bf E}_{n \in [N]} (1_A(n) - \delta') e(-\alpha n) = \frac{1}{N} ( \sum_{j=1}^J \sum_{n \in P_j} (1_A(n) - \delta') e(-\alpha n) ) + O(\varepsilon).

Since {e(-\alpha n)} fluctuates by at most {\varepsilon} on {P_j}, we can apply the triangle inequality and conclude that

\displaystyle  |\mathop{\bf E}_{n \in [N]} (1_A(n) - \delta') e(-\alpha n)| \leq \frac{1}{N} |\sum_{j=1}^J \sum_{n \in P_j} (1_A(n) - \delta')| + O(\varepsilon).

If {\varepsilon} is sufficiently small depending on {\delta}, we conclude that

\displaystyle  \sum_{j=1}^J |\sum_{n \in P_j} (1_A(n) - \delta')| \gg \delta^2 N. \ \ \ \ \ (4)

On the other hand, as {\delta'} is the mean of {1_A} on {[N]}, we have

\displaystyle  \sum_{n \in [N]} (1_A(n) - \delta') = 0

and thus

\displaystyle  \sum_{j=1}^J \sum_{n \in P_j} (1_A(n) - \delta') = O(\varepsilon).

Adding this to (4) and noting that {|x|+x = 2\max(x,0)} for real {x}, we conclude (for {\varepsilon} small enough) that

\displaystyle  \sum_{j=1}^J \max( \sum_{n \in P_j} (1_A(n) - \delta'), 0) \gg \delta^2 N

and hence by the pigeonhole principle we can find {j} such that

\displaystyle  \max( \sum_{n \in P_j} (1_A(n) - \delta'), 0) \gg \delta^2 |P_j|

or in other words

\displaystyle  |A \cap P_j| / |P_j| \geq \delta' + c \delta^2

for some absolute constant {c>0}, and Proposition 4 follows.
It is possible to rewrite the above argument in the ultralimit setting, though it only makes the argument slightly shorter as a consequence. We sketch this alternate formulation below.

Exercise 8 Let {\delta_*} be as above.

  • Show that if {N} is an unbounded limit natural number, and {A \subset [N]} is a limit subset whose density {st(|A|/N)} is strictly greater than {\delta_*}, then {A} contains a (limit) arithmetic progression {n,n+r,n+2r} of length three (with {r \neq 0}).
  • Show that there exists an unbounded limit natural number {N} and a limit subset {A \subset [N]} of density {st(|A|/N)=\delta_*}, which does not contain any arithmetic progressions of length three.

Exercise 9 Show that if {N} is an unbounded limit natural number, and {A \subset [N]} is a limit subset of positive density {st(|A|/N) = \delta' >0} with no arithmetic progressions of length three, then there exists a limit real {\alpha} such that {|\mathop{\bf E}_{n \in [N]} (1_A(n) - \delta') e(-\alpha n)| \gg 1}.

Exercise 10 If {N} is an unbounded limit natural number, and {\alpha} is a limit real, show that one can partition {[N] = \bigcup_{j=1}^J P_j \cup E}, where {J} is a limit natural number, the {P_j} are limit arithmetic subprogressions of {[N]} of unbounded length (with the map {j \mapsto P_j} being a limit function), such that {n \mapsto e(\alpha n)} fluctuates by {o(1)} on each {P_j} (uniformly in {j}), and {|E| = o(N)}.

Exercise 11 Use the previous three exercises to reprove Roth’s theorem.

Exercise 12 (Roth’s theorem in bounded characteristic) Let {F} be a finite field, let {\delta > 0}, and let {V} be a finite vector space. Show that if the dimension of {V} is sufficiently large depending on {F,\delta}, and if {A \subset V} is such that {|A| \geq \delta |V|}, then there exists {a,r \in V} with {r \neq 0} such that {a,a+r,a+2r \in A}. (Hint: Mimic the above arguments (either finitarily, or with ultralimits), using hyperplanes as a substitute for subprogressions.)

Exercise 13 (Roth’s theorem in finite abelian groups) Let {G} be a finite abelian group, and let {\delta > 0}. Show that if {|G|} is sufficiently large depending on {\delta}, and {A \subset G} is such that {|A| \geq \delta |G|}, then there exists {a, r \in V} with {r \neq 0} such that {a, a+r, a+2r \in A}. (Hint: if there is an element of {G} of large order, one can use Theorem 2 and the pigeonhole principle. If all elements have bounded order, one can instead use Exercise 12.) This result (as well as the special case in the preceding exercise) was first established by Meshulam.

— 2. The energy increment argument —

Now we turn to the energy increment approach. This approach requires a bit more machinery to set up, but ends up being quite flexible and powerful (for instance, it is the starting point for my theorem with Ben Green establishing arbitrarily long progressions in the primes, which we do not know how to establish via density increment arguments).
Instead of passing from {[N]} to a subprogression, we now instead coarsen {[N]} to some partition (or factor) of {[N]}, as follows. Define a factor of {[N]} to be a {\sigma}-algebra of subsets {{\mathcal B}} of {[N]}, or equivalently a partition of {[N]} into disjoint atoms or cells (with the elements of {{\mathcal B}} then being the arbitary unions of atoms). Given a function {f: [N] \rightarrow {\bf C}} and a factor {{\mathcal B}}, we define the conditional expectation {\mathop{\bf E}(f|{\mathcal B}): [N] \rightarrow {\bf C}} to be the function whose value at a given point {x \in [N]} is given by the formula

\displaystyle  \mathop{\bf E}(f|{\mathcal B})(x) := \frac{1}{|{\mathcal B}(x)|} \sum_{y \in {\mathcal B}(x)} f(y),

where {{\mathcal B}(x)} is the unique atom of {{\mathcal B}} that contains {x}. One can view the map {f \mapsto \mathop{\bf E}(f|{\mathcal B})} as the orthogonal projection from {L^2([N])} to {L^2({\mathcal B})}, where {L^2([N])} is the space of functions {f: [N] \rightarrow {\bf C}} with the inner product

\displaystyle  \langle f, g \rangle_{L^2([N])} := \mathop{\bf E}_{n \in [N]} f(n) \overline{g(n)}

and {L^2({\mathcal B})} is the subspace of functions in {L^2([N])} which are measurable with respect to {{\mathcal B}}, or equivalently are constant on each atom of {{\mathcal B}}.
We say that one factor {{\mathcal B}'} refines another {{\mathcal B}} if {{\mathcal B} \subset {\mathcal B}'}, or equivalently if every atom of {{\mathcal B}} is a union of atoms of {{\mathcal B}'}, or if every atom of {{\mathcal B'}} is contained in an atom of {{\mathcal B}'}, or equivalently again if {L^2({\mathcal B}) \subset L^2({\mathcal B}')}. Given two factors {{\mathcal B}}, {{\mathcal B}'}, one can define their join {{\mathcal B} \vee {\mathcal B}'} to be their least common refinement, thus the atoms in {{\mathcal B} \vee {\mathcal B}'} are the non-empty intersections of atoms in {{\mathcal B}} with atoms in {{\mathcal B}'}.
The idea is to split a given function {f} in {L^2([N])} (and specifically, an indicator function {1_A}) into a projection {\mathop{\bf E}(f|{\mathcal B})} onto a “structured factor” {{\mathcal B}} to obtain a “structured component” {\mathop{\bf E}(f|{\mathcal B})}, together with a “pseudorandom component” {f - \mathop{\bf E}(f|{\mathcal B})} that is essentially orthogonal to all structured functions. This decomposition is related to the classical decomposition of a vector in a Hilbert space into its orthogonal projection onto a closed subspace {V}, plus the complementary projection to the orthogonal complement {V^\perp}; we will see the relationship between the two decompositions later when we pass to the ultralimit.
We need to make the notion of “structured” more precise. We begin with some definitions. We say that a function {f: [N] \rightarrow {\bf C}} has Fourier complexity at most {M} if it can be expressed as

\displaystyle  f(n) = \sum_{m=1}^{M'} c_m e(\alpha_m n)

for some {M' \leq M} and some complex numbers {c_1,\ldots,c_{M'}} of magnitude at most {1}, and some real numbers {\alpha_1,\ldots,\alpha_{M'}}. Note that from the Fourier inversion formula that every function will have some finite Fourier complexity, but typically one expects the complexity to grow with {N}; only a few special functions will have complexity bounded uniformly in {N}. Also note that if {f, g} have Fourier complexity {M} then {f+g, f-g, \overline{f}}, or {fg} all have Fourier complexity at most {O_M(1)}; informally, the space of bounded complexity functions forms an algebra. (We will be able to formalise this statement after we take ultralimits.)
Ideally, we would like to take “functions of bounded Fourier complexity” as our class of structured functions. For technical reasons (related to our desire to use indicator functions as structured functions), we need to take an {L^1} closure and work with the wider class of Fourier measurable functions as our structured class.

Definition 14 (Measurability) Let {{\mathcal F}: {\bf R}^+ \rightarrow {\bf R}^+} be a function. We say that a function {f: [N] \rightarrow {\bf C}} is Fourier measurable with growth function {{\mathcal F}} if, for every {K > 1}, one can find a function {f_K: [N] \rightarrow {\bf C}} of Fourier complexity at most {{\mathcal F}(K)} such that {\mathop{\bf E}_{n \in [N]} |f(n) - f_K(n)| \leq 1/K}.
A subset {A} of {[N]} is Fourier measurable with growth function {{\mathcal F}} if {1_A} is Fourier measurable with this growth function.

Exercise 15 Show that every interval {[a,b]} in {[N]} is Fourier measurable with some growth function {{\mathcal F}} independent of {N}. (Hint: apply Fejér summation to {1_{[a,b]}}.)

Exercise 16 Let {f} be a Fourier-measurable function with some growth function {{\mathcal F}}, which is bounded in magnitude by {A}. Show that for every {K > 1}, one can find a function {\tilde f_K: [N] \rightarrow {\bf C}} which also is bounded in magnitude by {A}, and of Fourier complexity {O_{A,{\mathcal F}(K)}(1)}, such that {\mathop{\bf E}_{n \in [N]} |f(n) - \tilde f_K(n)| \ll 1/K}. (Hint: start with the approximating function {f_K} from Definition 14, which is already bounded in magnitude by {{\mathcal F}(K)}, and then set {\tilde f_K := P( f_K, \overline{f_K})} where {P(z,\overline{z})} is a polynomial bounded in magnitude by {A} on the ball of radius {{\mathcal F}(K)} which is close to the identity function on the ball of radius {A} (such a function can be constructed via the Stone-Weierstrass theorem).)

Exercise 17 Show that if {f, g: [N] \rightarrow {\bf C}} are bounded in magnitude by {A}, and are Fourier measurable with growth functions {{\mathcal F}}, then {f+g}, {\overline{f}}, and {fg} are Fourier measurable with some growth function {{\mathcal F}'} depending only on {A} and {{\mathcal F}}.
Conclude that if {E, F \subset [N]} are Fourier-measurable with growth function {{\mathcal F}}, then {[N] \backslash E}, {E \cup F}, and {E \cap F} are Fourier-measurable with some growth function {{\mathcal F}'} depending only on {{\mathcal F}}.

We thus see that Fourier-measurable sets morally form a Boolean algebra. (Again, we can formalise this assertion once we pass to the ultralimit.)
Now we make a key observation (cf. this paper by Reingold, Trevisan, Tulsiani, and Vadhan).

Lemma 18 (Correlation with a Fourier character implies correlation with a Fourier-measurable set) Let {f: [N] \rightarrow {\bf C}} be bounded in magnitude by {1}, and suppose that {|\mathop{\bf E}_{n \in [N]} f(n) e(-\alpha n)| \geq \delta} for some {\delta > 0}. Then there exists a Fourier-measurable set {E \subset [N]} with some growth function {{\mathcal F}} depending on {\delta}, such that {|\mathop{\bf E}_{n \in [N]} f(n) 1_E(n)| \gg \delta}.

Proof: By splitting {f} into real and imaginary parts, we may assume without loss of generality that {f} is real. Rotating {e(-\alpha n)}, we may find a real number {\theta} such that

\displaystyle  \mathop{\bf E}_{n \in [N]} f(n) \hbox{Re} e(-\alpha n + \theta) \geq \delta.

We then express

\displaystyle  \hbox{Re} e(-\alpha n + \theta) = 1-\int_{-1}^1 1_{E_t}(n)\ dt

where

\displaystyle  E_t := \{ n \in [N]: \hbox{Re} e(-\alpha n + \theta) \leq t \}.

By Minkowski’s inequality, we thus have either

\displaystyle  |\mathop{\bf E}_{n \in [N]} f(n)| \geq \delta/2

or

\displaystyle  \int_{-1}^1 |\mathop{\bf E}_{n \in [N]} f(n) 1_{E_t}(n)|\ dt \geq \delta/2.

In the former case we are done (setting {E=[N]}), so suppose that the latter holds. If all the {E_t} were uniformly Fourier-measurable, we would now be done in this case also by the pigeonhole principle. This is not quite true; however, it turns out that most {E_t} are uniformly measurable, and this will be enough. More precisely, let {\varepsilon > 0} be a small parameter to be chosen later, and say that {t} is good if one has

\displaystyle  |E_{t+r} \backslash E_{t-r}| \leq 2\varepsilon^{-1} r N

for all {r>0}. Let {\Omega \subset [-1,1]} be the set of all bad {t}. Observe that for each bad {t}, we have {M\mu(t) \geq \varepsilon^{-1}}, where {\mu} is the probability measure

\displaystyle  \mu(S) := \frac{1}{N} |\{ n \in [N]: \hbox{Re} e(-\alpha n + \theta) \in S \}|

and {M} is the Hardy-Littlewood maximal function

\displaystyle  M\mu(t) := \sup_{r>0} \frac{1}{2r} \mu( [t-r,t+r] ).

Applying the Hardy-Littlewood maximal inequality, we conclude that {|\Omega| \ll \varepsilon}. In particular, if {\varepsilon} is small enough compared with {\delta}, we have

\displaystyle  \int_{[-1,1] \backslash \Omega} |\mathop{\bf E}_{n \in [N]} f(n) 1_{E_t}(n)|\ dt \gg \delta

and so by the pigeonhole principle, there exists a good {t} such that

\displaystyle  |\mathop{\bf E}_{n \in [N]} f(n) 1_{E_t}(n)| \gg \delta.

It remains to verify that {E_t} is good. For any {K > 0}, we have (as {t} is good) that

\displaystyle  \mathop{\bf E}_{n \in [N]} 1_{E_{t+1/K}} - 1_{E_{t-1/K}} \ll_\delta 1/K.

Applying Urysohn’s lemma, we can thus find a smooth function {\eta: {\bf R} \rightarrow {\bf R}^+} with {\eta(t') = 1} for {t' < t-1/K} and {\eta(t')=0} for {t' > t+1/K} such that

\displaystyle  \mathop{\bf E}_{n \in [N]} |1_{E_t}(n) - \eta( \hbox{Re} e(-\alpha n + \theta) )| \ll_\delta 1/K.

Using the Weierstrass approximation theorem, one can then approximate {\eta} uniformly by {O(1/K)} on {[-1,1]} by a polynomial of degree {O_K(1)} and coefficients {O_K(1)}. This allows one to approximate {1_{E_t}} in {L^1} norm to an accuracy of {O_\delta(1/K)} by a function of Fourier complexity {O_K(1)}, and the claim follows. \Box

Corollary 19 (Correlation implies energy increment) Let {f: [N] \rightarrow [0,1]}, and let {{\mathcal B}} be a factor generated by at most {M} atoms, each of which is Fourier-measurable with growth function {{\mathcal F}}. Suppose that we have the correlation

\displaystyle  |\langle f - \mathop{\bf E}(f|{\mathcal B}), e(\alpha \cdot) \rangle_{L^2([N])}| \geq \delta

for some {\delta > 0} and {\alpha \in {\bf R}}. Then there exists a refinement {{\mathcal B}'} generated by at most {2M} atoms, each of which is Fourier-measurable with a growth function {{\mathcal F}'} depending only on {\delta, {\mathcal F}}, such that

\displaystyle  \| \mathop{\bf E}(f|{\mathcal B}')\|_{L^2([N])}^2 - \| \mathop{\bf E}(f|{\mathcal B})\|_{L^2([N])}^2 \gg \delta^2. \ \ \ \ \ (5)

Proof: By Lemma 18, we can find a Fourier-measurable set {E} with some growth function {{\mathcal F}''} depending on {\delta}, such that

\displaystyle  |\langle f - \mathop{\bf E}(f|{\mathcal B}), 1_E \rangle_{L^2([N])}| \gg \delta.

We let {{\mathcal B}'} be the factor generated by {{\mathcal B}} and {E}. As {1_E} is measurable with respect to {{\mathcal B}'}, we may project onto {L^2({\mathcal B}')} and conclude that

\displaystyle  |\langle \mathop{\bf E}(f|{\mathcal B}') - \mathop{\bf E}(f|{\mathcal B}), 1_E \rangle_{L^2([N])}| \gg \delta.

By Cauchy-Schwarz, we thus have

\displaystyle  \| \mathop{\bf E}(f|{\mathcal B}') - \mathop{\bf E}(f|{\mathcal B}) \|_{L^2([N])} \gg \delta.

Squaring and using Pythagoras’ theorem, we obtain (5). The remaining claims in the corollary follow from Exercise 17. \Box
We can then iterate this corollary via an energy increment argument to obtain

Proposition 20 (Weak arithmetic regularity lemma) Let {f: [N] \rightarrow [0,1]}, and let {{\mathcal B}} be a factor generated by at most {M} atoms, each of which is Fourier-measurable with growth function {{\mathcal F}}. Let {\delta > 0}. Then there exists an extension {{\mathcal B}'} of {{\mathcal B}} generated by {O_{M,\delta}(1)} atoms, each of which is Fourier-measurable with growth function {{\mathcal F}'} depending on {{\mathcal F}, \delta}, such that

\displaystyle  |\langle f - \mathop{\bf E}(f|{\mathcal B}'), e(\alpha \cdot) \rangle_{L^2([N])}| < \delta \ \ \ \ \ (6)

for all {\alpha \in {\bf R}}.

Proof: We initially set {{\mathcal B}'} equal to {{\mathcal B}}. If (6) already holds, then we are done; otherwise, we invoke Corollary 19 to increase the “energy” {\| \mathop{\bf E}(f|{\mathcal B}') \|_{L^2}^2} by {\gg \delta^2}, at the cost of possibly doubling the number of atoms in {{\mathcal B}'}, and also altering the growth function somewhat. We iterate this procedure; as the energy {\| \mathop{\bf E}(f|{\mathcal B}') \|_{L^2}^2} is bounded between zero and one, and increases by {\gg \delta^2} at each step, the procedure must terminate in {O(1/\delta^2)} steps, at which point the claim follows. \Box
It turns out that the power of this lemma is amplified if we iterate one more time, to obtain

Theorem 21 (Strong arithmetic regularity lemma) Let {f: [N] \rightarrow [0,1]}, let {\varepsilon > 0}, and let {F: {\bf R}^+ \rightarrow {\bf R}^+} be an arbitrary function. Then we can decompose {f = f_{str} + f_{sml} + f_{psd}} and find {1 \leq M = O_{\varepsilon,F}(1)} such that

  • (Nonnegativity) {f_{str}, f_{str}+f_{sml}} take values in {[0,1]}, and {f_{sml}, f_{psd}} have mean zero;
  • (Structure) {f_{str}} is Fourier-measurable with a growth function {{\mathcal F}_M} that depends only on {M};
  • (Smallness) {f_{sml}} has an {L^2} norm of at most {\varepsilon}; and
  • (Pseudorandomness) One has {|\mathop{\bf E}_{n \in [N]} f_{psd}(n) e(-\alpha n)| \leq 1/F(M)} for all {\alpha \in {\bf R}}.

Proof: We recursively define a sequence {M_1 < M_2 < \ldots} by setting {M_1 := 1} and {M_{k+1} := M_k + F(M_k) + 1} (say). Applying Proposition 20 (starting with the trivial factor {{\mathcal B}_1}), one can then find a nested sequence of refinements {{\mathcal B}_1 \subset {\mathcal B}_2 \subset \ldots}, such that

\displaystyle  |\langle f - \mathop{\bf E}(f|{\mathcal B}_{k}), e(\alpha \cdot) \rangle_{L^2([N])}| < 1/M_k

for all {k \geq 1} and {\alpha \in {\bf R}}, and such that each {{\mathcal B}_k} consists of {O_k(1)} atoms that are Fourier-measurable with some growth function depending on {M_k} (note that this quantity dominates {k} and {M_1,\ldots,M_{k-1}} by construction). By Pythagoras’ theorem, the energies {\| \mathop{\bf E}(f|{\mathcal B}_k)\|_{L^2([N])}^2} are monotone increasing between {0} and {1}, so by the pigeonhole principle there exists {k = O(1/\varepsilon^2)} such that

\displaystyle  \| \mathop{\bf E}(f|{\mathcal B}_{k+1})\|_{L^2([N])}^2 - \| \mathop{\bf E}(f|{\mathcal B}_{k})\|_{L^2([N])}^2 \leq \varepsilon^2

and hence by Pythagoras

\displaystyle  \| \mathop{\bf E}(f|{\mathcal B}_{k+1}) - \mathop{\bf E}(f|{\mathcal B}_{k})\|_{L^2([N])}^2 \leq \varepsilon^2.

Setting {f_{str} := \mathop{\bf E}(f|{\mathcal B}_{k})}, {f_{sml} := \mathop{\bf E}(f|{\mathcal B}_{k+1}) - \mathop{\bf E}(f|{\mathcal B}_{k})}, {f_{psd} := f - \mathop{\bf E}(f|{\mathcal B}_{k+1})}, we obtain the claim. \Box

Remark 22 This result is essentially due to Green (though not quite in this language). Earlier related decompositions are due to Bourgain and to Green-Konyagin. The Szemerédi regularity lemma in graph theory can be viewed as the graph-theoretic analogue of this Fourier-analytic result; see this paper of mine (or my FOCS paper) for further discussion. The double iteration required to prove Theorem 21 means that the bounds here are quite poor (of tower-exponential type, in fact, when {F} is exponential, which is typical in applications), much as in the graph theory case; thus the use of this lemma, while technically quantitative in nature, gives bounds that are usually quite inferior to what is known or suspected to be true.
As with the Ratner-type theorems from the previous notes, it is crucial that the uniformity {1/F(M)} for the pseudorandom component {f_{psd}} is of an arbitrarily higher quality than the measurability of the structured component {f_{str}}.

Much as the Ratner-type theorems from the previous notes could be used to prove multiple recurrence theorems, the arithmetic regularity lemma can be used (among other things) to give a proof of Roth’s theorem. We do so as follows. Let {N} be a large integer, and let {A} be a subset of {[N]} with {|A| \geq \delta N} for some {\delta > 0}. We consider the expression {\Lambda(1_A,1_A,1_A)}, where {\Lambda} is the trilinear form

\displaystyle  \Lambda(f,g,h) := \frac{1}{N^2} \sum_{n \in [N]} \sum_{r \in [-N,N]} f(n) g(n+r) h(n+2r).

We will show that

\displaystyle  \Lambda(1_A, 1_A, 1_A) \gg_\delta 1, \ \ \ \ \ (7)

which implies that the number of all three-term arithmetic progressions in {A} (including the degenerate ones with {r=0}) is {\gg_\delta N^2}. For {N} sufficiently large depending on {\delta}, this number is larger than the number {N} of degenerate progressions, giving the theorem.
It remains to establish (7). We apply Theorem 21 with parameters {\varepsilon > 0}, {F} to be chosen later (they will depend on {\delta}) to obtain a quantity {M} and a decomposition

\displaystyle  1_A = f_{str} + f_{sml} + f_{psd}

with the stated properties. This splits the left-hand side of (7) into {27} terms. But we can eliminate several of these terms:

Exercise 23 Show that all of the terms in (7) which involve at least one copy of {f_{psd}} are of size {O(1/F(M))}. (Hint: Modify the proof of Proposition 5.)

From this exercise we see that

\displaystyle  \Lambda(1_A, 1_A, 1_A) = \Lambda(f_{str}+f_{sml}, f_{str}+f_{sml}, f_{str}+f_{sml}) + O(1/F(M)). \ \ \ \ \ (8)

Now we need to deal with {f_{str}+f_{sml}}. A key point is the almost periodicity of {f_{str}+f_{sml}}:

Lemma 24 (Almost periodicity) For {\gg_{\delta,M} N} values of {r \in [-\varepsilon N,\varepsilon N]}, one has

\displaystyle  \mathop{\bf E}_{n \in [N]} |(f_{str}+f_{sml})(n+r) - (f_{str}+f_{sml})(n)| \ll \varepsilon

(where we extend {f_{str}, f_{sml}} by zero outside of {[N]}).

Proof: As {f_{str}} is Fourier-measurable, we can approximate it to an error of {O(\varepsilon)} in {L^1[N]} norm by a function

\displaystyle  g = \sum_{j=1}^J c_j e(\alpha_j n) \ \ \ \ \ (9)

of Fourier complexity {J \leq O_{M,\varepsilon}(1)}. From the smallness of {f_{sml}}, we then have

\displaystyle  \mathop{\bf E}_{n \in [N]} |(f_{str}+f_{sml})(n+r) - (f_{str}+f_{sml})(n)|

\displaystyle \leq \mathop{\bf E}_{n \in [N]} |g(n+r) - g(n)| + O(\varepsilon)

(where we extend {g} using (9) rather than by zero, with the error being {O(\varepsilon)} when {|r| \leq \varepsilon N}). We can use (9) and the triangle inequality to bound

\displaystyle  \mathop{\bf E}_{n \in [N]} |g(n+r) - g(n)| \leq \sum_{j=1}^J |e(\alpha_j r) - 1|.

Using multiple recurrence, we can find {\gg_{J,\varepsilon} N} values of {r \in [-\varepsilon N, \varepsilon N]} such that {\| \alpha_j r \|_{{\bf R}/{\bf Z}} \leq \varepsilon/J} for all {1 \leq j \leq J}. The claim follows. \Box
Now we can finish the proof of Roth’s theorem. As {f_{str}+f_{sml}} has the same mean as {f}, we have

\displaystyle  \mathop{\bf E}_{n \in [N]} (f_{str}+f_{sml})(n) \geq \delta

and hence by Hölder’s inequality (and the non-negativity of {f_{str}+f_{sml}})

\displaystyle  \mathop{\bf E}_{n \in [N]} (f_{str}+f_{sml})(n)^3 \geq \delta^3.

Now if {r} is one of the periods in the above lemma, we have

\displaystyle  \mathop{\bf E}_{n \in [N]} |(f_{str}+f_{sml})(n+r) - (f_{str}+f_{sml})(n)| \ll \varepsilon

and thus by shifting

\displaystyle  \mathop{\bf E}_{n \in [N]} |(f_{str}+f_{sml})(n+2r) - (f_{str}+f_{sml})(n+r)| \ll \varepsilon

and so by the triangle inequality

\displaystyle  \mathop{\bf E}_{n \in [N]} |(f_{str}+f_{sml})(n+2r) - (f_{str}+f_{sml})(n)| \ll \varepsilon.

Putting all this together using the triangle and Hölder inequalities, we obtain

\displaystyle  \mathop{\bf E}_{n \in [N]} (f_{str}+f_{sml})(n) (f_{str}+f_{sml})(n+r) (f_{str}+f_{sml})(n+2r)

\displaystyle  \geq \delta^3 - O(\varepsilon).

Thus, if {\varepsilon} is sufficiently small depending on {\delta}, we have

\displaystyle  \mathop{\bf E}_{n \in [N]} (f_{str}+f_{sml})(n) (f_{str}+f_{sml})(n+r) (f_{str}+f_{sml})(n+2r) \gg \delta^3

for {\gg_{J,\varepsilon} N} values of {r}, and thus

\displaystyle  \Lambda(f_{str}+f_{sml},f_{str}+f_{sml},f_{str}+f_{sml}) \gg_{\delta,M} 1;

if we then set {F} to be a sufficiently rapidly growing function (depending on {\delta}), we obtain the claim from (8). This concludes the proof of Roth’s theorem.

Exercise 25 Use the energy increment method to establish a different proof of Exercise 13. (Hint: For the multiple recurrence step, use a pigeonhole principle argument rather than an appeal to equidistribution theory.)

We now briefly indicate how to translate the above arguments into the ultralimit setting. We first need to construct an important measure on limit sets, namely Loeb measure.

Exercise 26 (Construction of Loeb measure) Let {N} be an unbounded natural number. Define the Loeb measure {\mu(A)} of a limit subset {A} of {[N]} to be the quantity {st(|A|/N)}, thus for instance a set of cardinality {o(N)} will have Loeb measure zero.

  • Show that if a limit subset {A} of {[N]} is partitioned into countably many disjoint limit subsets {A_n}, that all but finitely many of the {A_n} are empty, and so {\mu(A) = \mu(A_1)+\ldots+\mu(A_n)}.
  • Define the outer measure {\mu_*(A)} of a subset {A} of {[N]} (not necessarily a limit subset) to be the infimum of {\sum_n \mu(A_n)}, where {A_1, A_2, \ldots} is a countable family of limit subsets of {[N]} that cover {A}, and call a subset of {[N]} null if it has zero outer measure. Call a subset Loeb measurable if it differs from a limit set by a null set. Show that there is a unique extension of Loeb measure {\mu} from limit sets to Loeb measurable sets that is a countably additive probability measure on {[N]}. (Hint: use the Carathéodory extension theorem, see e.g. my 254B notes 0 or notes 0a.)
  • If {f: [N] \rightarrow {\bf C}} is a limit function bounded in magnitude by some standard real {M}, show that {st(f)} is a Loeb measurable function in {L^\infty(\mu)}, with norm at most {M}.
  • Show that there exists a unique trilinear form {\Lambda: L^\infty(\mu) \times L^\infty(\mu) \times L^\infty(\mu) \rightarrow {\bf C}}, jointly continuous in the {L^3(\mu)} topology for all three inputs, such that

    \displaystyle  \Lambda(st(f),st(g),st(h))

    \displaystyle  = st( \frac{1}{N^2} \sum_{n \in [N]} \sum_{r \in [-N,N]} f(n) g(n+r) h(n+2r) )

    for all bounded limit functions {f,g,h}.

  • Show that Roth’s theorem is equivalent to the assertion that {\Lambda(f,f,f) > 0} whenever {f \in L^\infty(\mu)} is a bounded non-negative function with {\int_{[N]} f\ d\mu > 0}.

Next, we develop the ultralimit analogue of Fourier measurability, which we will rename Kronecker measurability due to the close analogy with the Kronecker factor in ergodic theory.

Exercise 27 (Construction of the Kronecker factor) Let {N} be an unbounded natural number. We define a Fourier character to be a function in {L^\infty([N])} of the form {n \mapsto st(e(\alpha n))} for some limit real number {\alpha}. We define a trigonometric polynomial to be any finite linear combination (over the standard complex numbers) of Fourier characters. Let {{\mathcal Z}^1} be the {\sigma}-algebra of Loeb measurable sets generated by the Fourier characters; we refer to {{\mathcal Z}^1} as the Kronecker factor, and functions or sets measurable in this factor as Kronecker measurable functions and sets. Thus for instance all trigonometric polynomials are Kronecker measurable. We let {\mathop{\bf E}(f|{\mathcal Z}^1)} denote the orthogonal projection from {f} to {L^2({\mathcal Z^1})}, i.e. the conditional expectation to the Kronecker factor.

  • Show that if {f \in L^\infty({\mathcal Z}^1)} is bounded in magnitude by {M} and {\varepsilon > 0} is a standard real, then there exists a trigonometric polynomial {P \in L^\infty({\mathcal Z}^1)} which is also bounded in magnitude by {M} and is within {\varepsilon} of {f} in {L^1} norm.
  • Show that if {f \in L^\infty({\mathcal Z}^1)} and {\varepsilon > 0}, then there exists a limit subset {R} of {[-\varepsilon N, \varepsilon N]} of cardinality {\gg N} such that {\| f(\cdot) - f(\cdot+r) \|_{L^1([N])} \leq \varepsilon} for all {r \in R} (extending {f} by zero).
  • Show that if {f \in L^\infty({\mathcal Z}^1)} is non-negative with {\int_{[N]} f\ d\mu > 0}, then {\Lambda(f,f,f) > 0}.
  • Show that if {f_1,f_2,f_3 \in L^\infty([N])} and {\mathop{\bf E}(f_i|{\mathcal Z}^1)=0} for at least one {i=1,2,3}, then {\Lambda(f_1,f_2,f_3)=0}.
  • Conclude the proof of Roth’s theorem using ultralimits.

Remark 28 Note how the (finitary) arithmetic regularity lemma has been replaced by the more familiar (infinitary) theory of conditional expectation to a factor, and the finitary notion of measurability has been replaced by a notion from the traditional (countably additive) infinitary theory of measurability. This is one of the key advantages of the ultralimit approach, namely that it allows one to exploit already established theories of infinitary mathematics (e.g. measure theory, ergodic theory, Hilbert space geometry, etc.) to prove a finitary result.

Exercise 29 Use the ultralimit energy increment method to establish yet another proof of Exercise 13.

— 3. More quantitative bounds (optional) —

The above proofs of Roth’s theorem (as formulated in, say, Theorem 2) were qualitative in the sense that they did not explicitly give a bound for {N_0} in terms of {\delta}. Nevertheless, by analysing the finitary arguments more carefully, a bound can be extracted:

Exercise 30 Show that in Proposition 7, one can take {N' \gg \varepsilon^{O(1)} N^{1/2}}. Using this and the density increment argument, show that one can take {N_0 \ll \exp(\exp(O(1/\delta)))} in Theorem 2. (To put it another way, subsets of {[N]} of density much larger than {1/\log \log N} will contain progressions of length three.)

Exercise 31 Show that in the energy increment proof of Roth’s theorem, one can take the growth functions {{\mathcal F}} involved to be polynomial in {K} (but with the exponent growing exponentially with each refinement of the factor), and {F} can be taken to be an iterated exponential; thus ultimately allows one to take {N_0} to be a tower exponential of height {O(\delta^{-O(1)})}. (To put it another way, subsets of {[N]} of density much larger than {1/\log_*^c N} for some {c>0} will contain progressions of length three.) Thus we see that the energy increment argument, in the form presented here, provides much worse bounds than the density increment argument; but see below.

For the ultralimit arguments, it is significantly harder to extract a quantitative bound from the argument (basically one has to painstakingly “finitise” the argument first, essentially reaching the finitary counterparts of these arguments presented above). Thus we see that there is a tradeoff when using ultralimit analysis; the arguments become slightly cleaner (and one can deploy infinitary methods), but one tends to lose sight of what quantitative bounds the method establishes. (This is particularly the case if one begins to rely heavily on the axiom of choice (or on large cardinal axioms) once one takes ultralimits, although these axioms are not used in the examples above.)
It is possible to run the density increment argument more efficiently by combining it with some aspects of the energy increment argument. As described above, the density increment argument proceeds by locating a single large Fourier coefficient {\hat 1_A(\alpha)} of {A}, and uses this to obtain a density increment on a relatively short subprogression of {[N]} (of length comparable to {\sqrt{N}}, ignoring factors of {\delta}). One then has to iterate this about {1/\delta} times before one obtains a truly significant density increment (e.g. from {\delta} to {2\delta}). It is this repeated passage from {N} to {\sqrt{N}} which is ultimately responsible for the double exponential bound for {N_0} at the end of the day.
In an unpublished work, Endre Szemerédi observed that one can run this argument more efficiently by collecting several large Fourier coefficients of {1_A} simultaneously (somewhat in the spirit of the energy increment argument), and only then passing to a subprogression on which all of the relevant Fourier characters are simultaneously close to constant. The subprogression obtained is smaller as a consequence, but the density increment is much more substantial. Using this strategy, Endre was able to improve the original Roth bound of {N_0 \ll \exp(\exp(O(1/\delta)))} to the somewhat better {N_0 \ll \exp(\exp(O(\log^2(1/\delta))))} (or equivalently, he was able to establish length three progressions in any subset of {[N]} of density much larger than {\exp(-c\sqrt{\log \log N})} for some {c>0}). By carefully optimising the choice of threshold for selecting the “large Fourier coefficients”, Szemerédi (unpublished) and Heath-Brown independently improved this method further to obtain {N_0 \ll \exp(\delta^{-O(1)})}, or equivalently obtaining length three progressions in sets of density much larger than {\log^{-c} N}. (This result was extended to arbitrary finite abelian groups by Meshulam.)
The next advance was by Bourgain, who realised that rather than pass to short subprogressions, it was more efficient to work on the significantly larger (but messier) Bohr sets {\{ n: \alpha n \hbox{ mod } 1 \in I \}}, after ensuring that such Bohr sets were regular (this condition is closely related to the Fourier measurability condition used in the energy increment argument). With this modification to the original Roth argument, the bound was lowered to {N_0 \ll \delta^{-O(1/\delta^2)}}, or equivalently obtaining length three progressions in sets of density much larger than {\sqrt{\log\log N/\log N}}. Even more recently, this argument was combined with the Szemerédi-Heath-Brown argument by Bourgain (not yet published) to obtain the further improvement of {N_0 \ll \exp( O( \delta^{-3/2} \log^3( 1/\delta) ) )}, or equivalently obtaining length three progressions in sets of density much larger than {(\log \log N)^2 / \log^{2/3} N}. This is getting close to the {k=3} case of an old conjecture of Erdös that asserts that any subset of the natural numbers whose sums of reciprocals diverge should have infinitely many arithmetic progressions of length {k} for any {k}. To establish the {k=3} case from quantitative versions of Roth’s theorem, one would basically need a bound of the form {N_0 \ll \exp( \delta^{-1+c} )} for some {c>0} (or the ability to obtain progressions in sets of density {1/\log^{1+c} N}).
On the other hand, there is an old counterexample of Behrend (based ultimately on the observation that a sphere in a high-dimensional lattice {{\bf Z}^d} does not contain any arithmetic progressions of length three) which shows that {N_0} must be at least {\gg \exp( \log^2(1/\delta) )} (in particular, it must be super-polynomial in {\delta}); equivalently, it is known that there are subsets of {[N]} of density about {\exp(-c\sqrt{\log N})} with no arithmetic progressions of length three. For the sharpest results in this direction, see this paper of Elkin (and a later exposition of Green and Wolf).
The question of refining the bounds is an important one, as it tends to improve the technological understanding of current methods, as well as shed light on their relative strengths and weaknesses. However, this comes at the cost of making the arguments somewhat more technical, and so we shall not focus on the sharpest quantitative results in these notes.