You are currently browsing the tag archive for the ‘density increment argument’ tag.

The following result is due independently to Furstenberg and to Sarkozy:

Theorem 1 (Furstenberg-Sarkozy theorem) Let ${\delta > 0}$, and suppose that ${N}$ is sufficiently large depending on ${\delta}$. Then every subset ${A}$ of ${[N] := \{1,\ldots,N\}}$ of density ${|A|/N}$ at least ${\delta}$ contains a pair ${n, n+r^2}$ for some natural numbers ${n, r}$ with ${r \neq 0}$.

This theorem is of course similar in spirit to results such as Roth’s theorem or Szemerédi’s theorem, in which the pattern ${n,n+r^2}$ is replaced by ${n,n+r,n+2r}$ or ${n,n+r,\ldots,n+(k-1)r}$ for some fixed ${k}$ respectively. There are by now many proofs of this theorem (see this recent paper of Lyall for a survey), but most proofs involve some form of Fourier analysis (or spectral theory). This may be compared with the standard proof of Roth’s theorem, which combines some Fourier analysis with what is now known as the density increment argument.

A few years ago, Ben Green, Tamar Ziegler, and myself observed that it is possible to prove the Furstenberg-Sarkozy theorem by just using the Cauchy-Schwarz inequality (or van der Corput lemma) and the density increment argument, removing all invocations of Fourier analysis, and instead relying on Cauchy-Schwarz to linearise the quadratic shift ${r^2}$. As such, this theorem can be considered as even more elementary than Roth’s theorem (and its proof can be viewed as a toy model for the proof of Roth’s theorem). We ended up not doing too much with this observation, so decided to share it here.

The first step is to use the density increment argument that goes back to Roth. For any ${\delta > 0}$, let ${P(\delta)}$ denote the assertion that for ${N}$ sufficiently large, all sets ${A \subset [N]}$ of density at least ${\delta}$ contain a pair ${n,n+r^2}$ with ${r}$ non-zero. Note that ${P(\delta)}$ is vacuously true for ${\delta > 1}$. We will show that for any ${0 < \delta_0 \leq 1}$, one has the implication

$\displaystyle P(\delta_0 + c \delta_0^3) \implies P(\delta_0) \ \ \ \ \ (1)$

for some absolute constant ${c>0}$. This implies that ${P(\delta)}$ is true for any ${\delta>0}$ (as can be seen by considering the infimum of all ${\delta>0}$ for which ${P(\delta)}$ holds), which gives Theorem 1.

It remains to establish the implication (1). Suppose for sake of contradiction that we can find ${0 < \delta_0 \leq 1}$ for which ${P(\delta_0+c\delta^3_0)}$ holds (for some sufficiently small absolute constant ${c>0}$), but ${P(\delta_0)}$ fails. Thus, we can find arbitrarily large ${N}$, and subsets ${A}$ of ${[N]}$ of density at least ${\delta_0}$, such that ${A}$ contains no patterns of the form ${n,n+r^2}$ with ${r}$ non-zero. In particular, we have

$\displaystyle \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h \in [N^{1/100}]} 1_A(n) 1_A(n+(r+h)^2) = 0.$

(The exact ranges of ${r}$ and ${h}$ are not too important here, and could be replaced by various other small powers of ${N}$ if desired.)

Let ${\delta := |A|/N}$ be the density of ${A}$, so that ${\delta_0 \leq \delta \leq 1}$. Observe that

$\displaystyle \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h \in [N^{1/100}]} 1_A(n) \delta 1_{[N]}(n+(r+h)^2) = \delta^2 + O(N^{-1/3})$

$\displaystyle \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h \in [N^{1/100}]} \delta 1_{[N]}(n) \delta 1_{[N]}(n+(r+h)^2) = \delta^2 + O(N^{-1/3})$

and

$\displaystyle \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h \in [N^{1/100}]} \delta 1_{[N]}(n) 1_A(n+(r+h)^2) = \delta^2 + O( N^{-1/3} ).$

If we thus set ${f := 1_A - \delta 1_{[N]}}$, then

$\displaystyle \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h \in [N^{1/100}]} f(n) f(n+(r+h)^2) = -\delta^2 + O( N^{-1/3} ).$

In particular, for ${N}$ large enough,

$\displaystyle \mathop{\bf E}_{n \in [N]} |f(n)| \mathop{\bf E}_{r \in [N^{1/3}]} |\mathop{\bf E}_{h \in [N^{1/100}]} f(n+(r+h)^2)| \gg \delta^2.$

On the other hand, one easily sees that

$\displaystyle \mathop{\bf E}_{n \in [N]} |f(n)|^2 = O(\delta)$

and hence by the Cauchy-Schwarz inequality

$\displaystyle \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} |\mathop{\bf E}_{h \in [N^{1/100}]} f(n+(r+h)^2)|^2 \gg \delta^3$

which we can rearrange as

$\displaystyle |\mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h,h' \in [N^{1/100}]} \mathop{\bf E}_{n \in [N]} f(n+(r+h)^2) f(n+(r+h')^2)| \gg \delta^3.$

Shifting ${n}$ by ${(r+h)^2}$ we obtain (again for ${N}$ large enough)

$\displaystyle |\mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h,h' \in [N^{1/100}]} \mathop{\bf E}_{n \in [N]} f(n) f(n+(h'-h)(2r+h'+h))| \gg \delta^3.$

In particular, by the pigeonhole principle (and deleting the diagonal case ${h=h'}$, which we can do for ${N}$ large enough) we can find distinct ${h,h' \in [N^{1/100}]}$ such that

$\displaystyle |\mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{n \in [N]} f(n) f(n+(h'-h)(2r+h'+h))| \gg \delta^3,$

so in particular

$\displaystyle \mathop{\bf E}_{n \in [N]} |\mathop{\bf E}_{r \in [N^{1/3}]} f(n+(h'-h)(2r+h'+h))| \gg \delta^3.$

If we set ${d := 2(h'-h)}$ and shift ${n}$ by ${(h'-h) (h'+h)}$, we can simplify this (again for ${N}$ large enough) as

$\displaystyle \mathop{\bf E}_{n \in [N]} |\mathop{\bf E}_{r \in [N^{1/3}]} f(n+dr)| \gg \delta^3. \ \ \ \ \ (2)$

On the other hand, since

$\displaystyle \mathop{\bf E}_{n \in [N]} f(n) = 0$

we have

$\displaystyle \mathop{\bf E}_{n \in [N]} f(n+dr) = O( N^{-2/3+1/100})$

for any ${r \in [N^{1/3}]}$, and thus

$\displaystyle \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} f(n+dr) = O( N^{-2/3+1/100}).$

Averaging this with (2) we conclude that

$\displaystyle \mathop{\bf E}_{n \in [N]} \max( \mathop{\bf E}_{r \in [N^{1/3}]} f(n+dr), 0 ) \gg \delta^3.$

In particular, by the pigeonhole principle we can find ${n \in [N]}$ such that

$\displaystyle \mathop{\bf E}_{r \in [N^{1/3}]} f(n+dr) \gg \delta^3,$

or equivalently ${A}$ has density at least ${\delta+c'\delta^3}$ on the arithmetic progression ${\{ n+dr: r \in [N^{1/3}]\}}$, which has length ${\lfloor N^{1/3}\rfloor }$ and spacing ${d}$, for some absolute constant ${c'>0}$. By partitioning this progression into subprogressions of spacing ${d^2}$ and length ${\lfloor N^{1/4}\rfloor}$ (plus an error set of size ${O(N^{1/4})}$, we see from the pigeonhole principle that we can find a progression ${\{ n' + d^2 r': r' \in [N^{1/4}]\}}$ of length ${\lfloor N^{1/4}\rfloor}$ and spacing ${d^2}$ on which ${A}$ has density at least ${\delta + c\delta^3}$ (and hence at least ${\delta_0+c\delta_0^3}$) for some absolute constant ${c>0}$. If we then apply the induction hypothesis to the set

$\displaystyle A' := \{ r' \in [N^{1/4}]: n' + d^2 r' \in A \}$

we conclude (for ${N}$ large enough) that ${A'}$ contains a pair ${m, m+s^2}$ for some natural numbers ${m,s}$ with ${s}$ non-zero. This implies that ${(n'+d^2 m), (n'+d^2 m) + (|d|s)^2}$ lie in ${A}$, a contradiction, establishing the implication (1).

A more careful analysis of the above argument reveals a more quantitative version of Theorem 1: for ${N \geq 100}$ (say), any subset of ${[N]}$ of density at least ${C/(\log\log N)^{1/2}}$ for some sufficiently large absolute constant ${C}$ contains a pair ${n,n+r^2}$ with ${r}$ non-zero. This is not the best bound known; a (difficult) result of Pintz, Steiger, and Szemeredi allows the density to be as low as ${C / (\log N)^{\frac{1}{4} \log\log\log\log N}}$. On the other hand, this already improves on the (simpler) Fourier-analytic argument of Green that works for densities at least ${C/(\log\log N)^{1/11}}$ (although the original argument of Sarkozy, which is a little more intricate, works up to ${C (\log\log N)^{2/3}/(\log N)^{1/3}}$). In the other direction, a construction of Rusza gives a set of density ${\frac{1}{65} N^{-0.267}}$ without any pairs ${n,n+r^2}$.

Remark 1 A similar argument also applies with ${n,n+r^2}$ replaced by ${n,n+r^k}$ for fixed ${k}$, because this sort of pattern is preserved by affine dilations ${r' \mapsto n'+d^k r'}$ into arithmetic progressions whose spacing ${d^k}$ is a ${k^{th}}$ power. By re-introducing Fourier analysis, one can also perform an argument of this type for ${n,n+d,n+2d}$ where ${d}$ is the sum of two squares; see the above-mentioned paper of Green for details. However there seems to be some technical difficulty in extending it to patterns of the form ${n,n+P(r)}$ for polynomials ${P}$ that consist of more than a single monomial (and with the normalisation ${P(0)=0}$, to avoid local obstructions), because one no longer has this preservation property.

We now give a basic application of Fourier analysis to the problem of counting additive patterns in sets, namely the following famous theorem of Roth:

Theorem 1 (Roth’s theorem) Let ${A}$ be a subset of the integers ${{\bf Z}}$ whose upper density

$\displaystyle \overline{\delta}(A) := \limsup_{N \rightarrow \infty} \frac{|A \cap [-N,N]|}{2N+1}$

is positive. Then ${A}$ contains infinitely many arithmetic progressions ${a, a+r, a+2r}$ of length three, with ${a \in {\bf Z}}$ and ${r>0}$.

This is the first non-trivial case of Szemerédi’s theorem, which is the same assertion but with length three arithmetic progressions replaced by progressions of length ${k}$ for any ${k}$.

As it turns out, one can prove Roth’s theorem by an application of linear Fourier analysis – by comparing the set ${A}$ (or more precisely, the indicator function ${1_A}$ of that set, or of pieces of that set) against linear characters ${n \mapsto e(\alpha n)}$ for various frequencies ${\alpha \in {\bf R}/{\bf Z}}$. There are two extreme cases to consider (which are model examples of a more general dichotomy between structure and randomness). One is when ${A}$ is aligned up almost completely with one of these linear characters, for instance by being a Bohr set of the form

$\displaystyle \{ n \in {\bf Z}: \| \alpha n - \theta \|_{{\bf R}/{\bf Z}} < \epsilon \}$

or more generally of the form

$\displaystyle \{ n \in {\bf Z}: \alpha n \in U \}$

for some multi-dimensional frequency ${\alpha \in {\bf T}^d}$ and some open set ${U}$. In this case, arithmetic progressions can be located using the equidistribution theory of the previous set of notes. At the other extreme, one has Fourier-uniform or Fourier-pseudorandom sets, whose correlation with any linear character is negligible. In this case, arithmetic progressions can be produced in abundance via a Fourier-analytic calculation.

To handle the general case, one must somehow synthesise together the argument that deals with the structured case with the argument that deals with the random case. There are several known ways to do this, but they can be basically classified into two general methods, namely the density increment argument (or ${L^\infty}$ increment argument) and the energy increment argument (or ${L^2}$ increment argument).

The idea behind the density increment argument is to introduce a dichotomy: either the object ${A}$ being studied is pseudorandom (in which case one is done), or else one can use the theory of the structured objects to locate a sub-object of significantly higher “density” than the original object. As the density cannot exceed one, one should thus be done after a finite number of iterations of this dichotomy. This argument was introduced by Roth in his original proof of the above theorem.

The idea behind the energy increment argument is instead to decompose the original object ${A}$ into two pieces (and, sometimes, a small additional error term): a structured component that captures all the structured objects that have significant correlation with ${A}$, and a pseudorandom component which has no significant correlation with any structured object. This decomposition usually proceeds by trying to maximise the “energy” (or ${L^2}$ norm) of the structured component, or dually by trying to minimise the energy of the residual between the original object and the structured object. This argument appears for instance in the proof of the Szemerédi regularity lemma (which, not coincidentally, can also be used to prove Roth’s theorem), and is also implicit in the ergodic theory approach to such problems (through the machinery of conditional expectation relative to a factor, which is a type of orthogonal projection, the existence of which is usually established via an energy increment argument). However, one can also deploy the energy increment argument in the Fourier analytic setting, to give an alternate Fourier-analytic proof of Roth’s theorem that differs in some ways from the density increment proof.

In these notes we give both two Fourier-analytic proofs of Roth’s theorem, one proceeding via the density increment argument, and the other by the energy increment argument. As it turns out, both of these arguments extend to establish Szemerédi’s theorem, and more generally in counting other types of patterns, but this is non-trivial (requiring some sort of inverse conjecture for the Gowers uniformity norms in both cases); we will discuss this further in later notes.