The following result is due independently to Furstenberg and to Sarkozy:

Theorem 1 (Furstenberg-Sarkozy theorem) Let {\delta > 0}, and suppose that {N} is sufficiently large depending on {\delta}. Then every subset {A} of {[N] := \{1,\ldots,N\}} of density {|A|/N} at least {\delta} contains a pair {n, n+r^2} for some natural numbers {n, r} with {r \neq 0}.

This theorem is of course similar in spirit to results such as Roth’s theorem or Szemerédi’s theorem, in which the pattern {n,n+r^2} is replaced by {n,n+r,n+2r} or {n,n+r,\ldots,n+(k-1)r} for some fixed {k} respectively. There are by now many proofs of this theorem (see this recent paper of Lyall for a survey), but most proofs involve some form of Fourier analysis (or spectral theory). This may be compared with the standard proof of Roth’s theorem, which combines some Fourier analysis with what is now known as the density increment argument.

A few years ago, Ben Green, Tamar Ziegler, and myself observed that it is possible to prove the Furstenberg-Sarkozy theorem by just using the Cauchy-Schwarz inequality (or van der Corput lemma) and the density increment argument, removing all invocations of Fourier analysis, and instead relying on Cauchy-Schwarz to linearise the quadratic shift {r^2}. As such, this theorem can be considered as even more elementary than Roth’s theorem (and its proof can be viewed as a toy model for the proof of Roth’s theorem). We ended up not doing too much with this observation, so decided to share it here.

The first step is to use the density increment argument that goes back to Roth. For any {\delta > 0}, let {P(\delta)} denote the assertion that for {N} sufficiently large, all sets {A \subset [N]} of density at least {\delta} contain a pair {n,n+r^2} with {r} non-zero. Note that {P(\delta)} is vacuously true for {\delta > 1}. We will show that for any {0 < \delta_0 \leq 1}, one has the implication

\displaystyle  P(\delta_0 + c \delta_0^3) \implies P(\delta_0) \ \ \ \ \ (1)

for some absolute constant {c>0}. This implies that {P(\delta)} is true for any {\delta>0} (as can be seen by considering the infimum of all {\delta>0} for which {P(\delta)} holds), which gives Theorem 1.

It remains to establish the implication (1). Suppose for sake of contradiction that we can find {0 < \delta_0 \leq 1} for which {P(\delta_0+c\delta^3_0)} holds (for some sufficiently small absolute constant {c>0}), but {P(\delta_0)} fails. Thus, we can find arbitrarily large {N}, and subsets {A} of {[N]} of density at least {\delta_0}, such that {A} contains no patterns of the form {n,n+r^2} with {r} non-zero. In particular, we have

\displaystyle  \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h \in [N^{1/100}]} 1_A(n) 1_A(n+(r+h)^2) = 0.

(The exact ranges of {r} and {h} are not too important here, and could be replaced by various other small powers of {N} if desired.)

Let {\delta := |A|/N} be the density of {A}, so that {\delta_0 \leq \delta \leq 1}. Observe that

\displaystyle  \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h \in [N^{1/100}]} 1_A(n) \delta 1_{[N]}(n+(r+h)^2) = \delta^2 + O(N^{-1/3})

\displaystyle  \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h \in [N^{1/100}]} \delta 1_{[N]}(n) \delta 1_{[N]}(n+(r+h)^2) = \delta^2 + O(N^{-1/3})

and

\displaystyle  \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h \in [N^{1/100}]} \delta 1_{[N]}(n) 1_A(n+(r+h)^2) = \delta^2 + O( N^{-1/3} ).

If we thus set {f := 1_A - \delta 1_{[N]}}, then

\displaystyle  \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h \in [N^{1/100}]} f(n) f(n+(r+h)^2) = -\delta^2 + O( N^{-1/3} ).

In particular, for {N} large enough,

\displaystyle  \mathop{\bf E}_{n \in [N]} |f(n)| \mathop{\bf E}_{r \in [N^{1/3}]} |\mathop{\bf E}_{h \in [N^{1/100}]} f(n+(r+h)^2)| \gg \delta^2.

On the other hand, one easily sees that

\displaystyle  \mathop{\bf E}_{n \in [N]} |f(n)|^2 = O(\delta)

and hence by the Cauchy-Schwarz inequality

\displaystyle  \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} |\mathop{\bf E}_{h \in [N^{1/100}]} f(n+(r+h)^2)|^2 \gg \delta^3

which we can rearrange as

\displaystyle  |\mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h,h' \in [N^{1/100}]} \mathop{\bf E}_{n \in [N]} f(n+(r+h)^2) f(n+(r+h')^2)| \gg \delta^3.

Shifting {n} by {(r+h)^2} we obtain (again for {N} large enough)

\displaystyle  |\mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h,h' \in [N^{1/100}]} \mathop{\bf E}_{n \in [N]} f(n) f(n+(h'-h)(2r+h'+h))| \gg \delta^3.

In particular, by the pigeonhole principle (and deleting the diagonal case {h=h'}, which we can do for {N} large enough) we can find distinct {h,h' \in [N^{1/100}]} such that

\displaystyle  |\mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{n \in [N]} f(n) f(n+(h'-h)(2r+h'+h))| \gg \delta^3,

so in particular

\displaystyle  \mathop{\bf E}_{n \in [N]} |\mathop{\bf E}_{r \in [N^{1/3}]} f(n+(h'-h)(2r+h'+h))| \gg \delta^3.

If we set {d := 2(h'-h)} and shift {n} by {(h'-h) (h'+h)}, we can simplify this (again for {N} large enough) as

\displaystyle  \mathop{\bf E}_{n \in [N]} |\mathop{\bf E}_{r \in [N^{1/3}]} f(n+dr)| \gg \delta^3. \ \ \ \ \ (2)

On the other hand, since

\displaystyle  \mathop{\bf E}_{n \in [N]} f(n) = 0

we have

\displaystyle  \mathop{\bf E}_{n \in [N]} f(n+dr) = O( N^{-2/3+1/100})

for any {r \in [N^{1/3}]}, and thus

\displaystyle  \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} f(n+dr) = O( N^{-2/3+1/100}).

Averaging this with (2) we conclude that

\displaystyle  \mathop{\bf E}_{n \in [N]} \max( \mathop{\bf E}_{r \in [N^{1/3}]} f(n+dr), 0 ) \gg \delta^3.

In particular, by the pigeonhole principle we can find {n \in [N]} such that

\displaystyle  \mathop{\bf E}_{r \in [N^{1/3}]} f(n+dr) \gg \delta^3,

or equivalently {A} has density at least {\delta+c'\delta^3} on the arithmetic progression {\{ n+dr: r \in [N^{1/3}]\}}, which has length {\lfloor N^{1/3}\rfloor } and spacing {d}, for some absolute constant {c'>0}. By partitioning this progression into subprogressions of spacing {d^2} and length {\lfloor N^{1/4}\rfloor} (plus an error set of size {O(N^{1/4})}, we see from the pigeonhole principle that we can find a progression {\{ n' + d^2 r': r' \in [N^{1/4}]\}} of length {\lfloor N^{1/4}\rfloor} and spacing {d^2} on which {A} has density at least {\delta + c\delta^3} (and hence at least {\delta_0+c\delta_0^3}) for some absolute constant {c>0}. If we then apply the induction hypothesis to the set

\displaystyle  A' := \{ r' \in [N^{1/4}]: n' + d^2 r' \in A \}

we conclude (for {N} large enough) that {A'} contains a pair {m, m+s^2} for some natural numbers {m,s} with {s} non-zero. This implies that {(n'+d^2 m), (n'+d^2 m) + (|d|s)^2} lie in {A}, a contradiction, establishing the implication (1).

A more careful analysis of the above argument reveals a more quantitative version of Theorem 1: for {N \geq 100} (say), any subset of {[N]} of density at least {C/(\log\log N)^{1/2}} for some sufficiently large absolute constant {C} contains a pair {n,n+r^2} with {r} non-zero. This is not the best bound known; a (difficult) result of Pintz, Steiger, and Szemeredi allows the density to be as low as {C / (\log N)^{\frac{1}{4} \log\log\log\log N}}. On the other hand, this already improves on the (simpler) Fourier-analytic argument of Green that works for densities at least {C/(\log\log N)^{1/11}} (although the original argument of Sarkozy, which is a little more intricate, works up to {C (\log\log N)^{2/3}/(\log N)^{1/3}}). In the other direction, a construction of Rusza gives a set of density {\frac{1}{65} N^{-0.267}} without any pairs {n,n+r^2}.

Remark 1 A similar argument also applies with {n,n+r^2} replaced by {n,n+r^k} for fixed {k}, because this sort of pattern is preserved by affine dilations {r' \mapsto n'+d^k r'} into arithmetic progressions whose spacing {d^k} is a {k^{th}} power. By re-introducing Fourier analysis, one can also perform an argument of this type for {n,n+d,n+2d} where {d} is the sum of two squares; see the above-mentioned paper of Green for details. However there seems to be some technical difficulty in extending it to patterns of the form {n,n+P(r)} for polynomials {P} that consist of more than a single monomial (and with the normalisation {P(0)=0}, to avoid local obstructions), because one no longer has this preservation property.