The following result is due independently to Furstenberg and to Sarkozy:

Theorem 1 (Furstenberg-Sarkozy theorem)Let , and suppose that is sufficiently large depending on . Then every subset of of density at least contains a pair for some natural numbers with .

This theorem is of course similar in spirit to results such as Roth’s theorem or Szemerédi’s theorem, in which the pattern is replaced by or for some fixed respectively. There are by now many proofs of this theorem (see this recent paper of Lyall for a survey), but most proofs involve some form of Fourier analysis (or spectral theory). This may be compared with the standard proof of Roth’s theorem, which combines some Fourier analysis with what is now known as the density increment argument.

A few years ago, Ben Green, Tamar Ziegler, and myself observed that it is possible to prove the Furstenberg-Sarkozy theorem by just using the Cauchy-Schwarz inequality (or van der Corput lemma) and the density increment argument, removing all invocations of Fourier analysis, and instead relying on Cauchy-Schwarz to linearise the quadratic shift . As such, this theorem can be considered as even more elementary than Roth’s theorem (and its proof can be viewed as a toy model for the proof of Roth’s theorem). We ended up not doing too much with this observation, so decided to share it here.

The first step is to use the density increment argument that goes back to Roth. For any , let denote the assertion that for sufficiently large, all sets of density at least contain a pair with non-zero. Note that is vacuously true for . We will show that for any , one has the implication

for some absolute constant . This implies that is true for any (as can be seen by considering the infimum of all for which holds), which gives Theorem 1.

It remains to establish the implication (1). Suppose for sake of contradiction that we can find for which holds (for some sufficiently small absolute constant ), but fails. Thus, we can find arbitrarily large , and subsets of of density at least , such that contains no patterns of the form with non-zero. In particular, we have

(The exact ranges of and are not too important here, and could be replaced by various other small powers of if desired.)

Let be the density of , so that . Observe that

and

If we thus set , then

In particular, for large enough,

On the other hand, one easily sees that

and hence by the Cauchy-Schwarz inequality

which we can rearrange as

Shifting by we obtain (again for large enough)

In particular, by the pigeonhole principle (and deleting the diagonal case , which we can do for large enough) we can find distinct such that

so in particular

If we set and shift by , we can simplify this (again for large enough) as

we have

for any , and thus

Averaging this with (2) we conclude that

In particular, by the pigeonhole principle we can find such that

or equivalently has density at least on the arithmetic progression , which has length and spacing , for some absolute constant . By partitioning this progression into subprogressions of spacing and length (plus an error set of size , we see from the pigeonhole principle that we can find a progression of length and spacing on which has density at least (and hence at least ) for some absolute constant . If we then apply the induction hypothesis to the set

we conclude (for large enough) that contains a pair for some natural numbers with non-zero. This implies that lie in , a contradiction, establishing the implication (1).

A more careful analysis of the above argument reveals a more quantitative version of Theorem 1: for (say), any subset of of density at least for some sufficiently large absolute constant contains a pair with non-zero. This is not the best bound known; a (difficult) result of Pintz, Steiger, and Szemeredi allows the density to be as low as . On the other hand, this already improves on the (simpler) Fourier-analytic argument of Green that works for densities at least (although the original argument of Sarkozy, which is a little more intricate, works up to ). In the other direction, a construction of Rusza gives a set of density without any pairs .

Remark 1A similar argument also applies with replaced by for fixed , because this sort of pattern is preserved by affine dilations into arithmetic progressions whose spacing is a power. By re-introducing Fourier analysis, one can also perform an argument of this type for where is the sum of two squares; see the above-mentioned paper of Green for details. However there seems to be some technical difficulty in extending it to patterns of the form for polynomials that consist of more than a single monomial (and with the normalisation , to avoid local obstructions), because one no longer has this preservation property.

## 22 comments

Comments feed for this article

28 February, 2013 at 11:16 pm

HosseinWOW!

1 March, 2013 at 1:30 am

anonTypo: “Suppose for sake of contradiction that … holds” – should this be ?

[Corrected, thanks. Actually I found just now that one could actually get back the gain in density by being a bit more careful with the Cauchy-Schwarz, and updated the post accordingly. -T]1 March, 2013 at 4:09 am

MichaelIn the three equations before equation (2) I think r+h+h’ should be 2r+h+h’

[Corrected, thanks – T.]1 March, 2013 at 6:32 am

AnonymousThank you for the nice post! Actually, through shifting by (r+h)^2 one obtains 2r instead of r, and later 2rd but this is a minor thing.

1 March, 2013 at 6:38 am

Gil KalaiTerry, What are the upper and lower bounds on the required density in this case?

[Thanks for the question! I just added a link to the best known lower bound, by Ruzsa, at the end of this post. The best known upper bound is due to Pintz, Steiger, and Szemeredi and is also given at the end of the post. -T.]2 March, 2013 at 3:03 am

Ben GreenTerry, I don’t think my arguments from 2002 give , , – in the paper you refer to, the common difference has to be a sum of two squares, a much weaker result. Also, the original Fourier -analytic argument of S\’ark\”ozy, which is not too complicated, gives a bound of . The idea of this argument is to make sure your large Fourier coefficient is near a low-complexity rational, so you pass to a subprogression with small common difference.

[Corrected, thanks – T.]2 March, 2013 at 10:45 am

Gil KalaiThanks for the lower bound, Terry, That’s quite a gap :)

I have another question: For Roth we have 2 finite alphabet analogs: The first is DHJ(3) (the density Hales Jewett for 3-letters alphabet) and the second is the cupset problem (a set of (0,1,n}^n without an affine line). (Of course, there are intermediate problems, etc.)

For Furstenberg-Sarkozy we have a DHJ analog (which is the first case of a polynomial DHJ that Tim proposed to study; it is over a 2-letters alphabet, the ground set has the structure of edges of a complete graph, and asks for a Sperner theorem where the difference correspond to a complete graph).

http://gowers.wordpress.com/2009/11/14/the-first-unknown-case-of-polynomial-dhj/

My question is about an analog analogous to the the cupset analog. Maybe you have to exclude a difference is in a quadric? Do you know?

(If the forbidden things will includes those of the polynomial DHJ this would be nice but not necessary)

2 March, 2013 at 11:48 am

Terence TaoWell, the most direct capset analogue of the Furstenberg-Sarkozy theorem is that every dense subset of the polynomial ring contains a pair of polynomials of the form for non-zero. Here density is in the sense that the density in the space of polynomials of degree less than (which is isomorphic as a vector space to ) is bounded from below as . Pretty much any of the known proofs of the Furstenberg-Sarkozy theorem will adapt to this finite field setting (and in fact the arguments will usually become a little bit simpler) But this is not a purely alphabetical statement, because the operation of squaring a polynomial becomes quite complicated if one attempts to interpret it purely in terms of combinatorial operations on the coefficients (it is convolution rather than pointwise addition).

2 March, 2013 at 11:58 am

Gil KalaiThat’s fine, but why F_3 and not F_2?

2 March, 2013 at 4:18 pm

Terence TaoThe squaring function behaves quite differently in characteristic 2 as compared to odd (or zero) characteristic; in particular, in this characteristic the squaring function becomes linear (), and the result here collapses to something more analogous to the Ajtai-Szemeredi corners theorem than the Furstenberg-Sarkozy theorem.

14 March, 2013 at 1:19 pm

Gil KalaiRegarding as representing polynomials in one variable of degree less than n is very natural, but we can have other versions, e.g., polynomials of degree 2 in some m variables (or “bipartite” polynomials of degree 2) , which may make sense over as well. In particular, it will be nice to have cupset versions related to versions of polynomial HJT that Tim Gowers considered here http://gowers.wordpress.com/2009/11/14/the-first-unknown-case-of-polynomial-dhj/ .

3 March, 2013 at 1:33 pm

Neil LyallTerry, thanks for the very nice post!

Just wanted to point out that I believe the argument you present can be adapted in a standard way, using Lucier’s increment strategy (with the polynomials changing with each iteration), to obtain the analogous result for full class of so-called intersective polynomials. A nice exposition of Lucier’s iterative procedure can be found at the beginning of Chapter 6 (pp 40-42) of Alex Rice’s Ph.D. thesis, see http://www.math.uga.edu/~arice/AlexThesis.pdf

A quick calculation seems to give a bound of the order

for intersective polynomials of degree .

10 March, 2013 at 6:45 am

JosephThank you for one more great post. You work inspire me, Dr. Tao.

Can anyone help me with a very basic question? How does (1) guarantee that the infimus of the set of all \delta>0 for which P(\delta) holds is zero? And how does analyzing this leads to the fact that P holds for all \delta>0?

Sorry for the basic question, but I got stuck.

10 March, 2013 at 10:22 am

anonDefine the set . If then for any we have , simply because the theorem is stronger for smaller . So the set is some interval (closed or open ) and trivially it contains 1. Let’s call infimum of . may or may not be in but anything greater than it definitely is in .

If , then we are done, since this means contains any positive value, which proves the theorem. So let’s assume and derive a contradiction.

Let . Note that is (strictly) increasing and continuous. From (1) we have . Equivalently, .

Take any number . Since we have , therefore . But since we have . So we have a number which is in and smaller than , contradiction.

10 March, 2013 at 12:14 pm

JosephDear anon,

Thank you for your nice explanation!

I now got the idea.

Sincerely, Joseph

11 March, 2013 at 6:20 am

JosephDear anon,

Thank you again. But just a quick question: as we define f(x) = x+cx^3, can we consider the same value of c for all x? I mean, the beginning of Tao’s proof says that for any 0<delta00. Can we then fix c when defining f(x) (for example, by taking a c that satisfies (1) for any delta, if such one exists)?

Otherwise, if we state f(x) = x+c(x)x^3, with c now a function, can we still guarantee that f is invertible?

Sorry if the question is too basic… This is a new field for me.

Regards,

Joseph.

[It appears that you used some < and > signs in your answer that became interpreted as HTML and thus disappeared. If you use < and > instead then this should avoid the problem. In any case, is indeed the fixed absolute constant from (1). -T.]13 March, 2013 at 6:45 pm

JosephNever mind, I got it… Once you assume that the infimum is delta0 > 0, you define f(x) = x+cx^3, with c the value that satisfies (1) for that particular delta0 value (so, yes, a unique value for c). Than f is of course increasing and continuous, and everything else follows… I was thinking in a different way, but your explanation was perfect. Thanks!

23 March, 2013 at 7:50 am

solver6Hello, i don’t understand wy holds. Can someone help me.

23 March, 2013 at 9:21 am

Terence TaoFor all but about of the choices of n (the choices that are closest to the boundary of , the inner average is equal to .

19 April, 2013 at 6:32 am

AnonymousHi, I don’t understand the move from the inequality depending only on $h,n,r$ to the one depending also on $h’$.

Any help?

19 April, 2013 at 8:21 am

Terence Taowhenever is real-valued.

21 April, 2013 at 9:33 am

AnonymousMaybe the left side should be squared?

[Corrected, thanks – T.],