You are currently browsing the tag archive for the ‘Roth’s theorem’ tag.
In the modern theory of additive combinatorics, a large role is played by the Gowers uniformity norms , where
,
is a finite abelian group, and
is a function (one can also consider these norms in finite approximate groups such as
instead of finite groups, but we will focus on the group case here for simplicity). These norms can be defined by the formula
where we use the averaging notation
for any non-empty finite set (with
denoting the cardinality of
), and
is the multiplicative discrete derivative operator
One reason why these norms play an important role is that they control various multilinear averages. We give two sample examples here:
We establish these claims a little later in this post.
In some more recent literature (e.g., this paper of Conlon, Fox, and Zhao), the role of Gowers norms have been replaced by (generalisations) of the cut norm, a concept originating from graph theory. In this blog post, it will be convenient to define these cut norms in the language of probability theory (using boldface to denote random variables).
Definition 2 (Cut norm) Let
be independent random variables with
; to avoid minor technicalities we assume that these random variables are discrete and take values in a finite set. Given a random variable
of these independent random variables, we define the cut norm
where the supremum ranges over all choices
of random variables
that are
-bounded (thus
surely), and such that
does not depend on
.
If
, we abbreviate
as
.
Strictly speaking, the cut norm is only a cut semi-norm when , but we will abuse notation by referring to it as a norm nevertheless.
Example 3 If
is a bipartite graph, and
,
are independent random variables chosen uniformly from
respectively, then
where the supremum ranges over all
-bounded functions
,
. The right hand side is essentially the cut norm of the graph
, as defined for instance by Frieze and Kannan.
The cut norm is basically an expectation when :
Example 4 If
, we see from definition that
If
, one easily checks that
where
is the conditional expectation of
to the
-algebra generated by all the variables other than
, i.e., the
-algebra generated by
. In particular, if
are independent random variables drawn uniformly from
respectively, then
Here are some basic properties of the cut norm:
Lemma 5 (Basic properties of cut norm) Let
be independent discrete random variables, and
a function of these variables.
- (i) (Permutation invariance) The cut norm
is invariant with respect to permutations of the
, or permutations of the
.
- (ii) (Conditioning) One has
where on the right-hand side we view, for each realisation
of
,
as a function
of the random variables
alone, thus the right-hand side may be expanded as
- (iii) (Monotonicity) If
, we have
- (iv) (Multiplicative invariances) If
is a
-bounded function that does not depend on one of the
, then
In particular, if we additionally assume
, then
- (v) (Cauchy-Schwarz) If
, one has
where
is a copy of
that is independent of
and
is the random variable
- (vi) (Averaging) If
and
, where
is another random variable independent of
, and
is a random variable depending on both
and
, then
Proof: The claims (i), (ii) are clear from expanding out all the definitions. The claim (iii) also easily follows from the definitions (the left-hand side involves a supremum over a more general class of multipliers , while the right-hand side omits the
multiplier), as does (iv) (the multiplier
can be absorbed into one of the multipliers in the definition of the cut norm). The claim (vi) follows by expanding out the definitions, and observing that all of the terms in the supremum appearing in the left-hand side also appear as terms in the supremum on the right-hand side. It remains to prove (v). By definition, the left-hand side is the supremum over all quantities of the form
where the are
-bounded functions of
that do not depend on
. We average out in the
direction (that is, we condition out the variables
), and pull out the factor
(which does not depend on
), to write this as
which by Cauchy-Schwarz is bounded by
which can be expanded using the copy as
Expanding
and noting that each is
-bounded and independent of
for
, we obtain the claim.
Now we can relate the cut norm to Gowers uniformity norms:
Lemma 6 Let
be a finite abelian group, let
be independent random variables uniformly drawn from
for some
, and let
. Then
If
is additionally assumed to be
-bounded, we have the converse inequalities
Proof: Applying Lemma 5(v) times, we can bound
where are independent copies of
that are also independent of
. The expression inside the norm can also be written as
so by Example 4 one can write (6) as
which after some change of variables simplifies to
which by Cauchy-Schwarz is bounded by
which one can rearrange as
giving (2). A similar argument bounds
by
which gives (3).
For (4), we can reverse the above steps and expand as
which we can write as
for some -bounded function
. This can in turn be expanded as
for some -bounded functions
that do not depend on
. By Example 4, this can be written as
which by several applications of Theorem 5(iii) and then Theorem 5(iv) can be bounded by
giving (4). A similar argument gives (5).
Now we can prove Proposition 1. We begin with part (i). By permutation we may assume , then by translation we may assume
. Replacing
by
and
by
, we can write the left-hand side of (1) as
where
is a -bounded function that does not depend on
. Taking
to be independent random variables drawn uniformly from
, the left-hand side of (1) can then be written as
which by Example 4 is bounded in magnitude by
After many applications of Lemma 5(iii), (iv), this is bounded by
By Lemma 5(ii) we may drop the variable, and then the claim follows from Lemma 6.
For part (ii), we replace by
and
by
to write the left-hand side as
the point here is that the first factor does not involve , the second factor does not involve
, and the third factor has no quadratic terms in
. Letting
be independent variables drawn uniformly from
, we can use Example 4 to bound this in magnitude by
which by Lemma 5(i),(iii),(iv) is bounded by
and then by Lemma 5(v) we may bound this by
which by Example 4 is
Now the expression inside the expectation is the product of four factors, each of which is or
applied to an affine form
where
depends on
and
is one of
,
,
,
. With probability
, the four different values of
are distinct, and then by part (i) we have
When they are not distinct, we can instead bound this quantity by . Taking expectations in
, we obtain the claim.
The analogue of the inverse theorem for cut norms is the following claim (which I learned from Ben Green):
Lemma 7 (
-type inverse theorem) Let
be independent random variables drawn from a finite abelian group
, and let
be
-bounded. Then we have
where
is the group of homomorphisms
is a homomorphism from
to
, and
.
Proof: Suppose first that for some
, then by definition
for some -bounded
. By Fourier expansion, the left-hand side is also
where . From Plancherel’s theorem we have
hence by Hölder’s inequality one has for some
, and hence
Conversely, suppose (7) holds. Then there is such that
which on substitution and Example 4 implies
The term splits into the product of a factor
not depending on
, and a factor
not depending on
. Applying Lemma 5(iii), (iv) we conclude that
The claim follows.
The higher order inverse theorems are much less trivial (and the optimal quantitative bounds are not currently known). However, there is a useful degree lowering argument, due to Peluse and Prendiville, that can allow one to lower the order of a uniformity norm in some cases. We give a simple version of this argument here:
Lemma 8 (Degree lowering argument, special case) Let
be a finite abelian group, let
be a non-empty finite set, and let
be a function of the form
for some
-bounded functions
indexed by
. Suppose that
for some
and
. Then one of the following claims hold (with implied constants allowed to depend on
):
- (i) (Degree lowering) one has
.
- (ii) (Non-zero frequency) There exist
and non-zero
such that
There are more sophisticated versions of this argument in which the frequency is “minor arc” rather than “zero frequency”, and then the Gowers norms are localised to suitable large arithmetic progressions; this is implicit in the above-mentioned paper of Peluse and Prendiville.
Proof: One can write
and hence we conclude that
for a set of tuples
of density
. Applying Lemma 6 and Lemma 7, we see that for each such tuple, there exists
such that
where is drawn uniformly from
.
Let us adopt the convention that vanishes for
not in
, then from Lemma 5(ii) we have
where are independent random variables drawn uniformly from
and also independent of
. By repeated application of Lemma 5(iii) we then have
Expanding out and using Lemma 5(iv) repeatedly we conclude that
From definition of we then have
By Lemma 5(vi), we see that the left-hand side is less than
where is drawn uniformly from
, independently of
. By repeated application of Lemma 5(i), (v) repeatedly, we conclude that
where are independent copies of
that are also independent of
,
. By Lemma 5(ii) and Example 4 we conclude that
with probability .
The left-hand side can be rewritten as
where is the additive version of
, thus
Translating , we can simplify this a little to
If the frequency is ever non-vanishing in the event (9) then conclusion (ii) applies. We conclude that
with probability . In particular, by the pigeonhole principle, there exist
such that
with probability . Expanding this out, we obtain a representation of the form
holding with probability , where the
are functions that do not depend on the
coordinate. From (8) we conclude that
for of the tuples
. Thus by Lemma 5(ii)
By repeated application of Lemma 5(iii) we then have
and then by repeated application of Lemma 5(iv)
and then the conclusion (i) follows from Lemma 6.
As an application of degree lowering, we give an inverse theorem for the average in Proposition 1(ii), first established by Bourgain-Chang and later reproved by Peluse (by different methods from those given here):
Proposition 9 Let
be a cyclic group of prime order. Suppose that one has
-bounded functions
such that
for some
. Then either
, or one has
We remark that a modification of the arguments below also give .
Proof: The left-hand side of (10) can be written as
where is the dual function
By Cauchy-Schwarz one thus has
and hence by Proposition 1, we either have (in which case we are done) or
Writing with
, we conclude that either
, or that
for some and non-zero
. The left-hand side can be rewritten as
where and
. We can rewrite this in turn as
which is bounded by
where are independent random variables drawn uniformly from
. Applying Lemma 5(v), we conclude that
However, a routine Gauss sum calculation reveals that the left-hand side is for some absolute constant
because
is non-zero, so that
. The only remaining case to consider is when
Repeating the above arguments we then conclude that
and then
The left-hand side can be computed to equal , and the claim follows.
This argument was given for the cyclic group setting, but the argument can also be applied to the integers (see Peluse-Prendiville) and can also be used to establish an analogue over the reals (that was first obtained by Bourgain).
Szemerédi’s theorem asserts that all subsets of the natural numbers of positive density contain arbitrarily long arithmetic progressions. Roth’s theorem is the special case when one considers arithmetic progressions of length three. Both theorems have many important proofs using tools from additive combinatorics, (higher order) Fourier analysis, (hyper) graph regularity theory, and ergodic theory. However, the original proof by Endre Szemerédi, while extremely intricate, was purely combinatorial (and in particular “elementary”) and almost entirely self-contained, except for an invocation of the van der Waerden theorem. It is also notable for introducing a prototype of what is now known as the Szemerédi regularity lemma.
Back in 2005, I rewrote Szemerédi’s original proof in order to understand it better, however my rewrite ended up being about the same length as the original argument and was probably only usable to myself. In 2012, after Szemerédi was awarded the Abel prize, I revisited this argument with the intention to try to write up a more readable version of the proof, but ended up just presenting some ingredients of the argument in a blog post, rather than try to rewrite the whole thing. In that post, I suspected that the cleanest way to write up the argument would be through the language of nonstandard analysis (perhaps in an iterated hyperextension that could handle various hierarchies of infinitesimals), but was unable to actually achieve any substantial simplifications by passing to the nonstandard world.
A few weeks ago, I participated in a week-long workshop at the American Institute of Mathematics on “Nonstandard methods in combinatorial number theory”, and spent some time in a working group with Shabnam Akhtari, Irfam Alam, Renling Jin, Steven Leth, Karl Mahlburg, Paul Potgieter, and Henry Towsner to try to obtain a manageable nonstandard version of Szemerédi’s original proof. We didn’t end up being able to do so – in fact there are now signs that perhaps nonstandard analysis is not the optimal framework in which to place this argument – but we did at least clarify the existing standard argument, to the point that I was able to go back to my original rewrite of the proof and present it in a more civilised form, which I am now uploading here as an unpublished preprint. There are now a number of simplifications to the proof. Firstly, one no longer needs the full strength of the regularity lemma; only the simpler “weak” regularity lemma of Frieze and Kannan is required. Secondly, the proof has been “factored” into a number of stand-alone propositions of independent interest, in particular involving just (families of) one-dimensional arithmetic progressions rather than the complicated-looking multidimensional arithmetic progressions that occur so frequently in the original argument of Szemerédi. Finally, the delicate manipulations of densities and epsilons via double counting arguments in Szemerédi’s original paper have been abstracted into a certain key property of families of arithmetic progressions that I call the “double counting property”.
The factoring mentioned above is particularly simple in the case of proving Roth’s theorem, which is now presented separately in the above writeup. Roth’s theorem seeks to locate a length three progression in which all three elements lie in a single set. This will be deduced from an easier variant of the theorem in which one locates (a family of) length three progressions in which just the first two elements
of the progression lie in a good set (and some other properties of the family are also required). This is in turn derived from an even easier variant in which now just the first element of the progression is required to be in the good set.
More specifically, Roth’s theorem is now deduced from
Theorem 1.5. Let
be a natural number, and let
be a set of integers of upper density at least
. Then, whenever
is partitioned into finitely many colour classes, there exists a colour class
and a family
of 3-term arithmetic progressions with the following properties:
- For each
,
and
lie in
.
- For each
,
lie in
.
- The
for
are in arithmetic progression.
The situation in this theorem is depicted by the following diagram, in which elements of are in blue and elements of
are in grey:
Theorem 1.5 is deduced in turn from the following easier variant:
Theorem 1.6. Let
be a natural number, and let
be a set of integers of upper density at least
. Then, whenever
is partitioned into finitely many colour classes, there exists a colour class
and a family
of 3-term arithmetic progressions with the following properties:
- For each
,
lie in
.
- For each
,
and
lie in
.
- The
for
are in arithmetic progression.
The situation here is described by the figure below.
Theorem 1.6 is easy to prove. To derive Theorem 1.5 from Theorem 1.6, or to derive Roth’s theorem from Theorem 1.5, one uses double counting arguments, van der Waerden’s theorem, and the weak regularity lemma, largely as described in this previous blog post; see the writeup for the full details. (I would be interested in seeing a shorter proof of Theorem 1.5 though that did not go through these arguments, and did not use the more powerful theorems of Roth or Szemerédi.)
Roth’s theorem on arithmetic progressions asserts that every subset of the integers of positive upper density contains infinitely many arithmetic progressions of length three. There are many versions and variants of this theorem. Here is one of them:
Theorem 1 (Roth’s theorem) Let
be a compact abelian group, with Haar probability measure
, which is
-divisible (i.e. the map
is surjective) and let
be a measurable subset of
with
for some
. Then we have
where
denotes the bound
for some
depending only on
.
This theorem is usually formulated in the case that is a finite abelian group of odd order (in which case the result is essentially due to Meshulam) or more specifically a cyclic group
of odd order (in which case it is essentially due to Varnavides), but is also valid for the more general setting of
-divisible compact abelian groups, as we shall shortly see. One can be more precise about the dependence of the implied constant
on
, but to keep the exposition simple we will work at the qualitative level here, without trying at all to get good quantitative bounds. The theorem is also true without the
-divisibility hypothesis, but the proof we will discuss runs into some technical issues due to the degeneracy of the
shift in that case.
We can deduce Theorem 1 from the following more general Khintchine-type statement. Let denote the Pontryagin dual of a compact abelian group
, that is to say the set of all continuous homomorphisms
from
to the (additive) unit circle
. Thus
is a discrete abelian group, and functions
have a Fourier transform
defined by
If is
-divisible, then
is
-torsion-free in the sense that the map
is injective. For any finite set
and any radius
, define the Bohr set
where denotes the distance of
to the nearest integer. We refer to the cardinality
of
as the rank of the Bohr set. We record a simple volume bound on Bohr sets:
Lemma 2 (Volume packing bound) Let
be a compact abelian group with Haar probability measure
. For any Bohr set
, we have
Proof: We can cover the torus by
translates
of the cube
. Then the sets
form an cover of
. But all of these sets lie in a translate of
, and the claim then follows from the translation invariance of
.
Given any Bohr set , we define a normalised “Lipschitz” cutoff function
by the formula
where is the constant such that
thus
The function should be viewed as an
-normalised “tent function” cutoff to
. Note from Lemma 2 that
We then have the following sharper version of Theorem 1:
Theorem 3 (Roth-Khintchine theorem) Let
be a
-divisible compact abelian group, with Haar probability measure
, and let
. Then for any measurable function
, there exists a Bohr set
with
and
such that
where
denotes the convolution operation
A variant of this result (expressed in the language of ergodic theory) appears in this paper of Bergelson, Host, and Kra; a combinatorial version of the Bergelson-Host-Kra result that is closer to Theorem 3 subsequently appeared in this paper of Ben Green and myself, but this theorem arguably appears implicitly in a much older paper of Bourgain. To see why Theorem 3 implies Theorem 1, we apply the theorem with and
equal to a small multiple of
to conclude that there is a Bohr set
with
and
such that
But from (2) we have the pointwise bound , and Theorem 1 follows.
Below the fold, we give a short proof of Theorem 3, using an “energy pigeonholing” argument that essentially dates back to the 1986 paper of Bourgain mentioned previously (not to be confused with a later 1999 paper of Bourgain on Roth’s theorem that was highly influential, for instance in emphasising the importance of Bohr sets). The idea is to use the pigeonhole principle to choose the Bohr set to capture all the “large Fourier coefficients” of
, but such that a certain “dilate” of
does not capture much more Fourier energy of
than
itself. The bound (3) may then be obtained through elementary Fourier analysis, without much need to explicitly compute things like the Fourier transform of an indicator function of a Bohr set. (However, the bound obtained by this argument is going to be quite poor – of tower-exponential type.) To do this we perform a structural decomposition of
into “structured”, “small”, and “highly pseudorandom” components, as is common in the subject (e.g. in this previous blog post), but even though we crucially need to retain non-negativity of one of the components in this decomposition, we can avoid recourse to conditional expectation with respect to a partition (or “factor”) of the space, using instead convolution with one of the
considered above to achieve a similar effect.
We now give a basic application of Fourier analysis to the problem of counting additive patterns in sets, namely the following famous theorem of Roth:
Theorem 1 (Roth’s theorem) Let
be a subset of the integers
whose upper density
is positive. Then
contains infinitely many arithmetic progressions
of length three, with
and
.
This is the first non-trivial case of Szemerédi’s theorem, which is the same assertion but with length three arithmetic progressions replaced by progressions of length for any
.
As it turns out, one can prove Roth’s theorem by an application of linear Fourier analysis – by comparing the set (or more precisely, the indicator function
of that set, or of pieces of that set) against linear characters
for various frequencies
. There are two extreme cases to consider (which are model examples of a more general dichotomy between structure and randomness). One is when
is aligned up almost completely with one of these linear characters, for instance by being a Bohr set of the form
or more generally of the form
for some multi-dimensional frequency and some open set
. In this case, arithmetic progressions can be located using the equidistribution theory of the previous set of notes. At the other extreme, one has Fourier-uniform or Fourier-pseudorandom sets, whose correlation with any linear character is negligible. In this case, arithmetic progressions can be produced in abundance via a Fourier-analytic calculation.
To handle the general case, one must somehow synthesise together the argument that deals with the structured case with the argument that deals with the random case. There are several known ways to do this, but they can be basically classified into two general methods, namely the density increment argument (or increment argument) and the energy increment argument (or
increment argument).
The idea behind the density increment argument is to introduce a dichotomy: either the object being studied is pseudorandom (in which case one is done), or else one can use the theory of the structured objects to locate a sub-object of significantly higher “density” than the original object. As the density cannot exceed one, one should thus be done after a finite number of iterations of this dichotomy. This argument was introduced by Roth in his original proof of the above theorem.
The idea behind the energy increment argument is instead to decompose the original object into two pieces (and, sometimes, a small additional error term): a structured component that captures all the structured objects that have significant correlation with
, and a pseudorandom component which has no significant correlation with any structured object. This decomposition usually proceeds by trying to maximise the “energy” (or
norm) of the structured component, or dually by trying to minimise the energy of the residual between the original object and the structured object. This argument appears for instance in the proof of the Szemerédi regularity lemma (which, not coincidentally, can also be used to prove Roth’s theorem), and is also implicit in the ergodic theory approach to such problems (through the machinery of conditional expectation relative to a factor, which is a type of orthogonal projection, the existence of which is usually established via an energy increment argument). However, one can also deploy the energy increment argument in the Fourier analytic setting, to give an alternate Fourier-analytic proof of Roth’s theorem that differs in some ways from the density increment proof.
In these notes we give both two Fourier-analytic proofs of Roth’s theorem, one proceeding via the density increment argument, and the other by the energy increment argument. As it turns out, both of these arguments extend to establish Szemerédi’s theorem, and more generally in counting other types of patterns, but this is non-trivial (requiring some sort of inverse conjecture for the Gowers uniformity norms in both cases); we will discuss this further in later notes.
Read the rest of this entry »
In the previous lecture, we studied the recurrence properties of compact systems, which are systems in which all measurable functions exhibit almost periodicity – they almost return completely to themselves after repeated shifting. Now, we consider the opposite extreme of mixing systems – those in which all measurable functions (of mean zero) exhibit mixing – they become orthogonal to themselves after repeated shifting. (Actually, there are two different types of mixing, strong mixing and weak mixing, depending on whether the orthogonality occurs individually or on the average; it is the latter concept which is of more importance to the task of establishing the Furstenberg recurrence theorem.)
We shall see that for weakly mixing systems, averages such as can be computed very explicitly (in fact, this average converges to the constant
). More generally, we shall see that weakly mixing components of a system tend to average themselves out and thus become irrelevant when studying many types of ergodic averages. Our main tool here will be the humble Cauchy-Schwarz inequality, and in particular a certain consequence of it, known as the van der Corput lemma.
As one application of this theory, we will be able to establish Roth’s theorem (the k=3 case of Szemerédi’s theorem).
Earlier this month, in the previous incarnation of this page, I posed a question which I thought was unsolved, and obtained the answer (in fact, it was solved 25 years ago) within a week. Now that this new version of the page has better feedback capability, I am now tempted to try again, since I have a large number of such questions which I would like to publicise. (Actually, I even have a secret web page full of these somewhere near my home page, though it will take a non-trivial amount of effort to find it!)
Perhaps my favourite open question is the problem on the maximal size of a cap set – a subset of (
being the finite field of three elements) which contains no lines, or equivalently no non-trivial arithmetic progressions of length three. As an upper bound, one can easily modify the proof of Roth’s theorem to show that cap sets must have size
(see e.g. this paper of Meshulam). This of course is better than the trivial bound of
once n is large. In the converse direction, the trivial example
shows that cap sets can be as large as
; the current world record is
, held by Edel. The gap between these two bounds is rather enormous; I would be very interested in either an improvement of the upper bound to
, or an improvement of the lower bound to
. (I believe both improvements are true, though a good friend of mine disagrees about the improvement to the lower bound.)
Recent Comments