You are currently browsing the tag archive for the ‘arithmetic progressions’ tag.
In the modern theory of higher order Fourier analysis, a key role are played by the Gowers uniformity norms for
. For finitely supported functions
, one can define the (non-normalised) Gowers norm
by the formula
The significance of the Gowers norms is that they control other multilinear forms that show up in additive combinatorics. Given any polynomials and functions
, we define the multilinear form
-
and
have true complexity
;
-
has true complexity
;
-
has true complexity
;
- The form
(which among other things could be used to count twin primes) has infinite true complexity (which is quite unfortunate for applications).
Gowers and Wolf formulated a conjecture on what this complexity should be, at least for linear polynomials ; Ben Green and I thought we had resolved this conjecture back in 2010, though it turned out there was a subtle gap in our arguments and we were only able to resolve the conjecture in a partial range of cases. However, the full conjecture was recently resolved by Daniel Altman.
The (semi-)norm is so weak that it barely controls any averages at all. For instance the average
Because of this, I propose inserting an additional norm in the Gowers uniformity norm hierarchy between the and
norms, which I will call the
(or “profinite
“) norm:
The norm recently appeared implicitly in work of Peluse and Prendiville, who showed that the form
had true complexity
in this notation (with polynomially strong bounds). [Actually, strictly speaking this control was only shown for the third function
; for the first two functions
one needs to localize the
norm to intervals of length
. But I will ignore this technical point to keep the exposition simple.] The weaker claim that
has true complexity
is substantially easier to prove (one can apply the circle method together with Gauss sum estimates).
The well known inverse theorem for the norm tells us that if a
-bounded function
has
norm at least
for some
, then there is a Fourier phase
such that
For one has a trivial inverse theorem; by definition, the
norm of
is at least
if and only if
For one has the intermediate situation in which the frequency
is not taken to be zero, but is instead major arc. Indeed, suppose that
is
-bounded with
, thus
Here is a diagram showing some of the control relationships between various Gowers norms, multilinear forms, and duals of classes of functions (where each class of functions
induces a dual norm
:
Here I have included the three classes of functions that one can choose from for the inverse theorem, namely degree two nilsequences, bracket quadratic phases, and local quadratic phases, as well as the more narrow class of globally quadratic phases.
The Gowers norms have counterparts for measure-preserving systems , known as Host-Kra seminorms. The
norm can be defined for
as
Ben Green and I have (finally!) uploaded to the arXiv our paper “New bounds for Szemerédi’s theorem, III: A polylogarithmic bound for “, submitted to Mathematika. This is the sequel to two previous papers (and an erratum to the former paper), concerning quantitative versions of Szemerédi’s theorem in the case of length four progressions. This sequel has been delayed for over a decade for a number of reasons, but we have finally managed to write the arguments up to our satisfaction and submit it (to a special issue of Mathematika honouring the work of Klaus Roth).
For any natural number , define
to be the largest cardinality of a subset
of
which does not contain any non-trivial arithmetic progressions
of length four (where “non-trivial” means that
is non-zero). Trivially we have
. In 1969, Szemerédi showed that
. However, the decay rate that could be theoretically extracted from this argument (and from several subsequent proofs of this bound, including one by Roth) were quite poor. The first significant quantitative bound on this quantity was by Gowers, who showed that
for some absolute constant
. In the second paper in the above-mentioned series, we managed to improve this bound to
. In this paper, we improve the bound further to
, which seems to be the limit of the methods. (We remark that if we could take
to be larger than one, this would imply the length four case of a well known conjecture of Erdös that any set of natural numbers whose sum of reciprocals diverges would contain arbitrarily long arithmetic progressions. Thanks to the work of Sanders and of Bloom, the corresponding case of the conjecture for length three conjectures is nearly settled, as it is known that for the analogous bound on
one can take any
less than one.)
Most of the previous work on bounding relied in some form or another on the density increment argument introduced by Roth back in 1953; roughly speaking, the idea is to show that if a dense subset
of
fails to contain arithmetic progressions of length four, one seeks to then locate a long subprogression of
in which
has increased density. This was the basic method for instance underlying our previous bound
, as well as a finite field analogue of the bound
; however we encountered significant technical difficulties for several years in extending this argument to obtain the result of the current paper. Our method is instead based on “energy increment arguments”, and more specifically on establishing quantitative version of a Khintchine-type recurrence theorem, similar to the qualitative recurrence theorems established (in the ergodic theory context) by Bergelson-Host-Kra, and (in the current combinatorial context) by Ben Green and myself.
One way to phrase the latter recurrence theorem is as follows. Suppose that has density
. Then one would expect a “randomly” selected arithmetic progression
in
(using the convention that random variables will be in boldface) to be contained in
with probability about
. This is not true in general, however it was shown by Ben and myself that for any
, there was a set of shifts
of cardinality
, such that for any such
one had
if was chosen uniformly at random from
. This easily implies that
, but does not give a particularly good bound on the decay rate, because the implied constant in the cardinality lower bound
is quite poor (in fact of tower-exponential type, due to the use of regularity lemmas!), and so one has to take
to be extremely large compared to
to avoid the possibility that the set of shifts in the above theorem consists only of the trivial shift
.
We do not know how to improve the lower bound on the set of shifts to the point where it can give bounds that are competitive with those in this paper. However, we can obtain better quantitative results if we permit ourselves to couple together the two parameters and
of the length four progression. Namely, with
,
,
as above, we are able to show that there exist random variables
, not necessarily independent, such that
and such that we have the non-degeneracy bound
This then easily implies the main theorem.
The energy increment method is then deployed to locate a good pair of random variables that will obey the above bounds. One can get some intuition on how to proceed here by considering some model cases. Firstly one can consider a “globally quadratically structured” case in which the indicator function
“behaves like” a globally quadratic function such as
, for some irrational
and some smooth periodic function
of mean
. If one then takes
to be uniformly distributed in
and
respectively for some small
, with no coupling between the two variables, then the left-hand side of (1) is approximately of the form
where the integral is with respect to the probability Haar measure, and the constraint ultimately arises from the algebraic constraint
However, an application of the Cauchy-Schwarz inequality and Fubini’s theorem shows that the integral in (2) is at least , which (morally at least) gives (1) in this case.
Due to the nature of the energy increment argument, it also becomes necessary to consider “locally quadratically structured” cases, in which is partitioned into some number of structured pieces
(think of these as arithmetic progressions, or as “Bohr sets), and on each piece
,
behaves like a locally quadratic function such as
, where
now varies with
, and the mean of
will be approximately
on the average after averaging in
(weighted by the size of the pieces
). Now one should select
and
in the following coupled manner: first one chooses
uniformly from
, then one defines
to be the label
such that
, and then selects
uniformly from a set
which is related to
in much the same way that
is related to
. If one does this correctly, the analogue of (2) becomes
and one can again use Cauchy-Schwarz and Fubini’s theorem to conclude.
The general case proceeds, very roughly, by an iterative argument. At each stage of the iteration, one has some sort of quadratic model of which involves a decomposition of
into structured pieces
, and a quadratic approximation to
on each piece. If this approximation is accurate enough (or more precisely, if a certain (averaged) local Gowers uniformity norm
of the error is small enough) to model the count in (1) (for random variables
determined by the above partition of
into pieces
), and if the frequencies (such as
) involved in the quadratic approximation are “high rank” or “linearly independent over the rationals” in a suitably quantitative sense, then some version of the above arguments can be made to work. If there are some unwanted linear dependencies in the frequencies, we can do some linear algebra to eliminate one of the frequencies (using some geometry of numbers to keep the quantitative bounds under control) and continue the iteration. If instead the approximation is too inaccurate, then the error will be large in a certain averaged local Gowers uniformity norm
. A significant fraction of the paper is then devoted to establishing a quantitative inverse theorem for that norm that concludes (with good bounds) that the error must then locally correlate with locally quadratic phases, which can be used to refine the quadratic approximation to
in a manner that significantly increases its “energy” (basically an
norm). Such energy increments cannot continue indefinitely, and when they terminate we obtain the desired claim.
There are existing inverse theorems for type norms in the literature, going back to the pioneering work of Gowers mentioned previously, and relying on arithmetic combinatorics tools such as Freiman’s theorem and the Balog-Szemerédi-Gowers lemma, which are good for analysing the “
-structured homomorphisms” that arise in Gowers’ argument. However, when we applied these methods to the local Gowers norms we obtained inferior quantitative results that were not strong enough for our application. Instead, we use arguments from a different paper of Gowers in which he tackled Szemerédi’s theorem for arbitrary length progressions. This method produces “
-structured homomorphisms” associated to any function with large Gowers uniformity norm; however the catch is that such homomorphisms are initially supported only on a sparse unstructured set, rather than a structured set such as a Bohr set. To proceed further, one first has to locate inside the sparse unstructured set a sparse pseudorandom subset of a Bohr set, and then use “error-correction” type methods (such as “majority-vote” based algorithms) to locally upgrade this
-structured homomorphism on pseudorandom subsets of Bohr sets to a
-structured homomorphism on the entirety of a Bohr set. It is then possible to use some “approximate cohomology” tools to “integrate” these homomorphisms (and discern a key “local symmetry” property of these homomorphisms) to locate the desired local quadratic structure (in much the same fashion that a
-form on
that varies linearly with the coordinates can be integrated to be the derivative of a quadratic function if we know that the
-form is closed). These portions of the paper are unfortunately rather technical, but broadly follow the methods already used in previous literature.
The lonely runner conjecture is the following open problem:
Conjecture 1 Suppose one has
runners on the unit circle
, all starting at the origin and moving at different speeds. Then for each runner, there is at least one time
for which that runner is “lonely” in the sense that it is separated by a distance at least
from all other runners.
One can normalise the speed of the lonely runner to be zero, at which point the conjecture can be reformulated (after replacing by
) as follows:
Conjecture 2 Let
be non-zero real numbers for some
. Then there exists a real number
such that the numbers
are all a distance at least
from the integers, thus
where
denotes the distance of
to the nearest integer.
This conjecture has been proven for , but remains open for larger
. The bound
is optimal, as can be seen by looking at the case
and applying the Dirichlet approximation theorem. Note that for each non-zero
, the set
has (Banach) density
for any
, and from this and the union bound we can easily find
for which
for any , but it has proven to be quite challenging to remove the factor of
to increase
to
. (As far as I know, even improving
to
for some absolute constant
and sufficiently large
remains open.)
The speeds in the above conjecture are arbitrary non-zero reals, but it has been known for some time that one can reduce without loss of generality to the case when the
are rationals, or equivalently (by scaling) to the case where they are integers; see e.g. Section 4 of this paper of Bohman, Holzman, and Kleitman.
In this post I would like to remark on a slight refinement of this reduction, in which the speeds are integers of bounded size, where the bound depends on
. More precisely:
Proposition 3 In order to prove the lonely runner conjecture, it suffices to do so under the additional assumption that the
are integers of size at most
, where
is an (explicitly computable) absolute constant. (More precisely: if this restricted version of the lonely runner conjecture is true for all
, then the original version of the conjecture is also true for all
.)
In principle, this proposition allows one to verify the lonely runner conjecture for a given in finite time; however the number of cases to check with this proposition grows faster than exponentially in
, and so this is unfortunately not a feasible approach to verifying the lonely runner conjecture for more values of
than currently known.
One of the key tools needed to prove this proposition is the following additive combinatorics result. Recall that a generalised arithmetic progression (or ) in the reals
is a set of the form
for some and
; the quantity
is called the rank of the progression. If
, the progression
is said to be
-proper if the sums
with
for
are all distinct. We have
Lemma 4 (Progressions lie inside proper progressions) Let
be a GAP of rank
in the reals, and let
. Then
is contained in a
-proper GAP
of rank at most
, with
Proof: See Theorem 2.1 of this paper of Bilu. (Very similar results can also be found in Theorem 3.40 of my book with Van Vu, or Theorem 1.10 of this paper of mine with Van Vu.)
Now let , and assume inductively that the lonely runner conjecture has been proven for all smaller values of
, as well as for the current value of
in the case that
are integers of size at most
for some sufficiently large
. We will show that the lonely runner conjecture holds in general for this choice of
.
let be non-zero real numbers. Let
be a large absolute constant to be chosen later. From the above lemma applied to the GAP
, one can find a
-proper GAP
of rank at most
containing
such that
in particular if
is large enough depending on
.
We write
for some ,
, and
. We thus have
for
, where
is the linear map
and
are non-zero and lie in the box
.
We now need an elementary lemma that allows us to create a “collision” between two of the via a linear projection, without making any of the
collide with the origin:
Lemma 5 Let
be non-zero vectors that are not all collinear with the origin. Then, after replacing one or more of the
with their negatives
if necessary, there exists a pair
such that
, and such that none of the
is a scalar multiple of
.
Proof: We may assume that , since the
case is vacuous. Applying a generic linear projection to
(which does not affect collinearity, or the property that a given
is a scalar multiple of
), we may then reduce to the case
.
By a rotation and relabeling, we may assume that lies on the negative
-axis; by flipping signs as necessary we may then assume that all of the
lie in the closed right half-plane. As the
are not all collinear with the origin, one of the
lies off of the
-axis, by relabeling, we may assume that
lies off of the
axis and makes a minimal angle with the
-axis. Then the angle of
with the
-axis is non-zero but smaller than any non-zero angle that any of the
make with this axis, and so none of the
are a scalar multiple of
, and the claim follows.
We now return to the proof of the proposition. If the are all collinear with the origin, then
lie in a one-dimensional arithmetic progression
, and then by rescaling we may take the
to be integers of magnitude at most
, at which point we are done by hypothesis. Thus, we may assume that the
are not all collinear with the origin, and so by the above lemma and relabeling we may assume that
is non-zero, and that none of the
are scalar multiples of
.
with for
; by relabeling we may assume without loss of generality that
is non-zero, and furthermore that
where is a natural number and
have no common factor.
We now define a variant of
by the map
where the are real numbers that are linearly independent over
, whose precise value will not be of importance in our argument. This is a linear map with the property that
, so that
consists of at most
distinct real numbers, which are non-zero since none of the
are scalar multiples of
, and the
are linearly independent over
. As we are assuming inductively that the lonely runner conjecture holds for
, we conclude (after deleting duplicates) that there exists at least one real number
such that
We would like to “approximate” by
to then conclude that there is at least one real number
such that
It turns out that we can do this by a Fourier-analytic argument taking advantage of the -proper nature of
. Firstly, we see from the Dirichlet approximation theorem that one has
for a set of reals of (Banach) density
. Thus, by the triangle inequality, we have
for a set of reals of density
.
Applying a smooth Fourier multiplier of Littlewood-Paley type, one can find a trigonometric polynomial
which takes values in , is
for
, and is no larger than
for
. We then have
where denotes the mean value of a quasiperiodic function
on the reals
. We expand the left-hand side out as
From the genericity of , we see that the constraint
occurs if and only if is a scalar multiple of
, or equivalently (by (1), (2)) an integer multiple of
. Thus
and is the Dirichlet series
By Fourier expansion and writing , we may write (4) as
The support of the implies that
. Because of the
-properness of
, we see (for
large enough) that the equation
and conversely that (7) implies that (6) holds for some with
. From (3) we thus have
In particular, there exists a such that
Since is bounded in magnitude by
, and
is bounded by
, we thus have
for each , which by the size properties of
implies that
for all
, giving the lonely runner conjecture for
.
In the previous set of notes, we saw how zero-density theorems for the Riemann zeta function, when combined with the zero-free region of Vinogradov and Korobov, could be used to obtain prime number theorems in short intervals. It turns out that a more sophisticated version of this type of argument also works to obtain prime number theorems in arithmetic progressions, in particular establishing the celebrated theorem of Linnik:
Theorem 1 (Linnik’s theorem) Let
be a primitive residue class. Then
contains a prime
with
.
In fact it is known that one can find a prime with
, a result of Xylouris. For sake of comparison, recall from Exercise 65 of Notes 2 that the Siegel-Walfisz theorem gives this theorem with a bound of
, and from Exercise 48 of Notes 2 one can obtain a bound of the form
if one assumes the generalised Riemann hypothesis. The probabilistic random models from Supplement 4 suggest that one should in fact be able to take
.
We will not aim to obtain the optimal exponents for Linnik’s theorem here, and follow the treatment in Chapter 18 of Iwaniec and Kowalski. We will in fact establish the following more quantitative result (a special case of a more powerful theorem of Gallagher), which splits into two cases, depending on whether there is an exceptional zero or not:
Theorem 2 (Quantitative Linnik theorem) Let
be a primitive residue class for some
. For any
, let
denote the quantity
Assume that
for some sufficiently large
.
- (i) (No exceptional zero) If all the real zeroes
of
-functions
of real characters
of modulus
are such that
, then
for all
and some absolute constant
.
- (ii) (Exceptional zero) If there is a zero
of an
-function
of a real character
of modulus
with
for some sufficiently small
, then
for all
and some absolute constant
.
The implied constants here are effective.
Note from the Landau-Page theorem (Exercise 54 from Notes 2) that at most one exceptional zero exists (if is small enough). A key point here is that the error term
in the exceptional zero case is an improvement over the error term when no exceptional zero is present; this compensates for the potential reduction in the main term coming from the
term. The splitting into cases depending on whether an exceptional zero exists or not turns out to be an essential technique in many advanced results in analytic number theory (though presumably such a splitting will one day become unnecessary, once the possibility of exceptional zeroes are finally eliminated for good).
Exercise 3 Assuming Theorem 2, and assuming
for some sufficiently large absolute constant
, establish the lower bound
when there is no exceptional zero, and
when there is an exceptional zero
. Conclude that Theorem 2 implies Theorem 1, regardless of whether an exceptional zero exists or not.
Remark 4 The Brun-Titchmarsh theorem (Exercise 33 from Notes 4), in the sharp form of Montgomery and Vaughan, gives that
for any primitive residue class
and any
. This is (barely) consistent with the estimate (1). Any lowering of the coefficient
in the Brun-Titchmarsh inequality (with reasonable error terms), in the regime when
is a large power of
, would then lead to at least some elimination of the exceptional zero case. However, this has not led to any progress on the Landau-Siegel zero problem (and may well be just a reformulation of that problem). (When
is a relatively small power of
, some improvements to Brun-Titchmarsh are possible that are not in contradiction with the presence of an exceptional zero; see this paper of Maynard for more discussion.)
Theorem 2 is deduced in turn from facts about the distribution of zeroes of -functions. We first need a version of the truncated explicit formula that does not lose unnecessary logarithms:
Exercise 5 (Log-free truncated explicit formula) With the hypotheses as above, show that
for any non-principal character
of modulus
, where we assume
for some large
; for the principal character establish the same formula with an additional term of
on the right-hand side. (Hint: this is almost immediate from Exercise 45(iv) and Theorem 21 ofNotes 2) with (say)
, except that there is a factor of
in the error term instead of
when
is extremely large compared to
. However, a closer inspection of the proof (particularly with regards to the truncated Perron formula in Proposition 12 of Notes 2) shows that the
factor can be replaced fairly easily by
. To get rid of the final factor of
, note that the proof of Proposition 12 used the rather crude bound
. If one replaces this crude bound by more sophisticated tools such as the Brun-Titchmarsh inequality, one will be able to remove the factor of
.
Using the Fourier inversion formula
(see Theorem 69 of Notes 1), we thus have
and so it suffices by the triangle inequality (bounding very crudely by
, as the contribution of the low-lying zeroes already turns out to be quite dominant) to show that
when no exceptional zero is present, and
when an exceptional zero is present.
To handle the former case (2), one uses two facts about zeroes. The first is the classical zero-free region (Proposition 51 from Notes 2), which we reproduce in our context here:
Proposition 6 (Classical zero-free region) Let
. Apart from a potential exceptional zero
, all zeroes
of
-functions
with
of modulus
and
are such that
for some absolute constant
.
Using this zero-free region, we have
whenever contributes to the sum in (2), and so the left-hand side of (2) is bounded by
where we recall that is the number of zeroes
of any
-function of a character
of modulus
with
and
(here we use conjugation symmetry to make
non-negative, accepting a multiplicative factor of two).
In Exercise 25 of Notes 6, the grand density estimate
is proven. If one inserts this bound into the above expression, one obtains a bound for (2) which is of the form
Unfortunately this is off from what we need by a factor of (and would lead to a weak form of Linnik’s theorem in which
was bounded by
rather than by
). In the analogous problem for prime number theorems in short intervals, we could use the Vinogradov-Korobov zero-free region to compensate for this loss, but that region does not help here for the contribution of the low-lying zeroes with
, which as mentioned before give the dominant contribution. Fortunately, it is possible to remove this logarithmic loss from the zero-density side of things:
Theorem 7 (Log-free grand density estimate) For any
and
, one has
The implied constants are effective.
We prove this estimate below the fold. The proof follows the methods of the previous section, but one inserts various sieve weights to restrict sums over natural numbers to essentially become sums over “almost primes”, as this turns out to remove the logarithmic losses. (More generally, the trick of restricting to almost primes by inserting suitable sieve weights is quite useful for avoiding any unnecessary losses of logarithmic factors in analytic number theory estimates.)
Now we turn to the case when there is an exceptional zero (3). The argument used to prove (2) applies here also, but does not gain the factor of in the exponent. To achieve this, we need an additional tool, a version of the Deuring-Heilbronn repulsion phenomenon due to Linnik:
Theorem 9 (Deuring-Heilbronn repulsion phenomenon) Suppose
is such that there is an exceptional zero
with
small. Then all other zeroes
of
-functions of modulus
are such that
In other words, the exceptional zero enlarges the classical zero-free region by a factor of
. The implied constants are effective.
Exercise 10 Use Theorem 7 and Theorem 9 to complete the proof of (3), and thus Linnik’s theorem.
Exercise 11 Use Theorem 9 to give an alternate proof of (Tatuzawa’s version of) Siegel’s theorem (Theorem 62 of Notes 2). (Hint: if two characters have different moduli, then they can be made to have the same modulus by multiplying by suitable principal characters.)
Theorem 9 is proven by similar methods to that of Theorem 7, the basic idea being to insert a further weight of (in addition to the sieve weights), the point being that the exceptional zero causes this weight to be quite small on the average. There is a strengthening of Theorem 9 due to Bombieri that is along the lines of Theorem 7, obtaining the improvement
with effective implied constants for any and
in the presence of an exceptional zero, where the prime in
means that the exceptional zero
is omitted (thus
if
). Note that the upper bound on
falls below one when
for a sufficiently small
, thus recovering Theorem 9. Bombieri’s theorem can be established by the methods in this set of notes, and will be given as an exercise to the reader.
Remark 12 There are a number of alternate ways to derive the results in this set of notes, for instance using the Turan power sums method which is based on studying derivatives such as
for
and large
, and performing various sorts of averaging in
to attenuate the contribution of many of the zeroes
. We will not develop this method here, but see for instance Chapter 9 of Montgomery’s book. See the text of Friedlander and Iwaniec for yet another approach based primarily on sieve-theoretic ideas.
Remark 13 When one optimises all the exponents, it turns out that the exponent in Linnik’s theorem is extremely good in the presence of an exceptional zero – indeed Friedlander and Iwaniec showed can even get a bound of the form
for some
, which is even stronger than one can obtain from GRH! There are other places in which exceptional zeroes can be used to obtain results stronger than what one can obtain even on the Riemann hypothesis; for instance, Heath-Brown used the hypothesis of an infinite sequence of Siegel zeroes to obtain the twin prime conejcture.
Kevin Ford, Ben Green, Sergei Konyagin, and myself have just posted to the arXiv our preprint “Large gaps between consecutive prime numbers“. This paper concerns the “opposite” problem to that considered by the recently concluded Polymath8 project, which was concerned with very small values of the prime gap . Here, we wish to consider the largest prime gap
that one can find in the interval
as
goes to infinity.
Finding lower bounds on is more or less equivalent to locating long strings of consecutive composite numbers that are not too large compared to the length of the string. A classic (and quite well known) construction here starts with the observation that for any natural number
, the consecutive numbers
are all composite, because each
,
is divisible by some prime
, while being strictly larger than that prime
. From this and Stirling’s formula, it is not difficult to obtain the bound
A more efficient bound comes from the prime number theorem: there are only primes up to
, so just from the pigeonhole principle one can locate a string of consecutive composite numbers up to
of length at least
, thus
where we use or
as shorthand for
or
.
What about upper bounds? The Cramér random model predicts that the primes up to are distributed like a random subset
of density
. Using this model, Cramér arrived at the conjecture
In fact, if one makes the extremely optimistic assumption that the random model perfectly describes the behaviour of the primes, one would arrive at the even more precise prediction
However, it is no longer widely believed that this optimistic version of the conjecture is true, due to some additional irregularities in the primes coming from the basic fact that large primes cannot be divisible by very small primes. Using the Maier matrix method to capture some of this irregularity, Granville was led to the conjecture that
(note that is slightly larger than
). For comparison, the known upper bounds on
are quite weak; unconditionally one has
by the work of Baker, Harman, and Pintz, and even on the Riemann hypothesis one only gets down to
, as shown by Cramér (a slight improvement is also possible if one additionally assumes the pair correlation conjecture; see this article of Heath-Brown and the references therein).
This conjecture remains out of reach of current methods. In 1931, Westzynthius managed to improve the bound (2) slightly to
which Erdös in 1935 improved to
and Rankin in 1938 improved slightly further to
with . Remarkably, this rather strange bound then proved extremely difficult to advance further on; until recently, the only improvements were to the constant
, which was raised to
in 1963 by Schönhage, to
in 1963 by Rankin, to
by Maier and Pomerance, and finally to
in 1997 by Pintz.
Erdös listed the problem of making arbitrarily large one of his favourite open problems, even offering (“somewhat rashly”, in his words) a cash prize for the solution. Our main result answers this question in the affirmative:
Theorem 1 The bound (3) holds for arbitrarily large
.
In principle, we thus have a bound of the form
for some that grows to infinity. Unfortunately, due to various sources of ineffectivity in our methods, we cannot provide any explicit rate of growth on
at all.
We decided to announce this result the old-fashioned way, as part of a research lecture; more precisely, Ben Green announced the result in his ICM lecture this Tuesday. (The ICM staff have very efficiently put up video of his talks (and most of the other plenary and prize talks) online; Ben’s talk is here, with the announcement beginning at about 0:48. Note a slight typo in his slides, in that the exponent of in the denominator is
instead of
.) Ben’s lecture slides may be found here.
By coincidence, an independent proof of this theorem has also been obtained very recently by James Maynard.
I discuss our proof method below the fold.
Ben Green and I have just uploaded to the arXiv our paper “New bounds for Szemeredi’s theorem, Ia: Progressions of length 4 in finite field geometries revisited“, submitted to Proc. Lond. Math. Soc.. This is both an erratum to, and a replacement for, our previous paper “New bounds for Szemeredi’s theorem. I. Progressions of length 4 in finite field geometries“. The main objective in both papers is to bound the quantity for a vector space
over a finite field
of characteristic greater than
, where
is defined as the cardinality of the largest subset of
that does not contain an arithmetic progression of length
. In our earlier paper, we gave two arguments that bounded
in the regime when the field
was fixed and
was large. The first “cheap” argument gave the bound
and the more complicated “expensive” argument gave the improvement
for some constant depending only on
.
Unfortunately, while the cheap argument is correct, we discovered a subtle but serious gap in our expensive argument in the original paper. Roughly speaking, the strategy in that argument is to employ the density increment method: one begins with a large subset of
that has no arithmetic progressions of length
, and seeks to locate a subspace on which
has a significantly increased density. Then, by using a “Koopman-von Neumann theorem”, ultimately based on an iteration of the inverse
theorem of Ben and myself (and also independently by Samorodnitsky), one approximates
by a “quadratically structured” function
, which is (locally) a combination of a bounded number of quadratic phase functions, which one can prepare to be in a certain “locally equidistributed” or “locally high rank” form. (It is this reduction to the high rank case that distinguishes the “expensive” argument from the “cheap” one.) Because
has no progressions of length
, the count of progressions of length
weighted by
will also be small; by combining this with the theory of equidistribution of quadratic phase functions, one can then conclude that there will be a subspace on which
has increased density.
The error in the paper was to conclude from this that the original function also had increased density on the same subspace; it turns out that the manner in which
approximates
is not strong enough to deduce this latter conclusion from the former. (One can strengthen the nature of approximation until one restores such a conclusion, but only at the price of deteriorating the quantitative bounds on
one gets at the end of the day to be worse than the cheap argument.)
After trying unsuccessfully to repair this error, we eventually found an alternate argument, based on earlier papers of ourselves and of Bergelson-Host-Kra, that avoided the density increment method entirely and ended up giving a simpler proof of a stronger result than (1), and also gives the explicit value of for the exponent
in (1). In fact, it gives the following stronger result:
Theorem 1 Let
be a subset of
of density at least
, and let
. Then there is a subspace
of
of codimension
such that the number of (possibly degenerate) progressions
in
is at least
.
The bound (1) is an easy consequence of this theorem after choosing and removing the degenerate progressions from the conclusion of the theorem.
The main new idea is to work with a local Koopman-von Neumann theorem rather than a global one, trading a relatively weak global approximation to with a significantly stronger local approximation to
on a subspace
. This is somewhat analogous to how sometimes in graph theory it is more efficient (from the point of view of quantative estimates) to work with a local version of the Szemerédi regularity lemma which gives just a single regular pair of cells, rather than attempting to regularise almost all of the cells. This local approach is well adapted to the inverse
theorem we use (which also has this local aspect), and also makes the reduction to the high rank case much cleaner. At the end of the day, one ends up with a fairly large subspace
on which
is quite dense (of density
) and which can be well approximated by a “pure quadratic” object, namely a function of a small number of quadratic phases obeying a high rank condition. One can then exploit a special positivity property of the count of length four progressions weighted by pure quadratic objects, essentially due to Bergelson-Host-Kra, which then gives the required lower bound.
In this, the final lecture notes of this course, we discuss one of the motivating applications of the theory developed thus far, namely to count solutions to linear equations in primes (or in dense subsets
of primes
). Unfortunately, the most famous linear equations in primes: the twin prime equation
and the even Goldbach equation
– remain out of reach of this technology (because the relevant affine linear forms involved are commensurate, and thus have infinite complexity with respect to the Gowers norms), but most other systems of equations, in particular that of arithmetic progressions
for
(or equivalently,
for
) , as well as the odd Goldbach equation
, are tractable.
To illustrate the main ideas, we will focus on the following result of Green:
Theorem 1 (Roth’s theorem in the primes) Let
be a subset of primes whose upper density
is positive. Then
contains infinitely many arithmetic progressions of length three.
This should be compared with Roth’s theorem in the integers (Notes 2), which is the same statement but with the primes replaced by the integers
(or natural numbers
). Indeed, Roth’s theorem for the primes is proven by transferring Roth’s theorem for the integers to the prime setting; the latter theorem is used as a “black box”. The key difficulty here in performing this transference is that the primes have zero density inside the integers; indeed, from the prime number theorem we have
.
There are a number of generalisations of this transference technique. In a paper of Green and myself, we extended the above theorem to progressions of longer length (thus transferring Szemerédi’s theorem to the primes). In a series of papers (culminating in a paper to appear shortly) of Green, myself, and also Ziegler, related methods are also used to obtain an asymptotic for the number of solutions in the primes to any system of linear equations of bounded complexity. This latter result uses the full power of higher order Fourier analysis, in particular relying heavily on the inverse conjecture for the Gowers norms; in contrast, Roth’s theorem and Szemerédi’s theorem in the primes are “softer” results that do not need this conjecture.
To transfer results from the integers to the primes, there are three basic steps:
- A general transference principle, that transfers certain types of additive combinatorial results from dense subsets of the integers to dense subsets of a suitably “pseudorandom set” of integers (or more precisely, to the integers weighted by a suitably “pseudorandom measure”);
- An application of sieve theory to show that the primes (or more precisely, an affine modification of the primes) lie inside a suitably pseudorandom set of integers (or more precisely, have significant mass with respect to a suitably pseudorandom measure).
- If one is seeking asymptotics for patterns in the primes, and not simply lower bounds, one also needs to control correlations between the primes (or proxies for the primes, such as the Möbius function) with various objects that arise from higher order Fourier analysis, such as nilsequences.
The former step can be accomplished in a number of ways. For progressions of length three (and more generally, for controlling linear patterns of complexity at most one), transference can be accomplished by Fourier-analytic methods. For more complicated patterns, one can use techniques inspired by ergodic theory; more recently, simplified and more efficient methods based on duality (the Hahn-Banach theorem) have also been used. No number theory is used in this step. (In the case of transference to genuinely random sets, rather than pseudorandom sets, similar ideas appeared earlier in the graph theory setting, see this paper of Kohayakawa, Luczak, and Rodl.
The second step is accomplished by fairly standard sieve theory methods (e.g. the Selberg sieve, or the slight variants of this sieve used by Goldston and Yildirim). Remarkably, very little of the formidable apparatus of modern analytic number theory is needed for this step; for instance, the only fact about the Riemann zeta function that is truly needed is that it has a simple pole at , and no knowledge of L-functions is needed.
The third step does draw more significantly on analytic number theory techniques and results (most notably, the method of Vinogradov to compute oscillatory sums over the primes, and also the Siegel-Walfisz theorem that gives a good error term on the prime number theorem in arithemtic progressions). As these techniques are somewhat orthogonal to the main topic of this course, we shall only touch briefly on this aspect of the transference strategy.
This week I am at Penn State University, giving this year’s Marker lectures. My chosen theme for my four lectures here is “recent developments in additive prime number theory”. My first lecture, “Long arithmetic progressions in primes”, is similar to my AMS lecture on the same topic and so I am not reposting it here. The second lecture, the notes for which begin after the fold, is on “Linear equations in primes”. These two lectures focus primarily on work of myself and Ben Green. The third and fourth lectures, entitled “Small gaps between primes” and “Sieving for almost primes and expander graphs”, will instead be focused on the work of Goldston-Yildirim-Pintz and Bourgain-Gamburd-Sarnak respectively.
Read the rest of this entry »
This week I am in San Diego for the annual joint mathematics meeting of the American Mathematical Society and the Mathematical Association of America. I am giving two talks here. One is a lecture (for the AMS “Current Events” Bulletin) on recent developments (by Martel-Merle, Merle-Raphael, and others) on stability of solitons; I will post on that lecture at some point in the near future, once the survey paper associated to that lecture is finalised.
The other, which I am presenting here, is an address on “structure and randomness in the prime numbers“. Of course, I’ve talked about this general topic many times before, (e.g. at my Simons lecture at MIT, my Milliman lecture at U. Washington, and my Science Research Colloquium at UCLA), and I have given similar talks to the one here – which focuses on my original 2004 paper with Ben Green on long arithmetic progressions in the primes – about a dozen or so times. As such, this particular talk has probably run its course, and so I am “retiring” it by posting it here.
p.s. At this meeting, Endre Szemerédi was awarded the 2008 Steele prize for a seminal contribution to research, for his landmark paper establishing what is now known as Szemerédi’s theorem, which underlies the result I discuss in this talk. This prize is richly deserved – congratulations Endre! [The AMS and MAA also awarded prizes to several dozen other mathematicians, including many mentioned previously on this blog; rather than list them all here, let me just point you to their prize booklet.]
This week I am in Boston, giving this year’s Simons lectures at MIT together with David Donoho. (These lectures, incidentally, are endowed by Jim Simons, who was mentioned in some earlier discussion here.) While preparing these lectures, it occurred to me that I may as well post my lecture notes on this blog, since this medium is essentially just an asynchronous version of a traditional lecture series, and the hypertext capability is in some ways more convenient and informal than, say, slides.
I am giving three lectures, each expounding on some aspects of the theme “the dichotomy between structure and randomness”, which I also spoke about (and wrote about) for the ICM last August. This theme seems to pervade many of the areas of mathematics that I work in, and my lectures aim to explore how this theme manifests itself in several of these. In this, the first lecture, I describe the dichotomy as it appears in Fourier analysis and in number theory. (In the second, I discuss the dichotomy in ergodic theory and graph theory, while in the third, I discuss PDE.)
Recent Comments