Kevin Ford, Ben Green, Sergei Konyagin, and myself have just posted to the arXiv our preprint “Large gaps between consecutive prime numbers“. This paper concerns the “opposite” problem to that considered by the recently concluded Polymath8 project, which was concerned with very small values of the prime gap {p_{n+1}-p_n}. Here, we wish to consider the largest prime gap {G(X) = p_{n+1}-p_n} that one can find in the interval {[X] = \{1,\dots,X\}} as {X} goes to infinity.

Finding lower bounds on {G(X)} is more or less equivalent to locating long strings of consecutive composite numbers that are not too large compared to the length of the string. A classic (and quite well known) construction here starts with the observation that for any natural number {n}, the consecutive numbers {n!+2, n!+3,\dots,n!+n} are all composite, because each {n!+i}, {i=2,\dots,n} is divisible by some prime {p \leq n}, while being strictly larger than that prime {p}. From this and Stirling’s formula, it is not difficult to obtain the bound

\displaystyle G(X) \gg \frac{\log X}{\log\log X}. \ \ \ \ \ (1)

A more efficient bound comes from the prime number theorem: there are only {(1+o(1)) \frac{X}{\log X}} primes up to {X}, so just from the pigeonhole principle one can locate a string of consecutive composite numbers up to {X} of length at least {(1-o(1)) \log X}, thus

\displaystyle G(X) \gtrsim \log X \ \ \ \ \ (2)

where we use {X \gtrsim Y} or {Y \lesssim X} as shorthand for {X \geq (1-o(1)) Y} or {Y \leq (1+o(1)) X}.

What about upper bounds? The Cramér random model predicts that the primes up to {X} are distributed like a random subset {\{1,\dots,X\}} of density {1/\log X}. Using this model, Cramér arrived at the conjecture

\displaystyle G(X) \ll \log^2 X.

In fact, if one makes the extremely optimistic assumption that the random model perfectly describes the behaviour of the primes, one would arrive at the even more precise prediction

\displaystyle G(X) \sim \log^2 X.

However, it is no longer widely believed that this optimistic version of the conjecture is true, due to some additional irregularities in the primes coming from the basic fact that large primes cannot be divisible by very small primes. Using the Maier matrix method to capture some of this irregularity, Granville was led to the conjecture that

\displaystyle G(X) \gtrsim 2e^{-\gamma} \log^2 X

(note that {2e^{-\gamma} = 1.1229\dots} is slightly larger than {1}). For comparison, the known upper bounds on {G(X)} are quite weak; unconditionally one has {G(X) \ll X^{0.525}} by the work of Baker, Harman, and Pintz, and even on the Riemann hypothesis one only gets down to {G(X) \ll X^{1/2} \log X}, as shown by Cramér (a slight improvement is also possible if one additionally assumes the pair correlation conjecture; see this article of Heath-Brown and the references therein).

This conjecture remains out of reach of current methods. In 1931, Westzynthius managed to improve the bound (2) slightly to

\displaystyle G(X) \gg \frac{\log\log\log X}{\log\log\log\log X} \log X ,

which Erdös in 1935 improved to

\displaystyle G(X) \gg \frac{\log\log X}{(\log\log\log X)^2} \log X

and Rankin in 1938 improved slightly further to

\displaystyle G(X) \gtrsim c \frac{(\log\log X) \log\log\log\log X}{(\log\log\log X)^2} \log X \ \ \ \ \ (3)

with {c=1/3}. Remarkably, this rather strange bound then proved extremely difficult to advance further on; until recently, the only improvements were to the constant {c}, which was raised to {c=\frac{1}{2} e^\gamma} in 1963 by Schönhage, to {c= e^\gamma} in 1963 by Rankin, to {c = 1.31256 e^\gamma} by Maier and Pomerance, and finally to {c = 2e^\gamma} in 1997 by Pintz.

Erdös listed the problem of making {c} arbitrarily large one of his favourite open problems, even offering (“somewhat rashly”, in his words) a cash prize for the solution. Our main result answers this question in the affirmative:

Theorem 1 The bound (3) holds for arbitrarily large {c>0}.

In principle, we thus have a bound of the form

\displaystyle G(X) \geq f(X) \frac{(\log\log X) \log\log\log\log X}{(\log\log\log X)^2} \log X

for some {f(X)} that grows to infinity. Unfortunately, due to various sources of ineffectivity in our methods, we cannot provide any explicit rate of growth on {f(X)} at all.

We decided to announce this result the old-fashioned way, as part of a research lecture; more precisely, Ben Green announced the result in his ICM lecture this Tuesday. (The ICM staff have very efficiently put up video of his talks (and most of the other plenary and prize talks) online; Ben’s talk is here, with the announcement beginning at about 0:48. Note a slight typo in his slides, in that the exponent of {\log\log\log X} in the denominator is {3} instead of {2}.) Ben’s lecture slides may be found here.

By coincidence, an independent proof of this theorem has also been obtained very recently by James Maynard.

I discuss our proof method below the fold.

— 1. Sketch of proof —

Our method is a modification of Rankin’s method, combined with some work of myself and Ben Green (and of Tamar Ziegler) on counting various linear patterns in the primes. To explain this, let us first go back to Rankin’s argument, presented in a fashion that allows for comparison with our own methods. Let’s first go back to the easy bound (1) that came from using the consecutive string {n!+2,\dots,n!+n} of composite numbers. This bound was inferior to the prime number theorem bound (2), however this can be easily remedied by replacing {n!} with the somewhat smaller primorial {P(n)}, defined as the product of the primes up to and including {n}. It is still easy to see that {P(n)+2,\dots,P(n)+n} are all composite, with each {P(n)+i, i \leq n} divisible by some prime {p \leq n} while being larger than that prime. On the other hand, the prime number theorem tells us that {P(n) = \exp( (1+o(1)) n)}. From this, one can recover an alternate proof of (2) (perhaps not so surprising, since the prime number theorem is a key ingredient in both proofs).

This gives hope that further modification of this construction can be used to go beyond (2). If one looks carefully at the above proof, we see that the key fact used here is that the discrete interval of integers {\{2,\dots,n\}} is completely sieved out by the residue classes {0 \hbox{ mod } p} for primes {p \leq n}, in the sense that each element in this interval is contained in at least one of these residue classes. More generally (and shifting the interval by {1} for a more convenient normalisation), suppose we can find an interval {[y] := \{ n \in {\bf N}: n \leq y \}} which is completely sieved out by one residue class {a_p \hbox{ mod } p} for each {p \leq x}, for some {x} and {y}. Then the string of consecutive numbers {m+1,\dots,m+\lfloor y\rfloor} will be composite, whenever {m} is an integer larger than or equal to {x} with {m = - a_p \hbox{ mod } p} for each prime {p \leq x}, since each of the {m+i, i \leq y} will be divisible by some prime {p \leq x} while being larger than that prime. From the Chinese remainder theorem, one can find such an {m} that is of size at most {x+P(x)}. From this and the prime number theorem, one can obtain lower bounds on {G(X)} if one can get lower bounds on {y} in terms of {x}. In particular, if for any large {x} one can completely sieve out {[y]} with a residue class {a_p \hbox{ mod } p} for each {p \leq x}, and

\displaystyle y \sim c \frac{(\log x) \log\log\log x}{(\log\log x)^2} x, \ \ \ \ \ (4)

then one can establish the bound (3). (The largest {y} one can take for a given {x} is known as the Jacobsthal function of the primorial {P(x)}.) So the task is basically to find a smarter set of congruence classes {a_p \hbox{ mod } p} than just the zero congruence classes {0 \hbox{ mod } p} that can sieve out a larger interval than {x}. (Unfortunately, this approach by itself is unlikely to reach the Cramér conjecture; it was shown by Iwaniec using the large sieve that {y} is necessarily of size {O(x^2)} (which somewhat coincidentally matches the Cramér bound), but Maier and Pomerance conjecture that in fact one must have {y \leq x \log^{2+o(1)} x}, which would mean that the limit of this method would be to establish a bound of the form {G(X) \geq (\log\log X)^{2+o(1)} \log X}.)

So, how can one do better than just using the “Eratosthenes” sieve {0 \hbox{ mod } p}? We will divide the sieving into different stages, depending on the size of {p}. It turns out that a reasonably optimal division of primes {p} up to {x} will be into the following four classes:

  • Stage 1 primes: primes {p} that are either tiny (less than {\log x}) or medium size (between {z} and {x/4}), where {z} is a parameter to be chosen later.
  • Stage 2 primes: primes that are small (between {\log x} and {z}).
  • Stage 3 primes: Primes that are very large (between {x/2} and {x}).
  • Stage 4 primes: Primes that are fairly large (between {x/4} and {x/2}).

We will take an interval {[y]}, where {y} is given by (4), and sieve out first by Stage 1 primes, then Stage 2 primes, then Stage 3 primes, then Stage 4 primes, until none of the elements of {[y]} are left.

Let’s first discuss the final sieving step, which is rather trivial. Suppose that our sieving by the first three sieving stages is so efficient that the number of surviving elements of {[y]} is less than or equal to the number of Stage 4 primes (by the prime number theorem, this will for instance be the case for sufficiently large {x} if there are fewer than {(\frac{1}{5}+o(1)) \frac{x}{\log x}} survivors). Then one can finish off the remaining survivors simply by using each of the Stage 4 primes {p} to remove one of the surviving integers in {[y]} by an appropriate choice of residue class {a_p \hbox{ mod } p}. So we can recast our problem as an approximate sieving problem rather than a perfect sieving problem; we now only need to eliminate most of the elements of {[y]} rather than all of them, at the cost of only using primes from the Stages 1-3, rather than 1-4. Note though that for {y} given by (4), the Stage 1-3 sieving has to be reasonably efficient, in that the proportion of survivors cannot be too much larger than {1/\log^2 x} (ignoring factors of {\log\log x} etc.).

Next, we discuss the Stage 1 sieving process. Here, we will simply copy the classic construction and use the Eratosthenes sieve {0 \hbox{ mod } p} for these primes. The elements of {[y]} that survive this process are those elements that are not divisible by any Stage 1 primes, that is to say they are only divisible by small (Stage 2) primes, or else contain at least one prime factor larger than {x/4}, and no prime factors less than {\log x}. In the latter case, the survivor has no choice but to be a prime in {(x/4,y]} (since from (4) we have {y \leq x \log x} for {x} large enough). In the former case, the survivor is a {z}-smooth number – a number with no prime factors greater than {z}. How many such survivors are there? Here we can use a somewhat crude upper bound of Rankin:

Lemma 2 Let {1 < z \leq y} be large quantities, and write {u := \frac{\log y}{\log z}}. Suppose that

\displaystyle \log u = o( \log z ).

Then the number of {z}-smooth numbers in {[y]} is at most {e^{-u\log u + O(u)} y \log z}.

Proof: We use a Dirichlet series method commonly known as “Rankin’s trick”. Let {0<\rho<1} be a quantity to be optimised in later, and abbreviate “{z}-smooth” as “smooth”. Observe that if {n} is a smooth number less than {y}, then

\displaystyle 1 \leq \frac{y}{y^\rho} \frac{1}{n^{1-\rho}}. \ \ \ \ \ (5)

Thus, the number of smooth numbers in {y} is at most

\displaystyle \frac{y}{y^\rho} \sum_{n \hbox{smooth}} \frac{1}{n^{1-\rho}}

where we have simply discarded the constraint {n \leq y}. The point of doing this is that the above expression factors into a tractable Euler product

\displaystyle \frac{y}{y^\rho} \prod_{p \leq z} (1 + \frac{1}{p^{1-\rho}} + \frac{1}{p^{2(1-\rho)}} + \dots).

We will choose

\displaystyle \rho := \frac{u \log u}{\log y} = \frac{\log u}{\log z} \ \ \ \ \ (6)

so that {y^\rho = e^{u \log u}} and {\rho=o(1)}. Then the above expression simplifies to

\displaystyle \ll \frac{y}{e^{u \log u}} \exp( \sum_{p \leq z} \frac{1}{p^{1-\rho}} ).

To compute the sum here, we first observe from Mertens’ theorem (discussed in this previous blog post) that

\displaystyle \sum_{p \leq z} \frac{1}{p} = \log \log z + O(1),

so we may bound the previous expression by

\displaystyle \ll \frac{y \log z}{e^{u \log u}} \exp( \sum_{p \leq z} \frac{1}{p^{1-\rho}} - \frac{1}{p} )

which we rewrite using (6) as

\displaystyle \ll \frac{y \log z}{e^{u \log u}} \exp( \sum_{p \leq z} \frac{\exp( \frac{\log p}{\log z} \log u ) - 1}{p} ).

Next, we use the convexity inequality

\displaystyle \exp( ct ) - 1 \leq (\exp(c) - 1 ) t

for {c > 0} and {0 \leq t \leq 1}, applied with {c := \log u} and {t := \frac{\log p}{\log z}}, to conclude that

\displaystyle \exp( \frac{\log p}{\log z} \log u ) - 1 \leq u \frac{\log p}{\log z}

Finally, from the prime number theorem we have {\sum_{p \leq z} \frac{\log p}{\log z} \ll 1}. The bound follows. \Box

Remark 1 One can basically eliminate the {\log z} factor here (at the cost of worsening the {O(u)} error slightly to {O(u\log\log(3u))}) by a more refined version of the Rankin trick, based on replacing the crude bound (5) by the more sophisticated inequality

\displaystyle 1 = \frac{\log \frac{y}{n}}{\log n} + \sum_{p^\nu || n; p \leq z} \frac{\nu \log p}{\log y},

\displaystyle \ll \frac{y}{y^\rho \log y} \frac{1}{n^{1-\rho}} + \sum_{p^\nu || n; p \leq z} \frac{\nu \log p}{\log y},

where {p^\nu||n} denotes the assertion that {p} divides {n} exactly {\nu} times. (Thanks to Kevin Ford for pointing out this observation to me.) In fact, the number of {z}-smooth numbers in {[y]} is known to be asymptotically {e^{-u\log u + O( u \log\log(3u) )} y} in the range {\log^3 y \leq z \leq y}, a result of de Bruijn.

In view of the error term permitted by the Stage 4 process, we would like to take {z} as large as possible while still leaving only {o(x/\log x)} smooth numbers in {[y]}. A somewhat efficient choice of {z} here is

\displaystyle z := \exp( \frac{\log\log\log x}{4 \log\log x} \log x ),

so that {u \sim 4 \frac{\log\log x}{\log\log\log x}} and {u\log u \sim 4 \log\log x}, and then one can check that the above lemma does indeed show that there are {o(x/\log x)} smooth numbers in {[y]}. (If we use the sharper bound in the remark, we can reduce the {4} here to a {3}, although this makes little difference to the final bound.) If we let {{\mathcal Q}} denote all the primes in {(x/4,y]}, the remaining task is then to sieve out all but {(\frac{1}{5}+o(1)) \frac{x}{\log x}} of the primes in {{\mathcal Q}} by using one congruence class from each of the Stage 2 and Stage 3 primes.

Note that {{\mathcal Q}} is still quite large compared to the error that the Stage 4 primes can handle – it is of size about {y/\log x}, whereas we need to get down to a bit less than {x/\log x}. Still, this is some progress (the remaining sparsification needed is of the order of {1/\log^{1+o(1)} x} rather than {1/\log^{2+o(1)} x}).

For the Stage 2 sieve, we will just use a random construction, choosing {a_p \hbox{ mod } p} uniformly at random for each Stage 2 prime. This sieve is expected to sparsify the set {{\mathcal Q}} of survivors by a factor

\displaystyle \gamma := \prod_{\hbox{Stage 2 primes} p} (1-\frac{1}{p}),

which by Mertens’ theorem is of size

\displaystyle \gamma \sim \frac{\log \log x}{\log z} \sim 4 \frac{(\log \log x)^2}{\log x \log\log\log x}.

In particular, if {y} is given by (4), then all the strange logarithmic factors cancel out and

\displaystyle \gamma y \sim 4c x.

In particular, we expect {{\mathcal Q}} to be cut down to a random set (which we have called {{\mathcal Q}({\bf a})} in our paper) of size about {4c \frac{x}{\log x}}. This would already finish the job for very small {c} (e.g. {c \leq 1/20}), and indeed Rankin’s original argument proceeds more or less along these lines. But now we want to take {c} to be large.

Fortunately, we still have the Stage 3 primes to play with. But the number of Stage 3 primes is about {\frac{1}{2} \frac{x}{\log x}}, which is a bit smaller than the number of surviving primes {{\mathcal Q}({\bf a})}, which is about {4c \frac{x}{\log x}}. So to make this work, most of the Stage 3 congruence classes {a_p \hbox{ mod } p} need to sieve out many primes from {{\mathcal Q}({\bf a})}, rather than just one or two. (Rankin’s original argument is based on sieving out one prime per congruence class; the subsequent work of Maier-Pomerance and Pintz is basically based on sieving out two primes per congruence class.)

Here, one has to take some care because the set {{\mathcal Q}({\bf a})} is already quite sparse inside {[y]} (its density is about {1/\log^{2+o(1)} x}). So a randomly chosen {a_p \hbox{ mod } p} would in fact most likely catch none of the primes in {{\mathcal Q}({\bf a})} at all. So we need to restrict attention to congruence classes {a_p \hbox{ mod } p} which already catch a large number of primes in {{\mathcal Q}}, so that even after the Stage 2 sieving one can hope to be left with many congruence classes that also catch a large number of primes in {{\mathcal Q}({\bf a})}.

Here’s where my work with Ben came in. Suppose one has an arithmetic progression {a, a+d, \dots, a+(r-1)d} of length {r} consisting entirely of primes in {{\mathcal Q}}, and with {d} a multiple of {p}, then the congruence class {a \hbox{ mod } p} is guaranteed to pick up at least {k} primes in {{\mathcal Q}}. My first theorem with Ben shows that no matter how large {k} is, the set {{\mathcal Q}} does indeed contain some arithmetic progressions {a,a+d,\dots,a+(r-1)d} of length {r}. This result is not quite suitable for our applications here, because (a) we need the spacing {d} to also be divisible by a Stage 3 prime {p} (in our paper, we take {d = r! p} for concreteness, although other choices are certainly possible), and (b) for technical reasons, it is insufficient to simply have a large number of arithmetic progressions of primes strewn around {{\mathcal Q}}; they have to be “evenly distributed” in some sense in order to be able to still cover most of {{\mathcal Q}({\bf a})} after throwing out any progression that is partly or completely sieved out by the Stage 2 primes. Fortunately, though, these distributional results for linear equations in primes were established by a subsequent paper of Ben and myself, contingent on two conjectures (the Mobius-Nilsequences conjecture and the inverse conjecture for the Gowers norms) which we also proved (the latter with Tamar Ziegler) in some further papers. (Actually, strictly speaking our work does not quite cover the case needed here, because the progressions are a little “narrow”; we need progressions of primes in {[y]} whose spacing {d} is comparable to {x} instead of {y}, whereas our paper only considered the situation in which the spacing was comparable to the elements of the progression. It turns out though that the arguments can be modified (somewhat tediously) to extend to this case though.)