In the third Marker lecture, I would like to discuss the recent progress, particularly by Goldston, Pintz, and Yıldırım, on finding small gaps p_{n+1}-p_n between consecutive primes.  (See also the surveys by Goldston-Pintz-Yıldırım, by Green, and by Soundararajan on the subject; the material here is based to some extent on these prior surveys.)

The twin prime conjecture can be rephrased as the assertion that p_{n+1}-p_n attains the value of 2 infinitely often, where p_1=2, p_2=3, p_3=5, \ldots are the primes.  As discussed in previous lectures, this conjecture remains out of reach at present, at least with the techniques centred around counting solutions to linear equations in primes.  However, there is another direction to pursue towards the twin prime conjecture which has shown significant progress recently (though, again, there appears to be a significant difficulty in pushing it all the way to the full conjecture).  This is to try to show that p_{n+1}-p_n is unexpectedly small for many n.  Let us make this a bit more quantitative by posing the following question:

Question.  Let N be a large number.  What is the smallest value of p_{n+1}-p_n, where p_n is a prime between N and 2N?

The twin prime conjecture (or more precisely, the quantitative form of this conjecture coming from the prime tuples conjecture) would assert that the answer to this question is 2 for all sufficiently large N.

There are various ways to get upper bounds on this question.  For instance, from Bertrand’s postulate (which can be proven by elementary means) we know that p_{n+1}-p_n = O(N) for all N \leq p_n \leq 2N.  The prime number theorem asserts that p_n = (1+o(1)) n \log n, which gives p_{n+1}-p_n = o(N); using various non-trivial facts known about the zeroes of the zeta function, one can improve this to O(N^c) for various c (the best value of c known unconditionally is 0.525, a result of Baker, Harman, and Pintz).  The Riemann hypothesis gives a significantly more precise asymptotic formula for p_n, which ultimately leads to the bound p_{n+1}-p_n = O(\sqrt{N} \log N).  These bounds hold for all n in the given range, and so in fact bound the largest value of p_{n+1}-p_n, not just the smallest. As far as I know, the O(\sqrt{N} \log N) bound for the largest prime gap has not been improved even after one assumes the Riemann hypothesis, though this gap is generally expected to be much smaller than this.  (In particular, the old conjecture that there always exists at least one prime between two consecutive square numbers remains open, even assuming RH.  Cramér conjectured a bound of (1+o(1)) \log^2 N, though it is possible that the constant 1 here may need to be revised upwards to 2e^\gamma \approx 1.1229 where \gamma is the Euler-Mascheroni constant, see this paper of Granville for details.) In the converse direction, the best result is due to Rankin, who showed the somewhat unusual bound

p_{n+1}-p_n \geq c \log N \frac{(\log \log N) (\log \log \log \log N)}{(\log \log \log N)^2}

for some n and some absolute constant c>0 (in fact one can take c arbitrarily close to 2e^\gamma).  Remarkably, this type of right-hand side appears to be a genuine limit of what current methods can achieve (Paul Erdős in fact offered $10,000 to anyone who could improve the rate of growth of the right-hand side in N).

But for the smallest value of p_{n+1}-p_n, much more is known.  The prime number theorem already tells us that there are (1+o(1)) N/\log N primes between N and 2N, so from the pigeonhole principle we have

p_{n+1}-p_n \leq (1+o(1)) \log N (1)

for some n.

This bound should not be sharp, since this would imply that the primes are almost equally spaced by \log N, which is suspiciously regular behaviour for a sequence as irregular as the primes.  To get some intuition as to what to expect, we turn to random models of the primes.  In particular, we begin with Cramér’s random model for the primes, which asserts that the primes between N and 2N behave as if each integer in this range had an independent chance of about 1/\log N of being prime.  Standard probability theory then shows that the primes are distributed like a Poisson process of intensity 1/\log N.  In particular, if one takes an random interval I_\lambda in {}[N,2N] of length \lambda \log N for some \lambda > 0, the number of primes |I_\lambda \cap {\mathcal P}| that I_\lambda captures is expected to behave like a Poisson random variable of mean \lambda; in other words, we expect

{\Bbb P}( |I_\lambda \cap {\mathcal P}| = k ) \approx \frac{e^{-\lambda} \lambda^k}{k!} (2)

for k=0,1,2,\ldots.  (In contrast, the prime number theorem only gives the much weaker statement {\Bbb E} |I_\lambda \cap {\mathcal P}| = \lambda + o(1).)

Now, as discussed in previous lectures, Cramér’s random model is not a completely accurate model for the primes, because it does not reflect the fact that primes very strongly favour the odd numbers, the numbers coprime to 3, and so forth.  However, it turns out that even after one corrects for these local irregularities, the predicted Poisson random variable behaviour (2) does not change significantly for any fixed \lambda (e.g. a Poisson process of intensity 2/\log N on the odd numbers looks much the same as a Poisson process of intensity 1/\log N on the natural numbers, when viewed at scales comparable to \log N).  This computation was worked out fully by Gallagher as a rigorous consequence of the Hardy-Littlewood prime tuples conjecture.  (On the other hand, these corrections to the Cramér model do disrupt (2) for very large values of \lambda; see this paper of Soundararajan for more discussion.)

Applying (2) for small values of \lambda (but still independent of N), we see that intervals of length \lambda \log N are still expected to contain two or more primes with non-zero probability, which in particular would imply that p_{n+1}-p_n \leq \lambda \log N for at least one value of n.  So one path to creating small gaps between primes is to show that |I_\lambda \cap {\mathcal P}| can exceed 1 for as small a value of \lambda as one can manage.

One approach to this proceeds by controlling the second moment

{\Bbb E} |I_\lambda \cap {\mathcal P}|^2; (3)

the heuristic (2) predicts that this second moment should be \lambda^2 + \lambda + o(1) (reflecting the fact that the Poisson distribution has both mean and variance equal to \lambda).  On the other hand, the prime number theorem gives the first moment estimate {\Bbb E} |I_\lambda \cap {\mathcal P}| = \lambda + o(1).  Also, if |I_\lambda \cap {\mathcal P}| never exceeds 1, then the first and second moments are equal.  Thus if one could get the right bound for the second moment, one would be able to show that p_{n+1}-p_n \leq \lambda \log N is possible for arbitrarily small \lambda.

Second moments such as (3) are very amenable to tools from Fourier analysis or complex analysis; applying such tools, we soon see that (3) can be re-expressed easily in terms of zeroes to the Riemann zeta function, and one can use various standard facts (or hypotheses) about these zeroes to gain enough control on (3) to obtain non-trivial improvements to (1).  This approach was pursued by many mathematicians including Hardy-Littlewood, Rankin, Erdős, and Bombieri-Davenport; leading to non-trivial unconditional results for any \lambda \geq 1/2, but it seems difficult to push the method much beyond this.  It was later shown by Goldston and Montgomery that the correct asymptotic for (3) is essentially equivalent to the Riemann hypothesis combined with a certain statement on pair correlations between zeroes, and thus well out of reach of current technology.

Another method, introduced by Maier, is based on finding some (rare) intervals I_\lambda of numbers of length \lambda \log N for some moderately large \lambda which contain significantly more primes than average value of \lambda; if for instance one can find such an interval with over (k+1) \lambda primes in it, then from the pigeonhole principle one must be able to find a prime gap of size at most \frac{1}{k} \log N.  The ability to do this stems from the remarkable and unintuitive fact that the regularly distributed nature of primes in long arithmetic progressions, together with the tendency of primes to avoid certain residue classes, forces the primes to be irregularly distributed in short intervals.  This phenomenon (which has now been systematically studied as an “uncertainty principle” for equidistribution by Granville and Soundararajan), is related to the following curious failure of naive probabilistic heuristics to correctly predict the prime number theorem.  Indeed, consider the question of asking how likely it is that a randomly chosen number n between N and 2N is to be prime.  Well, n will have about a 1-\frac{1}{2} chance of being coprime to 2, a 1 - \frac{1}{3} chance of being coprime to 3, and so forth; the Chinese remainder theorem also suggests that these events behave independently.  Thus one might expect that the probability would be something like

\prod_{p < N} (1 - \frac{1}{p}).

We then invoke Mertens’ theorem, which provides the asymptotic

\prod_{p < N} (1 - \frac{1}{p}) = (e^{-\gamma} + o(1)) \frac{1}{\log N}. (4)

But this is off by a factor of e^{\gamma} from what the prime number theorem says the true probability of being prime is, which is (1+o(1)) \frac{1}{\log N}.  This discrepancy reflects the difficulty in cutting off the product in primes (4) at the right place (for instance, the sieve of Eratosthenes suggests that one might want to cut off at \sqrt{N} instead) and I might discuss this topic further in a future blog post.  At any rate, this e^\gamma discrepancy can be exploited to find intervals with an above-average number of primes by a “first moment” argument (known as the Maier matrix method) that we sketch as follows.  Let w be a moderately large number, and let W be the product of all the primes less than w.  If we pick a random number n between N and 2N, then as mentioned before, the prime number theorem says that this number will be prime with probability about \frac{1}{\log N}.  But if in addition we know that the number is coprime to W, then by the prime number theorem in arithmetic progressions this information boosts the probability of being prime to about \prod_{p < w} (1-\frac{1}{p})^{-1} \frac{1}{\log N}, which by (4) is about e^\gamma \frac{\log w}{\log N}.

Now we restrict attention to numbers n which are equal to a mod W for some 1 \leq a \leq w.  By the prime number theorem, about \frac{w}{\log w} of the a are prime and thus coprime to W.  Combining this with the previous discussion, we see that the total probability that such a number n is prime is about \frac{1}{\log w} \times e^\gamma \frac{\log w}{\log N} = e^\gamma \frac{1}{\log N}.

On the other hand, the set of numbers n which are equal to a mod W for some 1 \leq a \leq w is given by a sequence of intervals of length w.  By the pigeonhole principle, we must therefore have an interval of length w on which the density of primes is at least e^\gamma \frac{1}{\log N} – which is greater than the expected density by a factor of e^\gamma.

One can make the above arguments rigorous for certain ranges of interval length, and by combining this with the pigeonhole principle one can eventually improve (1) by a factor of e^{\gamma}:

p_{n+1}-p_n \leq (e^{-\gamma}+o(1)) \log N.

This is better than the bound of (\frac{1}{2}+o(1)) \log N obtained by the second moment method, but on the other hand the latter method establishes that a positive proportion of primes have small gaps; Maier’s method, by its very nature, is restricted to a very sparse set of primes (note that w is much smaller than W).

In a series of papers, Goldston and Yıldırım improved the numerical constants in these results by a variety of methods including those mentioned above, as well as replacing some of the reliance on information on zeroes of the zeta function with tools from sieve theory instead.   To oversimplify substantially, the latter idea is to try to control the set of primes {\mathcal P} in terms of a larger set {\mathcal AP} of almost primes – numbers with few prime factors.  (Strictly speaking, one does not actually work with a set of almost primes, but rather with a weight function or sieve which is large on almost primes and small for non-almost primes, but let us ignore this important technical detail to simplify the exposition.)  For instance, to control the second moment {\Bbb E} |I_\lambda \cap {\mathcal P}|^2, one can take advantage of the Cauchy-Schwarz inequality

\displaystyle {\Bbb E} |I_\lambda \cap {\mathcal P}|^2 \geq \frac{{\Bbb E} |I_\lambda \cap {\mathcal P}| |I_\lambda \cap {\mathcal AP}|}{|I_\lambda \cap {\mathcal AP}|^2}.

The denominator on the right-hand side involves only almost primes and can be computed easily by sieve theory methods.  The numerator involves primes, but only one prime at a time; note that this quantity is roughly counting the set of pairs p, q where p is prime, q is almost prime, and p and q differ by at most \lambda \log N.  This is in contrast to the left-hand side, which is counting pairs p, q that are both prime and differ by at most \lambda \log N.  We do not know how to use sieve theory to count the latter type of pattern (involving more than one prime); but sieve theory is perfectly capable of counting the former type of pattern, so long as we understand the distribution of primes in relatively sparse arithmetic progressions.  The standard tool for this is the Bombieri-Vinogradov theorem, which roughly speaking asserts that the primes from N to 2N are well distributed in “most” residue classes q, as long as q stays significantly smaller than \sqrt{N}; it can be viewed as an averaged version of the generalised Riemann hypothesis that can be proven unconditionally.  (The key point here is that while it is possible for the primes to be irregular with respect to a few such small moduli, the “orthogonality” of these moduli with respect to each other makes it impossible for the primes to be simultaneously irregular with respect to many of these moduli at once.  I may comment more on this topic in a future post.)  Using such tools, various improvements to (1) were established.  Finally, in 2005, in a series of papers by Goldston, Pintz, and Yıldırım, it was shown that

p_{n+1} - p_n \leq \lambda \log N (5)

held for some n and any \lambda > 0 (provided N was large enough depending on \lambda), or equivalently that p_{n+1}-p_n = o(\log N); this was later improved further, to p_{n+1} - p_n = O( \sqrt{\log N} (\log \log N)^2 ).  Assuming a strong version of the Bombieri-Vinogradov theorem (in which q is allowed now to get close to N rather than to \sqrt{N}), known as the Elliott-Halberstam conjecture), this was improved further to the striking result

p_{n+1} - p_n \leq 16,

thus there are infinitely many pairs of primes which differ by at most 16.  This is a remarkable “near miss” to the twin prime conjecture, though it seems clear that substantial new ideas would be needed to reduce 16 all the way to 2.

Let’s now discuss some of the ideas involved.  As with the previous arguments, the key idea is to find groups of integers which tend to contain more primes than average.  Suppose for instance one could find a certain random distribution of integers n where n, n+2, and n+6 each had a probability strictly greater than 1/3 of being prime.  (We choose these separations for our discussion because it is not possible to make three large prime numbers bunch up any closer than this; for instance, n, n+2, n+4 cannot be all be prime for n>3, since at least one of these numbers must be divisible by 3.)  By linearity of expectation, we thus see that the expected number of primes in the set \{n,n+2,n+6\} exceeds 1; thus, with positive probability, there will be at least two primes in this set, which then necessarily differ by at most 6.

Now, of course, the prime number theorem tells us that for n chosen uniformly at random from N to 2N, the probability that n, n+2, or n+6 are prime is only about 1/\log N.  So for this type of strategy to work, one would have to pick a highly non-uniform distribution for n, in which n, n+2, and n+6 are already close to being prime already.  The extreme choice would be to pick n uniformly among all choices for which n, n+2, and n+6 are simultaneously prime, but we of course don’t even know that any such primes exist (this is strictly harder than the twin prime conjecture!)  But what we can do instead is pick n uniformly among all choices such that n, n+2, and n+6 are almost prime (where we shall be vague for now about what “almost prime” means).  Thanks to sieve theory, we can assert the existence of many numbers n of this form, and get a good count as to how many there are.  Also, since the primes have positive density inside the almost primes, it is quite reasonable that the conditional probabilities

{\Bbb P}( n \hbox{ prime } | n, n+2, n+6 \hbox{ almost prime} ),

{\Bbb P}( n+2 \hbox{ prime } | n, n+2, n+6 \hbox{ almost prime} ),

{\Bbb P}( n+6 \hbox{ prime } | n, n+2, n+6 \hbox{ almost prime} ),

are large.  Indeed, using sieve theory techniques (and the Bombieri-Vinogradov theorem), we can bound each of these probabilities from below by a positive constant (plus a o(1) error).  Unfortunately, even if we optimise the sieve that produces the almost primes, this constant is too small (typically one gets numbers of the order of 1/20 or so, rather than 1/3).  Assuming the Elliot-Halberstam conjecture (which allows us to raise the level of sieving substantially) yields a significant improvement, but one that still falls short of the desired goal.  One can of course also add more numbers to the mix than just n, n+2, n+6, e.g. looking at those n for which n+h_1,\ldots,n+h_k are simultaneously almost prime, for some suitably chosen h_1,\ldots,h_k; on the one hand this lowers the threshold of probability (currently at 1/3) that one needs to obtain, but unfortunately this is more than canceled out by the multidimensional sieving one needs to do when restricting all of these numbers to be almost prime.

To get around this, a new idea was introduced: instead of requiring numbers such as n, n+2, and n+6 to separately be almost prime, ask instead for the product n(n+2)(n+6) to be almost prime (for a somewhat more relaxed notion of “almost prime”).  This turns out to be more efficient, as it lowers the number of summations involved in the sieve.  One has to carefully select how one defines almost prime here (it is roughly like asking for the product (n+h_1) \ldots (n+h_k) to have at most 2k+o(k) prime factors, with the o(k) factor being remarkably crucial to the delicate analysis); but to cut a long story short, one can establish probability bounds of the form

{\Bbb P}( n+h_j \hbox{ prime } | (n+h_1) \ldots (n+h_k) \hbox{ almost prime} ) \geq \frac{c-o(1)}{k} (6)

for all 1 \leq j \leq k and some absolute constant c > 0.  [More formally, one needs to compute sums such as \sum_{n=1}^N \Lambda(n+h_j) \Lambda_R( (n+h_1) \ldots (n+h_k) )^2, where \Lambda is the von Mangoldt function and \Lambda_R is a Selberg sieve-type approximation to that function.]

As soon as one assumes any non-trivial portion of the Elliott-Halberstam conjecture, the quantity c in the above inequality can be made to exceed 1 (for k large enough), leading to the conclusion that there exist infinitely many bounded prime gaps, p_{n+1}-p_n = O(1); pushing the machinery to their limit (taking \{h_1,\ldots,h_k\} = \{7,11,13,17,19,23\} to be the first six primes larger than 6), one obtains the bound of 16.  But without this conjecture, and just using Bombieri-Vinogradov, then after optimising everything in sight, one can get c arbitrarily close to 1, but not quite exceeding 1.  To compensate for this, the authors also started looking at the nearby numbers n+h where h was not equal to h_1,\ldots,h_k.  Here, of course, there is no particularly good reason for n+h to be prime, since it is not involved as a factor to the almost prime quantity (n+h_1)\ldots(n+h_k); but one can show that, for generic values of h, one has

{\Bbb P}( n+h \hbox{ prime } | (n+h_1) \ldots (n+h_k) \hbox{ almost prime} ) \geq \frac{c'+o(1)}{\log N}

for some c'>0 (there is an additional singular series factor involving the prime factors of h-h_j which can be easily dealt with, that I am suppressing here.)  Thus, for any \lambda > 0, we expect (from linearity of expectation) that for (n+h_1)\ldots(n+h_k) almost prime, the expected number of h = O(\lambda \log N) (including the k values h_1,\ldots,h_k) for which n+h is prime is at least

k( \frac{c}{k} +o(1)) + \lambda \log N \frac{c'+o(1)}{\log N} = c + c' \lambda + o(1).

Since we can make c arbitrarily close to 1, the extra term c' \lambda can push this expectation to exceed 1 for any choice of \lambda, and this leads to the desired bound (5) for any \lambda > 0.  (The subsequent improvement to (5) proceeds by enlarging k substantially, and by a preliminary sieving of the small primes, but we will not discuss these technical details here.)

It is tempting to continue to optimise these methods to improve the various constants, and I would imagine that the bound of 16, in particular, can be lowered somewhat (still assuming the Elliott-Halberstam conjecture).  But there seems to be a significant obstacle to pushing things all the way to 2.  Indeed, the parity problem tells us that for any reasonable definition of “almost prime” which is amenable to sieve theory, the primes themselves can have density at most 1/2 in these almost primes.  Since we need the density to exceed 1/k in order for the above argument to work, it is necessary to play with at least three numbers (e.g. n, n+2, n+6), which forces the bound on the prime gaps to be at least 6.  (Indeed, this bound has been obtained for semiprimes – which are subject to the same parity problem restriction as primes, but are slightly better distributed – by Goldston, Graham, Pintz, and Yıldırım.)  But it may be that a combination of these techniques with some substantially new ideas may push things even further.