You are currently browsing the tag archive for the ‘Kevin Ford’ tag.

William Banks, Kevin Ford, and I have just uploaded to the arXiv our paper “Large prime gaps and probabilistic models“. In this paper we introduce a random model to help understand the connection between two well known conjectures regarding the primes ${{\mathcal P} := \{2,3,5,\dots\}}$, the Cramér conjecture and the Hardy-Littlewood conjecture:

Conjecture 1 (Cramér conjecture) If ${x}$ is a large number, then the largest prime gap ${G_{\mathcal P}(x) := \sup_{p_n, p_{n+1} \leq x} p_{n+1}-p_n}$ in ${[1,x]}$ is of size ${\asymp \log^2 x}$. (Granville refines this conjecture to ${\gtrsim \xi \log^2 x}$, where ${\xi := 2e^{-\gamma} = 1.1229\dots}$. Here we use the asymptotic notation ${X \gtrsim Y}$ for ${X \geq (1-o(1)) Y}$, ${X \sim Y}$ for ${X \gtrsim Y \gtrsim X}$, ${X \gg Y}$ for ${X \geq C^{-1} Y}$, and ${X \asymp Y}$ for ${X \gg Y \gg X}$.)

Conjecture 2 (Hardy-Littlewood conjecture) If ${\mathcal{H} := \{h_1,\dots,h_k\}}$ are fixed distinct integers, then the number of numbers ${n \in [1,x]}$ with ${n+h_1,\dots,n+h_k}$ all prime is ${({\mathfrak S}(\mathcal{H}) +o(1)) \int_2^x \frac{dt}{\log^k t}}$ as ${x \rightarrow \infty}$, where the singular series ${{\mathfrak S}(\mathcal{H})}$ is defined by the formula

$\displaystyle {\mathfrak S}(\mathcal{H}) := \prod_p \left( 1 - \frac{|{\mathcal H} \hbox{ mod } p|}{p}\right) (1-\frac{1}{p})^{-k}.$

(One can view these conjectures as modern versions of two of the classical Landau problems, namely Legendre’s conjecture and the twin prime conjecture respectively.)

A well known connection between the Hardy-Littlewood conjecture and prime gaps was made by Gallagher. Among other things, Gallagher showed that if the Hardy-Littlewood conjecture was true, then the prime gaps ${p_{n+1}-p_n}$ with ${n \leq x}$ were asymptotically distributed according to an exponential distribution of mean ${\log x}$, in the sense that

$\displaystyle | \{ n: p_n \leq x, p_{n+1}-p_n \geq \lambda \log x \}| = (e^{-\lambda}+o(1)) \frac{x}{\log x} \ \ \ \ \ (1)$

as ${x \rightarrow \infty}$ for any fixed ${\lambda \geq 0}$. Roughly speaking, the way this is established is by using the Hardy-Littlewood conjecture to control the mean values of ${\binom{|{\mathcal P} \cap (p_n, p_n + \lambda \log x)|}{k}}$ for fixed ${k,\lambda}$, where ${p_n}$ ranges over the primes in ${[1,x]}$. The relevance of these quantities arises from the Bonferroni inequalities (or “Brun pure sieve“), which can be formulated as the assertion that

$\displaystyle 1_{N=0} \leq \sum_{k=0}^K (-1)^k \binom{N}{k}$

when ${K}$ is even and

$\displaystyle 1_{N=0} \geq \sum_{k=0}^K (-1)^k \binom{N}{k}$

when ${K}$ is odd, for any natural number ${N}$; setting ${N := |{\mathcal P} \cap (p_n, p_n + \lambda \log x)|}$ and taking means, one then gets upper and lower bounds for the probability that the interval ${(p_n, p_n + \lambda \log x)}$ is free of primes. The most difficult step is to control the mean values of the singular series ${{\mathfrak S}(\mathcal{H})}$ as ${{\mathcal H}}$ ranges over ${k}$-tuples in a fixed interval such as ${[0, \lambda \log x]}$.

Heuristically, if one extrapolates the asymptotic (1) to the regime ${\lambda \asymp \log x}$, one is then led to Cramér’s conjecture, since the right-hand side of (1) falls below ${1}$ when ${\lambda}$ is significantly larger than ${\log x}$. However, this is not a rigorous derivation of Cramér’s conjecture from the Hardy-Littlewood conjecture, since Gallagher’s computations only establish (1) for fixed choices of ${\lambda}$, which is only enough to establish the far weaker bound ${G_{\mathcal P}(x) / \log x \rightarrow \infty}$, which was already known (see this previous paper for a discussion of the best known unconditional lower bounds on ${G_{\mathcal P}(x)}$). An inspection of the argument shows that if one wished to extend (1) to parameter choices ${\lambda}$ that were allowed to grow with ${x}$, then one would need as input a stronger version of the Hardy-Littlewood conjecture in which the length ${k}$ of the tuple ${{\mathcal H} = (h_1,\dots,h_k)}$, as well as the magnitudes of the shifts ${h_1,\dots,h_k}$, were also allowed to grow with ${x}$. Our initial objective in this project was then to quantify exactly what strengthening of the Hardy-Littlewood conjecture would be needed to rigorously imply Cramer’s conjecture. The precise results are technical, but roughly we show results of the following form:

Theorem 3 (Large gaps from Hardy-Littlewood, rough statement)

• If the Hardy-Littlewood conjecture is uniformly true for ${k}$-tuples of length ${k \ll \frac{\log x}{\log\log x}}$, and with shifts ${h_1,\dots,h_k}$ of size ${O( \log^2 x )}$, with a power savings in the error term, then ${G_{\mathcal P}(x) \gg \frac{\log^2 x}{\log\log x}}$.
• If the Hardy-Littlewood conjecture is “true on average” for ${k}$-tuples of length ${k \ll \frac{y}{\log x}}$ and shifts ${h_1,\dots,h_k}$ of size ${y}$ for all ${\log x \leq y \leq \log^2 x \log\log x}$, with a power savings in the error term, then ${G_{\mathcal P}(x) \gg \log^2 x}$.

In particular, we can recover Cramer’s conjecture given a sufficiently powerful version of the Hardy-Littlewood conjecture “on the average”.

Our proof of this theorem proceeds more or less along the same lines as Gallagher’s calculation, but now with ${k}$ allowed to grow slowly with ${x}$. Again, the main difficulty is to accurately estimate average values of the singular series ${{\mathfrak S}({\mathfrak H})}$. Here we found it useful to switch to a probabilistic interpretation of this series. For technical reasons it is convenient to work with a truncated, unnormalised version

$\displaystyle V_{\mathcal H}(z) := \prod_{p \leq z} \left( 1 - \frac{|{\mathcal H} \hbox{ mod } p|}{p} \right)$

of the singular series, for a suitable cutoff ${z}$; it turns out that when studying prime tuples of size ${t}$, the most convenient cutoff ${z(t)}$ is the “Pólya magic cutoff“, defined as the largest prime for which

$\displaystyle \prod_{p \leq z(t)}(1-\frac{1}{p}) \geq \frac{1}{\log t} \ \ \ \ \ (2)$

(this is well defined for ${t \geq e^2}$); by Mertens’ theorem, we have ${z(t) \sim t^{1/e^\gamma}}$. One can interpret ${V_{\mathcal Z}(z)}$ probabilistically as

$\displaystyle V_{\mathcal Z}(z) = \mathbf{P}( {\mathcal H} \subset \mathcal{S}_z )$

where ${\mathcal{S}_z \subset {\bf Z}}$ is the randomly sifted set of integers formed by removing one residue class ${a_p \hbox{ mod } p}$ uniformly at random for each prime ${p \leq z}$. The Hardy-Littlewood conjecture can be viewed as an assertion that the primes ${{\mathcal P}}$ behave in some approximate statistical sense like the random sifted set ${\mathcal{S}_z}$, and one can prove the above theorem by using the Bonferroni inequalities both for the primes ${{\mathcal P}}$ and for the random sifted set, and comparing the two (using an even ${K}$ for the sifted set and an odd ${K}$ for the primes in order to be able to combine the two together to get a useful bound).

The proof of Theorem 3 ended up not using any properties of the set of primes ${{\mathcal P}}$ other than that this set obeyed some form of the Hardy-Littlewood conjectures; the theorem remains true (with suitable notational changes) if this set were replaced by any other set. In order to convince ourselves that our theorem was not vacuous due to our version of the Hardy-Littlewood conjecture being too strong to be true, we then started exploring the question of coming up with random models of ${{\mathcal P}}$ which obeyed various versions of the Hardy-Littlewood and Cramér conjectures.

This line of inquiry was started by Cramér, who introduced what we now call the Cramér random model ${{\mathcal C}}$ of the primes, in which each natural number ${n \geq 3}$ is selected for membership in ${{\mathcal C}}$ with an independent probability of ${1/\log n}$. This model matches the primes well in some respects; for instance, it almost surely obeys the “Riemann hypothesis”

$\displaystyle | {\mathcal C} \cap [1,x] | = \int_2^x \frac{dt}{\log t} + O( x^{1/2+o(1)})$

and Cramér also showed that the largest gap ${G_{\mathcal C}(x)}$ was almost surely ${\sim \log^2 x}$. On the other hand, it does not obey the Hardy-Littlewood conjecture; more precisely, it obeys a simplified variant of that conjecture in which the singular series ${{\mathfrak S}({\mathcal H})}$ is absent.

Granville proposed a refinement ${{\mathcal G}}$ to Cramér’s random model ${{\mathcal C}}$ in which one first sieves out (in each dyadic interval ${[x,2x]}$) all residue classes ${0 \hbox{ mod } p}$ for ${p \leq A}$ for a certain threshold ${A = \log^{1-o(1)} x = o(\log x)}$, and then places each surviving natural number ${n}$ in ${{\mathcal G}}$ with an independent probability ${\frac{1}{\log n} \prod_{p \leq A} (1-\frac{1}{p})^{-1}}$. One can verify that this model obeys the Hardy-Littlewood conjectures, and Granville showed that the largest gap ${G_{\mathcal G}(x)}$ in this model was almost surely ${\gtrsim \xi \log^2 x}$, leading to his conjecture that this bound also was true for the primes. (Interestingly, this conjecture is not yet borne out by numerics; calculations of prime gaps up to ${10^{18}}$, for instance, have shown that ${G_{\mathcal P}(x)}$ never exceeds ${0.9206 \log^2 x}$ in this range. This is not necessarily a conflict, however; Granville’s analysis relies on inspecting gaps in an extremely sparse region of natural numbers that are more devoid of primes than average, and this region is not well explored by existing numerics. See this previous blog post for more discussion of Granville’s argument.)

However, Granville’s model does not produce a power savings in the error term of the Hardy-Littlewood conjectures, mostly due to the need to truncate the singular series at the logarithmic cutoff ${A}$. After some experimentation, we were able to produce a tractable random model ${{\mathcal R}}$ for the primes which obeyed the Hardy-Littlewood conjectures with power savings, and which reproduced Granville’s gap prediction of ${\gtrsim \xi \log^2 x}$ (we also get an upper bound of ${\lesssim \xi \log^2 x \frac{\log\log x}{2 \log\log\log x}}$ for both models, though we expect the lower bound to be closer to the truth); to us, this strengthens the case for Granville’s version of Cramér’s conjecture. The model can be described as follows. We select one residue class ${a_p \hbox{ mod } p}$ uniformly at random for each prime ${p}$, and as before we let ${S_z}$ be the sifted set of integers formed by deleting the residue classes ${a_p \hbox{ mod } p}$ with ${p \leq z}$. We then set

$\displaystyle {\mathcal R} := \{ n \geq e^2: n \in S_{z(t)}\}$

with ${z(t)}$ Pólya’s magic cutoff (this is the cutoff that gives ${{\mathcal R}}$ a density consistent with the prime number theorem or the Riemann hypothesis). As stated above, we are able to show that almost surely one has

$\displaystyle \xi \log^2 x \lesssim {\mathcal G}_{\mathcal R}(x) \lesssim \xi \log^2 x \frac{\log\log x}{2 \log\log\log x} \ \ \ \ \ (3)$

and that the Hardy-Littlewood conjectures hold with power savings for ${k}$ up to ${\log^c x}$ for any fixed ${c < 1}$ and for shifts ${h_1,\dots,h_k}$ of size ${O(\log^c x)}$. This is unfortunately a tiny bit weaker than what Theorem 3 requires (which more or less corresponds to the endpoint ${c=1}$), although there is a variant of Theorem 3 that can use this input to produce a lower bound on gaps in the model ${{\mathcal R}}$ (but it is weaker than the one in (3)). In fact we prove a more precise almost sure asymptotic formula for ${{\mathcal G}_{\mathcal R}(x) }$ that involves the optimal bounds for the linear sieve (or interval sieve), in which one deletes one residue class modulo ${p}$ from an interval ${[0,y]}$ for all primes ${p}$ up to a given threshold. The lower bound in (3) relates to the case of deleting the ${0 \hbox{ mod } p}$ residue classes from ${[0,y]}$; the upper bound comes from the delicate analysis of the linear sieve by Iwaniec. Improving on either of the two bounds looks to be quite a difficult problem.

The probabilistic analysis of ${{\mathcal R}}$ is somewhat more complicated than of ${{\mathcal C}}$ or ${{\mathcal G}}$ as there is now non-trivial coupling between the events ${n \in {\mathcal R}}$ as ${n}$ varies, although moment methods such as the second moment method are still viable and allow one to verify the Hardy-Littlewood conjectures by a lengthy but fairly straightforward calculation. To analyse large gaps, one has to understand the statistical behaviour of a random linear sieve in which one starts with an interval ${[0,y]}$ and randomly deletes a residue class ${a_p \hbox{ mod } p}$ for each prime ${p}$ up to a given threshold. For very small ${p}$ this is handled by the deterministic theory of the linear sieve as discussed above. For medium sized ${p}$, it turns out that there is good concentration of measure thanks to tools such as Bennett’s inequality or Azuma’s inequality, as one can view the sieving process as a martingale or (approximately) as a sum of independent random variables. For larger primes ${p}$, in which only a small number of survivors are expected to be sieved out by each residue class, a direct combinatorial calculation of all possible outcomes (involving the random graph that connects interval elements ${n \in [0,y]}$ to primes ${p}$ if ${n}$ falls in the random residue class ${a_p \hbox{ mod } p}$) turns out to give the best results.

Kevin Ford, Sergei Konyagin, James Maynard, Carl Pomerance, and I have uploaded to the arXiv our paper “Long gaps in sieved sets“, submitted to J. Europ. Math. Soc..

This paper originated from the MSRI program in analytic number theory last year, and was centred around variants of the question of finding large gaps between primes. As discussed for instance in this previous post, it is now known that within the set of primes ${{\mathcal P} = \{2,3,5,\dots\}}$, one can find infinitely many adjacent elements ${a,b}$ whose gap ${b-a}$ obeys a lower bound of the form

$\displaystyle b-a \gg \log a \frac{\log_2 a \log_4 a}{\log_3 a}$

where ${\log_k}$ denotes the ${k}$-fold iterated logarithm. This compares with the trivial bound of ${b-a \gg \log a}$ that one can obtain from the prime number theorem and the pigeonhole principle. Several years ago, Pomerance posed the question of whether analogous improvements to the trivial bound can be obtained for such sets as

$\displaystyle {\mathcal P}_2 = \{ n \in {\bf N}: n^2+1 \hbox{ prime} \}.$

Here there is the obvious initial issue that this set is not even known to be infinite (this is the fourth Landau problem), but let us assume for the sake of discussion that this set is indeed infinite, so that we have an infinite number of gaps to speak of. Standard sieve theory techniques give upper bounds for the density of ${{\mathcal P}_2}$ that is comparable (up to an absolute constant) to the prime number theorem bounds for ${{\mathcal P}}$, so again we can obtain a trivial bound of ${b-a \gg \log a}$ for the gaps of ${{\mathcal P}_2}$. In this paper we improve this to

$\displaystyle b-a \gg \log a \log^c_2 a$

for an absolute constant ${c>0}$; this is not as strong as the corresponding bound for ${{\mathcal P}}$, but still improves over the trivial bound. In fact we can handle more general “sifted sets” than just ${{\mathcal P}_2}$. Recall from the sieve of Eratosthenes that the elements of ${{\mathcal P}}$ in, say, the interval ${[x/2, x]}$ can be obtained by removing from ${[x/2, x]}$ one residue class modulo ${p}$ for each prime up to ${\sqrt{x}}$, namely the class ${0}$ mod ${p}$. In a similar vein, the elements of ${{\mathcal P}_2}$ in ${[x/2,x]}$ can be obtained by removing for each prime ${p}$ up to ${x}$ zero, one, or two residue classes modulo ${p}$, depending on whether ${-1}$ is a quadratic residue modulo ${p}$. On the average, one residue class will be removed (this is a very basic case of the Chebotarev density theorem), so this sieving system is “one-dimensional on the average”. Roughly speaking, our arguments apply to any other set of numbers arising from a sieving system that is one-dimensional on average. (One can consider other dimensions also, but unfortunately our methods seem to give results that are worse than a trivial bound when the dimension is less than or greater than one.)

The standard “Erdős-Rankin” method for constructing long gaps between primes proceeds by trying to line up some residue classes modulo small primes ${p}$ so that they collectively occupy a long interval. A key tool in doing so are the smooth number estimates of de Bruijn and others, which among other things assert that if one removes from an interval such as ${[1,x]}$ all the residue classes ${0}$ mod ${p}$ for ${p}$ between ${x^{1/u}}$ and ${x}$ for some fixed ${u>1}$, then the set of survivors has exceptionally small density (roughly of the order of ${u^{-u}}$, with the precise density given by the Dickman function), in marked contrast to the situation in which one randomly removes one residue class for each such prime ${p}$, in which the density is more like ${1/u}$. One generally exploits this phenomenon to sieve out almost all the elements of a long interval using some of the primes available, and then using the remaining primes to cover up the remaining elements that have not already been sifted out. In the more recent work on this problem, advanced combinatorial tools such as hypergraph covering lemmas are used for the latter task.

In the case of ${{\mathcal P}_2}$, there does not appear to be any analogue of smooth numbers, in the sense that there is no obvious way to arrange the residue classes so that they have significantly fewer survivors than a random arrangement. Instead we adopt the following semi-random strategy to cover an interval ${[1,y]}$ by residue classes. Firstly, we randomly remove residue classes for primes ${p}$ up to some intermediate threshold ${z}$ (smaller than ${y}$ by a logarithmic factor), leaving behind a preliminary sifted set ${S_{[2,z]}}$. Then, for each prime ${p}$ between ${z}$ and another intermediate threshold ${x/2}$, we remove a residue class mod ${p}$ that maximises (or nearly maximises) its intersection with ${S_{[2,z]}}$. This ends up reducing the number of survivors to be significantly below what one would achieve if one selects residue classes randomly, particularly if one also uses the hypergraph covering lemma from our previous paper. Finally, we cover each the remaining survivors by a residue class from a remaining available prime.

Kevin Ford, James Maynard, and I have uploaded to the arXiv our preprint “Chains of large gaps between primes“. This paper was announced in our previous paper with Konyagin and Green, which was concerned with the largest gap

$\displaystyle G_1(X) := \max_{p_n, p_{n+1} \leq X} (p_{n+1} - p_n)$

between consecutive primes up to ${X}$, in which we improved the Rankin bound of

$\displaystyle G_1(X) \gg \log X \frac{\log_2 X \log_4 X}{(\log_3 X)^2}$

to

$\displaystyle G_1(X) \gg \log X \frac{\log_2 X \log_4 X}{\log_3 X}$

for large ${X}$ (where we use the abbreviations ${\log_2 X := \log\log X}$, ${\log_3 X := \log\log\log X}$, and ${\log_4 X := \log\log\log\log X}$). Here, we obtain an analogous result for the quantity

$\displaystyle G_k(X) := \max_{p_n, \dots, p_{n+k} \leq X} \min( p_{n+1} - p_n, p_{n+2}-p_{n+1}, \dots, p_{n+k} - p_{n+k-1} )$

which measures how far apart the gaps between chains of ${k}$ consecutive primes can be. Our main result is

$\displaystyle G_k(X) \gg \frac{1}{k^2} \log X \frac{\log_2 X \log_4 X}{\log_3 X}$

whenever ${X}$ is sufficiently large depending on ${k}$, with the implied constant here absolute (and effective). The factor of ${1/k^2}$ is inherent to the method, and related to the basic probabilistic fact that if one selects ${k}$ numbers at random from the unit interval ${[0,1]}$, then one expects the minimum gap between adjacent numbers to be about ${1/k^2}$ (i.e. smaller than the mean spacing of ${1/k}$ by an additional factor of ${1/k}$).

Our arguments combine those from the previous paper with the matrix method of Maier, who (in our notation) showed that

$\displaystyle G_k(X) \gg_k \log X \frac{\log_2 X \log_4 X}{(\log_3 X)^2}$

for an infinite sequence of ${X}$ going to infinity. (Maier needed to restrict to an infinite sequence to avoid Siegel zeroes, but we are able to resolve this issue by the now standard technique of simply eliminating a prime factor of an exceptional conductor from the sieve-theoretic portion of the argument. As a byproduct, this also makes all of the estimates in our paper effective.)

As its name suggests, the Maier matrix method is usually presented by imagining a matrix of numbers, and using information about the distribution of primes in the columns of this matrix to deduce information about the primes in at least one of the rows of the matrix. We found it convenient to interpret this method in an equivalent probabilistic form as follows. Suppose one wants to find an interval ${n+1,\dots,n+y}$ which contained a block of at least ${k}$ primes, each separated from each other by at least ${g}$ (ultimately, ${y}$ will be something like ${\log X \frac{\log_2 X \log_4 X}{\log_3 X}}$ and ${g}$ something like ${y/k^2}$). One can do this by the probabilistic method: pick ${n}$ to be a random large natural number ${{\mathbf n}}$ (with the precise distribution to be chosen later), and try to lower bound the probability that the interval ${{\mathbf n}+1,\dots,{\mathbf n}+y}$ contains at least ${k}$ primes, no two of which are within ${g}$ of each other.

By carefully choosing the residue class of ${{\mathbf n}}$ with respect to small primes, one can eliminate several of the ${{\mathbf n}+j}$ from consideration of being prime immediately. For instance, if ${{\mathbf n}}$ is chosen to be large and even, then the ${{\mathbf n}+j}$ with ${j}$ even have no chance of being prime and can thus be eliminated; similarly if ${{\mathbf n}}$ is large and odd, then ${{\mathbf n}+j}$ cannot be prime for any odd ${j}$. Using the methods of our previous paper, we can find a residue class ${m \hbox{ mod } P}$ (where ${P}$ is a product of a large number of primes) such that, if one chooses ${{\mathbf n}}$ to be a large random element of ${m \hbox{ mod } P}$ (that is, ${{\mathbf n} = {\mathbf z} P + m}$ for some large random integer ${{\mathbf z}}$), then the set ${{\mathcal T}}$ of shifts ${j \in \{1,\dots,y\}}$ for which ${{\mathbf n}+j}$ still has a chance of being prime has size comparable to something like ${k \log X / \log_2 X}$; furthermore this set ${{\mathcal T}}$ is fairly well distributed in ${\{1,\dots,y\}}$ in the sense that it does not concentrate too strongly in any short subinterval of ${\{1,\dots,y\}}$. The main new difficulty, not present in the previous paper, is to get lower bounds on the size of ${{\mathcal T}}$ in addition to upper bounds, but this turns out to be achievable by a suitable modification of the arguments.

Using a version of the prime number theorem in arithmetic progressions due to Gallagher, one can show that for each remaining shift ${j \in {\mathcal T}}$, ${{\mathbf n}+j}$ is going to be prime with probability comparable to ${\log_2 X / \log X}$, so one expects about ${k}$ primes in the set ${\{{\mathbf n} + j: j \in {\mathcal T}\}}$. An upper bound sieve (e.g. the Selberg sieve) also shows that for any distinct ${j,j' \in {\mathcal T}}$, the probability that ${{\mathbf n}+j}$ and ${{\mathbf n}+j'}$ are both prime is ${O( (\log_2 X / \log X)^2 )}$. Using this and some routine second moment calculations, one can then show that with large probability, the set ${\{{\mathbf n} + j: j \in {\mathcal T}\}}$ will indeed contain about ${k}$ primes, no two of which are closer than ${g}$ to each other; with no other numbers in this interval being prime, this gives a lower bound on ${G_k(X)}$.

Kevin Ford, Ben Green, Sergei Konyagin, James Maynard, and I have just uploaded to the arXiv our paper “Long gaps between primes“. This is a followup work to our two previous papers (discussed in this previous post), in which we had simultaneously shown that the maximal gap

$\displaystyle G(X) := \sup_{p_n, p_{n+1} \leq X} p_{n+1}-p_n$

between primes up to ${X}$ exhibited a lower bound of the shape

$\displaystyle G(X) \geq f(X) \log X \frac{\log \log X \log\log\log\log X}{(\log\log\log X)^2} \ \ \ \ \ (1)$

for some function ${f(X)}$ that went to infinity as ${X \rightarrow \infty}$; this improved upon previous work of Rankin and other authors, who established the same bound but with ${f(X)}$ replaced by a constant. (Again, see the previous post for a more detailed discussion.)

In our previous papers, we did not specify a particular growth rate for ${f(X)}$. In my paper with Kevin, Ben, and Sergei, there was a good reason for this: our argument relied (amongst other things) on the inverse conjecture on the Gowers norms, as well as the Siegel-Walfisz theorem, and the known proofs of both results both have ineffective constants, rendering our growth function ${f(X)}$ similarly ineffective. Maynard’s approach ostensibly also relies on the Siegel-Walfisz theorem, but (as shown in another recent paper of his) can be made quite effective, even when tracking ${k}$-tuples of fairly large size (about ${\log^c x}$ for some small ${c}$). If one carefully makes all the bounds in Maynard’s argument quantitative, one eventually ends up with a growth rate ${f(X)}$ of shape

$\displaystyle f(X) \asymp \frac{\log \log \log X}{\log\log\log\log X}, \ \ \ \ \ (2)$

$\displaystyle G(X) \gg \log X \frac{\log \log X}{\log\log\log X}$

on the gaps between primes for large ${X}$; this is an unpublished calculation of James’.

In this paper we make a further refinement of this calculation to obtain a growth rate

$\displaystyle f(X) \asymp \log \log \log X \ \ \ \ \ (3)$

leading to a bound of the form

$\displaystyle G(X) \geq c \log X \frac{\log \log X \log\log\log\log X}{\log\log\log X} \ \ \ \ \ (4)$

for large ${X}$ and some small constant ${c}$. Furthermore, this appears to be the limit of current technology (in particular, falling short of Cramer’s conjecture that ${G(X)}$ is comparable to ${\log^2 X}$); in the spirit of Erdös’ original prize on this problem, I would like to offer 10,000 USD for anyone who can show (in a refereed publication, of course) that the constant ${c}$ here can be replaced by an arbitrarily large constant ${C}$.

The reason for the growth rate (3) is as follows. After following the sieving process discussed in the previous post, the problem comes down to something like the following: can one sieve out all (or almost all) of the primes in ${[x,y]}$ by removing one residue class modulo ${p}$ for all primes ${p}$ in (say) ${[x/4,x/2]}$? Very roughly speaking, if one can solve this problem with ${y = g(x) x}$, then one can obtain a growth rate on ${f(X)}$ of the shape ${f(X) \sim g(\log X)}$. (This is an oversimplification, as one actually has to sieve out a random subset of the primes, rather than all the primes in ${[x,y]}$, but never mind this detail for now.)

Using the quantitative “dense clusters of primes” machinery of Maynard, one can find lots of ${k}$-tuples in ${[x,y]}$ which contain at least ${\gg \log k}$ primes, for ${k}$ as large as ${\log^c x}$ or so (so that ${\log k}$ is about ${\log\log x}$). By considering ${k}$-tuples in arithmetic progression, this means that one can find lots of residue classes modulo a given prime ${p}$ in ${[x/4,x/2]}$ that capture about ${\log\log x}$ primes. In principle, this means that union of all these residue classes can cover about ${\frac{x}{\log x} \log\log x}$ primes, allowing one to take ${g(x)}$ as large as ${\log\log x}$, which corresponds to (3). However, there is a catch: the residue classes for different primes ${p}$ may collide with each other, reducing the efficiency of the covering. In our previous papers on the subject, we selected the residue classes randomly, which meant that we had to insert an additional logarithmic safety margin in expected number of times each prime would be shifted out by one of the residue classes, in order to guarantee that we would (with high probability) sift out most of the primes. This additional safety margin is ultimately responsible for the ${\log\log\log\log X}$ loss in (2).

The main innovation of this paper, beyond detailing James’ unpublished calculations, is to use ideas from the literature on efficient hypergraph covering, to avoid the need for a logarithmic safety margin. The hypergraph covering problem, roughly speaking, is to try to cover a set of ${n}$ vertices using as few “edges” from a given hypergraph ${H}$ as possible. If each edge has ${m}$ vertices, then one certainly needs at least ${n/m}$ edges to cover all the vertices, and the question is to see if one can come close to attaining this bound given some reasonable uniform distribution hypotheses on the hypergraph ${H}$. As before, random methods tend to require something like ${\frac{n}{m} \log r}$ edges before one expects to cover, say ${1-1/r}$ of the vertices.

However, it turns out (under reasonable hypotheses on ${H}$) to eliminate this logarithmic loss, by using what is now known as the “semi-random method” or the “Rödl nibble”. The idea is to randomly select a small number of edges (a first “nibble”) – small enough that the edges are unlikely to overlap much with each other, thus obtaining maximal efficiency. Then, one pauses to remove all the edges from ${H}$ that intersect edges from this first nibble, so that all remaining edges will not overlap with the existing edges. One then randomly selects another small number of edges (a second “nibble”), and repeats this process until enough nibbles are taken to cover most of the vertices. Remarkably, it turns out that under some reasonable assumptions on the hypergraph ${H}$, one can maintain control on the uniform distribution of the edges throughout the nibbling process, and obtain an efficient hypergraph covering. This strategy was carried out in detail in an influential paper of Pippenger and Spencer.

In our setup, the vertices are the primes in ${[x,y]}$, and the edges are the intersection of the primes with various residue classes. (Technically, we have to work with a family of hypergraphs indexed by a prime ${p}$, rather than a single hypergraph, but let me ignore this minor technical detail.) The semi-random method would in principle eliminate the logarithmic loss and recover the bound (3). However, there is a catch: the analysis of Pippenger and Spencer relies heavily on the assumption that the hypergraph is uniform, that is to say all edges have the same size. In our context, this requirement would mean that each residue class captures exactly the same number of primes, which is not the case; we only control the number of primes in an average sense, but we were unable to obtain any concentration of measure to come close to verifying this hypothesis. And indeed, the semi-random method, when applied naively, does not work well with edges of variable size – the problem is that edges of large size are much more likely to be eliminated after each nibble than edges of small size, since they have many more vertices that could overlap with the previous nibbles. Since the large edges are clearly the more useful ones for the covering problem than small ones, this bias towards eliminating large edges significantly reduces the efficiency of the semi-random method (and also greatly complicates the analysis of that method).

Our solution to this is to iteratively reweight the probability distribution on edges after each nibble to compensate for this bias effect, giving larger edges a greater weight than smaller edges. It turns out that there is a natural way to do this reweighting that allows one to repeat the Pippenger-Spencer analysis in the presence of edges of variable size, and this ultimately allows us to recover the full growth rate (3).

To go beyond (3), one either has to find a lot of residue classes that can capture significantly more than ${\log\log x}$ primes of size ${x}$ (which is the limit of the multidimensional Selberg sieve of Maynard and myself), or else one has to find a very different method to produce large gaps between primes than the Erdös-Rankin method, which is the method used in all previous work on the subject.

It turns out that the arguments in this paper can be combined with the Maier matrix method to also produce chains of consecutive large prime gaps whose size is of the order of (4); three of us (Kevin, James, and myself) will detail this in a future paper. (A similar combination was also recently observed in connection with our earlier result (1) by Pintz, but there are some additional technical wrinkles required to recover the full gain of (3) for the chains of large gaps problem.)