You are currently browsing the category archive for the ‘paper’ category.

I’ve just uploaded to the arXiv my paper “Failure of the {L^1} pointwise and maximal ergodic theorems for the free group“, submitted to Forum of Mathematics, Sigma. This paper concerns a variant of the pointwise ergodic theorem of Birkhoff, which asserts that if one has a measure-preserving shift map {T: X \rightarrow X} on a probability space {X = (X,\mu)}, then for any {f \in L^1(X)}, the averages {\frac{1}{N} \sum_{n=1}^N f \circ T^{-n}} converge pointwise almost everywhere. (In the important case when the shift map {T} is ergodic, the pointwise limit is simply the mean {\int_X f\ d\mu} of the original function {f}.)

The pointwise ergodic theorem can be extended to measure-preserving actions of other amenable groups, if one uses a suitably “tempered” Folner sequence of averages; see this paper of Lindenstrauss for more details. (I also wrote up some notes on that paper here, back in 2006 before I had started this blog.) But the arguments used to handle the amenable case break down completely for non-amenable groups, and in particular for the free non-abelian group {F_2} on two generators.

Nevo and Stein studied this problem and obtained a number of pointwise ergodic theorems for {F_2}-actions {(T_g)_{g \in F_2}} on probability spaces {(X,\mu)}. For instance, for the spherical averaging operators

\displaystyle  {\mathcal A}_n f := \frac{1}{4 \times 3^{n-1}} \sum_{g \in F_2: |g| = n} f \circ T_g^{-1}

(where {|g|} denotes the length of the reduced word that forms {g}), they showed that {{\mathcal A}_{2n} f} converged pointwise almost everywhere provided that {f} was in {L^p(X)} for some {p>1}. (The need to restrict to spheres of even radius can be seen by considering the action of {F_2} on the two-element set {\{0,1\}} in which both generators of {F_2} act by interchanging the elements, in which case {{\mathcal A}_n} is determined by the parity of {n}.) This result was reproven with a different and simpler proof by Bufetov, who also managed to relax the condition {f \in L^p(X)} to the weaker condition {f \in L \log L(X)}.

The question remained open as to whether the pointwise ergodic theorem for {F_2}-actions held if one only assumed that {f} was in {L^1(X)}. Nevo and Stein were able to establish this for the Cesáro averages {\frac{1}{N} \sum_{n=1}^N {\mathcal A}_n}, but not for {{\mathcal A}_n} itself. About six years ago, Assaf Naor and I tried our hand at this problem, and was able to show an associated maximal inequality on {\ell^1(F_2)}, but due to the non-amenability of {F_2}, this inequality did not transfer to {L^1(X)} and did not have any direct impact on this question, despite a fair amount of effort on our part to attack it.

Inspired by some recent conversations with Lewis Bowen, I returned to this problem. This time around, I tried to construct a counterexample to the {L^1} pointwise ergodic theorem – something Assaf and I had not seriously attempted to do (perhaps due to being a bit too enamoured of our {\ell^1(F_2)} maximal inequality). I knew of an existing counterexample of Ornstein regarding a failure of an {L^1} ergodic theorem for iterates {P^n} of a self-adjoint Markov operator – in fact, I had written some notes on this example back in 2007. Upon revisiting my notes, I soon discovered that the Ornstein construction was adaptable to the {F_2} setting, thus settling the problem in the negative:

Theorem 1 (Failure of {L^1} pointwise ergodic theorem) There exists a measure-preserving {F_2}-action on a probability space {X} and a non-negative function {f \in L^1(X)} such that {\sup_n {\mathcal A}_{2n} f(x) = +\infty} for almost every {x}.

To describe the proof of this theorem, let me first briefly sketch the main ideas of Ornstein’s construction, which gave an example of a self-adjoint Markov operator {P} on a probability space {X} and a non-negative {f \in L^1(X)} such that {\sup_n P^n f(x) = +\infty} for almost every {x}. By some standard manipulations, it suffices to show that for any given {\alpha > 0} and {\varepsilon>0}, there exists a self-adjoint Markov operator {P} on a probability space {X} and a non-negative {f \in L^1(X)} with {\|f\|_{L^1(X)} \leq \alpha}, such that {\sup_n P^n f \geq 1-\varepsilon} on a set of measure at least {1-\varepsilon}. Actually, it will be convenient to replace the Markov chain {(P^n f)_{n \geq 0}} with an ancient Markov chain {(f_n)_{n \in {\bf Z}}} – that is to say, a sequence of non-negative functions {f_n} for both positive and negative {f}, such that {f_{n+1} = P f_n} for all {n \in {\bf Z}}. The purpose of requiring the Markov chain to be ancient (that is, to extend infinitely far back in time) is to allow for the Markov chain to be shifted arbitrarily in time, which is key to Ornstein’s construction. (Technically, Ornstein’s original argument only uses functions that go back to a large negative time, rather than being infinitely ancient, but I will gloss over this point for sake of discussion, as it turns out that the {F_2} version of the argument can be run using infinitely ancient chains.)

For any {\alpha>0}, let {P(\alpha)} denote the claim that for any {\varepsilon>0}, there exists an ancient Markov chain {(f_n)_{n \in {\bf Z}}} with {\|f_n\|_{L^1(X)} = \alpha} such that {\sup_{n \in {\bf Z}} f_n \geq 1-\varepsilon} on a set of measure at least {1-\varepsilon}. Clearly {P(1)} holds since we can just take {f_n=1} for all {n}. Our objective is to show that {P(\alpha)} holds for arbitrarily small {\alpha}. The heart of Ornstein’s argument is then the implication

\displaystyle  P(\alpha) \implies P( \alpha (1 - \frac{\alpha}{4}) ) \ \ \ \ \ (1)

for any {0 < \alpha \leq 1}, which upon iteration quickly gives the desired claim.

Let’s see informally how (1) works. By hypothesis, and ignoring epsilons, we can find an ancient Markov chain {(f_n)_{n \in {\bf Z}}} on some probability space {X} of total mass {\|f_n\|_{L^1(X)} = \alpha}, such that {\sup_n f_n} attains the value of {1} or greater almost everywhere. Assuming that the Markov process is irreducible, the {f_n} will eventually converge as {n \rightarrow \infty} to the constant value of {\|f_n\|_{L^1(X)}}, in particular its final state will essentially stay above {\alpha} (up to small errors).

Now suppose we duplicate the Markov process by replacing {X} with a double copy {X \times \{1,2\}} (giving {\{1,2\}} the uniform probability measure), and using the disjoint sum of the Markov operators on {X \times \{1\}} and {X \times \{2\}} as the propagator, so that there is no interaction between the two components of this new system. Then the functions {f'_n(x,i) := f_n(x) 1_{i=1}} form an ancient Markov chain of mass at most {\alpha/2} that lives solely in the first half {X \times \{1\}} of this copy, and {\sup_n f'_n} attains the value of {1} or greater on almost all of the first half {X \times \{1\}}, but is zero on the second half. The final state of {f'_n} will be to stay above {\alpha} in the first half {X \times \{1\}}, but be zero on the second half.

Now we modify the above example by allowing an infinitesimal amount of interaction between the two halves {X \times \{1\}}, {X \times \{2\}} of the system (I mentally think of {X \times \{1\}} and {X \times \{2\}} as two identical boxes that a particle can bounce around in, and now we wish to connect the boxes by a tiny tube). The precise way in which this interaction is inserted is not terribly important so long as the new Markov process is irreducible. Once one does so, then the ancient Markov chain {(f'_n)_{n \in {\bf Z}}} in the previous example gets replaced by a slightly different ancient Markov chain {(f''_n)_{n \in {\bf Z}}} which is more or less identical with {f'_n} for negative times {n}, or for bounded positive times {n}, but for very large values of {n} the final state is now constant across the entire state space {X \times \{1,2\}}, and will stay above {\alpha/2} on this space.

Finally, we consider an ancient Markov chain {F_n} which is basically of the form

\displaystyle  F_n(x,i) \approx f''_n(x,i) + (1 - \frac{\alpha}{2}) f_{n-M}(x) 1_{i=2}

for some large parameter {M} and for all {n \leq M} (the approximation becomes increasingly inaccurate for {n} much larger than {M}, but never mind this for now). This is basically two copies of the original Markov process in separate, barely interacting state spaces {X \times \{1\}, X \times \{2\}}, but with the second copy delayed by a large time delay {M}, and also attenuated in amplitude by a factor of {1-\frac{\alpha}{2}}. The total mass of this process is now {\frac{\alpha}{2} + \frac{\alpha}{2} (1 -\frac{\alpha}{2}) = \alpha (1 - \alpha/4)}. Because of the {f''_n} component of {F_n}, we see that {\sup_n F_n} basically attains the value of {1} or greater on the first half {X \times \{1\}}. On the second half {X \times \{2\}}, we work with times {n} close to {M}. If {M} is large enough, {f''_n} would have averaged out to about {\alpha/2} at such times, but the {(1 - \frac{\alpha}{2}) f_{n-M}(x)} component can get as large as {1-\alpha/2} here. Summing (and continuing to ignore various epsilon losses), we see that {\sup_n F_n} can get as large as {1} on almost all of the second half of {X \times \{2\}}. This concludes the rough sketch of how one establishes the implication (1).

It was observed by Bufetov that the spherical averages {{\mathcal A}_n} for a free group action can be lifted up to become powers {P^n} of a Markov operator, basically by randomly assigning a “velocity vector” {s \in \{a,b,a^{-1},b^{-1}\}} to one’s base point {x} and then applying the Markov process that moves {x} along that velocity vector (and then randomly changing the velocity vector at each time step to the “reduced word” condition that the velocity never flips from {s} to {s^{-1}}). Thus the spherical average problem has a Markov operator interpretation, which opens the door to adapting the Ornstein construction to the setting of {F_2} systems. This turns out to be doable after a certain amount of technical artifice; the main thing is to work with {F_2}-measure preserving systems that admit ancient Markov chains that are initially supported in a very small region in the “interior” of the state space, so that one can couple such systems to each other “at the boundary” in the fashion needed to establish the analogue of (1) without disrupting the ancient dynamics of such chains. The initial such system (used to establish the base case {P(1)}) comes from basically considering the action of {F_2} on a (suitably renormalised) “infinitely large ball” in the Cayley graph, after suitably gluing together the boundary of this ball to complete the action. The ancient Markov chain associated to this system starts at the centre of this infinitely large ball at infinite negative time {n=-\infty}, and only reaches the boundary of this ball at the time {n=0}.

Hoi Nguyen, Van Vu, and myself have just uploaded to the arXiv our paper “Random matrices: tail bounds for gaps between eigenvalues“. This is a followup paper to my recent paper with Van in which we showed that random matrices {M_n} of Wigner type (such as the adjacency matrix of an Erdös-Renyi graph) asymptotically almost surely had simple spectrum. In the current paper, we push the method further to show that the eigenvalues are not only distinct, but are (with high probability) separated from each other by some negative power {n^{-A}} of {n}. This follows the now standard technique of replacing any appearance of discrete Littlewood-Offord theory (a key ingredient in our previous paper) with its continuous analogue (inverse theorems for small ball probability). For general Wigner-type matrices {M_n} (in which the matrix entries are not normalised to have mean zero), we can use the inverse Littlewood-Offord theorem of Nguyen and Vu to obtain (under mild conditions on {M_n}) a result of the form

\displaystyle  {\bf P} (\lambda_{i+1}(M_n) - \lambda_i(M_n) \leq n^{-A} ) \leq n^{-B}

for any {B} and {i}, if {A} is sufficiently large depending on {B} (in a linear fashion), and {n} is sufficiently large depending on {B}. The point here is that {B} can be made arbitrarily large, and also that no continuity or smoothness hypothesis is made on the distribution of the entries. (In the continuous case, one can use the machinery of Wegner estimates to obtain results of this type, as was done in a paper of Erdös, Schlein, and Yau.)

In the mean zero case, it becomes more efficient to use an inverse Littlewood-Offord theorem of Rudelson and Vershynin to obtain (with the normalisation that the entries of {M_n} have unit variance, so that the eigenvalues of {M_n} are {O(\sqrt{n})} with high probability), giving the bound

\displaystyle  {\bf P} (\lambda_{i+1}(M_n) - \lambda_i(M_n) \leq \delta / \sqrt{n} ) \ll \delta \ \ \ \ \ (1)

for {\delta \geq n^{-O(1)}} (one also has good results of this type for smaller values of {\delta}). This is only optimal in the regime {\delta \sim 1}; we expect to establish some eigenvalue repulsion, improving the RHS to {\delta^2} for real matrices and {\delta^3} for complex matrices, but this appears to be a more difficult task (possibly requiring some quadratic inverse Littlewood-Offord theory, rather than just linear inverse Littlewood-Offord theory). However, we can get some repulsion if one works with larger gaps, getting a result roughly of the form

\displaystyle  {\bf P} (\lambda_{i+k}(M_n) - \lambda_i(M_n) \leq \delta / \sqrt{n} ) \ll \delta^{ck^2}

for any fixed {k \geq 1} and some absolute constant {c>0} (which we can asymptotically make to be {1/3} for large {k}, though it ought to be as large as {1}), by using a higher-dimensional version of the Rudelson-Vershynin inverse Littlewood-Offord theorem.

In the case of Erdös-Renyi graphs, we don’t have mean zero and the Rudelson-Vershynin Littlewood-Offord theorem isn’t quite applicable, but by working carefully through the approach based on the Nguyen-Vu theorem we can almost recover (1), except for a loss of {n^{o(1)}} on the RHS.

As a sample applications of the eigenvalue separation results, we can now obtain some information about eigenvectors; for instance, we can show that the components of the eigenvectors all have magnitude at least {n^{-A}} for some {A} with high probability. (Eigenvectors become much more stable, and able to be studied in isolation, once their associated eigenvalue is well separated from the other eigenvalues; see this previous blog post for more discussion.)

Kaisa Matomaki, Maksym Radziwill, and I have just uploaded to the arXiv our paper “An averaged form of Chowla’s conjecture“. This paper concerns a weaker variant of the famous conjecture of Chowla (discussed for instance in this previous post) that

\displaystyle  \sum_{n \leq X} \lambda(n+h_1) \dots \lambda(n+h_k) = o(X)

as {X \rightarrow \infty} for any distinct natural numbers {h_1,\dots,h_k}, where {\lambda} denotes the Liouville function. (One could also replace the Liouville function here by the Möbius function {\mu} and obtain a morally equivalent conjecture.) This conjecture remains open for any {k \geq 2}; for instance the assertion

\displaystyle  \sum_{n \leq X} \lambda(n) \lambda(n+2) = o(X)

is a variant of the twin prime conjecture (though possibly a tiny bit easier to prove), and is subject to the notorious parity barrier (as discussed in this previous post).

Our main result asserts, roughly speaking, that Chowla’s conjecture can be established unconditionally provided one has non-trivial averaging in the {h_1,\dots,h_k} parameters. More precisely, one has

Theorem 1 (Chowla on the average) Suppose {H = H(X) \leq X} is a quantity that goes to infinity as {X \rightarrow \infty} (but it can go to infinity arbitrarily slowly). Then for any fixed {k \geq 1}, we have

\displaystyle  \sum_{h_1,\dots,h_k \leq H} |\sum_{n \leq X} \lambda(n+h_1) \dots \lambda(n+h_k)| = o( H^k X ).

In fact, we can remove one of the averaging parameters and obtain

\displaystyle  \sum_{h_2,\dots,h_k \leq H} |\sum_{n \leq X} \lambda(n) \lambda(n+h_2) \dots \lambda(n+h_k)| = o( H^{k-1} X ).

Actually we can make the decay rate a bit more quantitative, gaining about {\frac{\log\log H}{\log H}} over the trivial bound. The key case is {k=2}; while the unaveraged Chowla conjecture becomes more difficult as {k} increases, the averaged Chowla conjecture does not increase in difficulty due to the increasing amount of averaging for larger {k}, and we end up deducing the higher {k} case of the conjecture from the {k=2} case by an elementary argument.

The proof of the theorem proceeds as follows. By exploiting the Fourier-analytic identity

\displaystyle  \int_{{\mathbf T}} (\int_{\mathbf R} |\sum_{x \leq n \leq x+H} f(n) e(\alpha n)|^2 dx)^2\ d\alpha

\displaystyle = \sum_{|h| \leq H} (H-|h|)^2 |\sum_n f(n) \overline{f}(n+h)|^2

(related to a standard Fourier-analytic identity for the Gowers {U^2} norm) it turns out that the {k=2} case of the above theorem can basically be derived from an estimate of the form

\displaystyle  \int_0^X |\sum_{x \leq n \leq x+H} \lambda(n) e(\alpha n)|\ dx = o( H X )

uniformly for all {\alpha \in {\mathbf T}}. For “major arc” {\alpha}, close to a rational {a/q} for small {q}, we can establish this bound from a generalisation of a recent result of Matomaki and Radziwill (discussed in this previous post) on averages of multiplicative functions in short intervals. For “minor arc” {\alpha}, we can proceed instead from an argument of Katai and Bourgain-Sarnak-Ziegler (discussed in this previous post).

The argument also extends to other bounded multiplicative functions than the Liouville function. Chowla’s conjecture was generalised by Elliott, who roughly speaking conjectured that the {k} copies of {\lambda} in Chowla’s conjecture could be replaced by arbitrary bounded multiplicative functions {g_1,\dots,g_k} as long as these functions were far from a twisted Dirichlet character {n \mapsto \chi(n) n^{it}} in the sense that

\displaystyle  \sum_p \frac{1 - \hbox{Re} g(p) \overline{\chi(p) p^{it}}}{p} = +\infty. \ \ \ \ \ (1)

(This type of distance is incidentally now a fundamental notion in the Granville-Soundararajan “pretentious” approach to multiplicative number theory.) During our work on this project, we found that Elliott’s conjecture is not quite true as stated due to a technicality: one can cook up a bounded multiplicative function {g} which behaves like {n^{it_j}} on scales {n \sim N_j} for some {N_j} going to infinity and some slowly varying {t_j}, and such a function will be far from any fixed Dirichlet character whilst still having many large correlations (e.g. the pair correlations {\sum_{n \leq N_j} g(n+1) \overline{g(n)}} will be large). In our paper we propose a technical “fix” to Elliott’s conjecture (replacing (1) by a truncated variant), and show that this repaired version of Elliott’s conjecture is true on the average in much the same way that Chowla’s conjecture is. (If one restricts attention to real-valued multiplicative functions, then this technical issue does not show up, basically because one can assume without loss of generality that {t=0} in this case; we discuss this fact in an appendix to the paper.)

Kevin Ford, Ben Green, Sergei Konyagin, James Maynard, and I have just uploaded to the arXiv our paper “Long gaps between primes“. This is a followup work to our two previous papers (discussed in this previous post), in which we had simultaneously shown that the maximal gap

\displaystyle  G(X) := \sup_{p_n, p_{n+1} \leq X} p_{n+1}-p_n

between primes up to {X} exhibited a lower bound of the shape

\displaystyle  G(X) \geq f(X) \log X \frac{\log \log X \log\log\log\log X}{(\log\log\log X)^2} \ \ \ \ \ (1)

for some function {f(X)} that went to infinity as {X \rightarrow \infty}; this improved upon previous work of Rankin and other authors, who established the same bound but with {f(X)} replaced by a constant. (Again, see the previous post for a more detailed discussion.)

In our previous papers, we did not specify a particular growth rate for {f(X)}. In my paper with Kevin, Ben, and Sergei, there was a good reason for this: our argument relied (amongst other things) on the inverse conjecture on the Gowers norms, as well as the Siegel-Walfisz theorem, and the known proofs of both results both have ineffective constants, rendering our growth function {f(X)} similarly ineffective. Maynard’s approach ostensibly also relies on the Siegel-Walfisz theorem, but (as shown in another recent paper of his) can be made quite effective, even when tracking {k}-tuples of fairly large size (about {\log^c x} for some small {c}). If one carefully makes all the bounds in Maynard’s argument quantitative, one eventually ends up with a growth rate {f(X)} of shape

\displaystyle  f(X) \asymp \frac{\log \log \log X}{\log\log\log\log X}, \ \ \ \ \ (2)

thus leading to a bound

\displaystyle  G(X) \gg \log X \frac{\log \log X}{\log\log\log X}

on the gaps between primes for large {X}; this is an unpublished calculation of James’.

In this paper we make a further refinement of this calculation to obtain a growth rate

\displaystyle  f(X) \asymp \log \log \log X \ \ \ \ \ (3)

leading to a bound of the form

\displaystyle  G(X) \geq c \log X \frac{\log \log X \log\log\log\log X}{\log\log\log X} \ \ \ \ \ (4)

for large {X} and some small constant {c}. Furthermore, this appears to be the limit of current technology (in particular, falling short of Cramer’s conjecture that {G(X)} is comparable to {\log^2 X}); in the spirit of Erdös’ original prize on this problem, I would like to offer 10,000 USD for anyone who can show (in a refereed publication, of course) that the constant {c} here can be replaced by an arbitrarily large constant {C}.

The reason for the growth rate (3) is as follows. After following the sieving process discussed in the previous post, the problem comes down to something like the following: can one sieve out all (or almost all) of the primes in {[x,y]} by removing one residue class modulo {p} for all primes {p} in (say) {[x/4,x/2]}? Very roughly speaking, if one can solve this problem with {y = g(x) x}, then one can obtain a growth rate on {f(X)} of the shape {f(X) \sim g(\log X)}. (This is an oversimplification, as one actually has to sieve out a random subset of the primes, rather than all the primes in {[x,y]}, but never mind this detail for now.)

Using the quantitative “dense clusters of primes” machinery of Maynard, one can find lots of {k}-tuples in {[x,y]} which contain at least {\gg \log k} primes, for {k} as large as {\log^c x} or so (so that {\log k} is about {\log\log x}). By considering {k}-tuples in arithmetic progression, this means that one can find lots of residue classes modulo a given prime {p} in {[x/4,x/2]} that capture about {\log\log x} primes. In principle, this means that union of all these residue classes can cover about {\frac{x}{\log x} \log\log x} primes, allowing one to take {g(x)} as large as {\log\log x}, which corresponds to (3). However, there is a catch: the residue classes for different primes {p} may collide with each other, reducing the efficiency of the covering. In our previous papers on the subject, we selected the residue classes randomly, which meant that we had to insert an additional logarithmic safety margin in expected number of times each prime would be shifted out by one of the residue classes, in order to guarantee that we would (with high probability) sift out most of the primes. This additional safety margin is ultimately responsible for the {\log\log\log\log X} loss in (2).

The main innovation of this paper, beyond detailing James’ unpublished calculations, is to use ideas from the literature on efficient hypergraph covering, to avoid the need for a logarithmic safety margin. The hypergraph covering problem, roughly speaking, is to try to cover a set of {n} vertices using as few “edges” from a given hypergraph {H} as possible. If each edge has {m} vertices, then one certainly needs at least {n/m} edges to cover all the vertices, and the question is to see if one can come close to attaining this bound given some reasonable uniform distribution hypotheses on the hypergraph {H}. As before, random methods tend to require something like {\frac{n}{m} \log r} edges before one expects to cover, say {1-1/r} of the vertices.

However, it turns out (under reasonable hypotheses on {H}) to eliminate this logarithmic loss, by using what is now known as the “semi-random method” or the “Rödl nibble”. The idea is to randomly select a small number of edges (a first “nibble”) – small enough that the edges are unlikely to overlap much with each other, thus obtaining maximal efficiency. Then, one pauses to remove all the edges from {H} that intersect edges from this first nibble, so that all remaining edges will not overlap with the existing edges. One then randomly selects another small number of edges (a second “nibble”), and repeats this process until enough nibbles are taken to cover most of the vertices. Remarkably, it turns out that under some reasonable assumptions on the hypergraph {H}, one can maintain control on the uniform distribution of the edges throughout the nibbling process, and obtain an efficient hypergraph covering. This strategy was carried out in detail in an influential paper of Pippenger and Spencer.

In our setup, the vertices are the primes in {[x,y]}, and the edges are the intersection of the primes with various residue classes. (Technically, we have to work with a family of hypergraphs indexed by a prime {p}, rather than a single hypergraph, but let me ignore this minor technical detail.) The semi-random method would in principle eliminate the logarithmic loss and recover the bound (3). However, there is a catch: the analysis of Pippenger and Spencer relies heavily on the assumption that the hypergraph is uniform, that is to say all edges have the same size. In our context, this requirement would mean that each residue class captures exactly the same number of primes, which is not the case; we only control the number of primes in an average sense, but we were unable to obtain any concentration of measure to come close to verifying this hypothesis. And indeed, the semi-random method, when applied naively, does not work well with edges of variable size – the problem is that edges of large size are much more likely to be eliminated after each nibble than edges of small size, since they have many more vertices that could overlap with the previous nibbles. Since the large edges are clearly the more useful ones for the covering problem than small ones, this bias towards eliminating large edges significantly reduces the efficiency of the semi-random method (and also greatly complicates the analysis of that method).

Our solution to this is to iteratively reweight the probability distribution on edges after each nibble to compensate for this bias effect, giving larger edges a greater weight than smaller edges. It turns out that there is a natural way to do this reweighting that allows one to repeat the Pippenger-Spencer analysis in the presence of edges of variable size, and this ultimately allows us to recover the full growth rate (3).

To go beyond (3), one either has to find a lot of residue classes that can capture significantly more than {\log\log x} primes of size {x} (which is the limit of the multidimensional Selberg sieve of Maynard and myself), or else one has to find a very different method to produce large gaps between primes than the Erdös-Rankin method, which is the method used in all previous work on the subject.

It turns out that the arguments in this paper can be combined with the Maier matrix method to also produce chains of consecutive large prime gaps whose size is of the order of (4); three of us (Kevin, James, and myself) will detail this in a future paper. (A similar combination was also recently observed in connection with our earlier result (1) by Pintz, but there are some additional technical wrinkles required to recover the full gain of (3) for the chains of large gaps problem.)

Van Vu and I have just uploaded to the arXiv our paper “Random matrices have simple spectrum“. Recall that an {n \times n} Hermitian matrix is said to have simple eigenvalues if all of its {n} eigenvalues are distinct. This is a very typical property of matrices to have: for instance, as discussed in this previous post, in the space of all {n \times n} Hermitian matrices, the space of matrices without all eigenvalues simple has codimension three, and for real symmetric cases this space has codimension two. In particular, given any random matrix ensemble of Hermitian or real symmetric matrices with an absolutely continuous distribution, we conclude that random matrices drawn from this ensemble will almost surely have simple eigenvalues.

For discrete random matrix ensembles, though, the above argument breaks down, even though general universality heuristics predict that the statistics of discrete ensembles should behave similarly to those of continuous ensembles. A model case here is the adjacency matrix {M_n} of an Erdös-Rényi graph – a graph on {n} vertices in which any pair of vertices has an independent probability {p} of being in the graph. For the purposes of this paper one should view {p} as fixed, e.g. {p=1/2}, while {n} is an asymptotic parameter going to infinity. In this context, our main result is the following (answering a question of Babai):

Theorem 1 With probability {1-o(1)}, {M_n} has simple eigenvalues.

Our argument works for more general Wigner-type matrix ensembles, but for sake of illustration we will stick with the Erdös-Renyi case. Previous work on local universality for such matrix models (e.g. the work of Erdos, Knowles, Yau, and Yin) was able to show that any individual eigenvalue gap {\lambda_{i+1}(M)-\lambda_i(M)} did not vanish with probability {1-o(1)} (in fact {1-O(n^{-c})} for some absolute constant {c>0}), but because there are {n} different gaps that one has to simultaneously ensure to be non-zero, this did not give Theorem 1 as one is forced to apply the union bound.

Our argument in fact gives simplicity of the spectrum with probability {1-O(n^{-A})} for any fixed {A}; in a subsequent paper we also show that it gives a quantitative lower bound on the eigenvalue gaps (analogous to how many results on the singularity probability of random matrices can be upgraded to a bound on the least singular value).

The basic idea of argument can be sketched as follows. Suppose that {M_n} has a repeated eigenvalue {\lambda}. We split

\displaystyle M_n = \begin{pmatrix} M_{n-1} & X \\ X^T & 0 \end{pmatrix}

for a random {n-1 \times n-1} minor {M_{n-1}} and a random sign vector {X}; crucially, {X} and {M_{n-1}} are independent. If {M_n} has a repeated eigenvalue {\lambda}, then by the Cauchy interlacing law, {M_{n-1}} also has an eigenvalue {\lambda}. We now write down the eigenvector equation for {M_n} at {\lambda}:

\displaystyle \begin{pmatrix} M_{n-1} & X \\ X^T & 0 \end{pmatrix} \begin{pmatrix} v \\ a \end{pmatrix} = \lambda \begin{pmatrix} v \\ a \end{pmatrix}.

Extracting the top {n-1} coefficients, we obtain

\displaystyle (M_{n-1} - \lambda) v + a X = 0.

If we let {w} be the {\lambda}-eigenvector of {M_{n-1}}, then by taking inner products with {w} we conclude that

\displaystyle a (w \cdot X) = 0;

we typically expect {a} to be non-zero, in which case we arrive at

\displaystyle w \cdot X = 0.

In other words, in order for {M_n} to have a repeated eigenvalue, the top right column {X} of {M_n} has to be orthogonal to an eigenvector {w} of the minor {M_{n-1}}. Note that {X} and {w} are going to be independent (once we specify which eigenvector of {M_{n-1}} to take as {w}). On the other hand, thanks to inverse Littlewood-Offord theory (specifically, we use an inverse Littlewood-Offord theorem of Nguyen and Vu), we know that the vector {X} is unlikely to be orthogonal to any given vector {w} independent of {X}, unless the coefficients of {w} are extremely special (specifically, that most of them lie in a generalised arithmetic progression). The main remaining difficulty is then to show that eigenvectors of a random matrix are typically not of this special form, and this relies on a conditioning argument originally used by Komlós to bound the singularity probability of a random sign matrix. (Basically, if an eigenvector has this special form, then one can use a fraction of the rows and columns of the random matrix to determine the eigenvector completely, while still preserving enough randomness in the remaining portion of the matrix so that this vector will in fact not be an eigenvector with high probability.)

I’ve just uploaded to the arXiv my paper “The Elliott-Halberstam conjecture implies the Vinogradov least quadratic nonresidue conjecture“. As the title suggests, this paper links together the Elliott-Halberstam conjecture from sieve theory with the conjecture of Vinogradov concerning the least quadratic nonresidue {n(p)} of a prime {p}. Vinogradov established the bound

\displaystyle  n(p) \ll p^{\frac{1}{2\sqrt{e}}} \log^2 p \ \ \ \ \ (1)

and conjectured that

\displaystyle  n(p) \ll p^\varepsilon \ \ \ \ \ (2)

for any fixed {\varepsilon>0}. Unconditionally, the best result so far (up to logarithmic factors) that holds for all primes {p} is by Burgess, who showed that

\displaystyle  n(p) \ll p^{\frac{1}{4\sqrt{e}}+\varepsilon} \ \ \ \ \ (3)

for any fixed {\varepsilon>0}. See this previous post for a proof of these bounds.

In this paper, we show that the Vinogradov conjecture is a consequence of the Elliott-Halberstam conjecture. Using a variant of the argument, we also show that the “Type II” estimates established by Zhang and numerically improved by the Polymath8a project can be used to improve a little on the Vinogradov bound (1), to

\displaystyle  n(p) \ll p^{(\frac{1}{2}-\frac{1}{34})\frac{1}{\sqrt{e}} + \varepsilon},

although this falls well short of the Burgess bound. However, the method is somewhat different (although in both cases it is the Weil exponential sum bounds that are the source of the gain over (1)) and it is conceivable that a numerically stronger version of the Type II estimates could obtain results that are more competitive with the Burgess bound. At any rate, this demonstrates that the equidistribution estimates introduced by Zhang may have further applications beyond the family of results related to bounded gaps between primes.

The connection between the least quadratic nonresidue problem and Elliott-Halberstam is follows. Suppose for contradiction we can find a prime {q} with {n(q)} unusually large. Letting {\chi} be the quadratic character modulo {q}, this implies that the sums {\sum_{n \leq x} \chi(n)} are also unusually large for a significant range of {x} (e.g. {x < n(q)}), although the sum is also quite small for large {x} (e.g. {x > q}), due to the cancellation present in {\chi}. It turns out (by a sort of “uncertainty principle” for multiplicative functions, as per this paper of Granville and Soundararajan) that these facts force {\sum_{n\leq x} \chi(n) \Lambda(n)} to be unusually large in magnitude for some large {x} (with {q^C \leq x \leq q^{C'}} for two large absolute constants {C,C'}). By the periodicity of {\chi}, this means that

\displaystyle  \sum_{n\leq x} \chi(n) \Lambda(n+q)

must be unusually large also. However, because {n(q)} is large, one can factorise {\chi} as {f * 1} for a fairly sparsely supported function {f = \chi * \mu}. The Elliott-Halberstam conjecture, which controls the distribution of {\Lambda} in arithmetic progressions on the average can then be used to show that {\sum_{n \leq x} (f*1)(n) \Lambda(n+q)} is small, giving the required contradiction.

The implication involving Type II estimates is proven by a variant of the argument. If {n(q)} is large, then a character sum {\sum_{N\leq n \leq 2N} \chi(n)} is unusually large for a certain {N}. By multiplicativity of {\chi}, this shows that {\chi} correlates with {\chi * 1_{[N,2N]}}, and then by periodicity of {\chi}, this shows that {\chi(n)} correlates with {\chi * 1_{[N,2N]}(n+jq)} for various small {q}. By the Cauchy-Schwarz inequality (cf. this previous blog post), this implies that {\chi * 1_{[N,2N]}(n+jq)} correlates with {\chi * 1_{[N,2N]}(n+j'q)} for some distinct {j,j'}. But this can be ruled out by using Type II estimates.

I’ll record here a well-known observation concerning potential counterexamples to any improvement to the Burgess bound, that is to say an infinite sequence of primes {p} with {n(p) = p^{\frac{1}{4\sqrt{e}} + o(1)}}. Suppose we let {a(t)} be the asymptotic mean value of the quadratic character {\chi} at {p^t} and {b(t)} the mean value of {\chi \Lambda}; these quantities are defined precisely in my paper, but roughly speaking one can think of

\displaystyle  a(t) = \lim_{p \rightarrow \infty} \frac{1}{p^t} \sum_{n \leq p^t} \chi(n)


\displaystyle  b(t) = \lim_{p \rightarrow \infty} \frac{1}{p^t} \sum_{n \leq p^t} \chi(n) \Lambda(n).

Thanks to the basic Dirichlet convolution identity {\chi(n) \log(n) = \chi * \chi\Lambda(n)}, one can establish the Wirsing integral equation

\displaystyle  t a(t) = \int_0^t a(u) b(t-u)\ du

for all {t \geq 0}; see my paper for details (actually far sharper claims than this appear in previous work of Wirsing and Granville-Soundararajan). If we have an infinite sequence of counterexamples to any improvement to the Burgess bound, then we have

\displaystyle  a(t)=b(t) = 1 \hbox{ for } t < \frac{1}{4\sqrt{e}}

while from the Burgess exponential sum estimates we have

\displaystyle  a(t) = 0 \hbox{ for } t > \frac{1}{4}.

These two constraints, together with the Wirsing integral equation, end up determining {a} and {b} completely. It turns out that we must have

\displaystyle  b(t) = -1 \hbox{ for } \frac{1}{4\sqrt{e}} \leq t \leq \frac{1}{4}


\displaystyle  a(t) = 1 - 2 \log(4 \sqrt{e} t) \hbox{ for } \frac{1}{4\sqrt{e}} \leq t \leq \frac{1}{4}

and then for {t > \frac{1}{4}}, {b} evolves by the integral equation

\displaystyle  b(t) = \int_{1/4\sqrt{e}}^{1/4} b(t-u) \frac{2du}{u}.

For instance

\displaystyle  b(t) = 1 \hbox{ for } \frac{1}{4} < t \leq \frac{1}{2\sqrt{e}}

and then {b} oscillates in a somewhat strange fashion towards zero as {t \rightarrow \infty}. This very odd behaviour of {\sum_n \chi(n) \Lambda(n)} is surely impossible, but it seems remarkably hard to exclude it without invoking a strong hypothesis, such as GRH or the Elliott-Halberstam conjecture (or weaker versions thereof).

Tamar Ziegler and I have just uploaded to the arXiv our paper “Narrow progressions in the primes“, submitted to the special issue “Analytic Number Theory” in honor of the 60th birthday of Helmut Maier. The results here are vaguely reminiscent of the recent progress on bounded gaps in the primes, but use different methods.

About a decade ago, Ben Green and I showed that the primes contained arbitrarily long arithmetic progressions: given any {k}, one could find a progression {n, n+r, \dots, n+(k-1)r} with {r>0} consisting entirely of primes. In fact we showed the same statement was true if the primes were replaced by any subset of the primes of positive relative density.

A little while later, Tamar Ziegler and I obtained the following generalisation: given any {k} and any polynomials {P_1,\dots,P_k: {\bf Z} \rightarrow {\bf Z}} with {P_1(0)=\dots=P_k(0)}, one could find a “polynomial progression” {n+P_1(r),\dots,n+P_k(r)} with {r>0} consisting entirely of primes. Furthermore, we could make this progression somewhat “narrow” by taking {r = n^{o(1)}} (where {o(1)} denotes a quantity that goes to zero as {n} goes to infinity). Again, the same statement also applies if the primes were replaced by a subset of positive relative density. My previous result with Ben corresponds to the linear case {P_i(r) = (i-1)r}.

In this paper we were able to make the progressions a bit narrower still: given any {k} and any polynomials {P_1,\dots,P_k: {\bf Z} \rightarrow {\bf Z}} with {P_1(0)=\dots=P_k(0)}, one could find a “polynomial progression” {n+P_1(r),\dots,n+P_k(r)} with {r>0} consisting entirely of primes, and such that {r \leq \log^L n}, where {L} depends only on {k} and {P_1,\dots,P_k} (in fact it depends only on {k} and the degrees of {P_1,\dots,P_k}). The result is still true if the primes are replaced by a subset of positive density {\delta}, but unfortunately in our arguments we must then let {L} depend on {\delta}. However, in the linear case {P_i(r) = (i-1)r}, we were able to make {L} independent of {\delta} (although it is still somewhat large, of the order of {k 2^k}).

The polylogarithmic factor is somewhat necessary: using an upper bound sieve, one can easily construct a subset of the primes of density, say, {90\%}, whose arithmetic progressions {n,n+r,\dots,n+(k-1)r} of length {k} all obey the lower bound {r \gg \log^{k-1} n}. On the other hand, the prime tuples conjecture predicts that if one works with the actual primes rather than dense subsets of the primes, then one should have infinitely many length {k} arithmetic progressions of bounded width for any fixed {k}. The {k=2} case of this is precisely the celebrated theorem of Yitang Zhang that was the focus of the recently concluded Polymath8 project here. The higher {k} case is conjecturally true, but appears to be out of reach of known methods. (Using the multidimensional Selberg sieve of Maynard, one can get {m} primes inside an interval of length {O( \exp(O(m)) )}, but this is such a sparse set of primes that one would not expect to find even a progression of length three within such an interval.)

The argument in the previous paper was unable to obtain a polylogarithmic bound on the width of the progressions, due to the reliance on a certain technical “correlation condition” on a certain Selberg sieve weight {\nu}. This correlation condition required one to control arbitrarily long correlations of {\nu}, which was not compatible with a bounded value of {L} (particularly if one wanted to keep {L} independent of {\delta}).

However, thanks to recent advances in this area by Conlon, Fox, and Zhao (who introduced a very nice “densification” technique), it is now possible (in principle, at least) to delete this correlation condition from the arguments. Conlon-Fox-Zhao did this for my original theorem with Ben; and in the current paper we apply the densification method to our previous argument to similarly remove the correlation condition. This method does not fully eliminate the need to control arbitrarily long correlations, but allows most of the factors in such a long correlation to be bounded, rather than merely controlled by an unbounded weight such as {\nu}. This turns out to be significantly easier to control, although in the non-linear case we still unfortunately had to make {L} large compared to {\delta} due to a certain “clearing denominators” step arising from the complicated nature of the Gowers-type uniformity norms that we were using to control polynomial averages. We believe though that this an artefact of our method, and one should be able to prove our theorem with an {L} that is uniform in {\delta}.

Here is a simple instance of the densification trick in action. Suppose that one wishes to establish an estimate of the form

\displaystyle  {\bf E}_n {\bf E}_r f(n) g(n+r) h(n+r^2) = o(1) \ \ \ \ \ (1)

for some real-valued functions {f,g,h} which are bounded in magnitude by a weight function {\nu}, but which are not expected to be bounded; this average will naturally arise when trying to locate the pattern {(n,n+r,n+r^2)} in a set such as the primes. Here I will be vague as to exactly what range the parameters {n,r} are being averaged over. Suppose that the factor {g} (say) has enough uniformity that one can already show a smallness bound

\displaystyle  {\bf E}_n {\bf E}_r F(n) g(n+r) H(n+r^2) = o(1) \ \ \ \ \ (2)

whenever {F, H} are bounded functions. (One should think of {F,H} as being like the indicator functions of “dense” sets, in contrast to {f,h} which are like the normalised indicator functions of “sparse” sets). The bound (2) cannot be directly applied to control (1) because of the unbounded (or “sparse”) nature of {f} and {h}. However one can “densify” {f} and {h} as follows. Since {f} is bounded in magnitude by {\nu}, we can bound the left-hand side of (1) as

\displaystyle  {\bf E}_n \nu(n) | {\bf E}_r g(n+r) h(n+r^2) |.

The weight function {\nu} will be normalised so that {{\bf E}_n \nu(n) = O(1)}, so by the Cauchy-Schwarz inequality it suffices to show that

\displaystyle  {\bf E}_n \nu(n) | {\bf E}_r g(n+r) h(n+r^2) |^2 = o(1).

The left-hand side expands as

\displaystyle  {\bf E}_n {\bf E}_r {\bf E}_s \nu(n) g(n+r) h(n+r^2) g(n+s) h(n+s^2).

Now, it turns out that after an enormous (but finite) number of applications of the Cauchy-Schwarz inequality to steadily eliminate the {g,h} factors, as well as a certain “polynomial forms condition” hypothesis on {\nu}, one can show that

\displaystyle  {\bf E}_n {\bf E}_r {\bf E}_s (\nu-1)(n) g(n+r) h(n+r^2) g(n+s) h(n+s^2) = o(1).

(Because of the polynomial shifts, this requires a method known as “PET induction”, but let me skip over this point here.) In view of this estimate, we now just need to show that

\displaystyle  {\bf E}_n {\bf E}_r {\bf E}_s g(n+r) h(n+r^2) g(n+s) h(n+s^2) = o(1).

Now we can reverse the previous steps. First, we collapse back to

\displaystyle  {\bf E}_n | {\bf E}_r g(n+r) h(n+r^2) |^2 = o(1).

One can bound {|{\bf E}_r g(n+r) h(n+r^2)|} by {{\bf E}_r \nu(n+r) \nu(n+r^2)}, which can be shown to be “bounded on average” in a suitable sense (e.g. bounded {L^4} norm) via the aforementioned polynomial forms condition. Because of this and the Hölder inequality, the above estimate is equivalent to

\displaystyle  {\bf E}_n | {\bf E}_r g(n+r) h(n+r^2) | = o(1).

By setting {F} to be the signum of {{\bf E}_r g(n+r) h(n+r^2)}, this is equivalent to

\displaystyle  {\bf E}_n {\bf E}_r F(n) g(n+r) h(n+r^2) = o(1).

This is halfway between (1) and (2); the sparsely supported function {f} has been replaced by its “densification” {F}, but we have not yet densified {h} to {H}. However, one can shift {n} by {r^2} and repeat the above arguments to achieve a similar densificiation of {h}, at which point one has reduced (1) to (2).

Kevin Ford, Ben Green, Sergei Konyagin, and myself have just posted to the arXiv our preprint “Large gaps between consecutive prime numbers“. This paper concerns the “opposite” problem to that considered by the recently concluded Polymath8 project, which was concerned with very small values of the prime gap {p_{n+1}-p_n}. Here, we wish to consider the largest prime gap {G(X) = p_{n+1}-p_n} that one can find in the interval {[X] = \{1,\dots,X\}} as {X} goes to infinity.

Finding lower bounds on {G(X)} is more or less equivalent to locating long strings of consecutive composite numbers that are not too large compared to the length of the string. A classic (and quite well known) construction here starts with the observation that for any natural number {n}, the consecutive numbers {n!+2, n!+3,\dots,n!+n} are all composite, because each {n!+i}, {i=2,\dots,n} is divisible by some prime {p \leq n}, while being strictly larger than that prime {p}. From this and Stirling’s formula, it is not difficult to obtain the bound

\displaystyle G(X) \gg \frac{\log X}{\log\log X}. \ \ \ \ \ (1)


A more efficient bound comes from the prime number theorem: there are only {(1+o(1)) \frac{X}{\log X}} primes up to {X}, so just from the pigeonhole principle one can locate a string of consecutive composite numbers up to {X} of length at least {(1-o(1)) \log X}, thus

\displaystyle G(X) \gtrsim \log X \ \ \ \ \ (2)


where we use {X \gtrsim Y} or {Y \lesssim X} as shorthand for {X \geq (1-o(1)) Y} or {Y \leq (1+o(1)) X}.

What about upper bounds? The Cramér random model predicts that the primes up to {X} are distributed like a random subset {\{1,\dots,X\}} of density {1/\log X}. Using this model, Cramér arrived at the conjecture

\displaystyle G(X) \ll \log^2 X.

In fact, if one makes the extremely optimistic assumption that the random model perfectly describes the behaviour of the primes, one would arrive at the even more precise prediction

\displaystyle G(X) \sim \log^2 X.

However, it is no longer widely believed that this optimistic version of the conjecture is true, due to some additional irregularities in the primes coming from the basic fact that large primes cannot be divisible by very small primes. Using the Maier matrix method to capture some of this irregularity, Granville was led to the conjecture that

\displaystyle G(X) \gtrsim 2e^{-\gamma} \log^2 X

(note that {2e^{-\gamma} = 1.1229\dots} is slightly larger than {1}). For comparison, the known upper bounds on {G(X)} are quite weak; unconditionally one has {G(X) \ll X^{0.525}} by the work of Baker, Harman, and Pintz, and even on the Riemann hypothesis one only gets down to {G(X) \ll X^{1/2} \log X}, as shown by Cramér (a slight improvement is also possible if one additionally assumes the pair correlation conjecture; see this article of Heath-Brown and the references therein).

This conjecture remains out of reach of current methods. In 1931, Westzynthius managed to improve the bound (2) slightly to

\displaystyle G(X) \gg \frac{\log\log\log X}{\log\log\log\log X} \log X ,

which Erdös in 1935 improved to

\displaystyle G(X) \gg \frac{\log\log X}{(\log\log\log X)^2} \log X

and Rankin in 1938 improved slightly further to

\displaystyle G(X) \gtrsim c \frac{\log\log X (\log\log\log\log X)}{(\log\log\log X)^2} \log X \ \ \ \ \ (3)


with {c=1/3}. Remarkably, this rather strange bound then proved extremely difficult to advance further on; until recently, the only improvements were to the constant {c}, which was raised to {c=\frac{1}{2} e^\gamma} in 1963 by Schönhage, to {c= e^\gamma} in 1963 by Rankin, to {c = 1.31256 e^\gamma} by Maier and Pomerance, and finally to {c = 2e^\gamma} in 1997 by Pintz.

Erdös listed the problem of making {c} arbitrarily large one of his favourite open problems, even offering (“somewhat rashly”, in his words) a cash prize for the solution. Our main result answers this question in the affirmative:

Theorem 1 The bound (3) holds for arbitrarily large {c>0}.

In principle, we thus have a bound of the form

\displaystyle G(X) \geq f(X) \frac{\log\log X (\log\log\log\log X)}{(\log\log\log X)^2} \log X

for some {f(X)} that grows to infinity. Unfortunately, due to various sources of ineffectivity in our methods, we cannot provide any explicit rate of growth on {f(X)} at all.

We decided to announce this result the old-fashioned way, as part of a research lecture; more precisely, Ben Green announced the result in his ICM lecture this Tuesday. (The ICM staff have very efficiently put up video of his talks (and most of the other plenary and prize talks) online; Ben’s talk is here, with the announcement beginning at about 0:48. Note a slight typo in his slides, in that the exponent of {\log\log\log X} in the denominator is {3} instead of {2}.) Ben’s lecture slides may be found here.

By coincidence, an independent proof of this theorem has also been obtained very recently by James Maynard.

I discuss our proof method below the fold.

Read the rest of this entry »

I’ve just uploaded to the arXiv the D.H.J. Polymath paper “Variants of the Selberg sieve, and bounded intervals containing many primes“, which is the second paper to be produced from the Polymath8 project (the first one being discussed here). We’ll refer to this latter paper here as the Polymath8b paper, and the former as the Polymath8a paper. As with Polymath8a, the Polymath8b paper is concerned with the smallest asymptotic prime gap

\displaystyle H_1 := \liminf_{n \rightarrow \infty}(p_{n+1}-p_n),

where {p_n} denotes the {n^{th}} prime, as well as the more general quantities

\displaystyle H_m := \liminf_{n \rightarrow \infty}(p_{n+m}-p_n).

In the breakthrough paper of Goldston, Pintz, and Yildirim, the bound {H_1 \leq 16} was obtained under the strong hypothesis of the Elliott-Halberstam conjecture. An unconditional bound on {H_1}, however, remained elusive until the celebrated work of Zhang last year, who showed that

\displaystyle H_1 \leq 70{,}000{,}000.

The Polymath8a paper then improved this to {H_1 \leq 4{,}680}. After that, Maynard introduced a new multidimensional Selberg sieve argument that gave the substantial improvement

\displaystyle H_1 \leq 600

unconditionally, and {H_1 \leq 12} on the Elliott-Halberstam conjecture; furthermore, bounds on {H_m} for higher {m} were obtained for the first time, and specifically that {H_m \ll m^3 e^{4m}} for all {m \geq 1}, with the improvements {H_2 \leq 600} and {H_m \ll m^3 e^{2m}} on the Elliott-Halberstam conjecture. (I had independently discovered the multidimensional sieve idea, although I did not obtain Maynard’s specific numerical results, and my asymptotic bounds were a bit weaker.)

In Polymath8b, we obtain some further improvements. Unconditionally, we have {H_1 \leq 246} and {H_m \ll m e^{(4 - \frac{28}{157}) m}}, together with some explicit bounds on {H_2,H_3,H_4,H_5}; on the Elliott-Halberstam conjecture we have {H_m \ll m e^{2m}} and some numerical improvements to the {H_2,H_3,H_4,H_5} bounds; and assuming the generalised Elliott-Halberstam conjecture we have the bound {H_1 \leq 6}, which is best possible from sieve-theoretic methods thanks to the parity problem obstruction.

There were a variety of methods used to establish these results. Maynard’s paper obtained a criterion for bounding {H_m} which reduced to finding a good solution to a certain multidimensional variational problem. When the dimension parameter {k} was relatively small (e.g. {k \leq 100}), we were able to obtain good numerical solutions both by continuing the method of Maynard (using a basis of symmetric polynomials), or by using a Krylov iteration scheme. For large {k}, we refined the asymptotics and obtained near-optimal solutions of the variational problem. For the {H_1} bounds, we extended the reach of the multidimensional Selberg sieve (particularly under the assumption of the generalised Elliott-Halberstam conjecture) by allowing the function {F} in the multidimensional variational problem to extend to a larger region of space than was previously admissible, albeit with some tricky new constraints on {F} (and penalties in the variational problem). This required some unusual sieve-theoretic manipulations, notably an “epsilon trick”, ultimately relying on the elementary inequality {(a+b)^2 \geq a^2 + 2ab}, that allowed one to get non-trivial lower bounds for sums such as {\sum_n (a(n)+b(n))^2} even if the sum {\sum_n b(n)^2} had no non-trivial estimates available; and a way to estimate divisor sums such as {\sum_{n\leq x} \sum_{d|n} \lambda_d} even if {d} was permitted to be comparable to or even exceed {x}, by using the fundamental theorem of arithmetic to factorise {n} (after restricting to the case when {n} is almost prime). I hope that these sieve-theoretic tricks will be useful in future work in the subject.

With this paper, the Polymath8 project is almost complete; there is still a little bit of scope to push our methods further and get some modest improvement for instance to the {H_1 \leq 246} bound, but this would require a substantial amount of effort, and it is probably best to instead wait for some new breakthrough in the subject to come along. One final task we are performing is to write up a retrospective article on both the 8a and 8b experiences, an incomplete writeup of which can be found here. If anyone wishes to contribute some commentary on these projects (whether you were an active contributor, an occasional contributor, or a silent “lurker” in the online discussion), please feel free to do so in the comments to this post.

There are multiple purposes to this blog post.

The first purpose is to announce the uploading of the paper “New equidistribution estimates of Zhang type, and bounded gaps between primes” by D.H.J. Polymath, which is the main output of the Polymath8a project on bounded gaps between primes, to the arXiv, and to describe the main results of this paper below the fold.

The second purpose is to roll over the previous thread on all remaining Polymath8a-related matters (e.g. updates on the submission status of the paper) to a fresh thread. (Discussion of the ongoing Polymath8b project is however being kept on a separate thread, to try to reduce confusion.)

The final purpose of this post is to coordinate the writing of a retrospective article on the Polymath8 experience, which has been solicited for the Newsletter of the European Mathematical Society. I suppose that this could encompass both the Polymath8a and Polymath8b projects, even though the second one is still ongoing (but I think we will soon be entering the endgame there). I think there would be two main purposes of such a retrospective article. The first one would be to tell a story about the process of conducting mathematical research, rather than just describe the outcome of such research; this is an important aspect of the subject which is given almost no attention in most mathematical writing, and it would be good to be able to capture some sense of this process while memories are still relatively fresh. The other would be to draw some tentative conclusions with regards to what the strengths and weaknesses of a Polymath project are, and how appropriate such a format would be for other mathematical problems than bounded gaps between primes. In my opinion, the bounded gaps problem had some fairly unique features that made it particularly amenable to a Polymath project, such as (a) a high level of interest amongst the mathematical community in the problem; (b) a very focused objective (“improve {H}!”), which naturally provided an obvious metric to measure progress; (c) the modular nature of the project, which allowed for people to focus on one aspect of the problem only, and still make contributions to the final goal; and (d) a very reasonable level of ambition (for instance, we did not attempt to prove the twin prime conjecture, which in my opinion would make a terrible Polymath project at our current level of mathematical technology). This is not an exhaustive list of helpful features of the problem; I would welcome other diagnoses of the project by other participants.

With these two objectives in mind, I propose a format for the retrospective article consisting of a brief introduction to the polymath concept in general and the polymath8 project in particular, followed by a collection of essentially independent contributions by different participants on their own experiences and thoughts. Finally we could have a conclusion section in which we make some general remarks on the polymath project (such as the remarks above). I’ve started a dropbox subfolder for this article (currently in a very skeletal outline form only), and will begin writing a section on my own experiences; other participants are of course encouraged to add their own sections (it is probably best to create separate files for these, and then input them into the main file retrospective.tex, to reduce edit conflicts. If there are participants who wish to contribute but do not currently have access to the Dropbox folder, please email me and I will try to have you added (or else you can supply your thoughts by email, or in the comments to this post; we may have a section for shorter miscellaneous comments from more casual participants, for people who don’t wish to write a lengthy essay on the subject).

As for deadlines, the EMS Newsletter would like a submitted article by mid-April in order to make the June issue, but in the worst case, it will just be held over until the issue after that.

Read the rest of this entry »


RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.

Get every new post delivered to your Inbox.

Join 4,580 other followers