You are currently browsing the category archive for the ‘paper’ category.

I’ve just uploaded to the arXiv my paper “Cancellation for the multilinear Hilbert transform“, submitted to Collectanea Mathematica. This paper uses methods from additive combinatorics (and more specifically, the arithmetic regularity and counting lemmas from this paper of Ben Green and myself) to obtain a slight amount of progress towards the open problem of obtaining bounds for the trilinear and higher Hilbert transforms (as discussed in this previous blog post). For instance, the trilinear Hilbert transform

is not known to be bounded for any to , although it is conjectured to do so when and . (For well below , one can use additive combinatorics constructions to demonstrate unboundedness; see this paper of Demeter.) One can approach this problem by considering the truncated trilinear Hilbert transforms

for . It is not difficult to show that the boundedness of is equivalent to the boundedness of with bounds that are uniform in and . On the other hand, from Minkowski’s inequality and Hölder’s inequality one can easily obtain the *non-uniform* bound of for . The main result of this paper is a slight improvement of this trivial bound to as . Roughly speaking, the way this gain is established is as follows. First there are some standard time-frequency type reductions to reduce to the task of obtaining some non-trivial cancellation on a single “tree”. Using a “generalised von Neumann theorem”, we show that such cancellation will happen if (a discretised version of) one or more of the functions (or a dual function that it is convenient to test against) is small in the Gowers norm. However, the arithmetic regularity lemma alluded to earlier allows one to represent an arbitrary function , up to a small error, as the sum of such a “Gowers uniform” function, plus a structured function (or more precisely, an *irrational virtual nilsequence*). This effectively reduces the problem to that of establishing some cancellation in a single tree in the case when all functions involved are irrational virtual nilsequences. At this point, the contribution of each component of the tree can be estimated using the “counting lemma” from my paper with Ben. The main term in the asymptotics is a certain integral over a nilmanifold, but because the kernel in the trilinear Hilbert transform is odd, it turns out that this integral vanishes, giving the required cancellation.

The same argument works for higher order Hilbert transforms (and one can also replace the coefficients in these transforms with other rational constants). However, because the quantitative bounds in the arithmetic regularity and counting lemmas are so poor, it does not seem likely that one can use these methods to remove the logarithmic growth in entirely, and some additional ideas will be needed to resolve the full conjecture.

I’ve just uploaded to the arXiv my paper “Failure of the pointwise and maximal ergodic theorems for the free group“, submitted to Forum of Mathematics, Sigma. This paper concerns a variant of the pointwise ergodic theorem of Birkhoff, which asserts that if one has a measure-preserving shift map on a probability space , then for any , the averages converge pointwise almost everywhere. (In the important case when the shift map is ergodic, the pointwise limit is simply the mean of the original function .)

The pointwise ergodic theorem can be extended to measure-preserving actions of other amenable groups, if one uses a suitably “tempered” Folner sequence of averages; see this paper of Lindenstrauss for more details. (I also wrote up some notes on that paper here, back in 2006 before I had started this blog.) But the arguments used to handle the amenable case break down completely for non-amenable groups, and in particular for the free non-abelian group on two generators.

Nevo and Stein studied this problem and obtained a number of pointwise ergodic theorems for -actions on probability spaces . For instance, for the spherical averaging operators

(where denotes the length of the reduced word that forms ), they showed that converged pointwise almost everywhere provided that was in for some . (The need to restrict to spheres of even radius can be seen by considering the action of on the two-element set in which both generators of act by interchanging the elements, in which case is determined by the parity of .) This result was reproven with a different and simpler proof by Bufetov, who also managed to relax the condition to the weaker condition .

The question remained open as to whether the pointwise ergodic theorem for -actions held if one only assumed that was in . Nevo and Stein were able to establish this for the Cesáro averages , but not for itself. About six years ago, Assaf Naor and I tried our hand at this problem, and was able to show an associated maximal inequality on , but due to the non-amenability of , this inequality did not transfer to and did not have any direct impact on this question, despite a fair amount of effort on our part to attack it.

Inspired by some recent conversations with Lewis Bowen, I returned to this problem. This time around, I tried to construct a counterexample to the pointwise ergodic theorem – something Assaf and I had not seriously attempted to do (perhaps due to being a bit too enamoured of our maximal inequality). I knew of an existing counterexample of Ornstein regarding a failure of an ergodic theorem for iterates of a self-adjoint Markov operator – in fact, I had written some notes on this example back in 2007. Upon revisiting my notes, I soon discovered that the Ornstein construction was adaptable to the setting, thus settling the problem in the negative:

Theorem 1 (Failure of pointwise ergodic theorem)There exists a measure-preserving -action on a probability space and a non-negative function such that for almost every .

To describe the proof of this theorem, let me first briefly sketch the main ideas of Ornstein’s construction, which gave an example of a self-adjoint Markov operator on a probability space and a non-negative such that for almost every . By some standard manipulations, it suffices to show that for any given and , there exists a self-adjoint Markov operator on a probability space and a non-negative with , such that on a set of measure at least . Actually, it will be convenient to replace the Markov chain with an *ancient Markov chain* – that is to say, a sequence of non-negative functions for both positive and negative , such that for all . The purpose of requiring the Markov chain to be ancient (that is, to extend infinitely far back in time) is to allow for the Markov chain to be shifted arbitrarily in time, which is key to Ornstein’s construction. (Technically, Ornstein’s original argument only uses functions that go back to a large negative time, rather than being infinitely ancient, but I will gloss over this point for sake of discussion, as it turns out that the version of the argument can be run using infinitely ancient chains.)

For any , let denote the claim that for any , there exists an ancient Markov chain with such that on a set of measure at least . Clearly holds since we can just take for all . Our objective is to show that holds for arbitrarily small . The heart of Ornstein’s argument is then the implication

for any , which upon iteration quickly gives the desired claim.

Let’s see informally how (1) works. By hypothesis, and ignoring epsilons, we can find an ancient Markov chain on some probability space of total mass , such that attains the value of or greater almost everywhere. Assuming that the Markov process is irreducible, the will eventually converge as to the constant value of , in particular its final state will essentially stay above (up to small errors).

Now suppose we duplicate the Markov process by replacing with a double copy (giving the uniform probability measure), and using the disjoint sum of the Markov operators on and as the propagator, so that there is no interaction between the two components of this new system. Then the functions form an ancient Markov chain of mass at most that lives solely in the first half of this copy, and attains the value of or greater on almost all of the first half , but is zero on the second half. The final state of will be to stay above in the first half , but be zero on the second half.

Now we modify the above example by allowing an infinitesimal amount of interaction between the two halves , of the system (I mentally think of and as two identical boxes that a particle can bounce around in, and now we wish to connect the boxes by a tiny tube). The precise way in which this interaction is inserted is not terribly important so long as the new Markov process is irreducible. Once one does so, then the ancient Markov chain in the previous example gets replaced by a slightly different ancient Markov chain which is more or less identical with for negative times , or for bounded positive times , but for very large values of the final state is now constant across the entire state space , and will stay above on this space.

Finally, we consider an ancient Markov chain which is basically of the form

for some large parameter and for all (the approximation becomes increasingly inaccurate for much larger than , but never mind this for now). This is basically two copies of the original Markov process in separate, barely interacting state spaces , but with the second copy delayed by a large time delay , and also attenuated in amplitude by a factor of . The total mass of this process is now . Because of the component of , we see that basically attains the value of or greater on the first half . On the second half , we work with times close to . If is large enough, would have averaged out to about at such times, but the component can get as large as here. Summing (and continuing to ignore various epsilon losses), we see that can get as large as on almost all of the second half of . This concludes the rough sketch of how one establishes the implication (1).

It was observed by Bufetov that the spherical averages for a free group action can be lifted up to become powers of a Markov operator, basically by randomly assigning a “velocity vector” to one’s base point and then applying the Markov process that moves along that velocity vector (and then randomly changing the velocity vector at each time step to the “reduced word” condition that the velocity never flips from to ). Thus the spherical average problem has a Markov operator interpretation, which opens the door to adapting the Ornstein construction to the setting of systems. This turns out to be doable after a certain amount of technical artifice; the main thing is to work with -measure preserving systems that admit ancient Markov chains that are initially supported in a very small region in the “interior” of the state space, so that one can couple such systems to each other “at the boundary” in the fashion needed to establish the analogue of (1) without disrupting the ancient dynamics of such chains. The initial such system (used to establish the base case ) comes from basically considering the action of on a (suitably renormalised) “infinitely large ball” in the Cayley graph, after suitably gluing together the boundary of this ball to complete the action. The ancient Markov chain associated to this system starts at the centre of this infinitely large ball at infinite negative time , and only reaches the boundary of this ball at the time .

Hoi Nguyen, Van Vu, and myself have just uploaded to the arXiv our paper “Random matrices: tail bounds for gaps between eigenvalues“. This is a followup paper to my recent paper with Van in which we showed that random matrices of Wigner type (such as the adjacency matrix of an Erdös-Renyi graph) asymptotically almost surely had simple spectrum. In the current paper, we push the method further to show that the eigenvalues are not only distinct, but are (with high probability) separated from each other by some negative power of . This follows the now standard technique of replacing any appearance of discrete Littlewood-Offord theory (a key ingredient in our previous paper) with its continuous analogue (inverse theorems for small ball probability). For general Wigner-type matrices (in which the matrix entries are not normalised to have mean zero), we can use the inverse Littlewood-Offord theorem of Nguyen and Vu to obtain (under mild conditions on ) a result of the form

for any and , if is sufficiently large depending on (in a linear fashion), and is sufficiently large depending on . The point here is that can be made arbitrarily large, and also that no continuity or smoothness hypothesis is made on the distribution of the entries. (In the continuous case, one can use the machinery of Wegner estimates to obtain results of this type, as was done in a paper of Erdös, Schlein, and Yau.)

In the mean zero case, it becomes more efficient to use an inverse Littlewood-Offord theorem of Rudelson and Vershynin to obtain (with the normalisation that the entries of have unit variance, so that the eigenvalues of are with high probability), giving the bound

for (one also has good results of this type for smaller values of ). This is only optimal in the regime ; we expect to establish some eigenvalue repulsion, improving the RHS to for real matrices and for complex matrices, but this appears to be a more difficult task (possibly requiring some *quadratic* inverse Littlewood-Offord theory, rather than just *linear* inverse Littlewood-Offord theory). However, we can get some repulsion if one works with larger gaps, getting a result roughly of the form

for any fixed and some absolute constant (which we can asymptotically make to be for large , though it ought to be as large as ), by using a higher-dimensional version of the Rudelson-Vershynin inverse Littlewood-Offord theorem.

In the case of Erdös-Renyi graphs, we don’t have mean zero and the Rudelson-Vershynin Littlewood-Offord theorem isn’t quite applicable, but by working carefully through the approach based on the Nguyen-Vu theorem we can almost recover (1), except for a loss of on the RHS.

As a sample applications of the eigenvalue separation results, we can now obtain some information about *eigenvectors*; for instance, we can show that the components of the eigenvectors all have magnitude at least for some with high probability. (Eigenvectors become much more stable, and able to be studied in isolation, once their associated eigenvalue is well separated from the other eigenvalues; see this previous blog post for more discussion.)

Kaisa Matomaki, Maksym Radziwill, and I have just uploaded to the arXiv our paper “An averaged form of Chowla’s conjecture“. This paper concerns a weaker variant of the famous conjecture of Chowla (discussed for instance in this previous post) that

as for any distinct natural numbers , where denotes the Liouville function. (One could also replace the Liouville function here by the Möbius function and obtain a morally equivalent conjecture.) This conjecture remains open for any ; for instance the assertion

is a variant of the twin prime conjecture (though possibly a tiny bit easier to prove), and is subject to the notorious parity barrier (as discussed in this previous post).

Our main result asserts, roughly speaking, that Chowla’s conjecture can be established unconditionally provided one has non-trivial averaging in the parameters. More precisely, one has

Theorem 1 (Chowla on the average)Suppose is a quantity that goes to infinity as (but it can go to infinity arbitrarily slowly). Then for any fixed , we haveIn fact, we can remove one of the averaging parameters and obtain

Actually we can make the decay rate a bit more quantitative, gaining about over the trivial bound. The key case is ; while the unaveraged Chowla conjecture becomes more difficult as increases, the averaged Chowla conjecture does not increase in difficulty due to the increasing amount of averaging for larger , and we end up deducing the higher case of the conjecture from the case by an elementary argument.

The proof of the theorem proceeds as follows. By exploiting the Fourier-analytic identity

(related to a standard Fourier-analytic identity for the Gowers norm) it turns out that the case of the above theorem can basically be derived from an estimate of the form

uniformly for all . For “major arc” , close to a rational for small , we can establish this bound from a generalisation of a recent result of Matomaki and Radziwill (discussed in this previous post) on averages of multiplicative functions in short intervals. For “minor arc” , we can proceed instead from an argument of Katai and Bourgain-Sarnak-Ziegler (discussed in this previous post).

The argument also extends to other bounded multiplicative functions than the Liouville function. Chowla’s conjecture was generalised by Elliott, who roughly speaking conjectured that the copies of in Chowla’s conjecture could be replaced by arbitrary bounded multiplicative functions as long as these functions were far from a twisted Dirichlet character in the sense that

(This type of distance is incidentally now a fundamental notion in the Granville-Soundararajan “pretentious” approach to multiplicative number theory.) During our work on this project, we found that Elliott’s conjecture is not quite true as stated due to a technicality: one can cook up a bounded multiplicative function which behaves like on scales for some going to infinity and some slowly varying , and such a function will be far from any fixed Dirichlet character whilst still having many large correlations (e.g. the pair correlations will be large). In our paper we propose a technical “fix” to Elliott’s conjecture (replacing (1) by a truncated variant), and show that this repaired version of Elliott’s conjecture is true on the average in much the same way that Chowla’s conjecture is. (If one restricts attention to real-valued multiplicative functions, then this technical issue does not show up, basically because one can assume without loss of generality that in this case; we discuss this fact in an appendix to the paper.)

Kevin Ford, Ben Green, Sergei Konyagin, James Maynard, and I have just uploaded to the arXiv our paper “Long gaps between primes“. This is a followup work to our two previous papers (discussed in this previous post), in which we had simultaneously shown that the maximal gap

between primes up to exhibited a lower bound of the shape

for some function that went to infinity as ; this improved upon previous work of Rankin and other authors, who established the same bound but with replaced by a constant. (Again, see the previous post for a more detailed discussion.)

In our previous papers, we did not specify a particular growth rate for . In my paper with Kevin, Ben, and Sergei, there was a good reason for this: our argument relied (amongst other things) on the inverse conjecture on the Gowers norms, as well as the Siegel-Walfisz theorem, and the known proofs of both results both have ineffective constants, rendering our growth function similarly ineffective. Maynard’s approach ostensibly also relies on the Siegel-Walfisz theorem, but (as shown in another recent paper of his) can be made quite effective, even when tracking -tuples of fairly large size (about for some small ). If one carefully makes all the bounds in Maynard’s argument quantitative, one eventually ends up with a growth rate of shape

on the gaps between primes for large ; this is an unpublished calculation of James’.

In this paper we make a further refinement of this calculation to obtain a growth rate

leading to a bound of the form

for large and some small constant . Furthermore, this appears to be the limit of current technology (in particular, falling short of Cramer’s conjecture that is comparable to ); in the spirit of Erdös’ original prize on this problem, I would like to offer 10,000 USD for anyone who can show (in a refereed publication, of course) that the constant here can be replaced by an arbitrarily large constant .

The reason for the growth rate (3) is as follows. After following the sieving process discussed in the previous post, the problem comes down to something like the following: can one sieve out all (or almost all) of the primes in by removing one residue class modulo for all primes in (say) ? Very roughly speaking, if one can solve this problem with , then one can obtain a growth rate on of the shape . (This is an oversimplification, as one actually has to sieve out a random subset of the primes, rather than all the primes in , but never mind this detail for now.)

Using the quantitative “dense clusters of primes” machinery of Maynard, one can find lots of -tuples in which contain at least primes, for as large as or so (so that is about ). By considering -tuples in arithmetic progression, this means that one can find lots of residue classes modulo a given prime in that capture about primes. In principle, this means that union of all these residue classes can cover about primes, allowing one to take as large as , which corresponds to (3). However, there is a catch: the residue classes for different primes may collide with each other, reducing the efficiency of the covering. In our previous papers on the subject, we selected the residue classes randomly, which meant that we had to insert an additional logarithmic safety margin in expected number of times each prime would be shifted out by one of the residue classes, in order to guarantee that we would (with high probability) sift out most of the primes. This additional safety margin is ultimately responsible for the loss in (2).

The main innovation of this paper, beyond detailing James’ unpublished calculations, is to use ideas from the literature on efficient hypergraph covering, to avoid the need for a logarithmic safety margin. The hypergraph covering problem, roughly speaking, is to try to cover a set of vertices using as few “edges” from a given hypergraph as possible. If each edge has vertices, then one certainly needs at least edges to cover all the vertices, and the question is to see if one can come close to attaining this bound given some reasonable uniform distribution hypotheses on the hypergraph . As before, random methods tend to require something like edges before one expects to cover, say of the vertices.

However, it turns out (under reasonable hypotheses on ) to eliminate this logarithmic loss, by using what is now known as the “semi-random method” or the “Rödl nibble”. The idea is to randomly select a small number of edges (a first “nibble”) – small enough that the edges are unlikely to overlap much with each other, thus obtaining maximal efficiency. Then, one pauses to remove all the edges from that intersect edges from this first nibble, so that all remaining edges will not overlap with the existing edges. One then randomly selects another small number of edges (a second “nibble”), and repeats this process until enough nibbles are taken to cover most of the vertices. Remarkably, it turns out that under some reasonable assumptions on the hypergraph , one can maintain control on the uniform distribution of the edges throughout the nibbling process, and obtain an efficient hypergraph covering. This strategy was carried out in detail in an influential paper of Pippenger and Spencer.

In our setup, the vertices are the primes in , and the edges are the intersection of the primes with various residue classes. (Technically, we have to work with a family of hypergraphs indexed by a prime , rather than a single hypergraph, but let me ignore this minor technical detail.) The semi-random method would *in principle* eliminate the logarithmic loss and recover the bound (3). However, there is a catch: the analysis of Pippenger and Spencer relies heavily on the assumption that the hypergraph is uniform, that is to say all edges have the same size. In our context, this requirement would mean that each residue class captures exactly the same number of primes, which is not the case; we only control the number of primes in an average sense, but we were unable to obtain any concentration of measure to come close to verifying this hypothesis. And indeed, the semi-random method, when applied naively, does not work well with edges of variable size – the problem is that edges of large size are much more likely to be eliminated after each nibble than edges of small size, since they have many more vertices that could overlap with the previous nibbles. Since the large edges are clearly the more useful ones for the covering problem than small ones, this bias towards eliminating large edges significantly reduces the efficiency of the semi-random method (and also greatly complicates the *analysis* of that method).

Our solution to this is to iteratively *reweight* the probability distribution on edges after each nibble to compensate for this bias effect, giving larger edges a greater weight than smaller edges. It turns out that there is a natural way to do this reweighting that allows one to repeat the Pippenger-Spencer analysis in the presence of edges of variable size, and this ultimately allows us to recover the full growth rate (3).

To go beyond (3), one either has to find a lot of residue classes that can capture significantly more than primes of size (which is the limit of the multidimensional Selberg sieve of Maynard and myself), or else one has to find a very different method to produce large gaps between primes than the Erdös-Rankin method, which is the method used in all previous work on the subject.

It turns out that the arguments in this paper can be combined with the Maier matrix method to also produce chains of consecutive large prime gaps whose size is of the order of (4); three of us (Kevin, James, and myself) will detail this in a future paper. (A similar combination was also recently observed in connection with our earlier result (1) by Pintz, but there are some additional technical wrinkles required to recover the full gain of (3) for the chains of large gaps problem.)

Van Vu and I have just uploaded to the arXiv our paper “Random matrices have simple spectrum“. Recall that an Hermitian matrix is said to have simple eigenvalues if all of its eigenvalues are distinct. This is a very typical property of matrices to have: for instance, as discussed in this previous post, in the space of all Hermitian matrices, the space of matrices without all eigenvalues simple has codimension three, and for real symmetric cases this space has codimension two. In particular, given any random matrix ensemble of Hermitian or real symmetric matrices with an absolutely continuous distribution, we conclude that random matrices drawn from this ensemble will almost surely have simple eigenvalues.

For discrete random matrix ensembles, though, the above argument breaks down, even though general universality heuristics predict that the statistics of discrete ensembles should behave similarly to those of continuous ensembles. A model case here is the adjacency matrix of an Erdös-Rényi graph – a graph on vertices in which any pair of vertices has an independent probability of being in the graph. For the purposes of this paper one should view as fixed, e.g. , while is an asymptotic parameter going to infinity. In this context, our main result is the following (answering a question of Babai):

Our argument works for more general Wigner-type matrix ensembles, but for sake of illustration we will stick with the Erdös-Renyi case. Previous work on local universality for such matrix models (e.g. the work of Erdos, Knowles, Yau, and Yin) was able to show that any individual eigenvalue gap did not vanish with probability (in fact for some absolute constant ), but because there are different gaps that one has to simultaneously ensure to be non-zero, this did not give Theorem 1 as one is forced to apply the union bound.

Our argument in fact gives simplicity of the spectrum with probability for any fixed ; in a subsequent paper we also show that it gives a quantitative lower bound on the eigenvalue gaps (analogous to how many results on the singularity probability of random matrices can be upgraded to a bound on the least singular value).

The basic idea of argument can be sketched as follows. Suppose that has a repeated eigenvalue . We split

for a random minor and a random sign vector ; crucially, and are independent. If has a repeated eigenvalue , then by the Cauchy interlacing law, also has an eigenvalue . We now write down the eigenvector equation for at :

Extracting the top coefficients, we obtain

If we let be the -eigenvector of , then by taking inner products with we conclude that

we typically expect to be non-zero, in which case we arrive at

In other words, in order for to have a repeated eigenvalue, the top right column of has to be orthogonal to an eigenvector of the minor . Note that and are going to be independent (once we specify which eigenvector of to take as ). On the other hand, thanks to inverse Littlewood-Offord theory (specifically, we use an inverse Littlewood-Offord theorem of Nguyen and Vu), we know that the vector is unlikely to be orthogonal to any given vector independent of , unless the coefficients of are extremely special (specifically, that most of them lie in a generalised arithmetic progression). The main remaining difficulty is then to show that eigenvectors of a random matrix are typically not of this special form, and this relies on a conditioning argument originally used by Komlós to bound the singularity probability of a random sign matrix. (Basically, if an eigenvector has this special form, then one can use a fraction of the rows and columns of the random matrix to determine the eigenvector completely, while still preserving enough randomness in the remaining portion of the matrix so that this vector will in fact not be an eigenvector with high probability.)

I’ve just uploaded to the arXiv my paper “The Elliott-Halberstam conjecture implies the Vinogradov least quadratic nonresidue conjecture“. As the title suggests, this paper links together the Elliott-Halberstam conjecture from sieve theory with the conjecture of Vinogradov concerning the least quadratic nonresidue of a prime . Vinogradov established the bound

for any fixed . Unconditionally, the best result so far (up to logarithmic factors) that holds for all primes is by Burgess, who showed that

for any fixed . See this previous post for a proof of these bounds.

In this paper, we show that the Vinogradov conjecture is a consequence of the Elliott-Halberstam conjecture. Using a variant of the argument, we also show that the “Type II” estimates established by Zhang and numerically improved by the Polymath8a project can be used to improve a little on the Vinogradov bound (1), to

although this falls well short of the Burgess bound. However, the method is somewhat different (although in both cases it is the Weil exponential sum bounds that are the source of the gain over (1)) and it is conceivable that a numerically stronger version of the Type II estimates could obtain results that are more competitive with the Burgess bound. At any rate, this demonstrates that the equidistribution estimates introduced by Zhang may have further applications beyond the family of results related to bounded gaps between primes.

The connection between the least quadratic nonresidue problem and Elliott-Halberstam is follows. Suppose for contradiction we can find a prime with unusually large. Letting be the quadratic character modulo , this implies that the sums are also unusually large for a significant range of (e.g. ), although the sum is also quite small for large (e.g. ), due to the cancellation present in . It turns out (by a sort of “uncertainty principle” for multiplicative functions, as per this paper of Granville and Soundararajan) that these facts force to be unusually large in magnitude for some large (with for two large absolute constants ). By the periodicity of , this means that

must be unusually large also. However, because is large, one can factorise as for a fairly sparsely supported function . The Elliott-Halberstam conjecture, which controls the distribution of in arithmetic progressions on the average can then be used to show that is small, giving the required contradiction.

The implication involving Type II estimates is proven by a variant of the argument. If is large, then a character sum is unusually large for a certain . By multiplicativity of , this shows that correlates with , and then by periodicity of , this shows that correlates with for various small . By the Cauchy-Schwarz inequality (cf. this previous blog post), this implies that correlates with for some distinct . But this can be ruled out by using Type II estimates.

I’ll record here a well-known observation concerning potential counterexamples to any improvement to the Burgess bound, that is to say an infinite sequence of primes with . Suppose we let be the asymptotic mean value of the quadratic character at and the mean value of ; these quantities are defined precisely in my paper, but roughly speaking one can think of

and

Thanks to the basic Dirichlet convolution identity , one can establish the *Wirsing integral equation*

for all ; see my paper for details (actually far sharper claims than this appear in previous work of Wirsing and Granville-Soundararajan). If we have an infinite sequence of counterexamples to any improvement to the Burgess bound, then we have

while from the Burgess exponential sum estimates we have

These two constraints, together with the Wirsing integral equation, end up determining and completely. It turns out that we must have

and

and then for , evolves by the integral equation

For instance

and then oscillates in a somewhat strange fashion towards zero as . This very odd behaviour of is surely impossible, but it seems remarkably hard to exclude it without invoking a strong hypothesis, such as GRH or the Elliott-Halberstam conjecture (or weaker versions thereof).

Tamar Ziegler and I have just uploaded to the arXiv our paper “Narrow progressions in the primes“, submitted to the special issue “Analytic Number Theory” in honor of the 60th birthday of Helmut Maier. The results here are vaguely reminiscent of the recent progress on bounded gaps in the primes, but use different methods.

About a decade ago, Ben Green and I showed that the primes contained arbitrarily long arithmetic progressions: given any , one could find a progression with consisting entirely of primes. In fact we showed the same statement was true if the primes were replaced by any subset of the primes of positive relative density.

A little while later, Tamar Ziegler and I obtained the following generalisation: given any and any polynomials with , one could find a “polynomial progression” with consisting entirely of primes. Furthermore, we could make this progression somewhat “narrow” by taking (where denotes a quantity that goes to zero as goes to infinity). Again, the same statement also applies if the primes were replaced by a subset of positive relative density. My previous result with Ben corresponds to the linear case .

In this paper we were able to make the progressions a bit narrower still: given any and any polynomials with , one could find a “polynomial progression” with consisting entirely of primes, and such that , where depends only on and (in fact it depends only on and the degrees of ). The result is still true if the primes are replaced by a subset of positive density , but unfortunately in our arguments we must then let depend on . However, in the linear case , we were able to make independent of (although it is still somewhat large, of the order of ).

The polylogarithmic factor is somewhat necessary: using an upper bound sieve, one can easily construct a subset of the primes of density, say, , whose arithmetic progressions of length all obey the lower bound . On the other hand, the prime tuples conjecture predicts that if one works with the actual primes rather than dense subsets of the primes, then one should have infinitely many length arithmetic progressions of bounded width for any fixed . The case of this is precisely the celebrated theorem of Yitang Zhang that was the focus of the recently concluded Polymath8 project here. The higher case is conjecturally true, but appears to be out of reach of known methods. (Using the multidimensional Selberg sieve of Maynard, one can get primes inside an interval of length , but this is such a sparse set of primes that one would not expect to find even a progression of length three within such an interval.)

The argument in the previous paper was unable to obtain a polylogarithmic bound on the width of the progressions, due to the reliance on a certain technical “correlation condition” on a certain Selberg sieve weight . This correlation condition required one to control arbitrarily long correlations of , which was not compatible with a bounded value of (particularly if one wanted to keep independent of ).

However, thanks to recent advances in this area by Conlon, Fox, and Zhao (who introduced a very nice “densification” technique), it is now possible (in principle, at least) to delete this correlation condition from the arguments. Conlon-Fox-Zhao did this for my original theorem with Ben; and in the current paper we apply the densification method to our previous argument to similarly remove the correlation condition. This method does not fully eliminate the need to control arbitrarily long correlations, but allows most of the factors in such a long correlation to be *bounded*, rather than merely controlled by an unbounded weight such as . This turns out to be significantly easier to control, although in the non-linear case we still unfortunately had to make large compared to due to a certain “clearing denominators” step arising from the complicated nature of the Gowers-type uniformity norms that we were using to control polynomial averages. We believe though that this an artefact of our method, and one should be able to prove our theorem with an that is uniform in .

Here is a simple instance of the densification trick in action. Suppose that one wishes to establish an estimate of the form

for some real-valued functions which are bounded in magnitude by a weight function , but which are not expected to be bounded; this average will naturally arise when trying to locate the pattern in a set such as the primes. Here I will be vague as to exactly what range the parameters are being averaged over. Suppose that the factor (say) has enough uniformity that one can already show a smallness bound

whenever are bounded functions. (One should think of as being like the indicator functions of “dense” sets, in contrast to which are like the normalised indicator functions of “sparse” sets). The bound (2) cannot be directly applied to control (1) because of the unbounded (or “sparse”) nature of and . However one can “densify” and as follows. Since is bounded in magnitude by , we can bound the left-hand side of (1) as

The weight function will be normalised so that , so by the Cauchy-Schwarz inequality it suffices to show that

The left-hand side expands as

Now, it turns out that after an enormous (but finite) number of applications of the Cauchy-Schwarz inequality to steadily eliminate the factors, as well as a certain “polynomial forms condition” hypothesis on , one can show that

(Because of the polynomial shifts, this requires a method known as “PET induction”, but let me skip over this point here.) In view of this estimate, we now just need to show that

Now we can reverse the previous steps. First, we collapse back to

One can bound by , which can be shown to be “bounded on average” in a suitable sense (e.g. bounded norm) via the aforementioned polynomial forms condition. Because of this and the Hölder inequality, the above estimate is equivalent to

By setting to be the signum of , this is equivalent to

This is halfway between (1) and (2); the sparsely supported function has been replaced by its “densification” , but we have not yet densified to . However, one can shift by and repeat the above arguments to achieve a similar densificiation of , at which point one has reduced (1) to (2).

Kevin Ford, Ben Green, Sergei Konyagin, and myself have just posted to the arXiv our preprint “Large gaps between consecutive prime numbers“. This paper concerns the “opposite” problem to that considered by the recently concluded Polymath8 project, which was concerned with very small values of the prime gap . Here, we wish to consider the *largest* prime gap that one can find in the interval as goes to infinity.

Finding lower bounds on is more or less equivalent to locating long strings of consecutive composite numbers that are not too large compared to the length of the string. A classic (and quite well known) construction here starts with the observation that for any natural number , the consecutive numbers are all composite, because each , is divisible by some prime , while being strictly larger than that prime . From this and Stirling’s formula, it is not difficult to obtain the bound

A more efficient bound comes from the prime number theorem: there are only primes up to , so just from the pigeonhole principle one can locate a string of consecutive composite numbers up to of length at least , thus

where we use or as shorthand for or .

What about upper bounds? The *Cramér random model* predicts that the primes up to are distributed like a random subset of density . Using this model, Cramér arrived at the conjecture

In fact, if one makes the extremely optimistic assumption that the random model perfectly describes the behaviour of the primes, one would arrive at the even more precise prediction

However, it is no longer widely believed that this optimistic version of the conjecture is true, due to some additional irregularities in the primes coming from the basic fact that large primes cannot be divisible by very small primes. Using the Maier matrix method to capture some of this irregularity, Granville was led to the conjecture that

(note that is slightly larger than ). For comparison, the known upper bounds on are quite weak; unconditionally one has by the work of Baker, Harman, and Pintz, and even on the Riemann hypothesis one only gets down to , as shown by Cramér (a slight improvement is also possible if one additionally assumes the pair correlation conjecture; see this article of Heath-Brown and the references therein).

This conjecture remains out of reach of current methods. In 1931, Westzynthius managed to improve the bound (2) slightly to

which Erdös in 1935 improved to

and Rankin in 1938 improved slightly further to

with . Remarkably, this rather strange bound then proved extremely difficult to advance further on; until recently, the only improvements were to the constant , which was raised to in 1963 by Schönhage, to in 1963 by Rankin, to by Maier and Pomerance, and finally to in 1997 by Pintz.

Erdös listed the problem of making arbitrarily large one of his favourite open problems, even offering (“somewhat rashly”, in his words) a cash prize for the solution. Our main result answers this question in the affirmative:

Theorem 1The bound (3) holds for arbitrarily large .

In principle, we thus have a bound of the form

for some that grows to infinity. Unfortunately, due to various sources of ineffectivity in our methods, we cannot provide any explicit rate of growth on at all.

We decided to announce this result the old-fashioned way, as part of a research lecture; more precisely, Ben Green announced the result in his ICM lecture this Tuesday. (The ICM staff have very efficiently put up video of his talks (and most of the other plenary and prize talks) online; Ben’s talk is here, with the announcement beginning at about 0:48. Note a slight typo in his slides, in that the exponent of in the denominator is instead of .) Ben’s lecture slides may be found here.

By coincidence, an independent proof of this theorem has also been obtained very recently by James Maynard.

I discuss our proof method below the fold.

I’ve just uploaded to the arXiv the D.H.J. Polymath paper “Variants of the Selberg sieve, and bounded intervals containing many primes“, which is the second paper to be produced from the Polymath8 project (the first one being discussed here). We’ll refer to this latter paper here as the Polymath8b paper, and the former as the Polymath8a paper. As with Polymath8a, the Polymath8b paper is concerned with the smallest asymptotic prime gap

where denotes the prime, as well as the more general quantities

In the breakthrough paper of Goldston, Pintz, and Yildirim, the bound was obtained under the strong hypothesis of the Elliott-Halberstam conjecture. An unconditional bound on , however, remained elusive until the celebrated work of Zhang last year, who showed that

The Polymath8a paper then improved this to . After that, Maynard introduced a new multidimensional Selberg sieve argument that gave the substantial improvement

unconditionally, and on the Elliott-Halberstam conjecture; furthermore, bounds on for higher were obtained for the first time, and specifically that for all , with the improvements and on the Elliott-Halberstam conjecture. (I had independently discovered the multidimensional sieve idea, although I did not obtain Maynard’s specific numerical results, and my asymptotic bounds were a bit weaker.)

In Polymath8b, we obtain some further improvements. Unconditionally, we have and , together with some explicit bounds on ; on the Elliott-Halberstam conjecture we have and some numerical improvements to the bounds; and assuming the generalised Elliott-Halberstam conjecture we have the bound , which is best possible from sieve-theoretic methods thanks to the parity problem obstruction.

There were a variety of methods used to establish these results. Maynard’s paper obtained a criterion for bounding which reduced to finding a good solution to a certain multidimensional variational problem. When the dimension parameter was relatively small (e.g. ), we were able to obtain good numerical solutions both by continuing the method of Maynard (using a basis of symmetric polynomials), or by using a Krylov iteration scheme. For large , we refined the asymptotics and obtained near-optimal solutions of the variational problem. For the bounds, we extended the reach of the multidimensional Selberg sieve (particularly under the assumption of the generalised Elliott-Halberstam conjecture) by allowing the function in the multidimensional variational problem to extend to a larger region of space than was previously admissible, albeit with some tricky new constraints on (and penalties in the variational problem). This required some unusual sieve-theoretic manipulations, notably an “epsilon trick”, ultimately relying on the elementary inequality , that allowed one to get non-trivial lower bounds for sums such as even if the sum had no non-trivial estimates available; and a way to estimate divisor sums such as even if was permitted to be comparable to or even exceed , by using the fundamental theorem of arithmetic to factorise (after restricting to the case when is almost prime). I hope that these sieve-theoretic tricks will be useful in future work in the subject.

With this paper, the Polymath8 project is almost complete; there is still a little bit of scope to push our methods further and get some modest improvement for instance to the bound, but this would require a substantial amount of effort, and it is probably best to instead wait for some new breakthrough in the subject to come along. One final task we are performing is to write up a retrospective article on both the 8a and 8b experiences, an incomplete writeup of which can be found here. If anyone wishes to contribute some commentary on these projects (whether you were an active contributor, an occasional contributor, or a silent “lurker” in the online discussion), please feel free to do so in the comments to this post.

## Recent Comments