You are currently browsing the category archive for the ‘paper’ category.

I’ve just uploaded to the arXiv my paper Finite time blowup for a supercritical defocusing nonlinear wave system, submitted to Analysis and PDE. This paper was inspired by a question asked of me by Sergiu Klainerman recently, regarding whether there were any analogues of my blowup example for Navier-Stokes type equations in the setting of nonlinear wave equations.

Recall that the defocusing nonlinear wave (NLW) equation reads

\displaystyle \Box u = |u|^{p-1} u \ \ \ \ \ (1)


where {u: {\bf R}^{1+d} \rightarrow {\bf R}} is the unknown scalar field, {\Box = -\partial_t^2 + \Delta} is the d’Alambertian operator, and {p>1} is an exponent. We can generalise this equation to the defocusing nonlinear wave system

\displaystyle \Box u = (\nabla F)(u) \ \ \ \ \ (2)


where {u: {\bf R}^{1+d} \rightarrow {\bf R}^m} is now a system of scalar fields, and {F: {\bf R}^m \rightarrow {\bf R}} is a potential which is homogeneous of degree {p+1} and strictly positive away from the origin; the scalar equation corresponds to the case where {m=1} and {F(u) = \frac{1}{p+1} |u|^{p+1}}. We will be interested in smooth solutions {u} to (2). It is only natural to restrict to the smooth category when the potential {F} is also smooth; unfortunately, if one requires {F} to be homogeneous of order {p+1} all the way down to the origin, then {F} cannot be smooth unless it is identically zero or {p+1} is an odd integer. This is too restrictive for us, so we will only require that {F} be homogeneous away from the origin (e.g. outside the unit ball). In any event it is the behaviour of {F(u)} for large {u} which will be decisive in understanding regularity or blowup for the equation (2).

Formally, solutions to the equation (2) enjoy a conserved energy

\displaystyle E[u] = \int_{{\bf R}^d} \frac{1}{2} \|\partial_t u \|^2 + \frac{1}{2} \| \nabla_x u \|^2 + F(u)\ dx.

Using this conserved energy, it is possible to establish global regularity for the Cauchy problem (2) in the energy-subcritical case when {d \leq 2}, or when {d \geq 3} and {p < 1+\frac{4}{d-2}}. This means that for any smooth initial position {u_0: {\bf R}^d \rightarrow {\bf R}^m} and initial velocity {u_1: {\bf R}^d \rightarrow {\bf R}^m}, there exists a (unique) smooth global solution {u: {\bf R}^{1+d} \rightarrow {\bf R}^m} to the equation (2) with {u(0,x) = u_0(x)} and {\partial_t u(0,x) = u_1(x)}. These classical global regularity results (essentially due to Jörgens) were famously extended to the energy-critical case when {d \geq 3} and {p = 1 + \frac{4}{d-2}} by Grillakis, Struwe, and Shatah-Struwe (though for various technical reasons, the global regularity component of these results was limited to the range {3 \leq d \leq 7}). A key tool used in the energy-critical theory is the Morawetz estimate

\displaystyle \int_0^T \int_{{\bf R}^d} \frac{|u(t,x)|^{p+1}}{|x|}\ dx dt \lesssim E[u]

which can be proven by manipulating the properties of the stress-energy tensor

\displaystyle T_{\alpha \beta} = \langle \partial_\alpha u, \partial_\beta u \rangle - \frac{1}{2} \eta_{\alpha \beta} (\langle \partial^\gamma u, \partial_\gamma u \rangle + F(u))

(with the usual summation conventions involving the Minkowski metric {\eta_{\alpha \beta} dx^\alpha dx^\beta = -dt^2 + |dx|^2}) and in particular exploiting the divergence-free nature of this tensor: {\partial^\beta T_{\alpha \beta}} See for instance the text of Shatah-Struwe, or my own PDE book, for more details. The energy-critical regularity results have also been extended to slightly supercritical settings in which the potential grows by a logarithmic factor or so faster than the critical rate; see the results of myself and of Roy.

This leaves the question of global regularity for the energy supercritical case when {d \geq 3} and {p > 1+\frac{4}{d-2}}. On the one hand, global smooth solutions are known for small data (if {F} vanishes to sufficiently high order at the origin, see e.g. the work of Lindblad and Sogge), and global weak solutions for large data were constructed long ago by Segal. On the other hand, the solution map, if it exists, is known to be extremely unstable, particularly at high frequencies; see for instance this paper of Lebeau, this paper of Christ, Colliander, and myself, this paper of Brenner and Kumlin, or this paper of Ibrahim, Majdoub, and Masmoudi for various formulations of this instability. In the case of the focusing NLW {-\partial_{tt} u + \Delta u = - |u|^{p-1} u}, one can easily create solutions that blow up in finite time by ODE constructions, for instance one can take {u(t,x) = c (1-t)^{-\frac{2}{p-1}}} with {c = (\frac{2(p+1)}{(p-1)^2})^{\frac{1}{p-1}}}, which blows up as {t} approaches {1}. However the situation in the defocusing supercritical case is less clear. The strongest positive results are of Kenig-Merle and Killip-Visan, which show (under some additional technical hypotheses) that global regularity for such equations holds under the additional assumption that the critical Sobolev norm of the solution stays bounded. Roughly speaking, this shows that “Type II blowup” cannot occur for (2).

Our main result is that finite time blowup can in fact occur, at least for three-dimensional systems where the number {m} of degrees of freedom is sufficiently large:

Theorem 1 Let {d=3}, {p > 5}, and {m \geq 76}. Then there exists a smooth potential {F: {\bf R}^m \rightarrow {\bf R}}, positive and homogeneous of degree {p+1} away from the origin, and a solution to (2) with smooth initial data that develops a singularity in finite time.

The rather large lower bound of {76} on {m} here is primarily due to our use of the Nash embedding theorem (which is the first time I have actually had to use this theorem in an application!). It can certainly be lowered, but unfortunately our methods do not seem to be able to bring {m} all the way down to {1}, so we do not directly exhibit finite time blowup for the scalar supercritical defocusing NLW. Nevertheless, this result presents a barrier to any attempt to prove global regularity for that equation, in that it must somehow use a property of the scalar equation which is not available for systems. It is likely that the methods can be adapted to higher dimensions than three, but we take advantage of some special structure to the equations in three dimensions (related to the strong Huygens principle) which does not seem to be available in higher dimensions.

The blowup will in fact be of discrete self-similar type in a backwards light cone, thus {u} will obey a relation of the form

\displaystyle u(e^S t, e^S x) = e^{-\frac{2}{p-1} S} u(t,x)

for some fixed {S>0} (the exponent {-\frac{2}{p-1}} is mandated by dimensional analysis considerations). It would be natural to consider continuously self-similar solutions (in which the above relation holds for all {S}, not just one {S}). And rough self-similar solutions have been constructed in the literature by perturbative methods (see this paper of Planchon, or this paper of Ribaud and Youssfi). However, it turns out that continuously self-similar solutions to a defocusing equation have to obey an additional monotonicity formula which causes them to not exist in three spatial dimensions; this argument is given in my paper. So we have to work just with discretely self-similar solutions.

Because of the discrete self-similarity, the finite time blowup solution will be “locally Type II” in the sense that scale-invariant norms inside the backwards light cone stay bounded as one approaches the singularity. But it will not be “globally Type II” in that scale-invariant norms stay bounded outside the light cone as well; indeed energy will leak from the light cone at every scale. This is consistent with the results of Kenig-Merle and Killip-Visan which preclude “globally Type II” blowup solutions to these equations in many cases.

We now sketch the arguments used to prove this theorem. Usually when studying the NLW, we think of the potential {F} (and the initial data {u_0,u_1}) as being given in advance, and then try to solve for {u} as an unknown field. However, in this problem we have the freedom to select {F}. So we can look at this problem from a “backwards” direction: we first choose the field {u}, and then fit the potential {F} (and the initial data) to match that field.

Now, one cannot write down a completely arbitrary field {u} and hope to find a potential {F} obeying (2), as there are some constraints coming from the homogeneity of {F}. Namely, from the Euler identity

\displaystyle \langle u, (\nabla F)(u) \rangle = (p+1) F(u)

we see that {F(u)} can be recovered from (2) by the formula

\displaystyle F(u) = \frac{1}{p+1} \langle u, \Box u \rangle \ \ \ \ \ (3)


so the defocusing nature of {F} imposes a constraint

\displaystyle \langle u, \Box u \rangle > 0.

Furthermore, taking a derivative of (3) we obtain another constraining equation

\displaystyle \langle \partial_\alpha u, \Box u \rangle = \frac{1}{p+1} \partial_\alpha \langle u, \Box u \rangle

that does not explicitly involve the potential {F}. Actually, one can write this equation in the more familiar form

\displaystyle \partial^\beta T_{\alpha \beta} = 0

where {T_{\alpha \beta}} is the stress-energy tensor

\displaystyle T_{\alpha \beta} = \langle \partial_\alpha u, \partial_\beta u \rangle - \frac{1}{2} \eta_{\alpha \beta} (\langle \partial^\gamma u, \partial_\gamma u \rangle + \frac{1}{p+1} \langle u, \Box u \rangle),

now written in a manner that does not explicitly involve {F}.

With this reformulation, this suggests a strategy for locating {u}: first one selects a stress-energy tensor {T_{\alpha \beta}} that is divergence-free and obeys suitable positive definiteness and self-similarity properties, and then locates a self-similar map {u} from the backwards light cone to {{\bf R}^m} that has that stress-energy tensor (one also needs the map {u} (or more precisely the direction component {u/\|u\|} of that map) injective up to the discrete self-similarity, in order to define {F(u)} consistently). If the stress-energy tensor was replaced by the simpler “energy tensor”

\displaystyle E_{\alpha \beta} = \langle \partial_\alpha u, \partial_\beta u \rangle

then the question of constructing an (injective) map {u} with the specified energy tensor is precisely the embedding problem that was famously solved by Nash (viewing {E_{\alpha \beta}} as a Riemannian metric on the domain of {u}, which in this case is a backwards light cone quotiented by a discrete self-similarity to make it compact). It turns out that one can adapt the Nash embedding theorem to also work with the stress-energy tensor as well (as long as one also specifies the mass density {M = \|u\|^2}, and as long as a certain positive definiteness property, related to the positive semi-definiteness of Gram matrices, is obeyed). Here is where the dimension {76} shows up:

Proposition 2 Let {M} be a smooth compact Riemannian {4}-manifold, and let {m \geq 76}. Then {M} smoothly isometrically embeds into the sphere {S^{m-1}}.

Proof: The Nash embedding theorem (in the form given in this ICM lecture of Gunther) shows that {M} can be smoothly isometrically embedded into {{\bf R}^{19}}, and thus in {[-R,R]^{19}} for some large {R}. Using an irrational slope, the interval {[-R,R]} can be smoothly isometrically embedded into the {2}-torus {\frac{1}{\sqrt{38}} (S^1 \times S^1)}, and so {[-R,R]^{19}} and hence {M} can be smoothly embedded in {\frac{1}{\sqrt{38}} (S^1)^{38}}. But from Pythagoras’ theorem, {\frac{1}{\sqrt{38}} (S^1)^{38}} can be identified with a subset of {S^{m-1}} for any {m \geq 76}, and the claim follows. \Box

One can presumably improve upon the bound {76} by being more efficient with the embeddings (e.g. by modifying the proof of Nash embedding to embed directly into a round sphere), but I did not try to optimise the bound here.

The remaining task is to construct the stress-energy tensor {T_{\alpha \beta}}. One can reduce to tensors that are invariant with respect to rotations around the spatial origin, but this still leaves a fair amount of degrees of freedom (it turns out that there are four fields that need to be specified, which are denoted {M, E_{tt}, E_{tr}, E_{rr}} in my paper). However a small miracle occurs in three spatial dimensions, in that the divergence-free condition involves only two of the four degrees of freedom (or three out of four, depending on whether one considers a function that is even or odd in {r} to only be half a degree of freedom). This is easiest to illustrate with the scalar NLW (1). Assuming spherical symmetry, this equation becomes

\displaystyle - \partial_{tt} u + \partial_{rr} u + \frac{2}{r} \partial_r u = |u|^{p-1} u.

Making the substitution {\phi := ru}, we can eliminate the lower order term {\frac{2}{r} \partial_r} completely to obtain

\displaystyle - \partial_{tt} \phi + \partial_{rr} \phi= \frac{1}{r^{p-1}} |\phi|^{p-1} \phi.

(This can be compared with the situation in higher dimensions, in which an undesirable zeroth order term {\frac{(d-1)(d-3)}{r^2} \phi} shows up.) In particular, if one introduces the null energy density

\displaystyle e_+ := \frac{1}{2} |\partial_t \phi + \partial_r \phi|^2

and the potential energy density

\displaystyle V := \frac{|\phi|^{p+1}}{(p+1) r^{p-1}}

then one can verify the equation

\displaystyle (\partial_t - \partial_r) e_+ + (\partial_t + \partial_r) V = - \frac{p-1}{r} V

which can be viewed as a transport equation for {e_+} with forcing term depending on {V} (or vice versa), and is thus quite easy to solve explicitly by choosing one of these fields and then solving for the other. As it turns out, once one is in the supercritical regime {p>5}, one can solve this equation while giving {e_+} and {V} the right homogeneity (they have to be homogeneous of order {-\frac{4}{p-1}}, which is greater than {-1} in the supercritical case) and positivity properties, and from this it is possible to prescribe all the other fields one needs to satisfy the conclusions of the main theorem. (It turns out that {e_+} and {V} will be concentrated near the boundary of the light cone, so this is how the solution {u} will concentrate also.)

Kevin Ford, James Maynard, and I have uploaded to the arXiv our preprint “Chains of large gaps between primes“. This paper was announced in our previous paper with Konyagin and Green, which was concerned with the largest gap

\displaystyle  G_1(X) := \max_{p_n, p_{n+1} \leq X} (p_{n+1} - p_n)

between consecutive primes up to {X}, in which we improved the Rankin bound of

\displaystyle  G_1(X) \gg \log X \frac{\log_2 X \log_4 X}{(\log_3 X)^2}


\displaystyle  G_1(X) \gg \log X \frac{\log_2 X \log_4 X}{\log_3 X}

for large {X} (where we use the abbreviations {\log_2 X := \log\log X}, {\log_3 X := \log\log\log X}, and {\log_4 X := \log\log\log\log X}). Here, we obtain an analogous result for the quantity

\displaystyle  G_k(X) := \max_{p_n, \dots, p_{n+k} \leq X} \min( p_{n+1} - p_n, p_{n+2}-p_{n+1}, \dots, p_{n+k} - p_{n+k-1} )

which measures how far apart the gaps between chains of {k} consecutive primes can be. Our main result is

\displaystyle  G_k(X) \gg \frac{1}{k^2} \log X \frac{\log_2 X \log_4 X}{\log_3 X}

whenever {X} is sufficiently large depending on {k}, with the implied constant here absolute (and effective). The factor of {1/k^2} is inherent to the method, and related to the basic probabilistic fact that if one selects {k} numbers at random from the unit interval {[0,1]}, then one expects the minimum gap between adjacent numbers to be about {1/k^2} (i.e. smaller than the mean spacing of {1/k} by an additional factor of {1/k}).

Our arguments combine those from the previous paper with the matrix method of Maier, who (in our notation) showed that

\displaystyle  G_k(X) \gg_k  \log X \frac{\log_2 X \log_4 X}{(\log_3 X)^2}

for an infinite sequence of {X} going to infinity. (Maier needed to restrict to an infinite sequence to avoid Siegel zeroes, but we are able to resolve this issue by the now standard technique of simply eliminating a prime factor of an exceptional conductor from the sieve-theoretic portion of the argument. As a byproduct, this also makes all of the estimates in our paper effective.)

As its name suggests, the Maier matrix method is usually presented by imagining a matrix of numbers, and using information about the distribution of primes in the columns of this matrix to deduce information about the primes in at least one of the rows of the matrix. We found it convenient to interpret this method in an equivalent probabilistic form as follows. Suppose one wants to find an interval {n+1,\dots,n+y} which contained a block of at least {k} primes, each separated from each other by at least {g} (ultimately, {y} will be something like {\log X \frac{\log_2 X \log_4 X}{\log_3 X}} and {g} something like {y/k^2}). One can do this by the probabilistic method: pick {n} to be a random large natural number {{\mathbf n}} (with the precise distribution to be chosen later), and try to lower bound the probability that the interval {{\mathbf n}+1,\dots,{\mathbf n}+y} contains at least {k} primes, no two of which are within {g} of each other.

By carefully choosing the residue class of {{\mathbf n}} with respect to small primes, one can eliminate several of the {{\mathbf n}+j} from consideration of being prime immediately. For instance, if {{\mathbf n}} is chosen to be large and even, then the {{\mathbf n}+j} with {j} even have no chance of being prime and can thus be eliminated; similarly if {{\mathbf n}} is large and odd, then {{\mathbf n}+j} cannot be prime for any odd {j}. Using the methods of our previous paper, we can find a residue class {m \hbox{ mod } P} (where {P} is a product of a large number of primes) such that, if one chooses {{\mathbf n}} to be a large random element of {m \hbox{ mod } P} (that is, {{\mathbf n} = {\mathbf z} P + m} for some large random integer {{\mathbf z}}), then the set {{\mathcal T}} of shifts {j \in \{1,\dots,y\}} for which {{\mathbf n}+j} still has a chance of being prime has size comparable to something like {k \log X / \log_2 X}; furthermore this set {{\mathcal T}} is fairly well distributed in {\{1,\dots,y\}} in the sense that it does not concentrate too strongly in any short subinterval of {\{1,\dots,y\}}. The main new difficulty, not present in the previous paper, is to get lower bounds on the size of {{\mathcal T}} in addition to upper bounds, but this turns out to be achievable by a suitable modification of the arguments.

Using a version of the prime number theorem in arithmetic progressions due to Gallagher, one can show that for each remaining shift {j \in {\mathcal T}}, {{\mathbf n}+j} is going to be prime with probability comparable to {\log_2 X / \log X}, so one expects about {k} primes in the set {\{{\mathbf n} + j: j \in {\mathcal T}\}}. An upper bound sieve (e.g. the Selberg sieve) also shows that for any distinct {j,j' \in {\mathcal T}}, the probability that {{\mathbf n}+j} and {{\mathbf n}+j'} are both prime is {O( (\log_2 X / \log X)^2 )}. Using this and some routine second moment calculations, one can then show that with large probability, the set {\{{\mathbf n} + j: j \in {\mathcal T}\}} will indeed contain about {k} primes, no two of which are closer than {g} to each other; with no other numbers in this interval being prime, this gives a lower bound on {G_k(X)}.

I’ve just uploaded two related papers to the arXiv:

This pair of papers is an outgrowth of these two recent blog posts and the ensuing discussion. In the first paper, we establish the following logarithmically averaged version of the Chowla conjecture (in the case {k=2} of two-point correlations (or “pair correlations”)):

Theorem 1 (Logarithmically averaged Chowla conjecture) Let {a_1,a_2} be natural numbers, and let {b_1,b_2} be integers such that {a_1 b_2 - a_2 b_1 \neq 0}. Let {1 \leq \omega(x) \leq x} be a quantity depending on {x} that goes to infinity as {x \rightarrow \infty}. Let {\lambda} denote the Liouville function. Then one has

\displaystyle  \sum_{x/\omega(x) < n \leq x} \frac{\lambda(a_1 n + b_1) \lambda(a_2 n+b_2)}{n} = o( \log \omega(x) ) \ \ \ \ \ (1)

as {x \rightarrow \infty}.

Thus for instance one has

\displaystyle  \sum_{n \leq x} \frac{\lambda(n) \lambda(n+1)}{n} = o(\log x). \ \ \ \ \ (2)

For comparison, the non-averaged Chowla conjecture would imply that

\displaystyle  \sum_{n \leq x} \lambda(n) \lambda(n+1) = o(x) \ \ \ \ \ (3)

which is a strictly stronger estimate than (2), and remains open.

The arguments also extend to other completely multiplicative functions than the Liouville function. In particular, one obtains a slightly averaged version of the non-asymptotic Elliott conjecture that was shown in the previous blog post to imply a positive solution to the Erdos discrepancy problem. The averaged version of the conjecture established in this paper is slightly weaker than the one assumed in the previous blog post, but it turns out that the arguments there can be modified without much difficulty to accept this averaged Elliott conjecture as input. In particular, we obtain an unconditional solution to the Erdos discrepancy problem as a consequence; this is detailed in the second paper listed above. In fact we can also handle the vector-valued version of the Erdos discrepancy problem, in which the sequence {f(1), f(2), \dots} takes values in the unit sphere of an arbitrary Hilbert space, rather than in {\{-1,+1\}}.

Estimates such as (2) or (3) are known to be subject to the “parity problem” (discussed numerous times previously on this blog), which roughly speaking means that they cannot be proven solely using “linear” estimates on functions such as the von Mangoldt function. However, it is known that the parity problem can be circumvented using “bilinear” estimates, and this is basically what is done here.

We now describe in informal terms the proof of Theorem 1, focusing on the model case (2) for simplicity. Suppose for contradiction that the left-hand side of (2) was large and (say) positive. Using the multiplicativity {\lambda(pn) = -\lambda(n)}, we conclude that

\displaystyle  \sum_{n \leq x} \frac{\lambda(n) \lambda(n+p) 1_{p|n}}{n}

is also large and positive for all primes {p} that are not too large; note here how the logarithmic averaging allows us to leave the constraint {n \leq x} unchanged. Summing in {p}, we conclude that

\displaystyle  \sum_{n \leq x} \frac{ \sum_{p \in {\mathcal P}} \lambda(n) \lambda(n+p) 1_{p|n}}{n}

is large and positive for any given set {{\mathcal P}} of medium-sized primes. By a standard averaging argument, this implies that

\displaystyle  \frac{1}{H} \sum_{j=1}^H \sum_{p \in {\mathcal P}} \lambda(n+j) \lambda(n+p+j) 1_{p|n+j} \ \ \ \ \ (4)

is large for many choices of {n}, where {H} is a medium-sized parameter at our disposal to choose, and we take {{\mathcal P}} to be some set of primes that are somewhat smaller than {H}. (A similar approach was taken in this recent paper of Matomaki, Radziwill, and myself to study sign patterns of the Möbius function.) To obtain the required contradiction, one thus wants to demonstrate significant cancellation in the expression (4). As in that paper, we view {n} as a random variable, in which case (4) is essentially a bilinear sum of the random sequence {(\lambda(n+1),\dots,\lambda(n+H))} along a random graph {G_{n,H}} on {\{1,\dots,H\}}, in which two vertices {j, j+p} are connected if they differ by a prime {p} in {{\mathcal P}} that divides {n+j}. A key difficulty in controlling this sum is that for randomly chosen {n}, the sequence {(\lambda(n+1),\dots,\lambda(n+H))} and the graph {G_{n,H}} need not be independent. To get around this obstacle we introduce a new argument which we call the “entropy decrement argument” (in analogy with the “density increment argument” and “energy increment argument” that appear in the literature surrounding Szemerédi’s theorem on arithmetic progressions, and also reminiscent of the “entropy compression argument” of Moser and Tardos, discussed in this previous post). This argument, which is a simple consequence of the Shannon entropy inequalities, can be viewed as a quantitative version of the standard subadditivity argument that establishes the existence of Kolmogorov-Sinai entropy in topological dynamical systems; it allows one to select a scale parameter {H} (in some suitable range {[H_-,H_+]}) for which the sequence {(\lambda(n+1),\dots,\lambda(n+H))} and the graph {G_{n,H}} exhibit some weak independence properties (or more precisely, the mutual information between the two random variables is small).

Informally, the entropy decrement argument goes like this: if the sequence {(\lambda(n+1),\dots,\lambda(n+H))} has significant mutual information with {G_{n,H}}, then the entropy of the sequence {(\lambda(n+1),\dots,\lambda(n+H'))} for {H' > H} will grow a little slower than linearly, due to the fact that the graph {G_{n,H}} has zero entropy (knowledge of {G_{n,H}} more or less completely determines the shifts {G_{n+kH,H}} of the graph); this can be formalised using the classical Shannon inequalities for entropy (and specifically, the non-negativity of conditional mutual information). But the entropy cannot drop below zero, so by increasing {H} as necessary, at some point one must reach a metastable region (cf. the finite convergence principle discussed in this previous blog post), within which very little mutual information can be shared between the sequence {(\lambda(n+1),\dots,\lambda(n+H))} and the graph {G_{n,H}}. Curiously, for the application it is not enough to have a purely quantitative version of this argument; one needs a quantitative bound (which gains a factor of a bit more than {\log H} on the trivial bound for mutual information), and this is surprisingly delicate (it ultimately comes down to the fact that the series {\sum_{j \geq 2} \frac{1}{j \log j \log\log j}} diverges, which is only barely true).

Once one locates a scale {H} with the low mutual information property, one can use standard concentration of measure results such as the Hoeffding inequality to approximate (4) by the significantly simpler expression

\displaystyle  \frac{1}{H} \sum_{j=1}^H \sum_{p \in {\mathcal P}} \frac{\lambda(n+j) \lambda(n+p+j)}{p}. \ \ \ \ \ (5)

The important thing here is that Hoeffding’s inequality gives exponentially strong bounds on the failure probability, which is needed to counteract the logarithms that are inevitably present whenever trying to use entropy inequalities. The expression (5) can then be controlled in turn by an application of the Hardy-Littlewood circle method and a non-trivial estimate

\displaystyle  \sup_\alpha \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n) e(\alpha n)|\ dx = o(1) \ \ \ \ \ (6)

for averaged short sums of a modulated Liouville function established in another recent paper by Matomäki, Radziwill and myself.

When one uses this method to study more general sums such as

\displaystyle  \sum_{n \leq x} \frac{g_1(n) g_2(n+1)}{n},

one ends up having to consider expressions such as

\displaystyle  \frac{1}{H} \sum_{j=1}^H \sum_{p \in {\mathcal P}} c_p \frac{g_1(n+j) g_2(n+p+j)}{p}.

where {c_p} is the coefficient {c_p := \overline{g_1}(p) \overline{g_2}(p)}. When attacking this sum with the circle method, one soon finds oneself in the situation of wanting to locate the large Fourier coefficients of the exponential sum

\displaystyle  S(\alpha) := \sum_{p \in {\mathcal P}} \frac{c_p}{p} e^{2\pi i \alpha p}.

In many cases (such as in the application to the Erdös discrepancy problem), the coefficient {c_p} is identically {1}, and one can understand this sum satisfactorily using the classical results of Vinogradov: basically, {S(\alpha)} is large when {\alpha} lies in a “major arc” and is small when it lies in a “minor arc”. For more general functions {g_1,g_2}, the coefficients {c_p} are more or less arbitrary; the large values of {S(\alpha)} are no longer confined to the major arc case. Fortunately, even in this general situation one can use a restriction theorem for the primes established some time ago by Ben Green and myself to show that there are still only a bounded number of possible locations {\alpha} (up to the uncertainty mandated by the Heisenberg uncertainty principle) where {S(\alpha)} is large, and we can still conclude by using (6). (Actually, as recently pointed out to me by Ben, one does not need the full strength of our result; one only needs the {L^4} restriction theorem for the primes, which can be proven fairly directly using Plancherel’s theorem and some sieve theory.)

It is tempting to also use the method to attack higher order cases of the (logarithmically) averaged Chowla conjecture, for instance one could try to prove the estimate

\displaystyle  \sum_{n \leq x} \frac{\lambda(n) \lambda(n+1) \lambda(n+2)}{n} = o(\log x).

The above arguments reduce matters to obtaining some non-trivial cancellation for sums of the form

\displaystyle  \frac{1}{H} \sum_{j=1}^H \sum_{p \in {\mathcal P}} \frac{\lambda(n+j) \lambda(n+p+j) \lambda(n+2p+j)}{p}.

A little bit of “higher order Fourier analysis” (as was done for very similar sums in the ergodic theory context by Frantzikinakis-Host-Kra and Wooley-Ziegler) lets one control this sort of sum if one can establish a bound of the form

\displaystyle  \frac{1}{X} \int_X^{2X} \sup_\alpha |\frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n) e(\alpha n)|\ dx = o(1) \ \ \ \ \ (7)

where {X} goes to infinity and {H} is a very slowly growing function of {X}. This looks very similar to (6), but the fact that the supremum is now inside the integral makes the problem much more difficult. However it looks worth attacking (7) further, as this estimate looks like it should have many nice applications (beyond just the {k=3} case of the logarithmically averaged Chowla or Elliott conjectures, which is already interesting).

For higher {k} than {k=3}, the same line of analysis requires one to replace the linear phase {e(\alpha n)} by more complicated phases, such as quadratic phases {e(\alpha n^2 + \beta n)} or even {k-2}-step nilsequences. Given that (7) is already beyond the reach of current literature, these even more complicated expressions are also unavailable at present, but one can imagine that they will eventually become tractable, in which case we would obtain an averaged form of the Chowla conjecture for all {k}, which would have a number of consequences (such as a logarithmically averaged version of Sarnak’s conjecture, as per this blog post).

It would of course be very nice to remove the logarithmic averaging, and be able to establish bounds such as (3). I did attempt to do so, but I do not see a way to use the entropy decrement argument in a manner that does not require some sort of averaging of logarithmic type, as it requires one to pick a scale {H} that one cannot specify in advance, which is not a problem for logarithmic averages (which are quite stable with respect to dilations) but is problematic for ordinary averages. But perhaps the problem can be circumvented by some clever modification of the argument. One possible approach would be to start exploiting multiplicativity at products of primes, and not just individual primes, to try to keep the scale fixed, but this makes the concentration of measure part of the argument much more complicated as one loses some independence properties (coming from the Chinese remainder theorem) which allowed one to conclude just from the Hoeffding inequality.

Kaisa Matomäki, Maksym Radziwiłł, and I have just uploaded to the arXiv our paper “Sign patterns of the Liouville and Möbius functions“. This paper is somewhat similar to our previous paper in that it is using the recent breakthrough of Matomäki and Radziwiłł on mean values of multiplicative functions to obtain partial results towards the Chowla conjecture. This conjecture can be phrased, roughly speaking, as follows: if {k} is a fixed natural number and {n} is selected at random from a large interval {[1,x]}, then the sign pattern {(\lambda(n), \lambda(n+1),\dots,\lambda(n+k-1)) \in \{-1,+1\}^k} becomes asymptotically equidistributed in {\{-1,+1\}^k} in the limit {x \rightarrow \infty}. This remains open for {k \geq 2}. In fact even the significantly weaker statement that each of the sign patterns in {\{-1,+1\}^k} is attained infinitely often is open for {k \geq 4}. However, in 1986, Hildebrand showed that for {k \leq 3} all sign patterns are indeed attained infinitely often. Our first result is a strengthening of Hildebrand’s, moving a little bit closer to Chowla’s conjecture:

Theorem 1 Let {k \leq 3}. Then each of the sign patterns in {\{-1,+1\}^k} is attained by the Liouville function for a set of natural numbers {n} of positive lower density.

Thus for instance one has {\lambda(n)=\lambda(n+1)=\lambda(n+2)} for a set of {n} of positive lower density. The {k \leq 2} case of this theorem already appears in the original paper of Matomäki and Radziwiłł (and the significantly simpler case of the sign patterns {++} and {--} was treated previously by Harman, Pintz, and Wolke).

The basic strategy in all of these arguments is to assume for sake of contradiction that a certain sign pattern occurs extremely rarely, and then exploit the complete multiplicativity of {\lambda} (which implies in particular that {\lambda(2n) = -\lambda(n)}, {\lambda(3n) = -\lambda(n)}, and {\lambda(5n) = -\lambda(n)} for all {n}) together with some combinatorial arguments (vaguely analogous to solving a Sudoku puzzle!) to establish more complex sign patterns for the Liouville function, that are either inconsistent with each other, or with results such as the Matomäki-Radziwiłł result. To illustrate this, let us give some {k=2} examples, arguing a little informally to emphasise the combinatorial aspects of the argument. First suppose that the sign pattern {(\lambda(n),\lambda(n+1)) = (+1,+1)} almost never occurs. The prime number theorem tells us that {\lambda(n)} and {\lambda(n+1)} are each equal to {+1} about half of the time, which by inclusion-exclusion implies that the sign pattern {(\lambda(n),\lambda(n+1))=(-1,-1)} almost never occurs. In other words, we have {\lambda(n+1) = -\lambda(n)} for almost all {n}. But from the multiplicativity property {\lambda(2n)=-\lambda(n)} this implies that one should have

\displaystyle \lambda(2n+2) = -\lambda(2n)

\displaystyle \lambda(2n+1) = -\lambda(2n)


\displaystyle \lambda(2n+2) = -\lambda(2n+1)

for almost all {n}. But the above three statements are contradictory, and the claim follows.

Similarly, if we assume that the sign pattern {(\lambda(n),\lambda(n+1)) = (+1,-1)} almost never occurs, then a similar argument to the above shows that for any fixed {h}, one has {\lambda(n)=\lambda(n+1)=\dots=\lambda(n+h)} for almost all {n}. But this means that the mean {\frac{1}{h} \sum_{j=1}^h \lambda(n+j)} is abnormally large for most {n}, which (for {h} large enough) contradicts the results of Matomäki and Radziwiłł. Here we see that the “enemy” to defeat is the scenario in which {\lambda} only changes sign very rarely, in which case one rarely sees the pattern {(+1,-1)}.

It turns out that similar (but more combinatorially intricate) arguments work for sign patterns of length three (but are unlikely to work for most sign patterns of length four or greater). We give here one fragment of such an argument (due to Hildebrand) which hopefully conveys the Sudoku-type flavour of the combinatorics. Suppose for instance that the sign pattern {(\lambda(n),\lambda(n+1),\lambda(n+2)) = (+1,+1,+1)} almost never occurs. Now suppose {n} is a typical number with {\lambda(15n-1)=\lambda(15n+1)=+1}. Since we almost never have the sign pattern {(+1,+1,+1)}, we must (almost always) then have {\lambda(15n) = -1}. By multiplicativity this implies that

\displaystyle (\lambda(60n-4), \lambda(60n), \lambda(60n+4)) = (+1,-1,+1).

We claim that this (almost always) forces {\lambda(60n+5)=-1}. For if {\lambda(60n+5)=+1}, then by the lack of the sign pattern {(+1,+1,+1)}, this (almost always) forces {\lambda(60n+3)=\lambda(60n+6)=-1}, which by multiplicativity forces {\lambda(20n+1)=\lambda(20n+2)=+1}, which by lack of {(+1,+1,+1)} (almost always) forces {\lambda(20n)=-1}, which by multiplicativity contradicts {\lambda(60n)=-1}. Thus we have {\lambda(60n+5)=-1}; a similar argument gives {\lambda(60n-5)=-1} almost always, which by multiplicativity gives {\lambda(12n-1)=\lambda(12n)=\lambda(12n+1)=+1}, a contradiction. Thus we almost never have {\lambda(15n-1)=\lambda(15n+1)=+1}, which by the inclusion-exclusion argument mentioned previously shows that {\lambda(15n+1) = - \lambda(15n-1)} for almost all {n}.

One can continue these Sudoku-type arguments and conclude eventually that {\lambda(3n-1)=-\lambda(3n+1)=\lambda(3n+2)} for almost all {n}. To put it another way, if {\chi_3} denotes the non-principal Dirichlet character of modulus {3}, then {\lambda \chi_3} is almost always constant away from the multiples of {3}. (Conversely, if {\lambda \chi_3} changed sign very rarely outside of the multiples of three, then the sign pattern {(+1,+1,+1)} would never occur.) Fortunately, the main result of Matomäki and Radziwiłł shows that this scenario cannot occur, which establishes that the sign pattern {(+1,+1,+1)} must occur rather frequently. The other sign patterns are handled by variants of these arguments.

Excluding a sign pattern of length three leads to useful implications like “if {\lambda(n-1)=\lambda(n)=+1}, then {\lambda(n+1)=-1}” which turn out are just barely strong enough to quite rigidly constrain the Liouville function using Sudoku-like arguments. In contrast, excluding a sign pattern of length four only gives rise to implications like “`if {\lambda(n-2)=\lambda(n-1)=\lambda(n)=+1}, then {\lambda(n+1)=-1}“, and these seem to be much weaker for this purpose (the hypothesis in these implications just isn’t satisfied nearly often enough). So a different idea seems to be needed if one wishes to extend the above theorem to larger values of {k}.

Our second theorem gives an analogous result for the Möbius function {\mu} (which takes values in {\{-1,0,+1\}} rather than {\{-1,1\}}), but the analysis turns out to be remarkably difficult and we are only able to get up to {k=2}:

Theorem 2 Let {k \leq 2}. Then each of the sign patterns in {\{-1,0,+1\}^k} is attained by the Möbius function for a set {n} of positive lower density.

It turns out that the prime number theorem and elementary sieve theory can be used to handle the {k=1} case and all the {k=2} cases that involve at least one {0}, leaving only the four sign patterns {(\pm 1, \pm 1)} to handle. It is here that the zeroes of the Möbius function cause a significant new obstacle. Suppose for instance that the sign pattern {(+1, -1)} almost never occurs for the Möbius function. The same arguments that were used in the Liouville case then show that {\mu(n)} will be almost always equal to {\mu(n+1)}, provided that {n,n+1} are both square-free. One can try to chain this together as before to create a long string {\mu(n)=\dots=\mu(n+h) \in \{-1,+1\}} where the Möbius function is constant, but this cannot work for any {h} larger than three, because the Möbius function vanishes at every multiple of four.

The constraints we assume on the Möbius function can be depicted using a graph on the squarefree natural numbers, in which any two adjacent squarefree natural numbers are connected by an edge. The main difficulty is then that this graph is highly disconnected due to the multiples of four not being squarefree.

To get around this, we need to enlarge the graph. Note from multiplicativity that if {\mu(n)} is almost always equal to {\mu(n+1)} when {n,n+1} are squarefree, then {\mu(n)} is almost always equal to {\mu(n+p)} when {n,n+p} are squarefree and {n} is divisible by {p}. We can then form a graph on the squarefree natural numbers by connecting {n} to {n+p} whenever {n,n+p} are squarefree and {n} is divisible by {p}. If this graph is “locally connected” in some sense, then {\mu} will be constant on almost all of the squarefree numbers in a large interval, which turns out to be incompatible with the results of Matomäki and Radziwiłł. Because of this, matters are reduced to establishing the connectedness of a certain graph. More precisely, it turns out to be sufficient to establish the following claim:

Theorem 3 For each prime {p}, let {a_p \hbox{ mod } p^2} be a residue class chosen uniformly at random. Let {G} be the random graph whose vertices {V} consist of those integers {n} not equal to {a_p \hbox{ mod } p^2} for any {p}, and whose edges consist of pairs {n,n+p} in {V} with {n = a_p \hbox{ mod } p}. Then with probability {1}, the graph {G} is connected.

We were able to show the connectedness of this graph, though it turned out to be remarkably tricky to do so. Roughly speaking (and suppressing a number of technicalities), the main steps in the argument were as follows.

  • (Early stage) Pick a large number {X} (in our paper we take {X} to be odd, but I’ll ignore this technicality here). Using a moment method to explore neighbourhoods of a single point in {V}, one can show that a vertex {v} in {V} is almost always connected to at least {\log^{10} X} numbers in {[v,v+X^{1/100}]}, using relatively short paths of short diameter. (This is the most computationally intensive portion of the argument.)
  • (Middle stage) Let {X'} be a typical number in {[X/40,X/20]}, and let {R} be a scale somewhere between {X^{1/40}} and {X'}. By using paths {n, n+p_1, n+p_1-p_2, n+p_1-p_2+p_3} involving three primes, and using a variant of Vinogradov’s theorem and some routine second moment computations, one can show that with quite high probability, any “good” vertex in {[v+X'-R, v+X'-0.99R]} is connected to a “good” vertex in {[v+X'-0.01R, v+X-0.0099 R]} by paths of length three, where the definition of “good” is somewhat technical but encompasses almost all of the vertices in {V}.
  • (Late stage) Combining the two previous results together, we can show that most vertices {v} will be connected to a vertex in {[v+X'-X^{1/40}, v+X']} for any {X'} in {[X/40,X/20]}. In particular, {v} will be connected to a set of {\gg X^{9/10}} vertices in {[v,v+X/20]}. By tracking everything carefully, one can control the length and diameter of the paths used to connect {v} to this set, and one can also control the parity of the elements in this set.
  • (Final stage) Now if we have two vertices {v, w} at a distance {X} apart. By the previous item, one can connect {v} to a large set {A} of vertices in {[v,v+X/20]}, and one can similarly connect {w} to a large set {B} of vertices in {[w,w+X/20]}. Now, by using a Vinogradov-type theorem and second moment calculations again (and ensuring that the elements of {A} and {B} have opposite parity), one can connect many of the vertices in {A} to many of the vertices {B} by paths of length three, which then connects {v} to {w}, and gives the claim.

It seems of interest to understand random graphs like {G} further. In particular, the graph {G'} on the integers formed by connecting {n} to {n+p} for all {n} in a randomly selected residue class mod {p} for each prime {p} is particularly interesting (it is to the Liouville function as {G} is to the Möbius function); if one could show some “local expander” properties of this graph {G'}, then one would have a chance of modifying the above methods to attack the first unsolved case of the Chowla conjecture, namely that {\lambda(n)\lambda(n+1)} has asymptotic density zero (perhaps working with logarithmic density instead of natural density to avoids some technicalities).

I’ve just uploaded to the arXiv my paper “Inverse theorems for sets and measures of polynomial growth“. This paper was motivated by two related questions. The first question was to obtain a qualitatively precise description of the sets of polynomial growth that arise in Gromov’s theorem, in much the same way that Freiman’s theorem (and its generalisations) provide a qualitatively precise description of sets of small doubling. The other question was to obtain a non-abelian analogue of inverse Littlewood-Offord theory.

Let me discuss the former question first. Gromov’s theorem tells us that if a finite subset {A} of a group {G} exhibits polynomial growth in the sense that {|A^n|} grows polynomially in {n}, then the group generated by {A} is virtually nilpotent (the converse direction also true, and is relatively easy to establish). This theorem has been strengthened a number of times over the years. For instance, a few years ago, I proved with Shalom that the condition that {|A^n|} grew polynomially in {n} could be replaced by {|A^n| \leq C n^d} for a single {n}, as long as {n} was sufficiently large depending on {C,d} (in fact we gave a fairly explicit quantitative bound on how large {n} needed to be). A little more recently, with Breuillard and Green, the condition {|A^n| \leq C n^d} was weakened to {|A^n| \leq n^d |A|}, that is to say it sufficed to have polynomial relative growth at a finite scale. In fact, the latter paper gave more information on {A} in this case, roughly speaking it showed (at least in the case when {A} was a symmetric neighbourhood of the identity) that {A^n} was “commensurate” with a very structured object known as a coset nilprogression. This can then be used to establish further control on {A}. For instance, it was recently shown by Breuillard and Tointon (again in the symmetric case) that if {|A^n| \leq n^d |A|} for a single {n} that was sufficiently large depending on {d}, then all the {A^{n'}} for {n' \geq n} have a doubling constant bounded by a bound {C_d} depending only on {d}, thus {|A^{2n'}| \leq C_d |A^{n'}|} for all {n' \geq n}.

In this paper we are able to refine this analysis a bit further; under the same hypotheses, we can show an estimate of the form

\displaystyle  \log |A^{n'}| = \log |A^n| + f( \log n' - \log n ) + O_d(1)

for all {n' \geq n} and some piecewise linear, continuous, non-decreasing function {f: [0,+\infty) \rightarrow [0,+\infty)} with {f(0)=0}, where the error {O_d(1)} is bounded by a constant depending only on {d}, and where {f} has at most {O_d(1)} pieces, each of which has a slope that is a natural number of size {O_d(1)}. To put it another way, the function {n' \mapsto |A^{n'}|} for {n' \geq n} behaves (up to multiplicative constants) like a piecewise polynomial function, where the degree of the function and number of pieces is bounded by a constant depending on {d}.

One could ask whether the function {f} has any convexity or concavity properties. It turns out that it can exhibit either convex or concave behaviour (or a combination of both). For instance, if {A} is contained in a large finite group, then {n \mapsto |A^n|} will eventually plateau to a constant, exhibiting concave behaviour. On the other hand, in nilpotent groups one can see convex behaviour; for instance, in the Heisenberg group {\begin{pmatrix}{} {1} {\mathbf Z} {\mathbf Z} \\ {0} {1} {\mathbf Z} \\ {0} {1} \end{pmatrix}}, if one sets {A} to be a set of matrices of the form {\begin{pmatrix} 1 & O(N) & O(N^3) \\ 0 & 1 & O(N) \\ 0 & 0 & 1 \end{pmatrix}} for some large {N} (abusing the {O()} notation somewhat), then {n \mapsto A^n} grows cubically for {n \leq N} but then grows quartically for {n > N}.

To prove this proposition, it turns out (after using a somewhat difficult inverse theorem proven previously by Breuillard, Green, and myself) that one has to analyse the volume growth {n \mapsto |P^n|} of nilprogressions {P}. In the “infinitely proper” case where there are no unexpected relations between the generators of the nilprogression, one can lift everything to a simply connected Lie group (where one can take logarithms and exploit the Baker-Campbell-Hausdorff formula heavily), eventually describing {P^n} with fair accuracy by a certain convex polytope with vertices depending polynomially on {n}, which implies that {|P^n|} depends polynomially on {n} up to constants. If one is not in the “infinitely proper” case, then at some point {n_0} the nilprogression {P^{n_0}} develops a “collision”, but then one can use this collision to show (after some work) that the dimension of the “Lie model” of {P^{n_0}} has dropped by at least one from the dimension of {P} (the notion of a Lie model being developed in the previously mentioned paper of Breuillard, Greenm, and myself), so that this sort of collision can only occur a bounded number of times, with essentially polynomial volume growth behaviour between these collisions.

The arguments also give a precise description of the location of a set {A} for which {A^n} grows polynomially in {n}. In the symmetric case, what ends up happening is that {A^n} becomes commensurate to a “coset nilprogression” {HP} of bounded rank and nilpotency class, whilst {A} is “virtually” contained in a scaled down version {HP^{1/n}} of that nilprogression. What “virtually” means is a little complicated; roughly speaking, it means that there is a set {X} of bounded cardinality such that {aXHP^{1/n} \approx XHP^{1/n}} for all {a \in A}. Conversely, if {A} is virtually contained in {HP^{1/n}}, then {A^n} is commensurate to {HP} (and more generally, {A^{mn}} is commensurate to {HP^m} for any natural number {m}), giving quite a (qualitatively) precise description of {A} in terms of coset nilprogressions.

The main tool used to prove these results is the structure theorem for approximate groups established by Breuillard, Green, and myself, which roughly speaking asserts that approximate groups are always commensurate with coset nilprogressions. A key additional trick is a pigeonholing argument of Sanders, which in this context is the assertion that if {A^n} is comparable to {A^{2n}}, then there is an {n'} between {n} and {2n} such that {A \cdot A^{n'}} is very close in size to {A^{n'}} (up to a relative error of {1/n}). It is this fact, together with the comparability of {A^{n'}} to a coset nilprogression {HP}, that allows us (after some combinatorial argument) to virtually place {A} inside {HP^{1/n}}.

Similar arguments apply when discussing iterated convolutions {\mu^{*n}} of (symmetric) probability measures on a (discrete) group {G}, rather than combinatorial powers {A^n} of a finite set. Here, the analogue of volume {A^n} is given by the negative power {\| \mu^{*n} \|_{\ell^2}^{-2}} of the {\ell^2} norm of {\mu^{*n}} (thought of as a non-negative function on {G} of total mass 1). One can also work with other norms here than {\ell^2}, but this norm has some minor technical conveniences (and other measures of the “spread” of {\mu^{*n}} end up being more or less equivalent for our purposes). There is an analogous structure theorem that asserts that if {\mu^{*n}} spreads at most polynomially in {n}, then {\mu^{*n}} is “commensurate” with the uniform probability distribution on a coset progression {HP}, and {\mu} itself is largely concentrated near {HP^{1/\sqrt{n}}}. The factor of {\sqrt{n}} here is the familiar scaling factor in random walks that arises for instance in the central limit theorem. The proof of (the precise version of) this statement proceeds similarly to the combinatorial case, using pigeonholing to locate a scale {n'} where {\mu *\mu^{n'}} has almost the same {\ell^2} norm as {\mu^{n'}}.

A special case of this theory occurs when {\mu} is the uniform probability measure on {n} elements {v_1,\dots,v_n} of {G} and their inverses. The probability measure {\mu^{*n}} is then the distribution of a random product {w_1 \dots w_n}, where each {w_i} is equal to one of {v_{j_i}} or its inverse {v_{j_i}^{-1}}, selected at random with {j_i} drawn uniformly from {\{1,\dots,n\}} with replacement. This is very close to the Littlewood-Offord situation of random products {u_1 \dots u_n} where each {u_i} is equal to {v_i} or {v_i^{-1}} selected independently at random (thus {j_i} is now fixed to equal {i} rather than being randomly drawn from {\{1,\dots,n\}}. In the case when {G} is abelian, it turns out that a little bit of Fourier analysis shows that these two random walks have “comparable” distributions in a certain {\ell^2} sense. As a consequence, the results in this paper can be used to recover an essentially optimal abelian inverse Littlewood-Offord theorem of Nguyen and Vu. In the nonabelian case, the only Littlewood-Offord theorem I am aware of is a recent result of Tiep and Vu for matrix groups, but in this case I do not know how to relate the above two random walks to each other, and so we can only obtain an analogue of the Tiep-Vu results for the symmetrised random walk {w_1 \dots w_n} instead of the ordered random walk {u_1 \dots u_n}.

I’ve just uploaded to the arXiv my paper “Cancellation for the multilinear Hilbert transform“, submitted to Collectanea Mathematica. This paper uses methods from additive combinatorics (and more specifically, the arithmetic regularity and counting lemmas from this paper of Ben Green and myself) to obtain a slight amount of progress towards the open problem of obtaining {L^p} bounds for the trilinear and higher Hilbert transforms (as discussed in this previous blog post). For instance, the trilinear Hilbert transform

\displaystyle  H_3( f_1, f_2, f_3 )(x) := p.v. \int_{\bf R} f_1(x+t) f_2(x+2t) f_3(x+3t)\ \frac{dt}{t}

is not known to be bounded for any {L^{p_1}({\bf R}) \times L^{p_2}({\bf R}) \times L^{p_3}({\bf R})} to {L^p({\bf R})}, although it is conjectured to do so when {1/p =1/p_1 +1/p_2+1/p_3} and {1 < p_1,p_2,p_3,p < \infty}. (For {p} well below {1}, one can use additive combinatorics constructions to demonstrate unboundedness; see this paper of Demeter.) One can approach this problem by considering the truncated trilinear Hilbert transforms

\displaystyle  H_{3,r,R}( f_1, f_2, f_3 )(x) := \int_{r \leq |t| \leq R} f_1(x+t) f_2(x+2t) f_3(x+3t)\ \frac{dt}{t}

for {0 < r < R}. It is not difficult to show that the boundedness of {H_3} is equivalent to the boundedness of {H_{3,r,R}} with bounds that are uniform in {R} and {r}. On the other hand, from Minkowski’s inequality and Hölder’s inequality one can easily obtain the non-uniform bound of {2 \log \frac{R}{r}} for {H_{3,r,R}}. The main result of this paper is a slight improvement of this trivial bound to {o( \log \frac{R}{r})} as {R/r \rightarrow \infty}. Roughly speaking, the way this gain is established is as follows. First there are some standard time-frequency type reductions to reduce to the task of obtaining some non-trivial cancellation on a single “tree”. Using a “generalised von Neumann theorem”, we show that such cancellation will happen if (a discretised version of) one or more of the functions {f_1,f_2,f_3} (or a dual function {f_0} that it is convenient to test against) is small in the Gowers {U^3} norm. However, the arithmetic regularity lemma alluded to earlier allows one to represent an arbitrary function {f_i}, up to a small error, as the sum of such a “Gowers uniform” function, plus a structured function (or more precisely, an irrational virtual nilsequence). This effectively reduces the problem to that of establishing some cancellation in a single tree in the case when all functions {f_0,f_1,f_2,f_3} involved are irrational virtual nilsequences. At this point, the contribution of each component of the tree can be estimated using the “counting lemma” from my paper with Ben. The main term in the asymptotics is a certain integral over a nilmanifold, but because the kernel {\frac{dt}{t}} in the trilinear Hilbert transform is odd, it turns out that this integral vanishes, giving the required cancellation.

The same argument works for higher order Hilbert transforms (and one can also replace the coefficients in these transforms with other rational constants). However, because the quantitative bounds in the arithmetic regularity and counting lemmas are so poor, it does not seem likely that one can use these methods to remove the logarithmic growth in {R/r} entirely, and some additional ideas will be needed to resolve the full conjecture.

I’ve just uploaded to the arXiv my paper “Failure of the {L^1} pointwise and maximal ergodic theorems for the free group“, submitted to Forum of Mathematics, Sigma. This paper concerns a variant of the pointwise ergodic theorem of Birkhoff, which asserts that if one has a measure-preserving shift map {T: X \rightarrow X} on a probability space {X = (X,\mu)}, then for any {f \in L^1(X)}, the averages {\frac{1}{N} \sum_{n=1}^N f \circ T^{-n}} converge pointwise almost everywhere. (In the important case when the shift map {T} is ergodic, the pointwise limit is simply the mean {\int_X f\ d\mu} of the original function {f}.)

The pointwise ergodic theorem can be extended to measure-preserving actions of other amenable groups, if one uses a suitably “tempered” Folner sequence of averages; see this paper of Lindenstrauss for more details. (I also wrote up some notes on that paper here, back in 2006 before I had started this blog.) But the arguments used to handle the amenable case break down completely for non-amenable groups, and in particular for the free non-abelian group {F_2} on two generators.

Nevo and Stein studied this problem and obtained a number of pointwise ergodic theorems for {F_2}-actions {(T_g)_{g \in F_2}} on probability spaces {(X,\mu)}. For instance, for the spherical averaging operators

\displaystyle  {\mathcal A}_n f := \frac{1}{4 \times 3^{n-1}} \sum_{g \in F_2: |g| = n} f \circ T_g^{-1}

(where {|g|} denotes the length of the reduced word that forms {g}), they showed that {{\mathcal A}_{2n} f} converged pointwise almost everywhere provided that {f} was in {L^p(X)} for some {p>1}. (The need to restrict to spheres of even radius can be seen by considering the action of {F_2} on the two-element set {\{0,1\}} in which both generators of {F_2} act by interchanging the elements, in which case {{\mathcal A}_n} is determined by the parity of {n}.) This result was reproven with a different and simpler proof by Bufetov, who also managed to relax the condition {f \in L^p(X)} to the weaker condition {f \in L \log L(X)}.

The question remained open as to whether the pointwise ergodic theorem for {F_2}-actions held if one only assumed that {f} was in {L^1(X)}. Nevo and Stein were able to establish this for the Cesáro averages {\frac{1}{N} \sum_{n=1}^N {\mathcal A}_n}, but not for {{\mathcal A}_n} itself. About six years ago, Assaf Naor and I tried our hand at this problem, and was able to show an associated maximal inequality on {\ell^1(F_2)}, but due to the non-amenability of {F_2}, this inequality did not transfer to {L^1(X)} and did not have any direct impact on this question, despite a fair amount of effort on our part to attack it.

Inspired by some recent conversations with Lewis Bowen, I returned to this problem. This time around, I tried to construct a counterexample to the {L^1} pointwise ergodic theorem – something Assaf and I had not seriously attempted to do (perhaps due to being a bit too enamoured of our {\ell^1(F_2)} maximal inequality). I knew of an existing counterexample of Ornstein regarding a failure of an {L^1} ergodic theorem for iterates {P^n} of a self-adjoint Markov operator – in fact, I had written some notes on this example back in 2007. Upon revisiting my notes, I soon discovered that the Ornstein construction was adaptable to the {F_2} setting, thus settling the problem in the negative:

Theorem 1 (Failure of {L^1} pointwise ergodic theorem) There exists a measure-preserving {F_2}-action on a probability space {X} and a non-negative function {f \in L^1(X)} such that {\sup_n {\mathcal A}_{2n} f(x) = +\infty} for almost every {x}.

To describe the proof of this theorem, let me first briefly sketch the main ideas of Ornstein’s construction, which gave an example of a self-adjoint Markov operator {P} on a probability space {X} and a non-negative {f \in L^1(X)} such that {\sup_n P^n f(x) = +\infty} for almost every {x}. By some standard manipulations, it suffices to show that for any given {\alpha > 0} and {\varepsilon>0}, there exists a self-adjoint Markov operator {P} on a probability space {X} and a non-negative {f \in L^1(X)} with {\|f\|_{L^1(X)} \leq \alpha}, such that {\sup_n P^n f \geq 1-\varepsilon} on a set of measure at least {1-\varepsilon}. Actually, it will be convenient to replace the Markov chain {(P^n f)_{n \geq 0}} with an ancient Markov chain {(f_n)_{n \in {\bf Z}}} – that is to say, a sequence of non-negative functions {f_n} for both positive and negative {f}, such that {f_{n+1} = P f_n} for all {n \in {\bf Z}}. The purpose of requiring the Markov chain to be ancient (that is, to extend infinitely far back in time) is to allow for the Markov chain to be shifted arbitrarily in time, which is key to Ornstein’s construction. (Technically, Ornstein’s original argument only uses functions that go back to a large negative time, rather than being infinitely ancient, but I will gloss over this point for sake of discussion, as it turns out that the {F_2} version of the argument can be run using infinitely ancient chains.)

For any {\alpha>0}, let {P(\alpha)} denote the claim that for any {\varepsilon>0}, there exists an ancient Markov chain {(f_n)_{n \in {\bf Z}}} with {\|f_n\|_{L^1(X)} = \alpha} such that {\sup_{n \in {\bf Z}} f_n \geq 1-\varepsilon} on a set of measure at least {1-\varepsilon}. Clearly {P(1)} holds since we can just take {f_n=1} for all {n}. Our objective is to show that {P(\alpha)} holds for arbitrarily small {\alpha}. The heart of Ornstein’s argument is then the implication

\displaystyle  P(\alpha) \implies P( \alpha (1 - \frac{\alpha}{4}) ) \ \ \ \ \ (1)

for any {0 < \alpha \leq 1}, which upon iteration quickly gives the desired claim.

Let’s see informally how (1) works. By hypothesis, and ignoring epsilons, we can find an ancient Markov chain {(f_n)_{n \in {\bf Z}}} on some probability space {X} of total mass {\|f_n\|_{L^1(X)} = \alpha}, such that {\sup_n f_n} attains the value of {1} or greater almost everywhere. Assuming that the Markov process is irreducible, the {f_n} will eventually converge as {n \rightarrow \infty} to the constant value of {\|f_n\|_{L^1(X)}}, in particular its final state will essentially stay above {\alpha} (up to small errors).

Now suppose we duplicate the Markov process by replacing {X} with a double copy {X \times \{1,2\}} (giving {\{1,2\}} the uniform probability measure), and using the disjoint sum of the Markov operators on {X \times \{1\}} and {X \times \{2\}} as the propagator, so that there is no interaction between the two components of this new system. Then the functions {f'_n(x,i) := f_n(x) 1_{i=1}} form an ancient Markov chain of mass at most {\alpha/2} that lives solely in the first half {X \times \{1\}} of this copy, and {\sup_n f'_n} attains the value of {1} or greater on almost all of the first half {X \times \{1\}}, but is zero on the second half. The final state of {f'_n} will be to stay above {\alpha} in the first half {X \times \{1\}}, but be zero on the second half.

Now we modify the above example by allowing an infinitesimal amount of interaction between the two halves {X \times \{1\}}, {X \times \{2\}} of the system (I mentally think of {X \times \{1\}} and {X \times \{2\}} as two identical boxes that a particle can bounce around in, and now we wish to connect the boxes by a tiny tube). The precise way in which this interaction is inserted is not terribly important so long as the new Markov process is irreducible. Once one does so, then the ancient Markov chain {(f'_n)_{n \in {\bf Z}}} in the previous example gets replaced by a slightly different ancient Markov chain {(f''_n)_{n \in {\bf Z}}} which is more or less identical with {f'_n} for negative times {n}, or for bounded positive times {n}, but for very large values of {n} the final state is now constant across the entire state space {X \times \{1,2\}}, and will stay above {\alpha/2} on this space.

Finally, we consider an ancient Markov chain {F_n} which is basically of the form

\displaystyle  F_n(x,i) \approx f''_n(x,i) + (1 - \frac{\alpha}{2}) f_{n-M}(x) 1_{i=2}

for some large parameter {M} and for all {n \leq M} (the approximation becomes increasingly inaccurate for {n} much larger than {M}, but never mind this for now). This is basically two copies of the original Markov process in separate, barely interacting state spaces {X \times \{1\}, X \times \{2\}}, but with the second copy delayed by a large time delay {M}, and also attenuated in amplitude by a factor of {1-\frac{\alpha}{2}}. The total mass of this process is now {\frac{\alpha}{2} + \frac{\alpha}{2} (1 -\frac{\alpha}{2}) = \alpha (1 - \alpha/4)}. Because of the {f''_n} component of {F_n}, we see that {\sup_n F_n} basically attains the value of {1} or greater on the first half {X \times \{1\}}. On the second half {X \times \{2\}}, we work with times {n} close to {M}. If {M} is large enough, {f''_n} would have averaged out to about {\alpha/2} at such times, but the {(1 - \frac{\alpha}{2}) f_{n-M}(x)} component can get as large as {1-\alpha/2} here. Summing (and continuing to ignore various epsilon losses), we see that {\sup_n F_n} can get as large as {1} on almost all of the second half of {X \times \{2\}}. This concludes the rough sketch of how one establishes the implication (1).

It was observed by Bufetov that the spherical averages {{\mathcal A}_n} for a free group action can be lifted up to become powers {P^n} of a Markov operator, basically by randomly assigning a “velocity vector” {s \in \{a,b,a^{-1},b^{-1}\}} to one’s base point {x} and then applying the Markov process that moves {x} along that velocity vector (and then randomly changing the velocity vector at each time step to the “reduced word” condition that the velocity never flips from {s} to {s^{-1}}). Thus the spherical average problem has a Markov operator interpretation, which opens the door to adapting the Ornstein construction to the setting of {F_2} systems. This turns out to be doable after a certain amount of technical artifice; the main thing is to work with {F_2}-measure preserving systems that admit ancient Markov chains that are initially supported in a very small region in the “interior” of the state space, so that one can couple such systems to each other “at the boundary” in the fashion needed to establish the analogue of (1) without disrupting the ancient dynamics of such chains. The initial such system (used to establish the base case {P(1)}) comes from basically considering the action of {F_2} on a (suitably renormalised) “infinitely large ball” in the Cayley graph, after suitably gluing together the boundary of this ball to complete the action. The ancient Markov chain associated to this system starts at the centre of this infinitely large ball at infinite negative time {n=-\infty}, and only reaches the boundary of this ball at the time {n=0}.

Hoi Nguyen, Van Vu, and myself have just uploaded to the arXiv our paper “Random matrices: tail bounds for gaps between eigenvalues“. This is a followup paper to my recent paper with Van in which we showed that random matrices {M_n} of Wigner type (such as the adjacency matrix of an Erdös-Renyi graph) asymptotically almost surely had simple spectrum. In the current paper, we push the method further to show that the eigenvalues are not only distinct, but are (with high probability) separated from each other by some negative power {n^{-A}} of {n}. This follows the now standard technique of replacing any appearance of discrete Littlewood-Offord theory (a key ingredient in our previous paper) with its continuous analogue (inverse theorems for small ball probability). For general Wigner-type matrices {M_n} (in which the matrix entries are not normalised to have mean zero), we can use the inverse Littlewood-Offord theorem of Nguyen and Vu to obtain (under mild conditions on {M_n}) a result of the form

\displaystyle  {\bf P} (\lambda_{i+1}(M_n) - \lambda_i(M_n) \leq n^{-A} ) \leq n^{-B}

for any {B} and {i}, if {A} is sufficiently large depending on {B} (in a linear fashion), and {n} is sufficiently large depending on {B}. The point here is that {B} can be made arbitrarily large, and also that no continuity or smoothness hypothesis is made on the distribution of the entries. (In the continuous case, one can use the machinery of Wegner estimates to obtain results of this type, as was done in a paper of Erdös, Schlein, and Yau.)

In the mean zero case, it becomes more efficient to use an inverse Littlewood-Offord theorem of Rudelson and Vershynin to obtain (with the normalisation that the entries of {M_n} have unit variance, so that the eigenvalues of {M_n} are {O(\sqrt{n})} with high probability), giving the bound

\displaystyle  {\bf P} (\lambda_{i+1}(M_n) - \lambda_i(M_n) \leq \delta / \sqrt{n} ) \ll \delta \ \ \ \ \ (1)

for {\delta \geq n^{-O(1)}} (one also has good results of this type for smaller values of {\delta}). This is only optimal in the regime {\delta \sim 1}; we expect to establish some eigenvalue repulsion, improving the RHS to {\delta^2} for real matrices and {\delta^3} for complex matrices, but this appears to be a more difficult task (possibly requiring some quadratic inverse Littlewood-Offord theory, rather than just linear inverse Littlewood-Offord theory). However, we can get some repulsion if one works with larger gaps, getting a result roughly of the form

\displaystyle  {\bf P} (\lambda_{i+k}(M_n) - \lambda_i(M_n) \leq \delta / \sqrt{n} ) \ll \delta^{ck^2}

for any fixed {k \geq 1} and some absolute constant {c>0} (which we can asymptotically make to be {1/3} for large {k}, though it ought to be as large as {1}), by using a higher-dimensional version of the Rudelson-Vershynin inverse Littlewood-Offord theorem.

In the case of Erdös-Renyi graphs, we don’t have mean zero and the Rudelson-Vershynin Littlewood-Offord theorem isn’t quite applicable, but by working carefully through the approach based on the Nguyen-Vu theorem we can almost recover (1), except for a loss of {n^{o(1)}} on the RHS.

As a sample applications of the eigenvalue separation results, we can now obtain some information about eigenvectors; for instance, we can show that the components of the eigenvectors all have magnitude at least {n^{-A}} for some {A} with high probability. (Eigenvectors become much more stable, and able to be studied in isolation, once their associated eigenvalue is well separated from the other eigenvalues; see this previous blog post for more discussion.)

Kaisa Matomaki, Maksym Radziwill, and I have just uploaded to the arXiv our paper “An averaged form of Chowla’s conjecture“. This paper concerns a weaker variant of the famous conjecture of Chowla (discussed for instance in this previous post) that

\displaystyle  \sum_{n \leq X} \lambda(n+h_1) \dots \lambda(n+h_k) = o(X)

as {X \rightarrow \infty} for any distinct natural numbers {h_1,\dots,h_k}, where {\lambda} denotes the Liouville function. (One could also replace the Liouville function here by the Möbius function {\mu} and obtain a morally equivalent conjecture.) This conjecture remains open for any {k \geq 2}; for instance the assertion

\displaystyle  \sum_{n \leq X} \lambda(n) \lambda(n+2) = o(X)

is a variant of the twin prime conjecture (though possibly a tiny bit easier to prove), and is subject to the notorious parity barrier (as discussed in this previous post).

Our main result asserts, roughly speaking, that Chowla’s conjecture can be established unconditionally provided one has non-trivial averaging in the {h_1,\dots,h_k} parameters. More precisely, one has

Theorem 1 (Chowla on the average) Suppose {H = H(X) \leq X} is a quantity that goes to infinity as {X \rightarrow \infty} (but it can go to infinity arbitrarily slowly). Then for any fixed {k \geq 1}, we have

\displaystyle  \sum_{h_1,\dots,h_k \leq H} |\sum_{n \leq X} \lambda(n+h_1) \dots \lambda(n+h_k)| = o( H^k X ).

In fact, we can remove one of the averaging parameters and obtain

\displaystyle  \sum_{h_2,\dots,h_k \leq H} |\sum_{n \leq X} \lambda(n) \lambda(n+h_2) \dots \lambda(n+h_k)| = o( H^{k-1} X ).

Actually we can make the decay rate a bit more quantitative, gaining about {\frac{\log\log H}{\log H}} over the trivial bound. The key case is {k=2}; while the unaveraged Chowla conjecture becomes more difficult as {k} increases, the averaged Chowla conjecture does not increase in difficulty due to the increasing amount of averaging for larger {k}, and we end up deducing the higher {k} case of the conjecture from the {k=2} case by an elementary argument.

The proof of the theorem proceeds as follows. By exploiting the Fourier-analytic identity

\displaystyle  \int_{{\mathbf T}} (\int_{\mathbf R} |\sum_{x \leq n \leq x+H} f(n) e(\alpha n)|^2 dx)^2\ d\alpha

\displaystyle = \sum_{|h| \leq H} (H-|h|)^2 |\sum_n f(n) \overline{f}(n+h)|^2

(related to a standard Fourier-analytic identity for the Gowers {U^2} norm) it turns out that the {k=2} case of the above theorem can basically be derived from an estimate of the form

\displaystyle  \int_0^X |\sum_{x \leq n \leq x+H} \lambda(n) e(\alpha n)|\ dx = o( H X )

uniformly for all {\alpha \in {\mathbf T}}. For “major arc” {\alpha}, close to a rational {a/q} for small {q}, we can establish this bound from a generalisation of a recent result of Matomaki and Radziwill (discussed in this previous post) on averages of multiplicative functions in short intervals. For “minor arc” {\alpha}, we can proceed instead from an argument of Katai and Bourgain-Sarnak-Ziegler (discussed in this previous post).

The argument also extends to other bounded multiplicative functions than the Liouville function. Chowla’s conjecture was generalised by Elliott, who roughly speaking conjectured that the {k} copies of {\lambda} in Chowla’s conjecture could be replaced by arbitrary bounded multiplicative functions {g_1,\dots,g_k} as long as these functions were far from a twisted Dirichlet character {n \mapsto \chi(n) n^{it}} in the sense that

\displaystyle  \sum_p \frac{1 - \hbox{Re} g(p) \overline{\chi(p) p^{it}}}{p} = +\infty. \ \ \ \ \ (1)

(This type of distance is incidentally now a fundamental notion in the Granville-Soundararajan “pretentious” approach to multiplicative number theory.) During our work on this project, we found that Elliott’s conjecture is not quite true as stated due to a technicality: one can cook up a bounded multiplicative function {g} which behaves like {n^{it_j}} on scales {n \sim N_j} for some {N_j} going to infinity and some slowly varying {t_j}, and such a function will be far from any fixed Dirichlet character whilst still having many large correlations (e.g. the pair correlations {\sum_{n \leq N_j} g(n+1) \overline{g(n)}} will be large). In our paper we propose a technical “fix” to Elliott’s conjecture (replacing (1) by a truncated variant), and show that this repaired version of Elliott’s conjecture is true on the average in much the same way that Chowla’s conjecture is. (If one restricts attention to real-valued multiplicative functions, then this technical issue does not show up, basically because one can assume without loss of generality that {t=0} in this case; we discuss this fact in an appendix to the paper.)

Kevin Ford, Ben Green, Sergei Konyagin, James Maynard, and I have just uploaded to the arXiv our paper “Long gaps between primes“. This is a followup work to our two previous papers (discussed in this previous post), in which we had simultaneously shown that the maximal gap

\displaystyle  G(X) := \sup_{p_n, p_{n+1} \leq X} p_{n+1}-p_n

between primes up to {X} exhibited a lower bound of the shape

\displaystyle  G(X) \geq f(X) \log X \frac{\log \log X \log\log\log\log X}{(\log\log\log X)^2} \ \ \ \ \ (1)

for some function {f(X)} that went to infinity as {X \rightarrow \infty}; this improved upon previous work of Rankin and other authors, who established the same bound but with {f(X)} replaced by a constant. (Again, see the previous post for a more detailed discussion.)

In our previous papers, we did not specify a particular growth rate for {f(X)}. In my paper with Kevin, Ben, and Sergei, there was a good reason for this: our argument relied (amongst other things) on the inverse conjecture on the Gowers norms, as well as the Siegel-Walfisz theorem, and the known proofs of both results both have ineffective constants, rendering our growth function {f(X)} similarly ineffective. Maynard’s approach ostensibly also relies on the Siegel-Walfisz theorem, but (as shown in another recent paper of his) can be made quite effective, even when tracking {k}-tuples of fairly large size (about {\log^c x} for some small {c}). If one carefully makes all the bounds in Maynard’s argument quantitative, one eventually ends up with a growth rate {f(X)} of shape

\displaystyle  f(X) \asymp \frac{\log \log \log X}{\log\log\log\log X}, \ \ \ \ \ (2)

thus leading to a bound

\displaystyle  G(X) \gg \log X \frac{\log \log X}{\log\log\log X}

on the gaps between primes for large {X}; this is an unpublished calculation of James’.

In this paper we make a further refinement of this calculation to obtain a growth rate

\displaystyle  f(X) \asymp \log \log \log X \ \ \ \ \ (3)

leading to a bound of the form

\displaystyle  G(X) \geq c \log X \frac{\log \log X \log\log\log\log X}{\log\log\log X} \ \ \ \ \ (4)

for large {X} and some small constant {c}. Furthermore, this appears to be the limit of current technology (in particular, falling short of Cramer’s conjecture that {G(X)} is comparable to {\log^2 X}); in the spirit of Erdös’ original prize on this problem, I would like to offer 10,000 USD for anyone who can show (in a refereed publication, of course) that the constant {c} here can be replaced by an arbitrarily large constant {C}.

The reason for the growth rate (3) is as follows. After following the sieving process discussed in the previous post, the problem comes down to something like the following: can one sieve out all (or almost all) of the primes in {[x,y]} by removing one residue class modulo {p} for all primes {p} in (say) {[x/4,x/2]}? Very roughly speaking, if one can solve this problem with {y = g(x) x}, then one can obtain a growth rate on {f(X)} of the shape {f(X) \sim g(\log X)}. (This is an oversimplification, as one actually has to sieve out a random subset of the primes, rather than all the primes in {[x,y]}, but never mind this detail for now.)

Using the quantitative “dense clusters of primes” machinery of Maynard, one can find lots of {k}-tuples in {[x,y]} which contain at least {\gg \log k} primes, for {k} as large as {\log^c x} or so (so that {\log k} is about {\log\log x}). By considering {k}-tuples in arithmetic progression, this means that one can find lots of residue classes modulo a given prime {p} in {[x/4,x/2]} that capture about {\log\log x} primes. In principle, this means that union of all these residue classes can cover about {\frac{x}{\log x} \log\log x} primes, allowing one to take {g(x)} as large as {\log\log x}, which corresponds to (3). However, there is a catch: the residue classes for different primes {p} may collide with each other, reducing the efficiency of the covering. In our previous papers on the subject, we selected the residue classes randomly, which meant that we had to insert an additional logarithmic safety margin in expected number of times each prime would be shifted out by one of the residue classes, in order to guarantee that we would (with high probability) sift out most of the primes. This additional safety margin is ultimately responsible for the {\log\log\log\log X} loss in (2).

The main innovation of this paper, beyond detailing James’ unpublished calculations, is to use ideas from the literature on efficient hypergraph covering, to avoid the need for a logarithmic safety margin. The hypergraph covering problem, roughly speaking, is to try to cover a set of {n} vertices using as few “edges” from a given hypergraph {H} as possible. If each edge has {m} vertices, then one certainly needs at least {n/m} edges to cover all the vertices, and the question is to see if one can come close to attaining this bound given some reasonable uniform distribution hypotheses on the hypergraph {H}. As before, random methods tend to require something like {\frac{n}{m} \log r} edges before one expects to cover, say {1-1/r} of the vertices.

However, it turns out (under reasonable hypotheses on {H}) to eliminate this logarithmic loss, by using what is now known as the “semi-random method” or the “Rödl nibble”. The idea is to randomly select a small number of edges (a first “nibble”) – small enough that the edges are unlikely to overlap much with each other, thus obtaining maximal efficiency. Then, one pauses to remove all the edges from {H} that intersect edges from this first nibble, so that all remaining edges will not overlap with the existing edges. One then randomly selects another small number of edges (a second “nibble”), and repeats this process until enough nibbles are taken to cover most of the vertices. Remarkably, it turns out that under some reasonable assumptions on the hypergraph {H}, one can maintain control on the uniform distribution of the edges throughout the nibbling process, and obtain an efficient hypergraph covering. This strategy was carried out in detail in an influential paper of Pippenger and Spencer.

In our setup, the vertices are the primes in {[x,y]}, and the edges are the intersection of the primes with various residue classes. (Technically, we have to work with a family of hypergraphs indexed by a prime {p}, rather than a single hypergraph, but let me ignore this minor technical detail.) The semi-random method would in principle eliminate the logarithmic loss and recover the bound (3). However, there is a catch: the analysis of Pippenger and Spencer relies heavily on the assumption that the hypergraph is uniform, that is to say all edges have the same size. In our context, this requirement would mean that each residue class captures exactly the same number of primes, which is not the case; we only control the number of primes in an average sense, but we were unable to obtain any concentration of measure to come close to verifying this hypothesis. And indeed, the semi-random method, when applied naively, does not work well with edges of variable size – the problem is that edges of large size are much more likely to be eliminated after each nibble than edges of small size, since they have many more vertices that could overlap with the previous nibbles. Since the large edges are clearly the more useful ones for the covering problem than small ones, this bias towards eliminating large edges significantly reduces the efficiency of the semi-random method (and also greatly complicates the analysis of that method).

Our solution to this is to iteratively reweight the probability distribution on edges after each nibble to compensate for this bias effect, giving larger edges a greater weight than smaller edges. It turns out that there is a natural way to do this reweighting that allows one to repeat the Pippenger-Spencer analysis in the presence of edges of variable size, and this ultimately allows us to recover the full growth rate (3).

To go beyond (3), one either has to find a lot of residue classes that can capture significantly more than {\log\log x} primes of size {x} (which is the limit of the multidimensional Selberg sieve of Maynard and myself), or else one has to find a very different method to produce large gaps between primes than the Erdös-Rankin method, which is the method used in all previous work on the subject.

It turns out that the arguments in this paper can be combined with the Maier matrix method to also produce chains of consecutive large prime gaps whose size is of the order of (4); three of us (Kevin, James, and myself) will detail this in a future paper. (A similar combination was also recently observed in connection with our earlier result (1) by Pintz, but there are some additional technical wrinkles required to recover the full gain of (3) for the chains of large gaps problem.)