You are currently browsing the category archive for the ‘paper’ category.

I’ve just uploaded two related papers to the arXiv:

This pair of papers is an outgrowth of these two recent blog posts and the ensuing discussion. In the first paper, we establish the following logarithmically averaged version of the Chowla conjecture (in the case {k=2} of two-point correlations (or “pair correlations”)):

Theorem 1 (Logarithmically averaged Chowla conjecture) Let {a_1,a_2} be natural numbers, and let {b_1,b_2} be integers such that {a_1 b_2 - a_2 b_1 \neq 0}. Let {1 \leq \omega(x) \leq x} be a quantity depending on {x} that goes to infinity as {x \rightarrow \infty}. Let {\lambda} denote the Liouville function. Then one has

\displaystyle  \sum_{x/\omega(x) < n \leq x} \frac{\lambda(a_1 n + b_1) \lambda(a_2 n+b_2)}{n} = o( \log \omega(x) ) \ \ \ \ \ (1)

as {x \rightarrow \infty}.

Thus for instance one has

\displaystyle  \sum_{n \leq x} \frac{\lambda(n) \lambda(n+1)}{n} = o(\log x). \ \ \ \ \ (2)

For comparison, the non-averaged Chowla conjecture would imply that

\displaystyle  \sum_{n \leq x} \lambda(n) \lambda(n+1) = o(x) \ \ \ \ \ (3)

which is a strictly stronger estimate than (2), and remains open.

The arguments also extend to other completely multiplicative functions than the Liouville function. In particular, one obtains a slightly averaged version of the non-asymptotic Elliott conjecture that was shown in the previous blog post to imply a positive solution to the Erdos discrepancy problem. The averaged version of the conjecture established in this paper is slightly weaker than the one assumed in the previous blog post, but it turns out that the arguments there can be modified without much difficulty to accept this averaged Elliott conjecture as input. In particular, we obtain an unconditional solution to the Erdos discrepancy problem as a consequence; this is detailed in the second paper listed above. In fact we can also handle the vector-valued version of the Erdos discrepancy problem, in which the sequence {f(1), f(2), \dots} takes values in the unit sphere of an arbitrary Hilbert space, rather than in {\{-1,+1\}}.

Estimates such as (2) or (3) are known to be subject to the “parity problem” (discussed numerous times previously on this blog), which roughly speaking means that they cannot be proven solely using “linear” estimates on functions such as the von Mangoldt function. However, it is known that the parity problem can be circumvented using “bilinear” estimates, and this is basically what is done here.

We now describe in informal terms the proof of Theorem 1, focusing on the model case (2) for simplicity. Suppose for contradiction that the left-hand side of (2) was large and (say) positive. Using the multiplicativity {\lambda(pn) = -\lambda(n)}, we conclude that

\displaystyle  \sum_{n \leq x} \frac{\lambda(n) \lambda(n+p) 1_{p|n}}{n}

is also large and positive for all primes {p} that are not too large; note here how the logarithmic averaging allows us to leave the constraint {n \leq x} unchanged. Summing in {p}, we conclude that

\displaystyle  \sum_{n \leq x} \frac{ \sum_{p \in {\mathcal P}} \lambda(n) \lambda(n+p) 1_{p|n}}{n}

is large and positive for any given set {{\mathcal P}} of medium-sized primes. By a standard averaging argument, this implies that

\displaystyle  \frac{1}{H} \sum_{j=1}^H \sum_{p \in {\mathcal P}} \lambda(n+j) \lambda(n+p+j) 1_{p|n+j} \ \ \ \ \ (4)

is large for many choices of {n}, where {H} is a medium-sized parameter at our disposal to choose, and we take {{\mathcal P}} to be some set of primes that are somewhat smaller than {H}. (A similar approach was taken in this recent paper of Matomaki, Radziwill, and myself to study sign patterns of the Möbius function.) To obtain the required contradiction, one thus wants to demonstrate significant cancellation in the expression (4). As in that paper, we view {n} as a random variable, in which case (4) is essentially a bilinear sum of the random sequence {(\lambda(n+1),\dots,\lambda(n+H))} along a random graph {G_{n,H}} on {\{1,\dots,H\}}, in which two vertices {j, j+p} are connected if they differ by a prime {p} in {{\mathcal P}} that divides {n+j}. A key difficulty in controlling this sum is that for randomly chosen {n}, the sequence {(\lambda(n+1),\dots,\lambda(n+H))} and the graph {G_{n,H}} need not be independent. To get around this obstacle we introduce a new argument which we call the “entropy decrement argument” (in analogy with the “density increment argument” and “energy increment argument” that appear in the literature surrounding Szemerédi’s theorem on arithmetic progressions, and also reminiscent of the “entropy compression argument” of Moser and Tardos, discussed in this previous post). This argument, which is a simple consequence of the Shannon entropy inequalities, can be viewed as a quantitative version of the standard subadditivity argument that establishes the existence of Kolmogorov-Sinai entropy in topological dynamical systems; it allows one to select a scale parameter {H} (in some suitable range {[H_-,H_+]}) for which the sequence {(\lambda(n+1),\dots,\lambda(n+H))} and the graph {G_{n,H}} exhibit some weak independence properties (or more precisely, the mutual information between the two random variables is small).

Informally, the entropy decrement argument goes like this: if the sequence {(\lambda(n+1),\dots,\lambda(n+H))} has significant mutual information with {G_{n,H}}, then the entropy of the sequence {(\lambda(n+1),\dots,\lambda(n+H'))} for {H' > H} will grow a little slower than linearly, due to the fact that the graph {G_{n,H}} has zero entropy (knowledge of {G_{n,H}} more or less completely determines the shifts {G_{n+kH,H}} of the graph); this can be formalised using the classical Shannon inequalities for entropy (and specifically, the non-negativity of conditional mutual information). But the entropy cannot drop below zero, so by increasing {H} as necessary, at some point one must reach a metastable region (cf. the finite convergence principle discussed in this previous blog post), within which very little mutual information can be shared between the sequence {(\lambda(n+1),\dots,\lambda(n+H))} and the graph {G_{n,H}}. Curiously, for the application it is not enough to have a purely quantitative version of this argument; one needs a quantitative bound (which gains a factor of a bit more than {\log H} on the trivial bound for mutual information), and this is surprisingly delicate (it ultimately comes down to the fact that the series {\sum_{j \geq 2} \frac{1}{j \log j \log\log j}} diverges, which is only barely true).

Once one locates a scale {H} with the low mutual information property, one can use standard concentration of measure results such as the Hoeffding inequality to approximate (4) by the significantly simpler expression

\displaystyle  \frac{1}{H} \sum_{j=1}^H \sum_{p \in {\mathcal P}} \frac{\lambda(n+j) \lambda(n+p+j)}{p}. \ \ \ \ \ (5)

The important thing here is that Hoeffding’s inequality gives exponentially strong bounds on the failure probability, which is needed to counteract the logarithms that are inevitably present whenever trying to use entropy inequalities. The expression (5) can then be controlled in turn by an application of the Hardy-Littlewood circle method and a non-trivial estimate

\displaystyle  \sup_\alpha \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n) e(\alpha n)|\ dx = o(1) \ \ \ \ \ (6)

for averaged short sums of a modulated Liouville function established in another recent paper by Matomäki, Radziwill and myself.

When one uses this method to study more general sums such as

\displaystyle  \sum_{n \leq x} \frac{g_1(n) g_2(n+1)}{n},

one ends up having to consider expressions such as

\displaystyle  \frac{1}{H} \sum_{j=1}^H \sum_{p \in {\mathcal P}} c_p \frac{g_1(n+j) g_2(n+p+j)}{p}.

where {c_p} is the coefficient {c_p := \overline{g_1}(p) \overline{g_2}(p)}. When attacking this sum with the circle method, one soon finds oneself in the situation of wanting to locate the large Fourier coefficients of the exponential sum

\displaystyle  S(\alpha) := \sum_{p \in {\mathcal P}} \frac{c_p}{p} e^{2\pi i \alpha p}.

In many cases (such as in the application to the Erdös discrepancy problem), the coefficient {c_p} is identically {1}, and one can understand this sum satisfactorily using the classical results of Vinogradov: basically, {S(\alpha)} is large when {\alpha} lies in a “major arc” and is small when it lies in a “minor arc”. For more general functions {g_1,g_2}, the coefficients {c_p} are more or less arbitrary; the large values of {S(\alpha)} are no longer confined to the major arc case. Fortunately, even in this general situation one can use a restriction theorem for the primes established some time ago by Ben Green and myself to show that there are still only a bounded number of possible locations {\alpha} (up to the uncertainty mandated by the Heisenberg uncertainty principle) where {S(\alpha)} is large, and we can still conclude by using (6). (Actually, as recently pointed out to me by Ben, one does not need the full strength of our result; one only needs the {L^4} restriction theorem for the primes, which can be proven fairly directly using Plancherel’s theorem and some sieve theory.)

It is tempting to also use the method to attack higher order cases of the (logarithmically) averaged Chowla conjecture, for instance one could try to prove the estimate

\displaystyle  \sum_{n \leq x} \frac{\lambda(n) \lambda(n+1) \lambda(n+2)}{n} = o(\log x).

The above arguments reduce matters to obtaining some non-trivial cancellation for sums of the form

\displaystyle  \frac{1}{H} \sum_{j=1}^H \sum_{p \in {\mathcal P}} \frac{\lambda(n+j) \lambda(n+p+j) \lambda(n+p+2j)}{p}.

A little bit of “higher order Fourier analysis” (as was done for very similar sums in the ergodic theory context by Frantzikinakis-Host-Kra and Wooley-Ziegler) lets one control this sort of sum if one can establish a bound of the form

\displaystyle  \frac{1}{X} \int_X^{2X} \sup_\alpha |\frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n) e(\alpha n)|\ dx = o(1) \ \ \ \ \ (7)

where {X} goes to infinity and {H} is a very slowly growing function of {X}. This looks very similar to (6), but the fact that the supremum is now inside the integral makes the problem much more difficult. However it looks worth attacking (7) further, as this estimate looks like it should have many nice applications (beyond just the {k=3} case of the logarithmically averaged Chowla or Elliott conjectures, which is already interesting).

For higher {k} than {k=3}, the same line of analysis requires one to replace the linear phase {e(\alpha n)} by more complicated phases, such as quadratic phases {e(\alpha n^2 + \beta n)} or even {k-2}-step nilsequences. Given that (7) is already beyond the reach of current literature, these even more complicated expressions are also unavailable at present, but one can imagine that they will eventually become tractable, in which case we would obtain an averaged form of the Chowla conjecture for all {k}, which would have a number of consequences (such as a logarithmically averaged version of Sarnak’s conjecture, as per this blog post).

It would of course be very nice to remove the logarithmic averaging, and be able to establish bounds such as (3). I did attempt to do so, but I do not see a way to use the entropy decrement argument in a manner that does not require some sort of averaging of logarithmic type, as it requires one to pick a scale {H} that one cannot specify in advance, which is not a problem for logarithmic averages (which are quite stable with respect to dilations) but is problematic for ordinary averages. But perhaps the problem can be circumvented by some clever modification of the argument. One possible approach would be to start exploiting multiplicativity at products of primes, and not just individual primes, to try to keep the scale fixed, but this makes the concentration of measure part of the argument much more complicated as one loses some independence properties (coming from the Chinese remainder theorem) which allowed one to conclude just from the Hoeffding inequality.

Kaisa Matomäki, Maksym Radziwiłł, and I have just uploaded to the arXiv our paper “Sign patterns of the Liouville and Möbius functions“. This paper is somewhat similar to our previous paper in that it is using the recent breakthrough of Matomäki and Radziwiłł on mean values of multiplicative functions to obtain partial results towards the Chowla conjecture. This conjecture can be phrased, roughly speaking, as follows: if {k} is a fixed natural number and {n} is selected at random from a large interval {[1,x]}, then the sign pattern {(\lambda(n), \lambda(n+1),\dots,\lambda(n+k-1)) \in \{-1,+1\}^k} becomes asymptotically equidistributed in {\{-1,+1\}^k} in the limit {x \rightarrow \infty}. This remains open for {k \geq 2}. In fact even the significantly weaker statement that each of the sign patterns in {\{-1,+1\}^k} is attained infinitely often is open for {k \geq 4}. However, in 1986, Hildebrand showed that for {k \leq 3} all sign patterns are indeed attained infinitely often. Our first result is a strengthening of Hildebrand’s, moving a little bit closer to Chowla’s conjecture:

Theorem 1 Let {k \leq 3}. Then each of the sign patterns in {\{-1,+1\}^k} is attained by the Liouville function for a set of natural numbers {n} of positive lower density.

Thus for instance one has {\lambda(n)=\lambda(n+1)=\lambda(n+2)} for a set of {n} of positive lower density. The {k \leq 2} case of this theorem already appears in the original paper of Matomäki and Radziwiłł (and the significantly simpler case of the sign patterns {++} and {--} was treated previously by Harman, Pintz, and Wolke).

The basic strategy in all of these arguments is to assume for sake of contradiction that a certain sign pattern occurs extremely rarely, and then exploit the complete multiplicativity of {\lambda} (which implies in particular that {\lambda(2n) = -\lambda(n)}, {\lambda(3n) = -\lambda(n)}, and {\lambda(5n) = -\lambda(n)} for all {n}) together with some combinatorial arguments (vaguely analogous to solving a Sudoku puzzle!) to establish more complex sign patterns for the Liouville function, that are either inconsistent with each other, or with results such as the Matomäki-Radziwiłł result. To illustrate this, let us give some {k=2} examples, arguing a little informally to emphasise the combinatorial aspects of the argument. First suppose that the sign pattern {(\lambda(n),\lambda(n+1)) = (+1,+1)} almost never occurs. The prime number theorem tells us that {\lambda(n)} and {\lambda(n+1)} are each equal to {+1} about half of the time, which by inclusion-exclusion implies that the sign pattern {(\lambda(n),\lambda(n+1))=(-1,-1)} almost never occurs. In other words, we have {\lambda(n+1) = -\lambda(n)} for almost all {n}. But from the multiplicativity property {\lambda(2n)=-\lambda(n)} this implies that one should have

\displaystyle \lambda(2n+2) = -\lambda(2n)

\displaystyle \lambda(2n+1) = -\lambda(2n)


\displaystyle \lambda(2n+2) = -\lambda(2n+1)

for almost all {n}. But the above three statements are contradictory, and the claim follows.

Similarly, if we assume that the sign pattern {(\lambda(n),\lambda(n+1)) = (+1,-1)} almost never occurs, then a similar argument to the above shows that for any fixed {h}, one has {\lambda(n)=\lambda(n+1)=\dots=\lambda(n+h)} for almost all {n}. But this means that the mean {\frac{1}{h} \sum_{j=1}^h \lambda(n+j)} is abnormally large for most {n}, which (for {h} large enough) contradicts the results of Matomäki and Radziwiłł. Here we see that the “enemy” to defeat is the scenario in which {\lambda} only changes sign very rarely, in which case one rarely sees the pattern {(+1,-1)}.

It turns out that similar (but more combinatorially intricate) arguments work for sign patterns of length three (but are unlikely to work for most sign patterns of length four or greater). We give here one fragment of such an argument (due to Hildebrand) which hopefully conveys the Sudoku-type flavour of the combinatorics. Suppose for instance that the sign pattern {(\lambda(n),\lambda(n+1),\lambda(n+2)) = (+1,+1,+1)} almost never occurs. Now suppose {n} is a typical number with {\lambda(15n-1)=\lambda(15n+1)=+1}. Since we almost never have the sign pattern {(+1,+1,+1)}, we must (almost always) then have {\lambda(15n) = -1}. By multiplicativity this implies that

\displaystyle (\lambda(60n-4), \lambda(60n), \lambda(60n+4)) = (+1,-1,+1).

We claim that this (almost always) forces {\lambda(60n+5)=-1}. For if {\lambda(60n+5)=+1}, then by the lack of the sign pattern {(+1,+1,+1)}, this (almost always) forces {\lambda(60n+3)=\lambda(60n+6)=-1}, which by multiplicativity forces {\lambda(20n+1)=\lambda(20n+2)=+1}, which by lack of {(+1,+1,+1)} (almost always) forces {\lambda(20n)=-1}, which by multiplicativity contradicts {\lambda(60n)=-1}. Thus we have {\lambda(60n+5)=-1}; a similar argument gives {\lambda(60n-5)=-1} almost always, which by multiplicativity gives {\lambda(12n-1)=\lambda(12n)=\lambda(12n+1)=+1}, a contradiction. Thus we almost never have {\lambda(15n-1)=\lambda(15n+1)=+1}, which by the inclusion-exclusion argument mentioned previously shows that {\lambda(15n+1) = - \lambda(15n-1)} for almost all {n}.

One can continue these Sudoku-type arguments and conclude eventually that {\lambda(3n-1)=-\lambda(3n+1)=\lambda(3n+2)} for almost all {n}. To put it another way, if {\chi_3} denotes the non-principal Dirichlet character of modulus {3}, then {\lambda \chi_3} is almost always constant away from the multiples of {3}. (Conversely, if {\lambda \chi_3} changed sign very rarely outside of the multiples of three, then the sign pattern {(+1,+1,+1)} would never occur.) Fortunately, the main result of Matomäki and Radziwiłł shows that this scenario cannot occur, which establishes that the sign pattern {(+1,+1,+1)} must occur rather frequently. The other sign patterns are handled by variants of these arguments.

Excluding a sign pattern of length three leads to useful implications like “if {\lambda(n-1)=\lambda(n)=+1}, then {\lambda(n+1)=-1}” which turn out are just barely strong enough to quite rigidly constrain the Liouville function using Sudoku-like arguments. In contrast, excluding a sign pattern of length four only gives rise to implications like “`if {\lambda(n-2)=\lambda(n-1)=\lambda(n)=+1}, then {\lambda(n+1)=-1}“, and these seem to be much weaker for this purpose (the hypothesis in these implications just isn’t satisfied nearly often enough). So a different idea seems to be needed if one wishes to extend the above theorem to larger values of {k}.

Our second theorem gives an analogous result for the Möbius function {\mu} (which takes values in {\{-1,0,+1\}} rather than {\{-1,1\}}), but the analysis turns out to be remarkably difficult and we are only able to get up to {k=2}:

Theorem 2 Let {k \leq 2}. Then each of the sign patterns in {\{-1,0,+1\}^k} is attained by the Möbius function for a set {n} of positive lower density.

It turns out that the prime number theorem and elementary sieve theory can be used to handle the {k=1} case and all the {k=2} cases that involve at least one {0}, leaving only the four sign patterns {(\pm 1, \pm 1)} to handle. It is here that the zeroes of the Möbius function cause a significant new obstacle. Suppose for instance that the sign pattern {(+1, -1)} almost never occurs for the Möbius function. The same arguments that were used in the Liouville case then show that {\mu(n)} will be almost always equal to {\mu(n+1)}, provided that {n,n+1} are both square-free. One can try to chain this together as before to create a long string {\mu(n)=\dots=\mu(n+h) \in \{-1,+1\}} where the Möbius function is constant, but this cannot work for any {h} larger than three, because the Möbius function vanishes at every multiple of four.

The constraints we assume on the Möbius function can be depicted using a graph on the squarefree natural numbers, in which any two adjacent squarefree natural numbers are connected by an edge. The main difficulty is then that this graph is highly disconnected due to the multiples of four not being squarefree.

To get around this, we need to enlarge the graph. Note from multiplicativity that if {\mu(n)} is almost always equal to {\mu(n+1)} when {n,n+1} are squarefree, then {\mu(n)} is almost always equal to {\mu(n+p)} when {n,n+p} are squarefree and {n} is divisible by {p}. We can then form a graph on the squarefree natural numbers by connecting {n} to {n+p} whenever {n,n+p} are squarefree and {n} is divisible by {p}. If this graph is “locally connected” in some sense, then {\mu} will be constant on almost all of the squarefree numbers in a large interval, which turns out to be incompatible with the results of Matomäki and Radziwiłł. Because of this, matters are reduced to establishing the connectedness of a certain graph. More precisely, it turns out to be sufficient to establish the following claim:

Theorem 3 For each prime {p}, let {a_p \hbox{ mod } p^2} be a residue class chosen uniformly at random. Let {G} be the random graph whose vertices {V} consist of those integers {n} not equal to {a_p \hbox{ mod } p^2} for any {p}, and whose edges consist of pairs {n,n+p} in {V} with {n = a_p \hbox{ mod } p}. Then with probability {1}, the graph {G} is connected.

We were able to show the connectedness of this graph, though it turned out to be remarkably tricky to do so. Roughly speaking (and suppressing a number of technicalities), the main steps in the argument were as follows.

  • (Early stage) Pick a large number {X} (in our paper we take {X} to be odd, but I’ll ignore this technicality here). Using a moment method to explore neighbourhoods of a single point in {V}, one can show that a vertex {v} in {V} is almost always connected to at least {\log^{10} X} numbers in {[v,v+X^{1/100}]}, using relatively short paths of short diameter. (This is the most computationally intensive portion of the argument.)
  • (Middle stage) Let {X'} be a typical number in {[X/40,X/20]}, and let {R} be a scale somewhere between {X^{1/40}} and {X'}. By using paths {n, n+p_1, n+p_1-p_2, n+p_1-p_2+p_3} involving three primes, and using a variant of Vinogradov’s theorem and some routine second moment computations, one can show that with quite high probability, any “good” vertex in {[v+X'-R, v+X'-0.99R]} is connected to a “good” vertex in {[v+X'-0.01R, v+X-0.0099 R]} by paths of length three, where the definition of “good” is somewhat technical but encompasses almost all of the vertices in {V}.
  • (Late stage) Combining the two previous results together, we can show that most vertices {v} will be connected to a vertex in {[v+X'-X^{1/40}, v+X']} for any {X'} in {[X/40,X/20]}. In particular, {v} will be connected to a set of {\gg X^{9/10}} vertices in {[v,v+X/20]}. By tracking everything carefully, one can control the length and diameter of the paths used to connect {v} to this set, and one can also control the parity of the elements in this set.
  • (Final stage) Now if we have two vertices {v, w} at a distance {X} apart. By the previous item, one can connect {v} to a large set {A} of vertices in {[v,v+X/20]}, and one can similarly connect {w} to a large set {B} of vertices in {[w,w+X/20]}. Now, by using a Vinogradov-type theorem and second moment calculations again (and ensuring that the elements of {A} and {B} have opposite parity), one can connect many of the vertices in {A} to many of the vertices {B} by paths of length three, which then connects {v} to {w}, and gives the claim.

It seems of interest to understand random graphs like {G} further. In particular, the graph {G'} on the integers formed by connecting {n} to {n+p} for all {n} in a randomly selected residue class mod {p} for each prime {p} is particularly interesting (it is to the Liouville function as {G} is to the Möbius function); if one could show some “local expander” properties of this graph {G'}, then one would have a chance of modifying the above methods to attack the first unsolved case of the Chowla conjecture, namely that {\lambda(n)\lambda(n+1)} has asymptotic density zero (perhaps working with logarithmic density instead of natural density to avoids some technicalities).

I’ve just uploaded to the arXiv my paper “Inverse theorems for sets and measures of polynomial growth“. This paper was motivated by two related questions. The first question was to obtain a qualitatively precise description of the sets of polynomial growth that arise in Gromov’s theorem, in much the same way that Freiman’s theorem (and its generalisations) provide a qualitatively precise description of sets of small doubling. The other question was to obtain a non-abelian analogue of inverse Littlewood-Offord theory.

Let me discuss the former question first. Gromov’s theorem tells us that if a finite subset {A} of a group {G} exhibits polynomial growth in the sense that {|A^n|} grows polynomially in {n}, then the group generated by {A} is virtually nilpotent (the converse direction also true, and is relatively easy to establish). This theorem has been strengthened a number of times over the years. For instance, a few years ago, I proved with Shalom that the condition that {|A^n|} grew polynomially in {n} could be replaced by {|A^n| \leq C n^d} for a single {n}, as long as {n} was sufficiently large depending on {C,d} (in fact we gave a fairly explicit quantitative bound on how large {n} needed to be). A little more recently, with Breuillard and Green, the condition {|A^n| \leq C n^d} was weakened to {|A^n| \leq n^d |A|}, that is to say it sufficed to have polynomial relative growth at a finite scale. In fact, the latter paper gave more information on {A} in this case, roughly speaking it showed (at least in the case when {A} was a symmetric neighbourhood of the identity) that {A^n} was “commensurate” with a very structured object known as a coset nilprogression. This can then be used to establish further control on {A}. For instance, it was recently shown by Breuillard and Tointon (again in the symmetric case) that if {|A^n| \leq n^d |A|} for a single {n} that was sufficiently large depending on {d}, then all the {A^{n'}} for {n' \geq n} have a doubling constant bounded by a bound {C_d} depending only on {d}, thus {|A^{2n'}| \leq C_d |A^{n'}|} for all {n' \geq n}.

In this paper we are able to refine this analysis a bit further; under the same hypotheses, we can show an estimate of the form

\displaystyle  \log |A^{n'}| = \log |A^n| + f( \log n' - \log n ) + O_d(1)

for all {n' \geq n} and some piecewise linear, continuous, non-decreasing function {f: [0,+\infty) \rightarrow [0,+\infty)} with {f(0)=0}, where the error {O_d(1)} is bounded by a constant depending only on {d}, and where {f} has at most {O_d(1)} pieces, each of which has a slope that is a natural number of size {O_d(1)}. To put it another way, the function {n' \mapsto |A^{n'}|} for {n' \geq n} behaves (up to multiplicative constants) like a piecewise polynomial function, where the degree of the function and number of pieces is bounded by a constant depending on {d}.

One could ask whether the function {f} has any convexity or concavity properties. It turns out that it can exhibit either convex or concave behaviour (or a combination of both). For instance, if {A} is contained in a large finite group, then {n \mapsto |A^n|} will eventually plateau to a constant, exhibiting concave behaviour. On the other hand, in nilpotent groups one can see convex behaviour; for instance, in the Heisenberg group {\begin{pmatrix}{} {1} {\mathbf Z} {\mathbf Z} \\ {0} {1} {\mathbf Z} \\ {0} {1} \end{pmatrix}}, if one sets {A} to be a set of matrices of the form {\begin{pmatrix} 1 & O(N) & O(N^3) \\ 0 & 1 & O(N) \\ 0 & 0 & 1 \end{pmatrix}} for some large {N} (abusing the {O()} notation somewhat), then {n \mapsto A^n} grows cubically for {n \leq N} but then grows quartically for {n > N}.

To prove this proposition, it turns out (after using a somewhat difficult inverse theorem proven previously by Breuillard, Green, and myself) that one has to analyse the volume growth {n \mapsto |P^n|} of nilprogressions {P}. In the “infinitely proper” case where there are no unexpected relations between the generators of the nilprogression, one can lift everything to a simply connected Lie group (where one can take logarithms and exploit the Baker-Campbell-Hausdorff formula heavily), eventually describing {P^n} with fair accuracy by a certain convex polytope with vertices depending polynomially on {n}, which implies that {|P^n|} depends polynomially on {n} up to constants. If one is not in the “infinitely proper” case, then at some point {n_0} the nilprogression {P^{n_0}} develops a “collision”, but then one can use this collision to show (after some work) that the dimension of the “Lie model” of {P^{n_0}} has dropped by at least one from the dimension of {P} (the notion of a Lie model being developed in the previously mentioned paper of Breuillard, Greenm, and myself), so that this sort of collision can only occur a bounded number of times, with essentially polynomial volume growth behaviour between these collisions.

The arguments also give a precise description of the location of a set {A} for which {A^n} grows polynomially in {n}. In the symmetric case, what ends up happening is that {A^n} becomes commensurate to a “coset nilprogression” {HP} of bounded rank and nilpotency class, whilst {A} is “virtually” contained in a scaled down version {HP^{1/n}} of that nilprogression. What “virtually” means is a little complicated; roughly speaking, it means that there is a set {X} of bounded cardinality such that {aXHP^{1/n} \approx XHP^{1/n}} for all {a \in A}. Conversely, if {A} is virtually contained in {HP^{1/n}}, then {A^n} is commensurate to {HP} (and more generally, {A^{mn}} is commensurate to {HP^m} for any natural number {m}), giving quite a (qualitatively) precise description of {A} in terms of coset nilprogressions.

The main tool used to prove these results is the structure theorem for approximate groups established by Breuillard, Green, and myself, which roughly speaking asserts that approximate groups are always commensurate with coset nilprogressions. A key additional trick is a pigeonholing argument of Sanders, which in this context is the assertion that if {A^n} is comparable to {A^{2n}}, then there is an {n'} between {n} and {2n} such that {A \cdot A^{n'}} is very close in size to {A^{n'}} (up to a relative error of {1/n}). It is this fact, together with the comparability of {A^{n'}} to a coset nilprogression {HP}, that allows us (after some combinatorial argument) to virtually place {A} inside {HP^{1/n}}.

Similar arguments apply when discussing iterated convolutions {\mu^{*n}} of (symmetric) probability measures on a (discrete) group {G}, rather than combinatorial powers {A^n} of a finite set. Here, the analogue of volume {A^n} is given by the negative power {\| \mu^{*n} \|_{\ell^2}^{-2}} of the {\ell^2} norm of {\mu^{*n}} (thought of as a non-negative function on {G} of total mass 1). One can also work with other norms here than {\ell^2}, but this norm has some minor technical conveniences (and other measures of the “spread” of {\mu^{*n}} end up being more or less equivalent for our purposes). There is an analogous structure theorem that asserts that if {\mu^{*n}} spreads at most polynomially in {n}, then {\mu^{*n}} is “commensurate” with the uniform probability distribution on a coset progression {HP}, and {\mu} itself is largely concentrated near {HP^{1/\sqrt{n}}}. The factor of {\sqrt{n}} here is the familiar scaling factor in random walks that arises for instance in the central limit theorem. The proof of (the precise version of) this statement proceeds similarly to the combinatorial case, using pigeonholing to locate a scale {n'} where {\mu *\mu^{n'}} has almost the same {\ell^2} norm as {\mu^{n'}}.

A special case of this theory occurs when {\mu} is the uniform probability measure on {n} elements {v_1,\dots,v_n} of {G} and their inverses. The probability measure {\mu^{*n}} is then the distribution of a random product {w_1 \dots w_n}, where each {w_i} is equal to one of {v_{j_i}} or its inverse {v_{j_i}^{-1}}, selected at random with {j_i} drawn uniformly from {\{1,\dots,n\}} with replacement. This is very close to the Littlewood-Offord situation of random products {u_1 \dots u_n} where each {u_i} is equal to {v_i} or {v_i^{-1}} selected independently at random (thus {j_i} is now fixed to equal {i} rather than being randomly drawn from {\{1,\dots,n\}}. In the case when {G} is abelian, it turns out that a little bit of Fourier analysis shows that these two random walks have “comparable” distributions in a certain {\ell^2} sense. As a consequence, the results in this paper can be used to recover an essentially optimal abelian inverse Littlewood-Offord theorem of Nguyen and Vu. In the nonabelian case, the only Littlewood-Offord theorem I am aware of is a recent result of Tiep and Vu for matrix groups, but in this case I do not know how to relate the above two random walks to each other, and so we can only obtain an analogue of the Tiep-Vu results for the symmetrised random walk {w_1 \dots w_n} instead of the ordered random walk {u_1 \dots u_n}.

I’ve just uploaded to the arXiv my paper “Cancellation for the multilinear Hilbert transform“, submitted to Collectanea Mathematica. This paper uses methods from additive combinatorics (and more specifically, the arithmetic regularity and counting lemmas from this paper of Ben Green and myself) to obtain a slight amount of progress towards the open problem of obtaining {L^p} bounds for the trilinear and higher Hilbert transforms (as discussed in this previous blog post). For instance, the trilinear Hilbert transform

\displaystyle  H_3( f_1, f_2, f_3 )(x) := p.v. \int_{\bf R} f_1(x+t) f_2(x+2t) f_3(x+3t)\ \frac{dt}{t}

is not known to be bounded for any {L^{p_1}({\bf R}) \times L^{p_2}({\bf R}) \times L^{p_3}({\bf R})} to {L^p({\bf R})}, although it is conjectured to do so when {1/p =1/p_1 +1/p_2+1/p_3} and {1 < p_1,p_2,p_3,p < \infty}. (For {p} well below {1}, one can use additive combinatorics constructions to demonstrate unboundedness; see this paper of Demeter.) One can approach this problem by considering the truncated trilinear Hilbert transforms

\displaystyle  H_{3,r,R}( f_1, f_2, f_3 )(x) := \int_{r \leq |t| \leq R} f_1(x+t) f_2(x+2t) f_3(x+3t)\ \frac{dt}{t}

for {0 < r < R}. It is not difficult to show that the boundedness of {H_3} is equivalent to the boundedness of {H_{3,r,R}} with bounds that are uniform in {R} and {r}. On the other hand, from Minkowski’s inequality and Hölder’s inequality one can easily obtain the non-uniform bound of {2 \log \frac{R}{r}} for {H_{3,r,R}}. The main result of this paper is a slight improvement of this trivial bound to {o( \log \frac{R}{r})} as {R/r \rightarrow \infty}. Roughly speaking, the way this gain is established is as follows. First there are some standard time-frequency type reductions to reduce to the task of obtaining some non-trivial cancellation on a single “tree”. Using a “generalised von Neumann theorem”, we show that such cancellation will happen if (a discretised version of) one or more of the functions {f_1,f_2,f_3} (or a dual function {f_0} that it is convenient to test against) is small in the Gowers {U^3} norm. However, the arithmetic regularity lemma alluded to earlier allows one to represent an arbitrary function {f_i}, up to a small error, as the sum of such a “Gowers uniform” function, plus a structured function (or more precisely, an irrational virtual nilsequence). This effectively reduces the problem to that of establishing some cancellation in a single tree in the case when all functions {f_0,f_1,f_2,f_3} involved are irrational virtual nilsequences. At this point, the contribution of each component of the tree can be estimated using the “counting lemma” from my paper with Ben. The main term in the asymptotics is a certain integral over a nilmanifold, but because the kernel {\frac{dt}{t}} in the trilinear Hilbert transform is odd, it turns out that this integral vanishes, giving the required cancellation.

The same argument works for higher order Hilbert transforms (and one can also replace the coefficients in these transforms with other rational constants). However, because the quantitative bounds in the arithmetic regularity and counting lemmas are so poor, it does not seem likely that one can use these methods to remove the logarithmic growth in {R/r} entirely, and some additional ideas will be needed to resolve the full conjecture.

I’ve just uploaded to the arXiv my paper “Failure of the {L^1} pointwise and maximal ergodic theorems for the free group“, submitted to Forum of Mathematics, Sigma. This paper concerns a variant of the pointwise ergodic theorem of Birkhoff, which asserts that if one has a measure-preserving shift map {T: X \rightarrow X} on a probability space {X = (X,\mu)}, then for any {f \in L^1(X)}, the averages {\frac{1}{N} \sum_{n=1}^N f \circ T^{-n}} converge pointwise almost everywhere. (In the important case when the shift map {T} is ergodic, the pointwise limit is simply the mean {\int_X f\ d\mu} of the original function {f}.)

The pointwise ergodic theorem can be extended to measure-preserving actions of other amenable groups, if one uses a suitably “tempered” Folner sequence of averages; see this paper of Lindenstrauss for more details. (I also wrote up some notes on that paper here, back in 2006 before I had started this blog.) But the arguments used to handle the amenable case break down completely for non-amenable groups, and in particular for the free non-abelian group {F_2} on two generators.

Nevo and Stein studied this problem and obtained a number of pointwise ergodic theorems for {F_2}-actions {(T_g)_{g \in F_2}} on probability spaces {(X,\mu)}. For instance, for the spherical averaging operators

\displaystyle  {\mathcal A}_n f := \frac{1}{4 \times 3^{n-1}} \sum_{g \in F_2: |g| = n} f \circ T_g^{-1}

(where {|g|} denotes the length of the reduced word that forms {g}), they showed that {{\mathcal A}_{2n} f} converged pointwise almost everywhere provided that {f} was in {L^p(X)} for some {p>1}. (The need to restrict to spheres of even radius can be seen by considering the action of {F_2} on the two-element set {\{0,1\}} in which both generators of {F_2} act by interchanging the elements, in which case {{\mathcal A}_n} is determined by the parity of {n}.) This result was reproven with a different and simpler proof by Bufetov, who also managed to relax the condition {f \in L^p(X)} to the weaker condition {f \in L \log L(X)}.

The question remained open as to whether the pointwise ergodic theorem for {F_2}-actions held if one only assumed that {f} was in {L^1(X)}. Nevo and Stein were able to establish this for the Cesáro averages {\frac{1}{N} \sum_{n=1}^N {\mathcal A}_n}, but not for {{\mathcal A}_n} itself. About six years ago, Assaf Naor and I tried our hand at this problem, and was able to show an associated maximal inequality on {\ell^1(F_2)}, but due to the non-amenability of {F_2}, this inequality did not transfer to {L^1(X)} and did not have any direct impact on this question, despite a fair amount of effort on our part to attack it.

Inspired by some recent conversations with Lewis Bowen, I returned to this problem. This time around, I tried to construct a counterexample to the {L^1} pointwise ergodic theorem – something Assaf and I had not seriously attempted to do (perhaps due to being a bit too enamoured of our {\ell^1(F_2)} maximal inequality). I knew of an existing counterexample of Ornstein regarding a failure of an {L^1} ergodic theorem for iterates {P^n} of a self-adjoint Markov operator – in fact, I had written some notes on this example back in 2007. Upon revisiting my notes, I soon discovered that the Ornstein construction was adaptable to the {F_2} setting, thus settling the problem in the negative:

Theorem 1 (Failure of {L^1} pointwise ergodic theorem) There exists a measure-preserving {F_2}-action on a probability space {X} and a non-negative function {f \in L^1(X)} such that {\sup_n {\mathcal A}_{2n} f(x) = +\infty} for almost every {x}.

To describe the proof of this theorem, let me first briefly sketch the main ideas of Ornstein’s construction, which gave an example of a self-adjoint Markov operator {P} on a probability space {X} and a non-negative {f \in L^1(X)} such that {\sup_n P^n f(x) = +\infty} for almost every {x}. By some standard manipulations, it suffices to show that for any given {\alpha > 0} and {\varepsilon>0}, there exists a self-adjoint Markov operator {P} on a probability space {X} and a non-negative {f \in L^1(X)} with {\|f\|_{L^1(X)} \leq \alpha}, such that {\sup_n P^n f \geq 1-\varepsilon} on a set of measure at least {1-\varepsilon}. Actually, it will be convenient to replace the Markov chain {(P^n f)_{n \geq 0}} with an ancient Markov chain {(f_n)_{n \in {\bf Z}}} – that is to say, a sequence of non-negative functions {f_n} for both positive and negative {f}, such that {f_{n+1} = P f_n} for all {n \in {\bf Z}}. The purpose of requiring the Markov chain to be ancient (that is, to extend infinitely far back in time) is to allow for the Markov chain to be shifted arbitrarily in time, which is key to Ornstein’s construction. (Technically, Ornstein’s original argument only uses functions that go back to a large negative time, rather than being infinitely ancient, but I will gloss over this point for sake of discussion, as it turns out that the {F_2} version of the argument can be run using infinitely ancient chains.)

For any {\alpha>0}, let {P(\alpha)} denote the claim that for any {\varepsilon>0}, there exists an ancient Markov chain {(f_n)_{n \in {\bf Z}}} with {\|f_n\|_{L^1(X)} = \alpha} such that {\sup_{n \in {\bf Z}} f_n \geq 1-\varepsilon} on a set of measure at least {1-\varepsilon}. Clearly {P(1)} holds since we can just take {f_n=1} for all {n}. Our objective is to show that {P(\alpha)} holds for arbitrarily small {\alpha}. The heart of Ornstein’s argument is then the implication

\displaystyle  P(\alpha) \implies P( \alpha (1 - \frac{\alpha}{4}) ) \ \ \ \ \ (1)

for any {0 < \alpha \leq 1}, which upon iteration quickly gives the desired claim.

Let’s see informally how (1) works. By hypothesis, and ignoring epsilons, we can find an ancient Markov chain {(f_n)_{n \in {\bf Z}}} on some probability space {X} of total mass {\|f_n\|_{L^1(X)} = \alpha}, such that {\sup_n f_n} attains the value of {1} or greater almost everywhere. Assuming that the Markov process is irreducible, the {f_n} will eventually converge as {n \rightarrow \infty} to the constant value of {\|f_n\|_{L^1(X)}}, in particular its final state will essentially stay above {\alpha} (up to small errors).

Now suppose we duplicate the Markov process by replacing {X} with a double copy {X \times \{1,2\}} (giving {\{1,2\}} the uniform probability measure), and using the disjoint sum of the Markov operators on {X \times \{1\}} and {X \times \{2\}} as the propagator, so that there is no interaction between the two components of this new system. Then the functions {f'_n(x,i) := f_n(x) 1_{i=1}} form an ancient Markov chain of mass at most {\alpha/2} that lives solely in the first half {X \times \{1\}} of this copy, and {\sup_n f'_n} attains the value of {1} or greater on almost all of the first half {X \times \{1\}}, but is zero on the second half. The final state of {f'_n} will be to stay above {\alpha} in the first half {X \times \{1\}}, but be zero on the second half.

Now we modify the above example by allowing an infinitesimal amount of interaction between the two halves {X \times \{1\}}, {X \times \{2\}} of the system (I mentally think of {X \times \{1\}} and {X \times \{2\}} as two identical boxes that a particle can bounce around in, and now we wish to connect the boxes by a tiny tube). The precise way in which this interaction is inserted is not terribly important so long as the new Markov process is irreducible. Once one does so, then the ancient Markov chain {(f'_n)_{n \in {\bf Z}}} in the previous example gets replaced by a slightly different ancient Markov chain {(f''_n)_{n \in {\bf Z}}} which is more or less identical with {f'_n} for negative times {n}, or for bounded positive times {n}, but for very large values of {n} the final state is now constant across the entire state space {X \times \{1,2\}}, and will stay above {\alpha/2} on this space.

Finally, we consider an ancient Markov chain {F_n} which is basically of the form

\displaystyle  F_n(x,i) \approx f''_n(x,i) + (1 - \frac{\alpha}{2}) f_{n-M}(x) 1_{i=2}

for some large parameter {M} and for all {n \leq M} (the approximation becomes increasingly inaccurate for {n} much larger than {M}, but never mind this for now). This is basically two copies of the original Markov process in separate, barely interacting state spaces {X \times \{1\}, X \times \{2\}}, but with the second copy delayed by a large time delay {M}, and also attenuated in amplitude by a factor of {1-\frac{\alpha}{2}}. The total mass of this process is now {\frac{\alpha}{2} + \frac{\alpha}{2} (1 -\frac{\alpha}{2}) = \alpha (1 - \alpha/4)}. Because of the {f''_n} component of {F_n}, we see that {\sup_n F_n} basically attains the value of {1} or greater on the first half {X \times \{1\}}. On the second half {X \times \{2\}}, we work with times {n} close to {M}. If {M} is large enough, {f''_n} would have averaged out to about {\alpha/2} at such times, but the {(1 - \frac{\alpha}{2}) f_{n-M}(x)} component can get as large as {1-\alpha/2} here. Summing (and continuing to ignore various epsilon losses), we see that {\sup_n F_n} can get as large as {1} on almost all of the second half of {X \times \{2\}}. This concludes the rough sketch of how one establishes the implication (1).

It was observed by Bufetov that the spherical averages {{\mathcal A}_n} for a free group action can be lifted up to become powers {P^n} of a Markov operator, basically by randomly assigning a “velocity vector” {s \in \{a,b,a^{-1},b^{-1}\}} to one’s base point {x} and then applying the Markov process that moves {x} along that velocity vector (and then randomly changing the velocity vector at each time step to the “reduced word” condition that the velocity never flips from {s} to {s^{-1}}). Thus the spherical average problem has a Markov operator interpretation, which opens the door to adapting the Ornstein construction to the setting of {F_2} systems. This turns out to be doable after a certain amount of technical artifice; the main thing is to work with {F_2}-measure preserving systems that admit ancient Markov chains that are initially supported in a very small region in the “interior” of the state space, so that one can couple such systems to each other “at the boundary” in the fashion needed to establish the analogue of (1) without disrupting the ancient dynamics of such chains. The initial such system (used to establish the base case {P(1)}) comes from basically considering the action of {F_2} on a (suitably renormalised) “infinitely large ball” in the Cayley graph, after suitably gluing together the boundary of this ball to complete the action. The ancient Markov chain associated to this system starts at the centre of this infinitely large ball at infinite negative time {n=-\infty}, and only reaches the boundary of this ball at the time {n=0}.

Hoi Nguyen, Van Vu, and myself have just uploaded to the arXiv our paper “Random matrices: tail bounds for gaps between eigenvalues“. This is a followup paper to my recent paper with Van in which we showed that random matrices {M_n} of Wigner type (such as the adjacency matrix of an Erdös-Renyi graph) asymptotically almost surely had simple spectrum. In the current paper, we push the method further to show that the eigenvalues are not only distinct, but are (with high probability) separated from each other by some negative power {n^{-A}} of {n}. This follows the now standard technique of replacing any appearance of discrete Littlewood-Offord theory (a key ingredient in our previous paper) with its continuous analogue (inverse theorems for small ball probability). For general Wigner-type matrices {M_n} (in which the matrix entries are not normalised to have mean zero), we can use the inverse Littlewood-Offord theorem of Nguyen and Vu to obtain (under mild conditions on {M_n}) a result of the form

\displaystyle  {\bf P} (\lambda_{i+1}(M_n) - \lambda_i(M_n) \leq n^{-A} ) \leq n^{-B}

for any {B} and {i}, if {A} is sufficiently large depending on {B} (in a linear fashion), and {n} is sufficiently large depending on {B}. The point here is that {B} can be made arbitrarily large, and also that no continuity or smoothness hypothesis is made on the distribution of the entries. (In the continuous case, one can use the machinery of Wegner estimates to obtain results of this type, as was done in a paper of Erdös, Schlein, and Yau.)

In the mean zero case, it becomes more efficient to use an inverse Littlewood-Offord theorem of Rudelson and Vershynin to obtain (with the normalisation that the entries of {M_n} have unit variance, so that the eigenvalues of {M_n} are {O(\sqrt{n})} with high probability), giving the bound

\displaystyle  {\bf P} (\lambda_{i+1}(M_n) - \lambda_i(M_n) \leq \delta / \sqrt{n} ) \ll \delta \ \ \ \ \ (1)

for {\delta \geq n^{-O(1)}} (one also has good results of this type for smaller values of {\delta}). This is only optimal in the regime {\delta \sim 1}; we expect to establish some eigenvalue repulsion, improving the RHS to {\delta^2} for real matrices and {\delta^3} for complex matrices, but this appears to be a more difficult task (possibly requiring some quadratic inverse Littlewood-Offord theory, rather than just linear inverse Littlewood-Offord theory). However, we can get some repulsion if one works with larger gaps, getting a result roughly of the form

\displaystyle  {\bf P} (\lambda_{i+k}(M_n) - \lambda_i(M_n) \leq \delta / \sqrt{n} ) \ll \delta^{ck^2}

for any fixed {k \geq 1} and some absolute constant {c>0} (which we can asymptotically make to be {1/3} for large {k}, though it ought to be as large as {1}), by using a higher-dimensional version of the Rudelson-Vershynin inverse Littlewood-Offord theorem.

In the case of Erdös-Renyi graphs, we don’t have mean zero and the Rudelson-Vershynin Littlewood-Offord theorem isn’t quite applicable, but by working carefully through the approach based on the Nguyen-Vu theorem we can almost recover (1), except for a loss of {n^{o(1)}} on the RHS.

As a sample applications of the eigenvalue separation results, we can now obtain some information about eigenvectors; for instance, we can show that the components of the eigenvectors all have magnitude at least {n^{-A}} for some {A} with high probability. (Eigenvectors become much more stable, and able to be studied in isolation, once their associated eigenvalue is well separated from the other eigenvalues; see this previous blog post for more discussion.)

Kaisa Matomaki, Maksym Radziwill, and I have just uploaded to the arXiv our paper “An averaged form of Chowla’s conjecture“. This paper concerns a weaker variant of the famous conjecture of Chowla (discussed for instance in this previous post) that

\displaystyle  \sum_{n \leq X} \lambda(n+h_1) \dots \lambda(n+h_k) = o(X)

as {X \rightarrow \infty} for any distinct natural numbers {h_1,\dots,h_k}, where {\lambda} denotes the Liouville function. (One could also replace the Liouville function here by the Möbius function {\mu} and obtain a morally equivalent conjecture.) This conjecture remains open for any {k \geq 2}; for instance the assertion

\displaystyle  \sum_{n \leq X} \lambda(n) \lambda(n+2) = o(X)

is a variant of the twin prime conjecture (though possibly a tiny bit easier to prove), and is subject to the notorious parity barrier (as discussed in this previous post).

Our main result asserts, roughly speaking, that Chowla’s conjecture can be established unconditionally provided one has non-trivial averaging in the {h_1,\dots,h_k} parameters. More precisely, one has

Theorem 1 (Chowla on the average) Suppose {H = H(X) \leq X} is a quantity that goes to infinity as {X \rightarrow \infty} (but it can go to infinity arbitrarily slowly). Then for any fixed {k \geq 1}, we have

\displaystyle  \sum_{h_1,\dots,h_k \leq H} |\sum_{n \leq X} \lambda(n+h_1) \dots \lambda(n+h_k)| = o( H^k X ).

In fact, we can remove one of the averaging parameters and obtain

\displaystyle  \sum_{h_2,\dots,h_k \leq H} |\sum_{n \leq X} \lambda(n) \lambda(n+h_2) \dots \lambda(n+h_k)| = o( H^{k-1} X ).

Actually we can make the decay rate a bit more quantitative, gaining about {\frac{\log\log H}{\log H}} over the trivial bound. The key case is {k=2}; while the unaveraged Chowla conjecture becomes more difficult as {k} increases, the averaged Chowla conjecture does not increase in difficulty due to the increasing amount of averaging for larger {k}, and we end up deducing the higher {k} case of the conjecture from the {k=2} case by an elementary argument.

The proof of the theorem proceeds as follows. By exploiting the Fourier-analytic identity

\displaystyle  \int_{{\mathbf T}} (\int_{\mathbf R} |\sum_{x \leq n \leq x+H} f(n) e(\alpha n)|^2 dx)^2\ d\alpha

\displaystyle = \sum_{|h| \leq H} (H-|h|)^2 |\sum_n f(n) \overline{f}(n+h)|^2

(related to a standard Fourier-analytic identity for the Gowers {U^2} norm) it turns out that the {k=2} case of the above theorem can basically be derived from an estimate of the form

\displaystyle  \int_0^X |\sum_{x \leq n \leq x+H} \lambda(n) e(\alpha n)|\ dx = o( H X )

uniformly for all {\alpha \in {\mathbf T}}. For “major arc” {\alpha}, close to a rational {a/q} for small {q}, we can establish this bound from a generalisation of a recent result of Matomaki and Radziwill (discussed in this previous post) on averages of multiplicative functions in short intervals. For “minor arc” {\alpha}, we can proceed instead from an argument of Katai and Bourgain-Sarnak-Ziegler (discussed in this previous post).

The argument also extends to other bounded multiplicative functions than the Liouville function. Chowla’s conjecture was generalised by Elliott, who roughly speaking conjectured that the {k} copies of {\lambda} in Chowla’s conjecture could be replaced by arbitrary bounded multiplicative functions {g_1,\dots,g_k} as long as these functions were far from a twisted Dirichlet character {n \mapsto \chi(n) n^{it}} in the sense that

\displaystyle  \sum_p \frac{1 - \hbox{Re} g(p) \overline{\chi(p) p^{it}}}{p} = +\infty. \ \ \ \ \ (1)

(This type of distance is incidentally now a fundamental notion in the Granville-Soundararajan “pretentious” approach to multiplicative number theory.) During our work on this project, we found that Elliott’s conjecture is not quite true as stated due to a technicality: one can cook up a bounded multiplicative function {g} which behaves like {n^{it_j}} on scales {n \sim N_j} for some {N_j} going to infinity and some slowly varying {t_j}, and such a function will be far from any fixed Dirichlet character whilst still having many large correlations (e.g. the pair correlations {\sum_{n \leq N_j} g(n+1) \overline{g(n)}} will be large). In our paper we propose a technical “fix” to Elliott’s conjecture (replacing (1) by a truncated variant), and show that this repaired version of Elliott’s conjecture is true on the average in much the same way that Chowla’s conjecture is. (If one restricts attention to real-valued multiplicative functions, then this technical issue does not show up, basically because one can assume without loss of generality that {t=0} in this case; we discuss this fact in an appendix to the paper.)

Kevin Ford, Ben Green, Sergei Konyagin, James Maynard, and I have just uploaded to the arXiv our paper “Long gaps between primes“. This is a followup work to our two previous papers (discussed in this previous post), in which we had simultaneously shown that the maximal gap

\displaystyle  G(X) := \sup_{p_n, p_{n+1} \leq X} p_{n+1}-p_n

between primes up to {X} exhibited a lower bound of the shape

\displaystyle  G(X) \geq f(X) \log X \frac{\log \log X \log\log\log\log X}{(\log\log\log X)^2} \ \ \ \ \ (1)

for some function {f(X)} that went to infinity as {X \rightarrow \infty}; this improved upon previous work of Rankin and other authors, who established the same bound but with {f(X)} replaced by a constant. (Again, see the previous post for a more detailed discussion.)

In our previous papers, we did not specify a particular growth rate for {f(X)}. In my paper with Kevin, Ben, and Sergei, there was a good reason for this: our argument relied (amongst other things) on the inverse conjecture on the Gowers norms, as well as the Siegel-Walfisz theorem, and the known proofs of both results both have ineffective constants, rendering our growth function {f(X)} similarly ineffective. Maynard’s approach ostensibly also relies on the Siegel-Walfisz theorem, but (as shown in another recent paper of his) can be made quite effective, even when tracking {k}-tuples of fairly large size (about {\log^c x} for some small {c}). If one carefully makes all the bounds in Maynard’s argument quantitative, one eventually ends up with a growth rate {f(X)} of shape

\displaystyle  f(X) \asymp \frac{\log \log \log X}{\log\log\log\log X}, \ \ \ \ \ (2)

thus leading to a bound

\displaystyle  G(X) \gg \log X \frac{\log \log X}{\log\log\log X}

on the gaps between primes for large {X}; this is an unpublished calculation of James’.

In this paper we make a further refinement of this calculation to obtain a growth rate

\displaystyle  f(X) \asymp \log \log \log X \ \ \ \ \ (3)

leading to a bound of the form

\displaystyle  G(X) \geq c \log X \frac{\log \log X \log\log\log\log X}{\log\log\log X} \ \ \ \ \ (4)

for large {X} and some small constant {c}. Furthermore, this appears to be the limit of current technology (in particular, falling short of Cramer’s conjecture that {G(X)} is comparable to {\log^2 X}); in the spirit of Erdös’ original prize on this problem, I would like to offer 10,000 USD for anyone who can show (in a refereed publication, of course) that the constant {c} here can be replaced by an arbitrarily large constant {C}.

The reason for the growth rate (3) is as follows. After following the sieving process discussed in the previous post, the problem comes down to something like the following: can one sieve out all (or almost all) of the primes in {[x,y]} by removing one residue class modulo {p} for all primes {p} in (say) {[x/4,x/2]}? Very roughly speaking, if one can solve this problem with {y = g(x) x}, then one can obtain a growth rate on {f(X)} of the shape {f(X) \sim g(\log X)}. (This is an oversimplification, as one actually has to sieve out a random subset of the primes, rather than all the primes in {[x,y]}, but never mind this detail for now.)

Using the quantitative “dense clusters of primes” machinery of Maynard, one can find lots of {k}-tuples in {[x,y]} which contain at least {\gg \log k} primes, for {k} as large as {\log^c x} or so (so that {\log k} is about {\log\log x}). By considering {k}-tuples in arithmetic progression, this means that one can find lots of residue classes modulo a given prime {p} in {[x/4,x/2]} that capture about {\log\log x} primes. In principle, this means that union of all these residue classes can cover about {\frac{x}{\log x} \log\log x} primes, allowing one to take {g(x)} as large as {\log\log x}, which corresponds to (3). However, there is a catch: the residue classes for different primes {p} may collide with each other, reducing the efficiency of the covering. In our previous papers on the subject, we selected the residue classes randomly, which meant that we had to insert an additional logarithmic safety margin in expected number of times each prime would be shifted out by one of the residue classes, in order to guarantee that we would (with high probability) sift out most of the primes. This additional safety margin is ultimately responsible for the {\log\log\log\log X} loss in (2).

The main innovation of this paper, beyond detailing James’ unpublished calculations, is to use ideas from the literature on efficient hypergraph covering, to avoid the need for a logarithmic safety margin. The hypergraph covering problem, roughly speaking, is to try to cover a set of {n} vertices using as few “edges” from a given hypergraph {H} as possible. If each edge has {m} vertices, then one certainly needs at least {n/m} edges to cover all the vertices, and the question is to see if one can come close to attaining this bound given some reasonable uniform distribution hypotheses on the hypergraph {H}. As before, random methods tend to require something like {\frac{n}{m} \log r} edges before one expects to cover, say {1-1/r} of the vertices.

However, it turns out (under reasonable hypotheses on {H}) to eliminate this logarithmic loss, by using what is now known as the “semi-random method” or the “Rödl nibble”. The idea is to randomly select a small number of edges (a first “nibble”) – small enough that the edges are unlikely to overlap much with each other, thus obtaining maximal efficiency. Then, one pauses to remove all the edges from {H} that intersect edges from this first nibble, so that all remaining edges will not overlap with the existing edges. One then randomly selects another small number of edges (a second “nibble”), and repeats this process until enough nibbles are taken to cover most of the vertices. Remarkably, it turns out that under some reasonable assumptions on the hypergraph {H}, one can maintain control on the uniform distribution of the edges throughout the nibbling process, and obtain an efficient hypergraph covering. This strategy was carried out in detail in an influential paper of Pippenger and Spencer.

In our setup, the vertices are the primes in {[x,y]}, and the edges are the intersection of the primes with various residue classes. (Technically, we have to work with a family of hypergraphs indexed by a prime {p}, rather than a single hypergraph, but let me ignore this minor technical detail.) The semi-random method would in principle eliminate the logarithmic loss and recover the bound (3). However, there is a catch: the analysis of Pippenger and Spencer relies heavily on the assumption that the hypergraph is uniform, that is to say all edges have the same size. In our context, this requirement would mean that each residue class captures exactly the same number of primes, which is not the case; we only control the number of primes in an average sense, but we were unable to obtain any concentration of measure to come close to verifying this hypothesis. And indeed, the semi-random method, when applied naively, does not work well with edges of variable size – the problem is that edges of large size are much more likely to be eliminated after each nibble than edges of small size, since they have many more vertices that could overlap with the previous nibbles. Since the large edges are clearly the more useful ones for the covering problem than small ones, this bias towards eliminating large edges significantly reduces the efficiency of the semi-random method (and also greatly complicates the analysis of that method).

Our solution to this is to iteratively reweight the probability distribution on edges after each nibble to compensate for this bias effect, giving larger edges a greater weight than smaller edges. It turns out that there is a natural way to do this reweighting that allows one to repeat the Pippenger-Spencer analysis in the presence of edges of variable size, and this ultimately allows us to recover the full growth rate (3).

To go beyond (3), one either has to find a lot of residue classes that can capture significantly more than {\log\log x} primes of size {x} (which is the limit of the multidimensional Selberg sieve of Maynard and myself), or else one has to find a very different method to produce large gaps between primes than the Erdös-Rankin method, which is the method used in all previous work on the subject.

It turns out that the arguments in this paper can be combined with the Maier matrix method to also produce chains of consecutive large prime gaps whose size is of the order of (4); three of us (Kevin, James, and myself) will detail this in a future paper. (A similar combination was also recently observed in connection with our earlier result (1) by Pintz, but there are some additional technical wrinkles required to recover the full gain of (3) for the chains of large gaps problem.)

Van Vu and I have just uploaded to the arXiv our paper “Random matrices have simple spectrum“. Recall that an {n \times n} Hermitian matrix is said to have simple eigenvalues if all of its {n} eigenvalues are distinct. This is a very typical property of matrices to have: for instance, as discussed in this previous post, in the space of all {n \times n} Hermitian matrices, the space of matrices without all eigenvalues simple has codimension three, and for real symmetric cases this space has codimension two. In particular, given any random matrix ensemble of Hermitian or real symmetric matrices with an absolutely continuous distribution, we conclude that random matrices drawn from this ensemble will almost surely have simple eigenvalues.

For discrete random matrix ensembles, though, the above argument breaks down, even though general universality heuristics predict that the statistics of discrete ensembles should behave similarly to those of continuous ensembles. A model case here is the adjacency matrix {M_n} of an Erdös-Rényi graph – a graph on {n} vertices in which any pair of vertices has an independent probability {p} of being in the graph. For the purposes of this paper one should view {p} as fixed, e.g. {p=1/2}, while {n} is an asymptotic parameter going to infinity. In this context, our main result is the following (answering a question of Babai):

Theorem 1 With probability {1-o(1)}, {M_n} has simple eigenvalues.

Our argument works for more general Wigner-type matrix ensembles, but for sake of illustration we will stick with the Erdös-Renyi case. Previous work on local universality for such matrix models (e.g. the work of Erdos, Knowles, Yau, and Yin) was able to show that any individual eigenvalue gap {\lambda_{i+1}(M)-\lambda_i(M)} did not vanish with probability {1-o(1)} (in fact {1-O(n^{-c})} for some absolute constant {c>0}), but because there are {n} different gaps that one has to simultaneously ensure to be non-zero, this did not give Theorem 1 as one is forced to apply the union bound.

Our argument in fact gives simplicity of the spectrum with probability {1-O(n^{-A})} for any fixed {A}; in a subsequent paper we also show that it gives a quantitative lower bound on the eigenvalue gaps (analogous to how many results on the singularity probability of random matrices can be upgraded to a bound on the least singular value).

The basic idea of argument can be sketched as follows. Suppose that {M_n} has a repeated eigenvalue {\lambda}. We split

\displaystyle M_n = \begin{pmatrix} M_{n-1} & X \\ X^T & 0 \end{pmatrix}

for a random {n-1 \times n-1} minor {M_{n-1}} and a random sign vector {X}; crucially, {X} and {M_{n-1}} are independent. If {M_n} has a repeated eigenvalue {\lambda}, then by the Cauchy interlacing law, {M_{n-1}} also has an eigenvalue {\lambda}. We now write down the eigenvector equation for {M_n} at {\lambda}:

\displaystyle \begin{pmatrix} M_{n-1} & X \\ X^T & 0 \end{pmatrix} \begin{pmatrix} v \\ a \end{pmatrix} = \lambda \begin{pmatrix} v \\ a \end{pmatrix}.

Extracting the top {n-1} coefficients, we obtain

\displaystyle (M_{n-1} - \lambda) v + a X = 0.

If we let {w} be the {\lambda}-eigenvector of {M_{n-1}}, then by taking inner products with {w} we conclude that

\displaystyle a (w \cdot X) = 0;

we typically expect {a} to be non-zero, in which case we arrive at

\displaystyle w \cdot X = 0.

In other words, in order for {M_n} to have a repeated eigenvalue, the top right column {X} of {M_n} has to be orthogonal to an eigenvector {w} of the minor {M_{n-1}}. Note that {X} and {w} are going to be independent (once we specify which eigenvector of {M_{n-1}} to take as {w}). On the other hand, thanks to inverse Littlewood-Offord theory (specifically, we use an inverse Littlewood-Offord theorem of Nguyen and Vu), we know that the vector {X} is unlikely to be orthogonal to any given vector {w} independent of {X}, unless the coefficients of {w} are extremely special (specifically, that most of them lie in a generalised arithmetic progression). The main remaining difficulty is then to show that eigenvectors of a random matrix are typically not of this special form, and this relies on a conditioning argument originally used by Komlós to bound the singularity probability of a random sign matrix. (Basically, if an eigenvector has this special form, then one can use a fraction of the rows and columns of the random matrix to determine the eigenvector completely, while still preserving enough randomness in the remaining portion of the matrix so that this vector will in fact not be an eigenvector with high probability.)

I’ve just uploaded to the arXiv my paper “The Elliott-Halberstam conjecture implies the Vinogradov least quadratic nonresidue conjecture“. As the title suggests, this paper links together the Elliott-Halberstam conjecture from sieve theory with the conjecture of Vinogradov concerning the least quadratic nonresidue {n(p)} of a prime {p}. Vinogradov established the bound

\displaystyle  n(p) \ll p^{\frac{1}{2\sqrt{e}}} \log^2 p \ \ \ \ \ (1)

and conjectured that

\displaystyle  n(p) \ll p^\varepsilon \ \ \ \ \ (2)

for any fixed {\varepsilon>0}. Unconditionally, the best result so far (up to logarithmic factors) that holds for all primes {p} is by Burgess, who showed that

\displaystyle  n(p) \ll p^{\frac{1}{4\sqrt{e}}+\varepsilon} \ \ \ \ \ (3)

for any fixed {\varepsilon>0}. See this previous post for a proof of these bounds.

In this paper, we show that the Vinogradov conjecture is a consequence of the Elliott-Halberstam conjecture. Using a variant of the argument, we also show that the “Type II” estimates established by Zhang and numerically improved by the Polymath8a project can be used to improve a little on the Vinogradov bound (1), to

\displaystyle  n(p) \ll p^{(\frac{1}{2}-\frac{1}{34})\frac{1}{\sqrt{e}} + \varepsilon},

although this falls well short of the Burgess bound. However, the method is somewhat different (although in both cases it is the Weil exponential sum bounds that are the source of the gain over (1)) and it is conceivable that a numerically stronger version of the Type II estimates could obtain results that are more competitive with the Burgess bound. At any rate, this demonstrates that the equidistribution estimates introduced by Zhang may have further applications beyond the family of results related to bounded gaps between primes.

The connection between the least quadratic nonresidue problem and Elliott-Halberstam is follows. Suppose for contradiction we can find a prime {q} with {n(q)} unusually large. Letting {\chi} be the quadratic character modulo {q}, this implies that the sums {\sum_{n \leq x} \chi(n)} are also unusually large for a significant range of {x} (e.g. {x < n(q)}), although the sum is also quite small for large {x} (e.g. {x > q}), due to the cancellation present in {\chi}. It turns out (by a sort of “uncertainty principle” for multiplicative functions, as per this paper of Granville and Soundararajan) that these facts force {\sum_{n\leq x} \chi(n) \Lambda(n)} to be unusually large in magnitude for some large {x} (with {q^C \leq x \leq q^{C'}} for two large absolute constants {C,C'}). By the periodicity of {\chi}, this means that

\displaystyle  \sum_{n\leq x} \chi(n) \Lambda(n+q)

must be unusually large also. However, because {n(q)} is large, one can factorise {\chi} as {f * 1} for a fairly sparsely supported function {f = \chi * \mu}. The Elliott-Halberstam conjecture, which controls the distribution of {\Lambda} in arithmetic progressions on the average can then be used to show that {\sum_{n \leq x} (f*1)(n) \Lambda(n+q)} is small, giving the required contradiction.

The implication involving Type II estimates is proven by a variant of the argument. If {n(q)} is large, then a character sum {\sum_{N\leq n \leq 2N} \chi(n)} is unusually large for a certain {N}. By multiplicativity of {\chi}, this shows that {\chi} correlates with {\chi * 1_{[N,2N]}}, and then by periodicity of {\chi}, this shows that {\chi(n)} correlates with {\chi * 1_{[N,2N]}(n+jq)} for various small {q}. By the Cauchy-Schwarz inequality (cf. this previous blog post), this implies that {\chi * 1_{[N,2N]}(n+jq)} correlates with {\chi * 1_{[N,2N]}(n+j'q)} for some distinct {j,j'}. But this can be ruled out by using Type II estimates.

I’ll record here a well-known observation concerning potential counterexamples to any improvement to the Burgess bound, that is to say an infinite sequence of primes {p} with {n(p) = p^{\frac{1}{4\sqrt{e}} + o(1)}}. Suppose we let {a(t)} be the asymptotic mean value of the quadratic character {\chi} at {p^t} and {b(t)} the mean value of {\chi \Lambda}; these quantities are defined precisely in my paper, but roughly speaking one can think of

\displaystyle  a(t) = \lim_{p \rightarrow \infty} \frac{1}{p^t} \sum_{n \leq p^t} \chi(n)


\displaystyle  b(t) = \lim_{p \rightarrow \infty} \frac{1}{p^t} \sum_{n \leq p^t} \chi(n) \Lambda(n).

Thanks to the basic Dirichlet convolution identity {\chi(n) \log(n) = \chi * \chi\Lambda(n)}, one can establish the Wirsing integral equation

\displaystyle  t a(t) = \int_0^t a(u) b(t-u)\ du

for all {t \geq 0}; see my paper for details (actually far sharper claims than this appear in previous work of Wirsing and Granville-Soundararajan). If we have an infinite sequence of counterexamples to any improvement to the Burgess bound, then we have

\displaystyle  a(t)=b(t) = 1 \hbox{ for } t < \frac{1}{4\sqrt{e}}

while from the Burgess exponential sum estimates we have

\displaystyle  a(t) = 0 \hbox{ for } t > \frac{1}{4}.

These two constraints, together with the Wirsing integral equation, end up determining {a} and {b} completely. It turns out that we must have

\displaystyle  b(t) = -1 \hbox{ for } \frac{1}{4\sqrt{e}} \leq t \leq \frac{1}{4}


\displaystyle  a(t) = 1 - 2 \log(4 \sqrt{e} t) \hbox{ for } \frac{1}{4\sqrt{e}} \leq t \leq \frac{1}{4}

and then for {t > \frac{1}{4}}, {b} evolves by the integral equation

\displaystyle  b(t) = \int_{1/4\sqrt{e}}^{1/4} b(t-u) \frac{2du}{u}.

For instance

\displaystyle  b(t) = 1 \hbox{ for } \frac{1}{4} < t \leq \frac{1}{2\sqrt{e}}

and then {b} oscillates in a somewhat strange fashion towards zero as {t \rightarrow \infty}. This very odd behaviour of {\sum_n \chi(n) \Lambda(n)} is surely impossible, but it seems remarkably hard to exclude it without invoking a strong hypothesis, such as GRH or the Elliott-Halberstam conjecture (or weaker versions thereof).


RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.

Get every new post delivered to your Inbox.

Join 5,188 other followers