I’ve just uploaded two related papers to the arXiv:

This pair of papers is an outgrowth of these two recent blog posts and the ensuing discussion. In the first paper, we establish the following logarithmically averaged version of the Chowla conjecture (in the case ${k=2}$ of two-point correlations (or “pair correlations”)):

Theorem 1 (Logarithmically averaged Chowla conjecture) Let ${a_1,a_2}$ be natural numbers, and let ${b_1,b_2}$ be integers such that ${a_1 b_2 - a_2 b_1 \neq 0}$. Let ${1 \leq \omega(x) \leq x}$ be a quantity depending on ${x}$ that goes to infinity as ${x \rightarrow \infty}$. Let ${\lambda}$ denote the Liouville function. Then one has

$\displaystyle \sum_{x/\omega(x) < n \leq x} \frac{\lambda(a_1 n + b_1) \lambda(a_2 n+b_2)}{n} = o( \log \omega(x) ) \ \ \ \ \ (1)$

as ${x \rightarrow \infty}$.

Thus for instance one has

$\displaystyle \sum_{n \leq x} \frac{\lambda(n) \lambda(n+1)}{n} = o(\log x). \ \ \ \ \ (2)$

For comparison, the non-averaged Chowla conjecture would imply that

$\displaystyle \sum_{n \leq x} \lambda(n) \lambda(n+1) = o(x) \ \ \ \ \ (3)$

which is a strictly stronger estimate than (2), and remains open.

The arguments also extend to other completely multiplicative functions than the Liouville function. In particular, one obtains a slightly averaged version of the non-asymptotic Elliott conjecture that was shown in the previous blog post to imply a positive solution to the Erdos discrepancy problem. The averaged version of the conjecture established in this paper is slightly weaker than the one assumed in the previous blog post, but it turns out that the arguments there can be modified without much difficulty to accept this averaged Elliott conjecture as input. In particular, we obtain an unconditional solution to the Erdos discrepancy problem as a consequence; this is detailed in the second paper listed above. In fact we can also handle the vector-valued version of the Erdos discrepancy problem, in which the sequence ${f(1), f(2), \dots}$ takes values in the unit sphere of an arbitrary Hilbert space, rather than in ${\{-1,+1\}}$.

Estimates such as (2) or (3) are known to be subject to the “parity problem” (discussed numerous times previously on this blog), which roughly speaking means that they cannot be proven solely using “linear” estimates on functions such as the von Mangoldt function. However, it is known that the parity problem can be circumvented using “bilinear” estimates, and this is basically what is done here.

We now describe in informal terms the proof of Theorem 1, focusing on the model case (2) for simplicity. Suppose for contradiction that the left-hand side of (2) was large and (say) positive. Using the multiplicativity ${\lambda(pn) = -\lambda(n)}$, we conclude that

$\displaystyle \sum_{n \leq x} \frac{\lambda(n) \lambda(n+p) 1_{p|n}}{n}$

is also large and positive for all primes ${p}$ that are not too large; note here how the logarithmic averaging allows us to leave the constraint ${n \leq x}$ unchanged. Summing in ${p}$, we conclude that

$\displaystyle \sum_{n \leq x} \frac{ \sum_{p \in {\mathcal P}} \lambda(n) \lambda(n+p) 1_{p|n}}{n}$

is large and positive for any given set ${{\mathcal P}}$ of medium-sized primes. By a standard averaging argument, this implies that

$\displaystyle \frac{1}{H} \sum_{j=1}^H \sum_{p \in {\mathcal P}} \lambda(n+j) \lambda(n+p+j) 1_{p|n+j} \ \ \ \ \ (4)$

is large for many choices of ${n}$, where ${H}$ is a medium-sized parameter at our disposal to choose, and we take ${{\mathcal P}}$ to be some set of primes that are somewhat smaller than ${H}$. (A similar approach was taken in this recent paper of Matomaki, Radziwill, and myself to study sign patterns of the Möbius function.) To obtain the required contradiction, one thus wants to demonstrate significant cancellation in the expression (4). As in that paper, we view ${n}$ as a random variable, in which case (4) is essentially a bilinear sum of the random sequence ${(\lambda(n+1),\dots,\lambda(n+H))}$ along a random graph ${G_{n,H}}$ on ${\{1,\dots,H\}}$, in which two vertices ${j, j+p}$ are connected if they differ by a prime ${p}$ in ${{\mathcal P}}$ that divides ${n+j}$. A key difficulty in controlling this sum is that for randomly chosen ${n}$, the sequence ${(\lambda(n+1),\dots,\lambda(n+H))}$ and the graph ${G_{n,H}}$ need not be independent. To get around this obstacle we introduce a new argument which we call the “entropy decrement argument” (in analogy with the “density increment argument” and “energy increment argument” that appear in the literature surrounding Szemerédi’s theorem on arithmetic progressions, and also reminiscent of the “entropy compression argument” of Moser and Tardos, discussed in this previous post). This argument, which is a simple consequence of the Shannon entropy inequalities, can be viewed as a quantitative version of the standard subadditivity argument that establishes the existence of Kolmogorov-Sinai entropy in topological dynamical systems; it allows one to select a scale parameter ${H}$ (in some suitable range ${[H_-,H_+]}$) for which the sequence ${(\lambda(n+1),\dots,\lambda(n+H))}$ and the graph ${G_{n,H}}$ exhibit some weak independence properties (or more precisely, the mutual information between the two random variables is small).

Informally, the entropy decrement argument goes like this: if the sequence ${(\lambda(n+1),\dots,\lambda(n+H))}$ has significant mutual information with ${G_{n,H}}$, then the entropy of the sequence ${(\lambda(n+1),\dots,\lambda(n+H'))}$ for ${H' > H}$ will grow a little slower than linearly, due to the fact that the graph ${G_{n,H}}$ has zero entropy (knowledge of ${G_{n,H}}$ more or less completely determines the shifts ${G_{n+kH,H}}$ of the graph); this can be formalised using the classical Shannon inequalities for entropy (and specifically, the non-negativity of conditional mutual information). But the entropy cannot drop below zero, so by increasing ${H}$ as necessary, at some point one must reach a metastable region (cf. the finite convergence principle discussed in this previous blog post), within which very little mutual information can be shared between the sequence ${(\lambda(n+1),\dots,\lambda(n+H))}$ and the graph ${G_{n,H}}$. Curiously, for the application it is not enough to have a purely quantitative version of this argument; one needs a quantitative bound (which gains a factor of a bit more than ${\log H}$ on the trivial bound for mutual information), and this is surprisingly delicate (it ultimately comes down to the fact that the series ${\sum_{j \geq 2} \frac{1}{j \log j \log\log j}}$ diverges, which is only barely true).

Once one locates a scale ${H}$ with the low mutual information property, one can use standard concentration of measure results such as the Hoeffding inequality to approximate (4) by the significantly simpler expression

$\displaystyle \frac{1}{H} \sum_{j=1}^H \sum_{p \in {\mathcal P}} \frac{\lambda(n+j) \lambda(n+p+j)}{p}. \ \ \ \ \ (5)$

The important thing here is that Hoeffding’s inequality gives exponentially strong bounds on the failure probability, which is needed to counteract the logarithms that are inevitably present whenever trying to use entropy inequalities. The expression (5) can then be controlled in turn by an application of the Hardy-Littlewood circle method and a non-trivial estimate

$\displaystyle \sup_\alpha \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n) e(\alpha n)|\ dx = o(1) \ \ \ \ \ (6)$

for averaged short sums of a modulated Liouville function established in another recent paper by Matomäki, Radziwill and myself.

When one uses this method to study more general sums such as

$\displaystyle \sum_{n \leq x} \frac{g_1(n) g_2(n+1)}{n},$

one ends up having to consider expressions such as

$\displaystyle \frac{1}{H} \sum_{j=1}^H \sum_{p \in {\mathcal P}} c_p \frac{g_1(n+j) g_2(n+p+j)}{p}.$

where ${c_p}$ is the coefficient ${c_p := \overline{g_1}(p) \overline{g_2}(p)}$. When attacking this sum with the circle method, one soon finds oneself in the situation of wanting to locate the large Fourier coefficients of the exponential sum

$\displaystyle S(\alpha) := \sum_{p \in {\mathcal P}} \frac{c_p}{p} e^{2\pi i \alpha p}.$

In many cases (such as in the application to the Erdös discrepancy problem), the coefficient ${c_p}$ is identically ${1}$, and one can understand this sum satisfactorily using the classical results of Vinogradov: basically, ${S(\alpha)}$ is large when ${\alpha}$ lies in a “major arc” and is small when it lies in a “minor arc”. For more general functions ${g_1,g_2}$, the coefficients ${c_p}$ are more or less arbitrary; the large values of ${S(\alpha)}$ are no longer confined to the major arc case. Fortunately, even in this general situation one can use a restriction theorem for the primes established some time ago by Ben Green and myself to show that there are still only a bounded number of possible locations ${\alpha}$ (up to the uncertainty mandated by the Heisenberg uncertainty principle) where ${S(\alpha)}$ is large, and we can still conclude by using (6). (Actually, as recently pointed out to me by Ben, one does not need the full strength of our result; one only needs the ${L^4}$ restriction theorem for the primes, which can be proven fairly directly using Plancherel’s theorem and some sieve theory.)

It is tempting to also use the method to attack higher order cases of the (logarithmically) averaged Chowla conjecture, for instance one could try to prove the estimate

$\displaystyle \sum_{n \leq x} \frac{\lambda(n) \lambda(n+1) \lambda(n+2)}{n} = o(\log x).$

The above arguments reduce matters to obtaining some non-trivial cancellation for sums of the form

$\displaystyle \frac{1}{H} \sum_{j=1}^H \sum_{p \in {\mathcal P}} \frac{\lambda(n+j) \lambda(n+p+j) \lambda(n+2p+j)}{p}.$

A little bit of “higher order Fourier analysis” (as was done for very similar sums in the ergodic theory context by Frantzikinakis-Host-Kra and Wooley-Ziegler) lets one control this sort of sum if one can establish a bound of the form

$\displaystyle \frac{1}{X} \int_X^{2X} \sup_\alpha |\frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n) e(\alpha n)|\ dx = o(1) \ \ \ \ \ (7)$

where ${X}$ goes to infinity and ${H}$ is a very slowly growing function of ${X}$. This looks very similar to (6), but the fact that the supremum is now inside the integral makes the problem much more difficult. However it looks worth attacking (7) further, as this estimate looks like it should have many nice applications (beyond just the ${k=3}$ case of the logarithmically averaged Chowla or Elliott conjectures, which is already interesting).

For higher ${k}$ than ${k=3}$, the same line of analysis requires one to replace the linear phase ${e(\alpha n)}$ by more complicated phases, such as quadratic phases ${e(\alpha n^2 + \beta n)}$ or even ${k-2}$-step nilsequences. Given that (7) is already beyond the reach of current literature, these even more complicated expressions are also unavailable at present, but one can imagine that they will eventually become tractable, in which case we would obtain an averaged form of the Chowla conjecture for all ${k}$, which would have a number of consequences (such as a logarithmically averaged version of Sarnak’s conjecture, as per this blog post).

It would of course be very nice to remove the logarithmic averaging, and be able to establish bounds such as (3). I did attempt to do so, but I do not see a way to use the entropy decrement argument in a manner that does not require some sort of averaging of logarithmic type, as it requires one to pick a scale ${H}$ that one cannot specify in advance, which is not a problem for logarithmic averages (which are quite stable with respect to dilations) but is problematic for ordinary averages. But perhaps the problem can be circumvented by some clever modification of the argument. One possible approach would be to start exploiting multiplicativity at products of primes, and not just individual primes, to try to keep the scale fixed, but this makes the concentration of measure part of the argument much more complicated as one loses some independence properties (coming from the Chinese remainder theorem) which allowed one to conclude just from the Hoeffding inequality.