You are currently browsing the tag archive for the ‘Kaisa Matomaki’ tag.

Kaisa Matomäki, Xuancheng Shao, Joni Teräväinen, and myself have just uploaded to the arXiv our preprint “Higher uniformity of arithmetic functions in short intervals I. All intervals“. This paper investigates the higher order (Gowers) uniformity of standard arithmetic functions in analytic number theory (and specifically, the Möbius function ${\mu}$, the von Mangoldt function ${\Lambda}$, and the generalised divisor functions ${d_k}$) in short intervals ${(X,X+H]}$, where ${X}$ is large and ${H}$ lies in the range ${X^{\theta+\varepsilon} \leq H \leq X^{1-\varepsilon}}$ for a fixed constant ${0 < \theta < 1}$ (that one would like to be as small as possible). If we let ${f}$ denote one of the functions ${\mu, \Lambda, d_k}$, then there is extensive literature on the estimation of short sums

$\displaystyle \sum_{X < n \leq X+H} f(n)$

and some literature also on the estimation of exponential sums such as

$\displaystyle \sum_{X < n \leq X+H} f(n) e(-\alpha n)$

for a real frequency ${\alpha}$, where ${e(\theta) := e^{2\pi i \theta}}$. For applications in the additive combinatorics of such functions ${f}$, it is also necessary to consider more general correlations, such as polynomial correlations

$\displaystyle \sum_{X < n \leq X+H} f(n) e(-P(n))$

where ${P: {\bf Z} \rightarrow {\bf R}}$ is a polynomial of some fixed degree, or more generally

$\displaystyle \sum_{X < n \leq X+H} f(n) \overline{F}(g(n) \Gamma)$

where ${G/\Gamma}$ is a nilmanifold of fixed degree and dimension (and with some control on structure constants), ${g: {\bf Z} \rightarrow G}$ is a polynomial map, and ${F: G/\Gamma \rightarrow {\bf C}}$ is a Lipschitz function (with some bound on the Lipschitz constant). Indeed, thanks to the inverse theorem for the Gowers uniformity norm, such correlations let one control the Gowers uniformity norm of ${f}$ (possibly after subtracting off some renormalising factor) on such short intervals ${(X,X+H]}$, which can in turn be used to control other multilinear correlations involving such functions.

Traditionally, asymptotics for such sums are expressed in terms of a “main term” of some arithmetic nature, plus an error term that is estimated in magnitude. For instance, a sum such as ${\sum_{X < n \leq X+H} \Lambda(n) e(-\alpha n)}$ would be approximated in terms of a main term that vanished (or is negligible) if ${\alpha}$ is “minor arc”, but would be expressible in terms of something like a Ramanujan sum if ${\alpha}$ was “major arc”, together with an error term. We found it convenient to cancel off such main terms by subtracting an approximant ${f^\sharp}$ from each of the arithmetic functions ${f}$ and then getting upper bounds on remainder correlations such as

$\displaystyle |\sum_{X < n \leq X+H} (f(n)-f^\sharp(n)) \overline{F}(g(n) \Gamma)| \ \ \ \ \ (1)$

(actually for technical reasons we also allow the ${n}$ variable to be restricted further to a subprogression of ${(X,X+H]}$, but let us ignore this minor extension for this discussion). There is some flexibility in how to choose these approximants, but we eventually found it convenient to use the following choices.

• For the Möbius function ${\mu}$, we simply set ${\mu^\sharp = 0}$, as per the Möbius pseudorandomness conjecture. (One could choose a more sophisticated approximant in the presence of a Siegel zero, as I did with Joni in this recent paper, but we do not do so here.)
• For the von Mangoldt function ${\Lambda}$, we eventually went with the Cramér-Granville approximant ${\Lambda^\sharp(n) = \frac{W}{\phi(W)} 1_{(n,W)=1}}$, where ${W = \prod_{p < R} p}$ and ${R = \exp(\log^{1/10} X)}$.
• For the divisor functions ${d_k}$, we used a somewhat complicated-looking approximant ${d_k^\sharp(n) = \sum_{m \leq X^{\frac{k-1}{5k}}} P_m(\log n)}$ for some explicit polynomials ${P_m}$, chosen so that ${d_k^\sharp}$ and ${d_k}$ have almost exactly the same sums along arithmetic progressions (see the paper for details).

The objective is then to obtain bounds on sums such as (1) that improve upon the “trivial bound” that one can get with the triangle inequality and standard number theory bounds such as the Brun-Titchmarsh inequality. For ${\mu}$ and ${\Lambda}$, the Siegel-Walfisz theorem suggests that it is reasonable to expect error terms that have “strongly logarithmic savings” in the sense that they gain a factor of ${O_A(\log^{-A} X)}$ over the trivial bound for any ${A>0}$; for ${d_k}$, the Dirichlet hyperbola method suggests instead that one has “power savings” in that one should gain a factor of ${X^{-c_k}}$ over the trivial bound for some ${c_k>0}$. In the case of the Möbius function ${\mu}$, there is an additional trick (introduced by Matomäki and Teräväinen) that allows one to lower the exponent ${\theta}$ somewhat at the cost of only obtaining “weakly logarithmic savings” of shape ${\log^{-c} X}$ for some small ${c>0}$.

Our main estimates on sums of the form (1) work in the following ranges:

• For ${\theta=5/8}$, one can obtain strongly logarithmic savings on (1) for ${f=\mu,\Lambda}$, and power savings for ${f=d_k}$.
• For ${\theta=3/5}$, one can obtain weakly logarithmic savings for ${f = \mu, d_k}$.
• For ${\theta=5/9}$, one can obtain power savings for ${f=d_3}$.
• For ${\theta=1/3}$, one can obtain power savings for ${f=d_2}$.

Conjecturally, one should be able to obtain power savings in all cases, and lower ${\theta}$ down to zero, but the ranges of exponents and savings given here seem to be the limit of current methods unless one assumes additional hypotheses, such as GRH. The ${\theta=5/8}$ result for correlation against Fourier phases ${e(\alpha n)}$ was established previously by Zhan, and the ${\theta=3/5}$ result for such phases and ${f=\mu}$ was established previously by by Matomäki and Teräväinen.

By combining these results with tools from additive combinatorics, one can obtain a number of applications:

• Direct insertion of our bounds in the recent work of Kanigowski, Lemanczyk, and Radziwill on the prime number theorem on dynamical systems that are analytic skew products gives some improvements in the exponents there.
• We can obtain a “short interval” version of a multiple ergodic theorem along primes established by Frantzikinakis-Host-Kra and Wooley-Ziegler, in which we average over intervals of the form ${(X,X+H]}$ rather than ${[1,X]}$.
• We can obtain a “short interval” version of the “linear equations in primes” asymptotics obtained by Ben Green, Tamar Ziegler, and myself in this sequence of papers, where the variables in these equations lie in short intervals ${(X,X+H]}$ rather than long intervals such as ${[1,X]}$.

We now briefly discuss some of the ingredients of proof of our main results. The first step is standard, using combinatorial decompositions (based on the Heath-Brown identity and (for the ${\theta=3/5}$ result) the Ramaré identity) to decompose ${\mu(n), \Lambda(n), d_k(n)}$ into more tractable sums of the following types:

• Type ${I}$ sums, which are basically of the form ${\sum_{m \leq A:m|n} \alpha(m)}$ for some weights ${\alpha(m)}$ of controlled size and some cutoff ${A}$ that is not too large;
• Type ${II}$ sums, which are basically of the form ${\sum_{A_- \leq m \leq A_+:m|n} \alpha(m)\beta(n/m)}$ for some weights ${\alpha(m)}$, ${\beta(n)}$ of controlled size and some cutoffs ${A_-, A_+}$ that are not too close to ${1}$ or to ${X}$;
• Type ${I_2}$ sums, which are basically of the form ${\sum_{m \leq A:m|n} \alpha(m) d_2(n/m)}$ for some weights ${\alpha(m)}$ of controlled size and some cutoff ${A}$ that is not too large.

The precise ranges of the cutoffs ${A, A_-, A_+}$ depend on the choice of ${\theta}$; our methods fail once these cutoffs pass a certain threshold, and this is the reason for the exponents ${\theta}$ being what they are in our main results.

The Type ${I}$ sums involving nilsequences can be treated by methods similar to those in this previous paper of Ben Green and myself; the main innovations are in the treatment of the Type ${II}$ and Type ${I_2}$ sums.

For the Type ${II}$ sums, one can split into the “abelian” case in which (after some Fourier decomposition) the nilsequence ${F(g(n)\Gamma)}$ is basically of the form ${e(P(n))}$, and the “non-abelian” case in which ${G}$ is non-abelian and ${F}$ exhibits non-trivial oscillation in a central direction. In the abelian case we can adapt arguments of Matomaki and Shao, which uses Cauchy-Schwarz and the equidistribution properties of polynomials to obtain good bounds unless ${e(P(n))}$ is “major arc” in the sense that it resembles (or “pretends to be”) ${\chi(n) n^{it}}$ for some Dirichlet character ${\chi}$ and some frequency ${t}$, but in this case one can use classical multiplicative methods to control the correlation. It turns out that the non-abelian case can be treated similarly. After applying Cauchy-Schwarz, one ends up analyzing the equidistribution of the four-variable polynomial sequence

$\displaystyle (n,m,n',m') \mapsto (g(nm)\Gamma, g(n'm)\Gamma, g(nm') \Gamma, g(n'm'\Gamma))$

as ${n,m,n',m'}$ range in various dyadic intervals. Using the known multidimensional equidistribution theory of polynomial maps in nilmanifolds, one can eventually show in the non-abelian case that this sequence either has enough equidistribution to give cancellation, or else the nilsequence involved can be replaced with one from a lower dimensional nilmanifold, in which case one can apply an induction hypothesis.

For the type ${I_2}$ sum, a model sum to study is

$\displaystyle \sum_{X < n \leq X+H} d_2(n) e(\alpha n)$

which one can expand as

$\displaystyle \sum_{n,m: X < nm \leq X+H} e(\alpha nm).$

We experimented with a number of ways to treat this type of sum (including automorphic form methods, or methods based on the Voronoi formula or van der Corput’s inequality), but somewhat to our surprise, the most efficient approach was an elementary one, in which one uses the Dirichlet approximation theorem to decompose the hyperbolic region ${\{ (n,m) \in {\bf N}^2: X < nm \leq X+H \}}$ into a number of arithmetic progressions, and then uses equidistribution theory to establish cancellation of sequences such as ${e(\alpha nm)}$ on the majority of these progressions. As it turns out, this strategy works well in the regime ${H > X^{1/3+\varepsilon}}$ unless the nilsequence involved is “major arc”, but the latter case is treatable by existing methods as discussed previously; this is why the ${\theta}$ exponent for our ${d_2}$ result can be as low as ${1/3}$.

In a sequel to this paper (currently in preparation), we will obtain analogous results for almost all intervals ${(x,x+H]}$ with ${x}$ in the range ${[X,2X]}$, in which we will be able to lower ${\theta}$ all the way to ${0}$.

Kaisa Matomäki, Maksym Radziwill, Xuancheng Shao, Joni Teräväinen, and myself have just uploaded to the arXiv our preprint “Singmaster’s conjecture in the interior of Pascal’s triangle“. This paper leverages the theory of exponential sums over primes to make progress on a well known conjecture of Singmaster which asserts that any natural number larger than ${1}$ appears at most a bounded number of times in Pascal’s triangle. That is to say, for any integer ${t \geq 2}$, there are at most ${O(1)}$ solutions to the equation

$\displaystyle \binom{n}{m} = t \ \ \ \ \ (1)$

with ${1 \leq m < n}$. Currently, the largest number of solutions that is known to be attainable is eight, with ${t}$ equal to

$\displaystyle 3003 = \binom{3003}{1} = \binom{78}{2} = \binom{15}{5} = \binom{14}{6} = \binom{14}{8} = \binom{15}{10}$

$\displaystyle = \binom{78}{76} = \binom{3003}{3002}.$

Because of the symmetry ${\binom{n}{m} = \binom{n}{n-m}}$ of Pascal’s triangle it is natural to restrict attention to the left half ${1 \leq m \leq n/2}$ of the triangle.

Our main result settles this conjecture in the “interior” region of the triangle:

Theorem 1 (Singmaster’s conjecture in the interior of the triangle) If ${0 < \varepsilon < 1}$ and ${t}$ is sufficiently large depending on ${\varepsilon}$, there are at most two solutions to (1) in the region

$\displaystyle \exp( \log^{2/3+\varepsilon} n ) \leq m \leq n/2 \ \ \ \ \ (2)$

and hence at most four in the region

$\displaystyle \exp( \log^{2/3+\varepsilon} n ) \leq m \leq n - \exp( \log^{2/3+\varepsilon} n ).$

Also, there is at most one solution in the region

$\displaystyle \exp( \log^{2/3+\varepsilon} n ) \leq m \leq n/\exp(\log^{1-\varepsilon} n ).$

To verify Singmaster’s conjecture in full, it thus suffices in view of this result to verify the conjecture in the boundary region

$\displaystyle 2 \leq m < \exp(\log^{2/3+\varepsilon} n) \ \ \ \ \ (3)$

(or equivalently ${n - \exp(\log^{2/3+\varepsilon} n) < m \leq n}$); we have deleted the ${m=1}$ case as it of course automatically supplies exactly one solution to (1). It is in fact possible that for ${t}$ sufficiently large there are no further collisions ${\binom{n}{m} = \binom{n'}{m'}=t}$ for ${(n,m), (n',m')}$ in the region (3), in which case there would never be more than eight solutions to (1) for sufficiently large ${t}$. This is latter claim known for bounded values of ${m,m'}$ by Beukers, Shorey, and Tildeman, with the main tool used being Siegel’s theorem on integral points.

The upper bound of two here for the number of solutions in the region (2) is best possible, due to the infinite family of solutions to the equation

$\displaystyle \binom{n+1}{m+1} = \binom{n}{m+2} \ \ \ \ \ (4)$

coming from ${n = F_{2j+2} F_{2j+3}-1}$, ${m = F_{2j} F_{2j+3}-1}$ and ${F_j}$ is the ${j^{th}}$ Fibonacci number.

The appearance of the quantity ${\exp( \log^{2/3+\varepsilon} n )}$ in Theorem 1 may be familiar to readers that are acquainted with Vinogradov’s bounds on exponential sums, which ends up being the main new ingredient in our arguments. In principle this threshold could be lowered if we had stronger bounds on exponential sums.

To try to control solutions to (1) we use a combination of “Archimedean” and “non-Archimedean” approaches. In the “Archimedean” approach (following earlier work of Kane on this problem) we view ${n,m}$ primarily as real numbers rather than integers, and express (1) in terms of the Gamma function as

$\displaystyle \frac{\Gamma(n+1)}{\Gamma(m+1) \Gamma(n-m+1)} = t.$

One can use this equation to solve for ${n}$ in terms of ${m,t}$ as

$\displaystyle n = f_t(m)$

for a certain real analytic function ${f_t}$ whose asymptotics are easily computable (for instance one has the asymptotic ${f_t(m) \asymp m t^{1/m}}$). One can then view the problem as one of trying to control the number of lattice points on the graph ${\{ (m,f_t(m)): m \in {\bf R} \}}$. Here we can take advantage of the fact that in the regime ${m \leq f_t(m)/2}$ (which corresponds to working in the left half ${m \leq n/2}$ of Pascal’s triangle), the function ${f_t}$ can be shown to be convex, but not too convex, in the sense that one has both upper and lower bounds on the second derivative of ${f_t}$ (in fact one can show that ${f''_t(m) \asymp f_t(m) (\log t/m^2)^2}$). This can be used to preclude the possibility of having a cluster of three or more nearby lattice points on the graph ${\{ (m,f_t(m)): m \in {\bf R} \}}$, basically because the area subtended by the triangle connecting three of these points would lie between ${0}$ and ${1/2}$, contradicting Pick’s theorem. Developing these ideas, we were able to show

Proposition 2 Let ${\varepsilon>0}$, and suppose ${t}$ is sufficiently large depending on ${\varepsilon}$. If ${(m,n)}$ is a solution to (1) in the left half ${m \leq n/2}$ of Pascal’s triangle, then there is at most one other solution ${(m',n')}$ to this equation in the left half with

$\displaystyle |m-m'| + |n-n'| \ll \exp( (\log\log t)^{1-\varepsilon} ).$

Again, the example of (4) shows that a cluster of two solutions is certainly possible; the convexity argument only kicks in once one has a cluster of three or more solutions.

To finish the proof of Theorem 1, one has to show that any two solutions ${(m,n), (m',n')}$ to (1) in the region of interest must be close enough for the above proposition to apply. Here we switch to the “non-Archimedean” approach, in which we look at the ${p}$-adic valuations ${\nu_p( \binom{n}{m} )}$ of the binomial coefficients, defined as the number of times a prime ${p}$ divides ${\binom{n}{m}}$. From the fundamental theorem of arithmetic, a collision

$\displaystyle \binom{n}{m} = \binom{n'}{m'}$

between binomial coefficients occurs if and only if one has agreement of valuations

$\displaystyle \nu_p( \binom{n}{m} ) = \nu_p( \binom{n'}{m'} ). \ \ \ \ \ (5)$

From the Legendre formula

$\displaystyle \nu_p(n!) = \sum_{j=1}^\infty \lfloor \frac{n}{p^j} \rfloor$

we can rewrite this latter identity (5) as

$\displaystyle \sum_{j=1}^\infty \{ \frac{m}{p^j} \} + \{ \frac{n-m}{p^j} \} - \{ \frac{n}{p^j} \} = \sum_{j=1}^\infty \{ \frac{m'}{p^j} \} + \{ \frac{n'-m'}{p^j} \} - \{ \frac{n'}{p^j} \}, \ \ \ \ \ (6)$

where ${\{x\} := x - \lfloor x\rfloor}$ denotes the fractional part of ${x}$. (These sums are not truly infinite, because the summands vanish once ${p^j}$ is larger than ${\max(n,n')}$.)

A key idea in our approach is to view this condition (6) statistically, for instance by viewing ${p}$ as a prime drawn randomly from an interval such as ${[P, P + P \log^{-100} P]}$ for some suitably chosen scale parameter ${P}$, so that the two sides of (6) now become random variables. It then becomes advantageous to compare correlations between these two random variables and some additional test random variable. For instance, if ${n}$ and ${n'}$ are far apart from each other, then one would expect the left-hand side of (6) to have a higher correlation with the fractional part ${\{ \frac{n}{p}\}}$, since this term shows up in the summation on the left-hand side but not the right. Similarly if ${m}$ and ${m'}$ are far apart from each other (although there are some annoying cases one has to treat separately when there is some “unexpected commensurability”, for instance if ${n'-m'}$ is a rational multiple of ${m}$ where the rational has bounded numerator and denominator). In order to execute this strategy, it turns out (after some standard Fourier expansion) that one needs to get good control on exponential sums such as

$\displaystyle \sum_{P \leq p \leq P + P\log^{-100} P} e( \frac{N}{p} + \frac{M}{p^j} )$

for various choices of parameters ${P, N, M, j}$, where ${e(\theta) := e^{2\pi i \theta}}$. Fortunately, the methods of Vinogradov (which more generally can handle sums such as ${\sum_{n \in I} e(f(n))}$ and ${\sum_{p \in I} e(f(p))}$ for various analytic functions ${f}$) can give useful bounds on such sums as long as ${N}$ and ${M}$ are not too large compared to ${P}$; more specifically, Vinogradov’s estimates are non-trivial in the regime ${N,M \ll \exp( \log^{3/2-\varepsilon} P )}$, and this ultimately leads to a distance bound

$\displaystyle m' - m \ll_\varepsilon \exp( \log^{2/3 +\varepsilon}(n+n') )$

between any colliding pair ${(n,m), (n',m')}$ in the left half of Pascal’s triangle, as well as the variant bound

$\displaystyle n' - n \ll_\varepsilon \exp( \log^{2/3 +\varepsilon}(n+n') )$

under the additional assumption

$\displaystyle m', m \geq \exp( \log^{2/3 +\varepsilon}(n+n') ).$

Comparing these bounds with Proposition 2 and using some basic estimates about the function ${f_t}$, we can conclude Theorem 1.

A modification of the arguments also gives similar results for the equation

$\displaystyle (n)_m = t \ \ \ \ \ (7)$

where ${(n)_m := n (n-1) \dots (n-m+1)}$ is the falling factorial:

Theorem 3 If ${0 < \varepsilon < 1}$ and ${t}$ is sufficiently large depending on ${\varepsilon}$, there are at most two solutions to (7) in the region

$\displaystyle \exp( \log^{2/3+\varepsilon} n ) \leq m < n. \ \ \ \ \ (8)$

Again the upper bound of two is best possible, thanks to identities such as

$\displaystyle (a^2-a)_{a^2-2a} = (a^2-a-1)_{a^2-2a+1}.$

Kaisa Matomäki, Maksym Radziwill, Joni Teräväinen, Tamar Ziegler and I have uploaded to the arXiv our paper Higher uniformity of bounded multiplicative functions in short intervals on average. This paper (which originated from a working group at an AIM workshop on Sarnak’s conjecture) focuses on the local Fourier uniformity conjecture for bounded multiplicative functions such as the Liouville function ${\lambda}$. One form of this conjecture is the assertion that

$\displaystyle \int_0^X \| \lambda \|_{U^k([x,x+H])}\ dx = o(X) \ \ \ \ \ (1)$

as ${X \rightarrow \infty}$ for any fixed ${k \geq 0}$ and any ${H = H(X) \leq X}$ that goes to infinity as ${X \rightarrow \infty}$, where ${U^k([x,x+H])}$ is the (normalized) Gowers uniformity norm. Among other things this conjecture implies (logarithmically averaged version of) the Chowla and Sarnak conjectures for the Liouville function (or the Möbius function), see this previous blog post.

The conjecture gets more difficult as ${k}$ increases, and also becomes more difficult the more slowly ${H}$ grows with ${X}$. The ${k=0}$ conjecture is equivalent to the assertion

$\displaystyle \int_0^X |\sum_{x \leq n \leq x+H} \lambda(n)| \ dx = o(HX)$

which was proven (for arbitrarily slowly growing ${H}$) in a landmark paper of Matomäki and Radziwill, discussed for instance in this blog post.

For ${k=1}$, the conjecture is equivalent to the assertion

$\displaystyle \int_0^X \sup_\alpha |\sum_{x \leq n \leq x+H} \lambda(n) e(-\alpha n)| \ dx = o(HX). \ \ \ \ \ (2)$

This remains open for sufficiently slowly growing ${H}$ (and it would be a major breakthrough in particular if one could obtain this bound for ${H}$ as small as ${\log^\varepsilon X}$ for any fixed ${\varepsilon>0}$, particularly if applicable to more general bounded multiplicative functions than ${\lambda}$, as this would have new implications for a generalization of the Chowla conjecture known as the Elliott conjecture). Recently, Kaisa, Maks and myself were able to establish this conjecture in the range ${H \geq X^\varepsilon}$ (in fact we have since worked out in the current paper that we can get ${H}$ as small as ${\exp(\log^{5/8+\varepsilon} X)}$). In our current paper we establish Fourier uniformity conjecture for higher ${k}$ for the same range of ${H}$. This in particular implies local orthogonality to polynomial phases,

$\displaystyle \int_0^X \sup_{P \in \mathrm{Poly}_{\leq k-1}({\bf R} \rightarrow {\bf R})} |\sum_{x \leq n \leq x+H} \lambda(n) e(-P(n))| \ dx = o(HX) \ \ \ \ \ (3)$

where ${\mathrm{Poly}_{\leq k-1}({\bf R} \rightarrow {\bf R})}$ denotes the polynomials of degree at most ${k-1}$, but the full conjecture is a bit stronger than this, establishing the more general statement

$\displaystyle \int_0^X \sup_{g \in \mathrm{Poly}({\bf R} \rightarrow G)} |\sum_{x \leq n \leq x+H} \lambda(n) \overline{F}(g(n) \Gamma)| \ dx = o(HX) \ \ \ \ \ (4)$

for any degree ${k}$ filtered nilmanifold ${G/\Gamma}$ and Lipschitz function ${F: G/\Gamma \rightarrow {\bf C}}$, where ${g}$ now ranges over polynomial maps from ${{\bf R}}$ to ${G}$. The method of proof follows the same general strategy as in the previous paper with Kaisa and Maks. (The equivalence of (4) and (1) follows from the inverse conjecture for the Gowers norms, proven in this paper.) We quickly sketch first the proof of (3), using very informal language to avoid many technicalities regarding the precise quantitative form of various estimates. If the estimate (3) fails, then we have the correlation estimate

$\displaystyle |\sum_{x \leq n \leq x+H} \lambda(n) e(-P_x(n))| \gg H$

for many ${x \sim X}$ and some polynomial ${P_x}$ depending on ${x}$. The difficulty here is to understand how ${P_x}$ can depend on ${x}$. We write the above correlation estimate more suggestively as

$\displaystyle \lambda(n) \sim_{[x,x+H]} e(P_x(n)).$

Because of the multiplicativity ${\lambda(np) = -\lambda(p)}$ at small primes ${p}$, one expects to have a relation of the form

$\displaystyle e(P_{x'}(p'n)) \sim_{[x/p,x/p+H/p]} e(P_x(pn)) \ \ \ \ \ (5)$

for many ${x,x'}$ for which ${x/p \approx x'/p'}$ for some small primes ${p,p'}$. (This can be formalised using an inequality of Elliott related to the Turan-Kubilius theorem.) This gives a relationship between ${P_x}$ and ${P_{x'}}$ for “edges” ${x,x'}$ in a rather sparse “graph” connecting the elements of say ${[X/2,X]}$. Using some graph theory one can locate some non-trivial “cycles” in this graph that eventually lead (in conjunction to a certain technical but important “Chinese remainder theorem” step to modify the ${P_x}$ to eliminate a rather serious “aliasing” issue that was already discussed in this previous post) to obtain functional equations of the form

$\displaystyle P_x(a_x \cdot) \approx P_x(b_x \cdot)$

for some large and close (but not identical) integers ${a_x,b_x}$, where ${\approx}$ should be viewed as a first approximation (ignoring a certain “profinite” or “major arc” term for simplicity) as “differing by a slowly varying polynomial” and the polynomials ${P_x}$ should now be viewed as taking values on the reals rather than the integers. This functional equation can be solved to obtain a relation of the form

$\displaystyle P_x(t) \approx T_x \log t$

for some real number ${T_x}$ of polynomial size, and with further analysis of the relation (5) one can make ${T_x}$ basically independent of ${x}$. This simplifies (3) to something like

$\displaystyle \int_0^X \sup_{P \in \mathrm{Poly}_{\leq k-1}({\bf R} \rightarrow {\bf R})} |\sum_{x \leq n \leq x+H} \lambda(n) n^{-iT}| \ dx = o(HX)$

and this is now of a form that can be treated by the theorem of Matomäki and Radziwill (because ${n \mapsto \lambda(n) n^{-iT}}$ is a bounded multiplicative function). (Actually because of the profinite term mentioned previously, one also has to insert a Dirichlet character of bounded conductor into this latter conclusion, but we will ignore this technicality.)

Now we apply the same strategy to (4). For abelian ${G}$ the claim follows easily from (3), so we focus on the non-abelian case. One now has a polynomial sequence ${g_x \in \mathrm{Poly}({\bf R} \rightarrow G)}$ attached to many ${x \sim X}$, and after a somewhat complicated adaptation of the above arguments one again ends up with an approximate functional equation

$\displaystyle g_x(a_x \cdot) \Gamma \approx g_x(b_x \cdot) \Gamma \ \ \ \ \ (6)$

where the relation ${\approx}$ is rather technical and will not be detailed here. A new difficulty arises in that there are some unwanted solutions to this equation, such as

$\displaystyle g_x(t) = \gamma^{\frac{\log(a_x t)}{\log(a_x/b_x)}}$

for some ${\gamma \in \Gamma}$, which do not necessarily lead to multiplicative characters like ${n^{-iT}}$ as in the polynomial case, but instead to some unfriendly looking “generalized multiplicative characters” (think of ${e(\lfloor \alpha \log n \rfloor \beta \log n)}$ as a rough caricature). To avoid this problem, we rework the graph theory portion of the argument to produce not just one functional equation of the form (6)for each ${x}$, but many, leading to dilation invariances

$\displaystyle g_x((1+\theta) t) \Gamma \approx g_x(t) \Gamma$

for a “dense” set of ${\theta}$. From a certain amount of Lie algebra theory (ultimately arising from an understanding of the behaviour of the exponential map on nilpotent matrices, and exploiting the hypothesis that ${G}$ is non-abelian) one can conclude that (after some initial preparations to avoid degenerate cases) ${g_x(t)}$ must behave like ${\gamma_x^{\log t}}$ for some central element ${\gamma_x}$ of ${G}$. This eventually brings one back to the multiplicative characters ${n^{-iT}}$ that arose in the polynomial case, and the arguments now proceed as before.

We give two applications of this higher order Fourier uniformity. One regards the growth of the number

$\displaystyle s(k) := |\{ (\lambda(n+1),\dots,\lambda(n+k)): n \in {\bf N} \}|$

of length ${k}$ sign patterns in the Liouville function. The Chowla conjecture implies that ${s(k) = 2^k}$, but even the weaker conjecture of Sarnak that ${s(k) \gg (1+\varepsilon)^k}$ for some ${\varepsilon>0}$ remains open. Until recently, the best asymptotic lower bound on ${s(k)}$ was ${s(k) \gg k^2}$, due to McNamara; with our result, we can now show ${s(k) \gg_A k^A}$ for any ${A}$ (in fact we can get ${s(k) \gg_\varepsilon \exp(\log^{8/5-\varepsilon} k)}$ for any ${\varepsilon>0}$). The idea is to repeat the now-standard argument to exploit multiplicativity at small primes to deduce Chowla-type conjectures from Fourier uniformity conjectures, noting that the Chowla conjecture would give all the sign patterns one could hope for. The usual argument here uses the “entropy decrement argument” to eliminate a certain error term (involving the large but mean zero factor ${p 1_{p|n}-1}$). However the observation is that if there are extremely few sign patterns of length ${k}$, then the entropy decrement argument is unnecessary (there isn’t much entropy to begin with), and a more low-tech moment method argument (similar to the derivation of Chowla’s conjecture from Sarnak’s conjecture, as discussed for instance in this post) gives enough of Chowla’s conjecture to produce plenty of length ${k}$ sign patterns. If there are not extremely few sign patterns of length ${k}$ then we are done anyway. One quirk of this argument is that the sign patterns it produces may only appear exactly once; in contrast with preceding arguments, we were not able to produce a large number of sign patterns that each occur infinitely often.

The second application is to obtain cancellation for various polynomial averages involving the Liouville function ${\lambda}$ or von Mangoldt function ${\Lambda}$, such as

$\displaystyle {\bf E}_{n \leq X} {\bf E}_{m \leq X^{1/d}} \lambda(n+P_1(m)) \lambda(n+P_2(m)) \dots \lambda(n+P_k(m))$

or

$\displaystyle {\bf E}_{n \leq X} {\bf E}_{m \leq X^{1/d}} \lambda(n+P_1(m)) \Lambda(n+P_2(m)) \dots \Lambda(n+P_k(m))$

where ${P_1,\dots,P_k}$ are polynomials of degree at most ${d}$, no two of which differ by a constant (the latter is essential to avoid having to establish the Chowla or Hardy-Littlewood conjectures, which of course remain open). Results of this type were previously obtained by Tamar Ziegler and myself in the “true complexity zero” case when the polynomials ${P}$ had distinct degrees, in which one could use the ${k=0}$ theory of Matomäki and Radziwill; now that higher ${k}$ is available at the scale ${H=X^{1/d}}$ we can now remove this restriction.

Kaisa Matomäki, Maksym Radziwill, and I just uploaded to the arXiv our paper “Fourier uniformity of bounded multiplicative functions in short intervals on average“. This paper is the outcome of our attempts during the MSRI program in analytic number theory last year to attack the local Fourier uniformity conjecture for the Liouville function ${\lambda}$. This conjecture generalises a landmark result of Matomäki and Radziwill, who show (among other things) that one has the asymptotic

$\displaystyle \int_X^{2X} |\sum_{x \leq n \leq x+H} \lambda(n)|\ dx = o(HX) \ \ \ \ \ (1)$

whenever ${X \rightarrow \infty}$ and ${H = H(X)}$ goes to infinity as ${X \rightarrow \infty}$. Informally, this says that the Liouville function has small mean for almost all short intervals ${[x,x+H]}$. The remarkable thing about this theorem is that there is no lower bound on how ${H}$ goes to infinity with ${X}$; one can take for instance ${H = \log\log\log X}$. This lack of lower bound was crucial when I applied this result (or more precisely, a generalisation of this result to arbitrary non-pretentious bounded multiplicative functions) a few years ago to solve the Erdös discrepancy problem, as well as a logarithmically averaged two-point Chowla conjecture, for instance it implies that

$\displaystyle \sum_{n \leq X} \frac{\lambda(n) \lambda(n+1)}{n} = o(\log X).$

The local Fourier uniformity conjecture asserts the stronger asymptotic

$\displaystyle \int_X^{2X} \sup_{\alpha \in {\bf R}} |\sum_{x \leq n \leq x+H} \lambda(n) e(-\alpha n)|\ dx = o(HX) \ \ \ \ \ (2)$

under the same hypotheses on ${H}$ and ${X}$. As I worked out in a previous paper, this conjecture would imply a logarithmically averaged three-point Chowla conjecture, implying for instance that

$\displaystyle \sum_{n \leq X} \frac{\lambda(n) \lambda(n+1) \lambda(n+2)}{n} = o(\log X).$

This particular bound also follows from some slightly different arguments of Joni Teräväinen and myself, but the implication would also work for other non-pretentious bounded multiplicative functions, whereas the arguments of Joni and myself rely more heavily on the specific properties of the Liouville function (in particular that ${\lambda(p)=-1}$ for all primes ${p}$).

There is also a higher order version of the local Fourier uniformity conjecture in which the linear phase ${{}e(-\alpha n)}$ is replaced with a polynomial phase such as ${e(-\alpha_d n^d - \dots - \alpha_1 n - \alpha_0)}$, or more generally a nilsequence ${\overline{F(g(n) \Gamma)}}$; as shown in my previous paper, this conjecture implies (and is in fact equivalent to, after logarithmic averaging) a logarithmically averaged version of the full Chowla conjecture (not just the two-point or three-point versions), as well as a logarithmically averaged version of the Sarnak conjecture.

The main result of the current paper is to obtain some cases of the local Fourier uniformity conjecture:

Theorem 1 The asymptotic (2) is true when ${H = X^\theta}$ for a fixed ${\theta > 0}$.

Previously this was known for ${\theta > 5/8}$ by the work of Zhan (who in fact proved the stronger pointwise assertion ${\sup_{\alpha \in {\bf R}} |\sum_{x \leq n \leq x+H} \lambda(n) e(-\alpha n)|= o(H)}$ for ${X \leq x \leq 2X}$ in this case). In a previous paper with Kaisa and Maksym, we also proved a weak version

$\displaystyle \sup_{\alpha \in {\bf R}} \int_X^{2X} |\sum_{x \leq n \leq x+H} \lambda(n) e(-\alpha n)|\ dx = o(HX) \ \ \ \ \ (3)$

of (2) for any ${H}$ growing arbitrarily slowly with ${X}$; this is stronger than (1) (and is in fact proven by a variant of the method) but significantly weaker than (2), because in the latter the worst-case ${\alpha}$ is permitted to depend on the ${x}$ parameter, whereas in (3) ${\alpha}$ must remain independent of ${x}$.

Unfortunately, the restriction ${H = X^\theta}$ is not strong enough to give applications to Chowla-type conjectures (one would need something more like ${H = \log^\theta X}$ for this). However, it can still be used to control some sums that had not previously been manageable. For instance, a quick application of the circle method lets one use the above theorem to derive the asymptotic

$\displaystyle \sum_{h \leq H} \sum_{n \leq X} \lambda(n) \Lambda(n+h) \Lambda(n+2h) = o( H X )$

whenever ${H = X^\theta}$ for a fixed ${\theta > 0}$, where ${\Lambda}$ is the von Mangoldt function. Amusingly, the seemingly simpler question of establishing the expected asymptotic for

$\displaystyle \sum_{h \leq H} \sum_{n \leq X} \Lambda(n+h) \Lambda(n+2h)$

is only known in the range ${\theta \geq 1/6}$ (from the work of Zaccagnini). Thus we have a rare example of a number theory sum that becomes easier to control when one inserts a Liouville function!

We now give an informal description of the strategy of proof of the theorem (though for numerous technical reasons, the actual proof deviates in some respects from the description given here). If (2) failed, then for many values of ${x \in [X,2X]}$ we would have the lower bound

$\displaystyle |\sum_{x \leq n \leq x+H} \lambda(n) e(-\alpha_x n)| \gg 1$

for some frequency ${\alpha_x \in{\bf R}}$. We informally describe this correlation between ${\lambda(n)}$ and ${e(\alpha_x n)}$ by writing

$\displaystyle \lambda(n) \approx e(\alpha_x n) \ \ \ \ \ (4)$

for ${n \in [x,x+H]}$ (informally, one should view this as asserting that ${\lambda(n)}$ “behaves like” a constant multiple of ${e(\alpha_x n)}$). For sake of discussion, suppose we have this relationship for all ${x \in [X,2X]}$, not just many.

As mentioned before, the main difficulty here is to understand how ${\alpha_x}$ varies with ${x}$. As it turns out, the multiplicativity properties of the Liouville function place a significant constraint on this dependence. Indeed, if we let ${p}$ be a fairly small prime (e.g. of size ${H^\varepsilon}$ for some ${\varepsilon>0}$), and use the identity ${\lambda(np) = \lambda(n) \lambda(p) = - \lambda(n)}$ for the Liouville function to conclude (at least heuristically) from (4) that

$\displaystyle \lambda(n) \approx e(\alpha_x n p)$

for ${n \in [x/p, x/p + H/p]}$. (In practice, we will have this sort of claim for many primes ${p}$ rather than all primes ${p}$, after using tools such as the Turán-Kubilius inequality, but we ignore this distinction for this informal argument.)

Now let ${x, y \in [X,2X]}$ and ${p,q \sim P}$ be primes comparable to some fixed range ${P = H^\varepsilon}$ such that

$\displaystyle x/p = y/q + O( H/P). \ \ \ \ \ (5)$

Then we have both

$\displaystyle \lambda(n) \approx e(\alpha_x n p)$

and

$\displaystyle \lambda(n) \approx e(\alpha_y n q)$

on essentially the same range of ${n}$ (two nearby intervals of length ${\sim H/P}$). This suggests that the frequencies ${p \alpha_x}$ and ${q \alpha_y}$ should be close to each other modulo ${1}$, in particular one should expect the relationship

$\displaystyle p \alpha_x = q \alpha_y + O( \frac{P}{H} ) \hbox{ mod } 1. \ \ \ \ \ (6)$

Comparing this with (5) one is led to the expectation that ${\alpha_x}$ should depend inversely on ${x}$ in some sense (for instance one can check that

$\displaystyle \alpha_x = T/x \ \ \ \ \ (7)$

would solve (6) if ${T = O( X / H^2 )}$; by Taylor expansion, this would correspond to a global approximation of the form ${\lambda(n) \approx n^{iT}}$). One now has a problem of an additive combinatorial flavour (or of a “local to global” flavour), namely to leverage the relation (6) to obtain global control on ${\alpha_x}$ that resembles (7).

A key obstacle in solving (6) efficiently is the fact that one only knows that ${p \alpha_x}$ and ${q \alpha_y}$ are close modulo ${1}$, rather than close on the real line. One can start resolving this problem by the Chinese remainder theorem, using the fact that we have the freedom to shift (say) ${\alpha_y}$ by an arbitrary integer. After doing so, one can arrange matters so that one in fact has the relationship

$\displaystyle p \alpha_x = q \alpha_y + O( \frac{P}{H} ) \hbox{ mod } p \ \ \ \ \ (8)$

whenever ${x,y \in [X,2X]}$ and ${p,q \sim P}$ obey (5). (This may force ${\alpha_q}$ to become extremely large, on the order of ${\prod_{p \sim P} p}$, but this will not concern us.)

Now suppose that we have ${y,y' \in [X,2X]}$ and primes ${q,q' \sim P}$ such that

$\displaystyle y/q = y'/q' + O(H/P). \ \ \ \ \ (9)$

For every prime ${p \sim P}$, we can find an ${x}$ such that ${x/p}$ is within ${O(H/P)}$ of both ${y/q}$ and ${y'/q'}$. Applying (8) twice we obtain

$\displaystyle p \alpha_x = q \alpha_y + O( \frac{P}{H} ) \hbox{ mod } p$

and

$\displaystyle p \alpha_x = q' \alpha_{y'} + O( \frac{P}{H} ) \hbox{ mod } p$

and thus by the triangle inequality we have

$\displaystyle q \alpha_y = q' \alpha_{y'} + O( \frac{P}{H} ) \hbox{ mod } p$

for all ${p \sim P}$; hence by the Chinese remainder theorem

$\displaystyle q \alpha_y = q' \alpha_{y'} + O( \frac{P}{H} ) \hbox{ mod } \prod_{p \sim P} p.$

In practice, in the regime ${H = X^\theta}$ that we are considering, the modulus ${\prod_{p \sim P} p}$ is so huge we can effectively ignore it (in the spirit of the Lefschetz principle); so let us pretend that we in fact have

$\displaystyle q \alpha_y = q' \alpha_{y'} + O( \frac{P}{H} ) \ \ \ \ \ (10)$

whenever ${y,y' \in [X,2X]}$ and ${q,q' \sim P}$ obey (9).

Now let ${k}$ be an integer to be chosen later, and suppose we have primes ${p_1,\dots,p_k,q_1,\dots,q_k \sim P}$ such that the difference

$\displaystyle q = |p_1 \dots p_k - q_1 \dots q_k|$

is small but non-zero. If ${k}$ is chosen so that

$\displaystyle P^k \approx \frac{X}{H}$

(where one is somewhat loose about what ${\approx}$ means) then one can then find real numbers ${x_1,\dots,x_k \sim X}$ such that

$\displaystyle \frac{x_j}{p_j} = \frac{x_{j+1}}{q_j} + O( \frac{H}{P} )$

for ${j=1,\dots,k}$, with the convention that ${x_{k+1} = x_1}$. We then have

$\displaystyle p_j \alpha_{x_j} = q_j \alpha_{x_{j+1}} + O( \frac{P}{H} )$

which telescopes to

$\displaystyle p_1 \dots p_k \alpha_{x_1} = q_1 \dots q_k \alpha_{x_1} + O( \frac{P^k}{H} )$

and thus

$\displaystyle q \alpha_{x_1} = O( \frac{P^k}{H} )$

and hence

$\displaystyle \alpha_{x_1} = O( \frac{P^k}{H} ) \approx O( \frac{X}{H^2} ).$

In particular, for each ${x \sim X}$, we expect to be able to write

$\displaystyle \alpha_x = \frac{T_x}{x} + O( \frac{1}{H} )$

for some ${T_x = O( \frac{X^2}{H^2} )}$. This quantity ${T_x}$ can vary with ${x}$; but from (10) and a short calculation we see that

$\displaystyle T_y = T_{y'} + O( \frac{X}{H} )$

whenever ${y, y' \in [X,2X]}$ obey (9) for some ${q,q' \sim P}$.

Now imagine a “graph” in which the vertices are elements ${y}$ of ${[X,2X]}$, and two elements ${y,y'}$ are joined by an edge if (9) holds for some ${q,q' \sim P}$. Because of exponential sum estimates on ${\sum_{q \sim P} q^{it}}$, this graph turns out to essentially be an “expander” in the sense that any two vertices ${y,y' \in [X,2X]}$ can be connected (in multiple ways) by fairly short paths in this graph (if one allows one to modify one of ${y}$ or ${y'}$ by ${O(H)}$). As a consequence, we can assume that this quantity ${T_y}$ is essentially constant in ${y}$ (cf. the application of the ergodic theorem in this previous blog post), thus we now have

$\displaystyle \alpha_x = \frac{T}{x} + O(\frac{1}{H} )$

for most ${x \in [X,2X]}$ and some ${T = O(X^2/H^2)}$. By Taylor expansion, this implies that

$\displaystyle \lambda(n) \approx n^{iT}$

on ${[x,x+H]}$ for most ${x}$, thus

$\displaystyle \int_X^{2X} |\sum_{x \leq n \leq x+H} \lambda(n) n^{-iT}|\ dx \gg HX.$

But this can be shown to contradict the Matomäki-Radziwill theorem (because the multiplicative function ${n \mapsto \lambda(n) n^{-iT}}$ is known to be non-pretentious).

Kaisa Matomaki, Maksym Radziwill, and I have uploaded to the arXiv our paper “Correlations of the von Mangoldt and higher divisor functions II. Divisor correlations in short ranges“. This is a sequel of sorts to our previous paper on divisor correlations, though the proof techniques in this paper are rather different. As with the previous paper, our interest is in correlations such as

$\displaystyle \sum_{n \leq X} d_k(n) d_l(n+h) \ \ \ \ \ (1)$

for medium-sized ${h}$ and large ${X}$, where ${k \geq l \geq 1}$ are natural numbers and ${d_k(n) = \sum_{n = m_1 \dots m_k} 1}$ is the ${k^{th}}$ divisor function (actually our methods can also treat a generalisation in which ${k}$ is non-integer, but for simplicity let us stick with the integer case for this discussion). Our methods also allow for one of the divisor function factors to be replaced with a von Mangoldt function, but (in contrast to the previous paper) we cannot treat the case when both factors are von Mangoldt.

As discussed in this previous post, one heuristically expects an asymptotic of the form

$\displaystyle \sum_{n \leq X} d_k(n) d_l(n+h) = P_{k,l,h}( \log X ) X + O( X^{1/2+\varepsilon})$

for any fixed ${\varepsilon>0}$, where ${P_{k,l,h}}$ is a certain explicit (but rather complicated) polynomial of degree ${k+l-1}$. Such asymptotics are known when ${l \leq 2}$, but remain open for ${k \geq l \geq 3}$. In the previous paper, we were able to obtain a weaker bound of the form

$\displaystyle \sum_{n \leq X} d_k(n) d_l(n+h) = P_{k,l,h}( \log X ) X + O_A( X \log^{-A} X)$

for ${1-O_A(\log^{-A} X)}$ of the shifts ${-H \leq h \leq H}$, whenever the shift range ${H}$ lies between ${X^{8/33+\varepsilon}}$ and ${X^{1-\varepsilon}}$. But the methods become increasingly hard to use as ${H}$ gets smaller. In this paper, we use a rather different method to obtain the even weaker bound

$\displaystyle \sum_{n \leq X} d_k(n) d_l(n+h) = (1+o(1)) P_{k,l,h}( \log X ) X$

for ${1-o(1)}$ of the shifts ${-H \leq h \leq H}$, where ${H}$ can now be as short as ${H = \log^{10^4 k \log k} X}$. The constant ${10^4}$ can be improved, but there are serious obstacles to using our method to go below ${\log^{k \log k} X}$ (as the exceptionally large values of ${d_k}$ then begin to dominate). This can be viewed as an analogue to our previous paper on correlations of bounded multiplicative functions on average, in which the functions ${d_k,d_l}$ are now unbounded, and indeed our proof strategy is based in large part on that paper (but with many significant new technical complications).

We now discuss some of the ingredients of the proof. Unsurprisingly, the first step is the circle method, expressing (1) in terms of exponential sums such as

$\displaystyle S(\alpha) := \sum_{n \leq X} d_k(n) e(\alpha).$

Actually, it is convenient to first prune ${d_k}$ slightly by zeroing out this function on “atypical” numbers ${n}$ that have an unusually small or large number of factors in a certain sense, but let us ignore this technicality for this discussion. The contribution of ${S(\alpha)}$ for “major arc” ${\alpha}$ can be treated by standard techniques (and is the source of the main term ${P_{k,l,h}(\log X) X}$; the main difficulty comes from treating the contribution of “minor arc” ${\alpha}$.

In our previous paper on bounded multiplicative functions, we used Plancherel’s theorem to estimate the global ${L^2}$ norm ${\int_{{\bf R}/{\bf Z}} |S(\alpha)|^2\ d\alpha}$, and then also used the Katai-Bourgain-Sarnak-Ziegler orthogonality criterion to control local ${L^2}$ norms ${\int_I |S(\alpha)|^2\ d\alpha}$, where ${I}$ was a minor arc interval of length about ${1/H}$, and these two estimates together were sufficient to get a good bound on correlations by an application of Hölder’s inequality. For ${d_k}$, it is more convenient to use Dirichlet series methods (and Ramaré-type factorisations of such Dirichlet series) to control local ${L^2}$ norms on minor arcs, in the spirit of the proof of the Matomaki-Radziwill theorem; a key point is to develop “log-free” mean value theorems for Dirichlet series associated to functions such as ${d_k}$, so as not to wipe out the (rather small) savings one will get over the trivial bound from this method. On the other hand, the global ${L^2}$ bound will definitely be unusable, because the ${\ell^2}$ sum ${\sum_{n \leq X} d_k(n)^2}$ has too many unwanted factors of ${\log X}$. Fortunately, we can substitute this global ${L^2}$ bound with a “large values” bound that controls expressions such as

$\displaystyle \sum_{i=1}^J \int_{I_i} |S(\alpha)|^2\ d\alpha$

for a moderate number of disjoint intervals ${I_1,\dots,I_J}$, with a bound that is slightly better (for ${J}$ a medium-sized power of ${\log X}$) than what one would have obtained by bounding each integral ${\int_{I_i} |S(\alpha)|^2\ d\alpha}$ separately. (One needs to save more than ${J^{1/2}}$ for the argument to work; we end up saving a factor of about ${J^{3/4}}$.) This large values estimate is probably the most novel contribution of the paper. After taking the Fourier transform, matters basically reduce to getting a good estimate for

$\displaystyle \sum_{i=1}^J (\int_X^{2X} |\sum_{x \leq n \leq x+H} d_k(n) e(\alpha_i n)|^2\ dx)^{1/2},$

where ${\alpha_i}$ is the midpoint of ${I_i}$; thus we need some upper bound on the large local Fourier coefficients of ${d_k}$. These coefficients are difficult to calculate directly, but, in the spirit of a paper of Ben Green and myself, we can try to replace ${d_k}$ by a more tractable and “pseudorandom” majorant ${\tilde d_k}$ for which the local Fourier coefficients are computable (on average). After a standard duality argument, one ends up having to control expressions such as

$\displaystyle |\sum_{x \leq n \leq x+H} \tilde d_k(n) e((\alpha_i -\alpha_{i'}) n)|$

after various averaging in the ${x, i,i'}$ parameters. These local Fourier coefficients of ${\tilde d_k}$ turn out to be small on average unless ${\alpha_i -\alpha_{i'}}$ is “major arc”. One then is left with a mostly combinatorial problem of trying to bound how often this major arc scenario occurs. This is very close to a computation in the previously mentioned paper of Ben and myself; there is a technical wrinkle in that the ${\alpha_i}$ are not as well separated as they were in my paper with Ben, but it turns out that one can modify the arguments in that paper to still obtain a satisfactory estimate in this case (after first grouping nearby frequencies ${\alpha_i}$ together, and modifying the duality argument accordingly).

Kaisa Matomaki, Maksym Radziwill, and I have uploaded to the arXiv our paper “Correlations of the von Mangoldt and higher divisor functions I. Long shift ranges“, submitted to Proceedings of the London Mathematical Society. This paper is concerned with the estimation of correlations such as

$\displaystyle \sum_{n \leq X} \Lambda(n) \Lambda(n+h) \ \ \ \ \ (1)$

for medium-sized ${h}$ and large ${X}$, where ${\Lambda}$ is the von Mangoldt function; we also consider variants of this sum in which one of the von Mangoldt functions is replaced with a (higher order) divisor function, but for sake of discussion let us focus just on the sum (1). Understanding this sum is very closely related to the problem of finding pairs of primes that differ by ${h}$; for instance, if one could establish a lower bound

$\displaystyle \sum_{n \leq X} \Lambda(n) \Lambda(n+2) \gg X$

then this would easily imply the twin prime conjecture.

The (first) Hardy-Littlewood conjecture asserts an asymptotic

$\displaystyle \sum_{n \leq X} \Lambda(n) \Lambda(n+h) = {\mathfrak S}(h) X + o(X) \ \ \ \ \ (2)$

as ${X \rightarrow \infty}$ for any fixed positive ${h}$, where the singular series ${{\mathfrak S}(h)}$ is an arithmetic factor arising from the irregularity of distribution of ${\Lambda}$ at small moduli, defined explicitly by

$\displaystyle {\mathfrak S}(h) := 2 \Pi_2 \prod_{p|h; p>2} \frac{p-2}{p-1}$

when ${h}$ is even, and ${{\mathfrak S}(h)=0}$ when ${h}$ is odd, where

$\displaystyle \Pi_2 := \prod_{p>2} (1-\frac{1}{(p-1)^2}) = 0.66016\dots$

is (half of) the twin prime constant. See for instance this previous blog post for a a heuristic explanation of this conjecture. From the previous discussion we see that (2) for ${h=2}$ would imply the twin prime conjecture. Sieve theoretic methods are only able to provide an upper bound of the form ${ \sum_{n \leq X} \Lambda(n) \Lambda(n+h) \ll {\mathfrak S}(h) X}$.

Needless to say, apart from the trivial case of odd ${h}$, there are no values of ${h}$ for which the Hardy-Littlewood conjecture is known. However there are some results that say that this conjecture holds “on the average”: in particular, if ${H}$ is a quantity depending on ${X}$ that is somewhat large, there are results that show that (2) holds for most (i.e. for ${1-o(1)}$) of the ${h}$ betwen ${0}$ and ${H}$. Ideally one would like to get ${H}$ as small as possible, in particular one can view the full Hardy-Littlewood conjecture as the endpoint case when ${H}$ is bounded.

The first results in this direction were by van der Corput and by Lavrik, who established such a result with ${H = X}$ (with a subsequent refinement by Balog); Wolke lowered ${H}$ to ${X^{5/8+\varepsilon}}$, and Mikawa lowered ${H}$ further to ${X^{1/3+\varepsilon}}$. The main result of this paper is a further lowering of ${H}$ to ${X^{8/33+\varepsilon}}$. In fact (as in the preceding works) we get a better error term than ${o(X)}$, namely an error of the shape ${O_A( X \log^{-A} X)}$ for any ${A}$.

Our arguments initially proceed along standard lines. One can use the Hardy-Littlewood circle method to express the correlation in (2) as an integral involving exponential sums ${S(\alpha) := \sum_{n \leq X} \Lambda(n) e(\alpha n)}$. The contribution of “major arc” ${\alpha}$ is known by a standard computation to recover the main term ${{\mathfrak S}(h) X}$ plus acceptable errors, so it is a matter of controlling the “minor arcs”. After averaging in ${h}$ and using the Plancherel identity, one is basically faced with establishing a bound of the form

$\displaystyle \int_{\beta-1/H}^{\beta+1/H} |S(\alpha)|^2\ d\alpha \ll_A X \log^{-A} X$

for any “minor arc” ${\beta}$. If ${\beta}$ is somewhat close to a low height rational ${a/q}$ (specifically, if it is within ${X^{-1/6-\varepsilon}}$ of such a rational with ${q = O(\log^{O(1)} X)}$), then this type of estimate is roughly of comparable strength (by another application of Plancherel) to the best available prime number theorem in short intervals on the average, namely that the prime number theorem holds for most intervals of the form ${[x, x + x^{1/6+\varepsilon}]}$, and we can handle this case using standard mean value theorems for Dirichlet series. So we can restrict attention to the “strongly minor arc” case where ${\beta}$ is far from such rationals.

The next step (following some ideas we found in a paper of Zhan) is to rewrite this estimate not in terms of the exponential sums ${S(\alpha) := \sum_{n \leq X} \Lambda(n) e(\alpha n)}$, but rather in terms of the Dirichlet polynomial ${F(s) := \sum_{n \sim X} \frac{\Lambda(n)}{n^s}}$. After a certain amount of computation (including some oscillatory integral estimates arising from stationary phase), one is eventually reduced to the task of establishing an estimate of the form

$\displaystyle \int_{t \sim \lambda X} (\sum_{t-\lambda H}^{t+\lambda H} |F(\frac{1}{2}+it')|\ dt')^2\ dt \ll_A \lambda^2 H^2 X \log^{-A} X$

for any ${X^{-1/6-\varepsilon} \ll \lambda \ll \log^{-B} X}$ (with ${B}$ sufficiently large depending on ${A}$).

The next step, which is again standard, is the use of the Heath-Brown identity (as discussed for instance in this previous blog post) to split up ${\Lambda}$ into a number of components that have a Dirichlet convolution structure. Because the exponent ${8/33}$ we are shooting for is less than ${1/4}$, we end up with five types of components that arise, which we call “Type ${d_1}$“, “Type ${d_2}$“, “Type ${d_3}$“, “Type ${d_4}$“, and “Type II”. The “Type II” sums are Dirichlet convolutions involving a factor supported on a range ${[X^\varepsilon, X^{-\varepsilon} H]}$ and is quite easy to deal with; the “Type ${d_j}$” terms are Dirichlet convolutions that resemble (non-degenerate portions of) the ${j^{th}}$ divisor function, formed from convolving together ${j}$ portions of ${1}$. The “Type ${d_1}$” and “Type ${d_2}$” terms can be estimated satisfactorily by standard moment estimates for Dirichlet polynomials; this already recovers the result of Mikawa (and our argument is in fact slightly more elementary in that no Kloosterman sum estimates are required). It is the treatment of the “Type ${d_3}$” and “Type ${d_4}$” sums that require some new analysis, with the Type ${d_3}$ terms turning to be the most delicate. After using an existing moment estimate of Jutila for Dirichlet L-functions, matters reduce to obtaining a family of estimates, a typical one of which (relating to the more difficult Type ${d_3}$ sums) is of the form

$\displaystyle \int_{t - H}^{t+H} |M( \frac{1}{2} + it')|^2\ dt' \ll X^{\varepsilon^2} H \ \ \ \ \ (3)$

for “typical” ordinates ${t}$ of size ${X}$, where ${M}$ is the Dirichlet polynomial ${M(s) := \sum_{n \sim X^{1/3}} \frac{1}{n^s}}$ (a fragment of the Riemann zeta function). The precise definition of “typical” is a little technical (because of the complicated nature of Jutila’s estimate) and will not be detailed here. Such a claim would follow easily from the Lindelof hypothesis (which would imply that ${M(1/2 + it) \ll X^{o(1)}}$) but of course we would like to have an unconditional result.

At this point, having exhausted all the Dirichlet polynomial estimates that are usefully available, we return to “physical space”. Using some further Fourier-analytic and oscillatory integral computations, we can estimate the left-hand side of (3) by an expression that is roughly of the shape

$\displaystyle \frac{H}{X^{1/3}} \sum_{\ell \sim X^{1/3}/H} |\sum_{m \sim X^{1/3}} e( \frac{t}{2\pi} \log \frac{m+\ell}{m-\ell} )|.$

The phase ${\frac{t}{2\pi} \log \frac{m+\ell}{m-\ell}}$ can be Taylor expanded as the sum of ${\frac{t_j \ell}{\pi m}}$ and a lower order term ${\frac{t_j \ell^3}{3\pi m^3}}$, plus negligible errors. If we could discard the lower order term then we would get quite a good bound using the exponential sum estimates of Robert and Sargos, which control averages of exponential sums with purely monomial phases, with the averaging allowing us to exploit the hypothesis that ${t}$ is “typical”. Figuring out how to get rid of this lower order term caused some inefficiency in our arguments; the best we could do (after much experimentation) was to use Fourier analysis to shorten the sums, estimate a one-parameter average exponential sum with a binomial phase by a two-parameter average with a monomial phase, and then use the van der Corput ${B}$ process followed by the estimates of Robert and Sargos. This rather complicated procedure works up to ${H = X^{8/33+\varepsilon}}$ it may be possible that some alternate way to proceed here could improve the exponent somewhat.

In a sequel to this paper, we will use a somewhat different method to reduce ${H}$ to a much smaller value of ${\log^{O(1)} X}$, but only if we replace the correlations ${\sum_{n \leq X} \Lambda(n) \Lambda(n+h)}$ by either ${\sum_{n \leq X} \Lambda(n) d_k(n+h)}$ or ${\sum_{n \leq X} d_k(n) d_l(n+h)}$, and also we now only save a ${o(1)}$ in the error term rather than ${O_A(\log^{-A} X)}$.

Kaisa Matomäki, Maksym Radziwiłł, and I have just uploaded to the arXiv our paper “Sign patterns of the Liouville and Möbius functions“. This paper is somewhat similar to our previous paper in that it is using the recent breakthrough of Matomäki and Radziwiłł on mean values of multiplicative functions to obtain partial results towards the Chowla conjecture. This conjecture can be phrased, roughly speaking, as follows: if ${k}$ is a fixed natural number and ${n}$ is selected at random from a large interval ${[1,x]}$, then the sign pattern ${(\lambda(n), \lambda(n+1),\dots,\lambda(n+k-1)) \in \{-1,+1\}^k}$ becomes asymptotically equidistributed in ${\{-1,+1\}^k}$ in the limit ${x \rightarrow \infty}$. This remains open for ${k \geq 2}$. In fact even the significantly weaker statement that each of the sign patterns in ${\{-1,+1\}^k}$ is attained infinitely often is open for ${k \geq 4}$. However, in 1986, Hildebrand showed that for ${k \leq 3}$ all sign patterns are indeed attained infinitely often. Our first result is a strengthening of Hildebrand’s, moving a little bit closer to Chowla’s conjecture:

Theorem 1 Let ${k \leq 3}$. Then each of the sign patterns in ${\{-1,+1\}^k}$ is attained by the Liouville function for a set of natural numbers ${n}$ of positive lower density.

Thus for instance one has ${\lambda(n)=\lambda(n+1)=\lambda(n+2)}$ for a set of ${n}$ of positive lower density. The ${k \leq 2}$ case of this theorem already appears in the original paper of Matomäki and Radziwiłł (and the significantly simpler case of the sign patterns ${++}$ and ${--}$ was treated previously by Harman, Pintz, and Wolke).

The basic strategy in all of these arguments is to assume for sake of contradiction that a certain sign pattern occurs extremely rarely, and then exploit the complete multiplicativity of ${\lambda}$ (which implies in particular that ${\lambda(2n) = -\lambda(n)}$, ${\lambda(3n) = -\lambda(n)}$, and ${\lambda(5n) = -\lambda(n)}$ for all ${n}$) together with some combinatorial arguments (vaguely analogous to solving a Sudoku puzzle!) to establish more complex sign patterns for the Liouville function, that are either inconsistent with each other, or with results such as the Matomäki-Radziwiłł result. To illustrate this, let us give some ${k=2}$ examples, arguing a little informally to emphasise the combinatorial aspects of the argument. First suppose that the sign pattern ${(\lambda(n),\lambda(n+1)) = (+1,+1)}$ almost never occurs. The prime number theorem tells us that ${\lambda(n)}$ and ${\lambda(n+1)}$ are each equal to ${+1}$ about half of the time, which by inclusion-exclusion implies that the sign pattern ${(\lambda(n),\lambda(n+1))=(-1,-1)}$ almost never occurs. In other words, we have ${\lambda(n+1) = -\lambda(n)}$ for almost all ${n}$. But from the multiplicativity property ${\lambda(2n)=-\lambda(n)}$ this implies that one should have

$\displaystyle \lambda(2n+2) = -\lambda(2n)$

$\displaystyle \lambda(2n+1) = -\lambda(2n)$

and

$\displaystyle \lambda(2n+2) = -\lambda(2n+1)$

for almost all ${n}$. But the above three statements are contradictory, and the claim follows.

Similarly, if we assume that the sign pattern ${(\lambda(n),\lambda(n+1)) = (+1,-1)}$ almost never occurs, then a similar argument to the above shows that for any fixed ${h}$, one has ${\lambda(n)=\lambda(n+1)=\dots=\lambda(n+h)}$ for almost all ${n}$. But this means that the mean ${\frac{1}{h} \sum_{j=1}^h \lambda(n+j)}$ is abnormally large for most ${n}$, which (for ${h}$ large enough) contradicts the results of Matomäki and Radziwiłł. Here we see that the “enemy” to defeat is the scenario in which ${\lambda}$ only changes sign very rarely, in which case one rarely sees the pattern ${(+1,-1)}$.

It turns out that similar (but more combinatorially intricate) arguments work for sign patterns of length three (but are unlikely to work for most sign patterns of length four or greater). We give here one fragment of such an argument (due to Hildebrand) which hopefully conveys the Sudoku-type flavour of the combinatorics. Suppose for instance that the sign pattern ${(\lambda(n),\lambda(n+1),\lambda(n+2)) = (+1,+1,+1)}$ almost never occurs. Now suppose ${n}$ is a typical number with ${\lambda(15n-1)=\lambda(15n+1)=+1}$. Since we almost never have the sign pattern ${(+1,+1,+1)}$, we must (almost always) then have ${\lambda(15n) = -1}$. By multiplicativity this implies that

$\displaystyle (\lambda(60n-4), \lambda(60n), \lambda(60n+4)) = (+1,-1,+1).$

We claim that this (almost always) forces ${\lambda(60n+5)=-1}$. For if ${\lambda(60n+5)=+1}$, then by the lack of the sign pattern ${(+1,+1,+1)}$, this (almost always) forces ${\lambda(60n+3)=\lambda(60n+6)=-1}$, which by multiplicativity forces ${\lambda(20n+1)=\lambda(20n+2)=+1}$, which by lack of ${(+1,+1,+1)}$ (almost always) forces ${\lambda(20n)=-1}$, which by multiplicativity contradicts ${\lambda(60n)=-1}$. Thus we have ${\lambda(60n+5)=-1}$; a similar argument gives ${\lambda(60n-5)=-1}$ almost always, which by multiplicativity gives ${\lambda(12n-1)=\lambda(12n)=\lambda(12n+1)=+1}$, a contradiction. Thus we almost never have ${\lambda(15n-1)=\lambda(15n+1)=+1}$, which by the inclusion-exclusion argument mentioned previously shows that ${\lambda(15n+1) = - \lambda(15n-1)}$ for almost all ${n}$.

One can continue these Sudoku-type arguments and conclude eventually that ${\lambda(3n-1)=-\lambda(3n+1)=\lambda(3n+2)}$ for almost all ${n}$. To put it another way, if ${\chi_3}$ denotes the non-principal Dirichlet character of modulus ${3}$, then ${\lambda \chi_3}$ is almost always constant away from the multiples of ${3}$. (Conversely, if ${\lambda \chi_3}$ changed sign very rarely outside of the multiples of three, then the sign pattern ${(+1,+1,+1)}$ would never occur.) Fortunately, the main result of Matomäki and Radziwiłł shows that this scenario cannot occur, which establishes that the sign pattern ${(+1,+1,+1)}$ must occur rather frequently. The other sign patterns are handled by variants of these arguments.

Excluding a sign pattern of length three leads to useful implications like “if ${\lambda(n-1)=\lambda(n)=+1}$, then ${\lambda(n+1)=-1}$” which turn out are just barely strong enough to quite rigidly constrain the Liouville function using Sudoku-like arguments. In contrast, excluding a sign pattern of length four only gives rise to implications like “`if ${\lambda(n-2)=\lambda(n-1)=\lambda(n)=+1}$, then ${\lambda(n+1)=-1}$“, and these seem to be much weaker for this purpose (the hypothesis in these implications just isn’t satisfied nearly often enough). So a different idea seems to be needed if one wishes to extend the above theorem to larger values of ${k}$.

Our second theorem gives an analogous result for the Möbius function ${\mu}$ (which takes values in ${\{-1,0,+1\}}$ rather than ${\{-1,1\}}$), but the analysis turns out to be remarkably difficult and we are only able to get up to ${k=2}$:

Theorem 2 Let ${k \leq 2}$. Then each of the sign patterns in ${\{-1,0,+1\}^k}$ is attained by the Möbius function for a set ${n}$ of positive lower density.

It turns out that the prime number theorem and elementary sieve theory can be used to handle the ${k=1}$ case and all the ${k=2}$ cases that involve at least one ${0}$, leaving only the four sign patterns ${(\pm 1, \pm 1)}$ to handle. It is here that the zeroes of the Möbius function cause a significant new obstacle. Suppose for instance that the sign pattern ${(+1, -1)}$ almost never occurs for the Möbius function. The same arguments that were used in the Liouville case then show that ${\mu(n)}$ will be almost always equal to ${\mu(n+1)}$, provided that ${n,n+1}$ are both square-free. One can try to chain this together as before to create a long string ${\mu(n)=\dots=\mu(n+h) \in \{-1,+1\}}$ where the Möbius function is constant, but this cannot work for any ${h}$ larger than three, because the Möbius function vanishes at every multiple of four.

The constraints we assume on the Möbius function can be depicted using a graph on the squarefree natural numbers, in which any two adjacent squarefree natural numbers are connected by an edge. The main difficulty is then that this graph is highly disconnected due to the multiples of four not being squarefree.

To get around this, we need to enlarge the graph. Note from multiplicativity that if ${\mu(n)}$ is almost always equal to ${\mu(n+1)}$ when ${n,n+1}$ are squarefree, then ${\mu(n)}$ is almost always equal to ${\mu(n+p)}$ when ${n,n+p}$ are squarefree and ${n}$ is divisible by ${p}$. We can then form a graph on the squarefree natural numbers by connecting ${n}$ to ${n+p}$ whenever ${n,n+p}$ are squarefree and ${n}$ is divisible by ${p}$. If this graph is “locally connected” in some sense, then ${\mu}$ will be constant on almost all of the squarefree numbers in a large interval, which turns out to be incompatible with the results of Matomäki and Radziwiłł. Because of this, matters are reduced to establishing the connectedness of a certain graph. More precisely, it turns out to be sufficient to establish the following claim:

Theorem 3 For each prime ${p}$, let ${a_p \hbox{ mod } p^2}$ be a residue class chosen uniformly at random. Let ${G}$ be the random graph whose vertices ${V}$ consist of those integers ${n}$ not equal to ${a_p \hbox{ mod } p^2}$ for any ${p}$, and whose edges consist of pairs ${n,n+p}$ in ${V}$ with ${n = a_p \hbox{ mod } p}$. Then with probability ${1}$, the graph ${G}$ is connected.

We were able to show the connectedness of this graph, though it turned out to be remarkably tricky to do so. Roughly speaking (and suppressing a number of technicalities), the main steps in the argument were as follows.

• (Early stage) Pick a large number ${X}$ (in our paper we take ${X}$ to be odd, but I’ll ignore this technicality here). Using a moment method to explore neighbourhoods of a single point in ${V}$, one can show that a vertex ${v}$ in ${V}$ is almost always connected to at least ${\log^{10} X}$ numbers in ${[v,v+X^{1/100}]}$, using relatively short paths of short diameter. (This is the most computationally intensive portion of the argument.)
• (Middle stage) Let ${X'}$ be a typical number in ${[X/40,X/20]}$, and let ${R}$ be a scale somewhere between ${X^{1/40}}$ and ${X'}$. By using paths ${n, n+p_1, n+p_1-p_2, n+p_1-p_2+p_3}$ involving three primes, and using a variant of Vinogradov’s theorem and some routine second moment computations, one can show that with quite high probability, any “good” vertex in ${[v+X'-R, v+X'-0.99R]}$ is connected to a “good” vertex in ${[v+X'-0.01R, v+X-0.0099 R]}$ by paths of length three, where the definition of “good” is somewhat technical but encompasses almost all of the vertices in ${V}$.
• (Late stage) Combining the two previous results together, we can show that most vertices ${v}$ will be connected to a vertex in ${[v+X'-X^{1/40}, v+X']}$ for any ${X'}$ in ${[X/40,X/20]}$. In particular, ${v}$ will be connected to a set of ${\gg X^{9/10}}$ vertices in ${[v,v+X/20]}$. By tracking everything carefully, one can control the length and diameter of the paths used to connect ${v}$ to this set, and one can also control the parity of the elements in this set.
• (Final stage) Now if we have two vertices ${v, w}$ at a distance ${X}$ apart. By the previous item, one can connect ${v}$ to a large set ${A}$ of vertices in ${[v,v+X/20]}$, and one can similarly connect ${w}$ to a large set ${B}$ of vertices in ${[w,w+X/20]}$. Now, by using a Vinogradov-type theorem and second moment calculations again (and ensuring that the elements of ${A}$ and ${B}$ have opposite parity), one can connect many of the vertices in ${A}$ to many of the vertices ${B}$ by paths of length three, which then connects ${v}$ to ${w}$, and gives the claim.

It seems of interest to understand random graphs like ${G}$ further. In particular, the graph ${G'}$ on the integers formed by connecting ${n}$ to ${n+p}$ for all ${n}$ in a randomly selected residue class mod ${p}$ for each prime ${p}$ is particularly interesting (it is to the Liouville function as ${G}$ is to the Möbius function); if one could show some “local expander” properties of this graph ${G'}$, then one would have a chance of modifying the above methods to attack the first unsolved case of the Chowla conjecture, namely that ${\lambda(n)\lambda(n+1)}$ has asymptotic density zero (perhaps working with logarithmic density instead of natural density to avoids some technicalities).

Kaisa Matomaki, Maksym Radziwill, and I have just uploaded to the arXiv our paper “An averaged form of Chowla’s conjecture“. This paper concerns a weaker variant of the famous conjecture of Chowla (discussed for instance in this previous post) that

$\displaystyle \sum_{n \leq X} \lambda(n+h_1) \dots \lambda(n+h_k) = o(X)$

as ${X \rightarrow \infty}$ for any distinct natural numbers ${h_1,\dots,h_k}$, where ${\lambda}$ denotes the Liouville function. (One could also replace the Liouville function here by the Möbius function ${\mu}$ and obtain a morally equivalent conjecture.) This conjecture remains open for any ${k \geq 2}$; for instance the assertion

$\displaystyle \sum_{n \leq X} \lambda(n) \lambda(n+2) = o(X)$

is a variant of the twin prime conjecture (though possibly a tiny bit easier to prove), and is subject to the notorious parity barrier (as discussed in this previous post).

Our main result asserts, roughly speaking, that Chowla’s conjecture can be established unconditionally provided one has non-trivial averaging in the ${h_1,\dots,h_k}$ parameters. More precisely, one has

Theorem 1 (Chowla on the average) Suppose ${H = H(X) \leq X}$ is a quantity that goes to infinity as ${X \rightarrow \infty}$ (but it can go to infinity arbitrarily slowly). Then for any fixed ${k \geq 1}$, we have

$\displaystyle \sum_{h_1,\dots,h_k \leq H} |\sum_{n \leq X} \lambda(n+h_1) \dots \lambda(n+h_k)| = o( H^k X ).$

In fact, we can remove one of the averaging parameters and obtain

$\displaystyle \sum_{h_2,\dots,h_k \leq H} |\sum_{n \leq X} \lambda(n) \lambda(n+h_2) \dots \lambda(n+h_k)| = o( H^{k-1} X ).$

Actually we can make the decay rate a bit more quantitative, gaining about ${\frac{\log\log H}{\log H}}$ over the trivial bound. The key case is ${k=2}$; while the unaveraged Chowla conjecture becomes more difficult as ${k}$ increases, the averaged Chowla conjecture does not increase in difficulty due to the increasing amount of averaging for larger ${k}$, and we end up deducing the higher ${k}$ case of the conjecture from the ${k=2}$ case by an elementary argument.

The proof of the theorem proceeds as follows. By exploiting the Fourier-analytic identity

$\displaystyle \int_{{\mathbf T}} (\int_{\mathbf R} |\sum_{x \leq n \leq x+H} f(n) e(\alpha n)|^2 dx)^2\ d\alpha$

$\displaystyle = \sum_{|h| \leq H} (H-|h|)^2 |\sum_n f(n) \overline{f}(n+h)|^2$

(related to a standard Fourier-analytic identity for the Gowers ${U^2}$ norm) it turns out that the ${k=2}$ case of the above theorem can basically be derived from an estimate of the form

$\displaystyle \int_0^X |\sum_{x \leq n \leq x+H} \lambda(n) e(\alpha n)|\ dx = o( H X )$

uniformly for all ${\alpha \in {\mathbf T}}$. For “major arc” ${\alpha}$, close to a rational ${a/q}$ for small ${q}$, we can establish this bound from a generalisation of a recent result of Matomaki and Radziwill (discussed in this previous post) on averages of multiplicative functions in short intervals. For “minor arc” ${\alpha}$, we can proceed instead from an argument of Katai and Bourgain-Sarnak-Ziegler (discussed in this previous post).

The argument also extends to other bounded multiplicative functions than the Liouville function. Chowla’s conjecture was generalised by Elliott, who roughly speaking conjectured that the ${k}$ copies of ${\lambda}$ in Chowla’s conjecture could be replaced by arbitrary bounded multiplicative functions ${g_1,\dots,g_k}$ as long as these functions were far from a twisted Dirichlet character ${n \mapsto \chi(n) n^{it}}$ in the sense that

$\displaystyle \sum_p \frac{1 - \hbox{Re} g(p) \overline{\chi(p) p^{it}}}{p} = +\infty. \ \ \ \ \ (1)$

(This type of distance is incidentally now a fundamental notion in the Granville-Soundararajan “pretentious” approach to multiplicative number theory.) During our work on this project, we found that Elliott’s conjecture is not quite true as stated due to a technicality: one can cook up a bounded multiplicative function ${g}$ which behaves like ${n^{it_j}}$ on scales ${n \sim N_j}$ for some ${N_j}$ going to infinity and some slowly varying ${t_j}$, and such a function will be far from any fixed Dirichlet character whilst still having many large correlations (e.g. the pair correlations ${\sum_{n \leq N_j} g(n+1) \overline{g(n)}}$ will be large). In our paper we propose a technical “fix” to Elliott’s conjecture (replacing (1) by a truncated variant), and show that this repaired version of Elliott’s conjecture is true on the average in much the same way that Chowla’s conjecture is. (If one restricts attention to real-valued multiplicative functions, then this technical issue does not show up, basically because one can assume without loss of generality that ${t=0}$ in this case; we discuss this fact in an appendix to the paper.)

In analytic number theory, it is a well-known phenomenon that for many arithmetic functions ${f: {\bf N} \rightarrow {\bf C}}$ of interest in number theory, it is significantly easier to estimate logarithmic sums such as

$\displaystyle \sum_{n \leq x} \frac{f(n)}{n}$

than it is to estimate summatory functions such as

$\displaystyle \sum_{n \leq x} f(n).$

(Here we are normalising ${f}$ to be roughly constant in size, e.g. ${f(n) = O( n^{o(1)} )}$ as ${n \rightarrow \infty}$.) For instance, when ${f}$ is the von Mangoldt function ${\Lambda}$, the logarithmic sums ${\sum_{n \leq x} \frac{\Lambda(n)}{n}}$ can be adequately estimated by Mertens’ theorem, which can be easily proven by elementary means (see Notes 1); but a satisfactory estimate on the summatory function ${\sum_{n \leq x} \Lambda(n)}$ requires the prime number theorem, which is substantially harder to prove (see Notes 2). (From a complex-analytic or Fourier-analytic viewpoint, the problem is that the logarithmic sums ${\sum_{n \leq x} \frac{f(n)}{n}}$ can usually be controlled just from knowledge of the Dirichlet series ${\sum_n \frac{f(n)}{n^s}}$ for ${s}$ near ${1}$; but the summatory functions require control of the Dirichlet series ${\sum_n \frac{f(n)}{n^s}}$ for ${s}$ on or near a large portion of the line ${\{ 1+it: t \in {\bf R} \}}$. See Notes 2 for further discussion.)

Viewed conversely, whenever one has a difficult estimate on a summatory function such as ${\sum_{n \leq x} f(n)}$, one can look to see if there is a “cheaper” version of that estimate that only controls the logarithmic sums ${\sum_{n \leq x} \frac{f(n)}{n}}$, which is easier to prove than the original, more “expensive” estimate. In this post, we shall do this for two theorems, a classical theorem of Halasz on mean values of multiplicative functions on long intervals, and a much more recent result of Matomaki and Radziwiłł on mean values of multiplicative functions in short intervals. The two are related; the former theorem is an ingredient in the latter (though in the special case of the Matomaki-Radziwiłł theorem considered here, we will not need Halasz’s theorem directly, instead using a key tool in the proof of that theorem).

We begin with Halasz’s theorem. Here is a version of this theorem, due to Montgomery and to Tenenbaum:

Theorem 1 (Halasz-Montgomery-Tenenbaum) Let ${f: {\bf N} \rightarrow {\bf C}}$ be a multiplicative function with ${|f(n)| \leq 1}$ for all ${n}$. Let ${x \geq 3}$ and ${T \geq 1}$, and set

$\displaystyle M := \min_{|t| \leq T} \sum_{p \leq x} \frac{1 - \hbox{Re}( f(p) p^{-it} )}{p}.$

Then one has

$\displaystyle \frac{1}{x} \sum_{n \leq x} f(n) \ll (1+M) e^{-M} + \frac{1}{\sqrt{T}}.$

Informally, this theorem asserts that ${\sum_{n \leq x} f(n)}$ is small compared with ${x}$, unless ${f}$ “pretends” to be like the character ${p \mapsto p^{it}}$ on primes for some small ${y}$. (This is the starting point of the “pretentious” approach of Granville and Soundararajan to analytic number theory, as developed for instance here.) We now give a “cheap” version of this theorem which is significantly weaker (both because it settles for controlling logarithmic sums rather than summatory functions, it requires ${f}$ to be completely multiplicative instead of multiplicative, it requires a strong bound on the analogue of the quantity ${M}$, and because it only gives qualitative decay rather than quantitative estimates), but easier to prove:

Theorem 2 (Cheap Halasz) Let ${x}$ be an asymptotic parameter goingto infinity. Let ${f: {\bf N} \rightarrow {\bf C}}$ be a completely multiplicative function (possibly depending on ${x}$) such that ${|f(n)| \leq 1}$ for all ${n}$, such that

$\displaystyle \sum_{p \leq x} \frac{1 - \hbox{Re}( f(p) )}{p} \gg \log\log x. \ \ \ \ \ (1)$

Then

$\displaystyle \frac{1}{\log x} \sum_{n \leq x} \frac{f(n)}{n} = o(1). \ \ \ \ \ (2)$

Note that now that we are content with estimating exponential sums, we no longer need to preclude the possibility that ${f(p)}$ pretends to be like ${p^{it}}$; see Exercise 11 of Notes 1 for a related observation.

To prove this theorem, we first need a special case of the Turan-Kubilius inequality.

Lemma 3 (Turan-Kubilius) Let ${x}$ be a parameter going to infinity, and let ${1 < P < x}$ be a quantity depending on ${x}$ such that ${P = x^{o(1)}}$ and ${P \rightarrow \infty}$ as ${x \rightarrow \infty}$. Then

$\displaystyle \sum_{n \leq x} \frac{ | \frac{1}{\log \log P} \sum_{p \leq P: p|n} 1 - 1 |}{n} = o( \log x ).$

Informally, this lemma is asserting that

$\displaystyle \sum_{p \leq P: p|n} 1 \approx \log \log P$

for most large numbers ${n}$. Another way of writing this heuristically is in terms of Dirichlet convolutions:

$\displaystyle 1 \approx 1 * \frac{1}{\log\log P} 1_{{\mathcal P} \cap [1,P]}.$

This type of estimate was previously discussed as a tool to establish a criterion of Katai and Bourgain-Sarnak-Ziegler for Möbius orthogonality estimates in this previous blog post. See also Section 5 of Notes 1 for some similar computations.

Proof: By Cauchy-Schwarz it suffices to show that

$\displaystyle \sum_{n \leq x} \frac{ | \frac{1}{\log \log P} \sum_{p \leq P: p|n} 1 - 1 |^2}{n} = o( \log x ).$

Expanding out the square, it suffices to show that

$\displaystyle \sum_{n \leq x} \frac{ (\frac{1}{\log \log P} \sum_{p \leq P: p|n} 1)^j}{n} = \log x + o( \log x )$

for ${j=0,1,2}$.

We just show the ${j=2}$ case, as the ${j=0,1}$ cases are similar (and easier). We rearrange the left-hand side as

$\displaystyle \frac{1}{(\log\log P)^2} \sum_{p_1, p_2 \leq P} \sum_{n \leq x: p_1,p_2|n} \frac{1}{n}.$

We can estimate the inner sum as ${(1+o(1)) \frac{1}{[p_1,p_2]} \log x}$. But a routine application of Mertens’ theorem (handling the diagonal case when ${p_1=p_2}$ separately) shows that

$\displaystyle \sum_{p_1, p_2 \leq P} \frac{1}{[p_1,p_2]} = (1+o(1)) (\log\log P)^2$

and the claim follows. $\Box$

Remark 4 As an alternative to the Turan-Kubilius inequality, one can use the Ramaré identity

$\displaystyle \sum_{p \leq P: p|n} \frac{1}{\# \{ p' \leq P: p'|n\} + 1} - 1 = 1_{(p,n)=1 \hbox{ for all } p \leq P}$

(see e.g. Section 17.3 of Friedlander-Iwaniec). This identity turns out to give superior quantitative results than the Turan-Kubilius inequality in applications; see the paper of Matomaki and Radziwiłł for an instance of this.

We now prove Theorem 2. Let ${Q}$ denote the left-hand side of (2); by the triangle inequality we have ${Q=O(1)}$. By Lemma 3 (for some ${P = x^{o(1)}}$ to be chosen later) and the triangle inequality we have

$\displaystyle \sum_{n \leq x} \frac{\frac{1}{\log \log P} \sum_{p \leq P: p|n} f(n)}{n} = Q \log x + o( \log x ).$

We rearrange the left-hand side as

$\displaystyle \frac{1}{\log\log P} \sum_{p \leq P} \frac{f(p)}{p} \sum_{m \leq x/p} \frac{f(m)}{m}.$

We now replace the constraint ${m \leq x/p}$ by ${m \leq x}$. The error incurred in doing so is

$\displaystyle O( \frac{1}{\log\log P} \sum_{p \leq P} \frac{1}{p} \sum_{x/P \leq m \leq x} \frac{1}{m} )$

which by Mertens’ theorem is ${O(\log P) = o( \log x )}$. Thus we have

$\displaystyle \frac{1}{\log\log P} \sum_{p \leq P} \frac{f(p)}{p} \sum_{m \leq x} \frac{f(m)}{m} = Q \log x + o( \log x ).$

But by definition of ${Q}$, we have ${\sum_{m \leq x} \frac{f(m)}{m} = Q \log x}$, thus

$\displaystyle [1 - \frac{1}{\log\log P} \sum_{p \leq P} \frac{f(p)}{p}] Q = o(1). \ \ \ \ \ (3)$

From Mertens’ theorem, the expression in brackets can be rewritten as

$\displaystyle \frac{1}{\log\log P} \sum_{p \leq P} \frac{1 - f(p)}{p} + o(1)$

and so the real part of this expression is

$\displaystyle \frac{1}{\log\log P} \sum_{p \leq P} \frac{1 - \hbox{Re} f(p)}{p} + o(1).$

By (1), Mertens’ theorem and the hypothesis on ${f}$ we have

$\displaystyle \sum_{p \leq x^\varepsilon} \frac{(1 - \hbox{Re} f(p)) \log p}{p} \gg \log\log x^\varepsilon - O_\varepsilon(1)$

for any ${\varepsilon > 0}$. This implies that we can find ${P = x^{o(1)}}$ going to infinity such that

$\displaystyle \sum_{p \leq P} \frac{(1 - \hbox{Re} f(p)) \log p}{p} \gg (1-o(1))\log\log P$

and thus the expression in brackets has real part ${\gg 1-o(1)}$. The claim follows.

The Turan-Kubilius argument is certainly not the most efficient way to estimate sums such as ${\frac{1}{n} \sum_{n \leq x} f(n)}$. In the exercise below we give a significantly more accurate estimate that works when ${f}$ is non-negative.

Exercise 5 (Granville-Koukoulopoulos-Matomaki)

• (i) If ${g}$ is a completely multiplicative function with ${g(p) \in \{0,1\}}$ for all primes ${p}$, show that

$\displaystyle (e^{-\gamma}-o(1)) \prod_{p \leq x} (1 - \frac{g(p)}{p})^{-1} \leq \sum_{n \leq x} \frac{g(n)}{n} \leq \prod_{p \leq x} (1 - \frac{g(p)}{p})^{-1}.$

as ${x \rightarrow \infty}$. (Hint: for the upper bound, expand out the Euler product. For the lower bound, show that ${\sum_{n \leq x} \frac{g(n)}{n} \times \sum_{n \leq x} \frac{h(n)}{n} \ge \sum_{n \leq x} \frac{1}{n}}$, where ${h}$ is the completely multiplicative function with ${h(p) = 1-g(p)}$ for all primes ${p}$.)

• (ii) If ${g}$ is multiplicative and takes values in ${[0,1]}$, show that

$\displaystyle \sum_{n \leq x} \frac{g(n)}{n} \asymp \prod_{p \leq x} (1 - \frac{g(p)}{p})^{-1}$

$\displaystyle \asymp \exp( \sum_{p \leq x} \frac{g(p)}{p} )$

for all ${x \geq 1}$.

Now we turn to a very recent result of Matomaki and Radziwiłł on mean values of multiplicative functions in short intervals. For sake of illustration we specialise their results to the simpler case of the Liouville function ${\lambda}$, although their arguments actually work (with some additional effort) for arbitrary multiplicative functions of magnitude at most ${1}$ that are real-valued (or more generally, stay far from complex characters ${p \mapsto p^{it}}$). Furthermore, we give a qualitative form of their estimates rather than a quantitative one:

Theorem 6 (Matomaki-Radziwiłł, special case) Let ${X}$ be a parameter going to infinity, and let ${2 \leq h \leq X}$ be a quantity going to infinity as ${X \rightarrow \infty}$. Then for all but ${o(X)}$ of the integers ${x \in [X,2X]}$, one has

$\displaystyle \sum_{x \leq n \leq x+h} \lambda(n) = o( h ).$

Equivalently, one has

$\displaystyle \sum_{X \leq x \leq 2X} |\sum_{x \leq n \leq x+h} \lambda(n)|^2 = o( h^2 X ). \ \ \ \ \ (4)$

A simple sieving argument (see Exercise 18 of Supplement 4) shows that one can replace ${\lambda}$ by the Möbius function ${\mu}$ and obtain the same conclusion. See this recent note of Matomaki and Radziwiłł for a simple proof of their (quantitative) main theorem in this special case.

Of course, (4) improves upon the trivial bound of ${O( h^2 X )}$. Prior to this paper, such estimates were only known (using arguments similar to those in Section 3 of Notes 6) for ${h \geq X^{1/6+\varepsilon}}$ unconditionally, or for ${h \geq \log^A X}$ for some sufficiently large ${A}$ if one assumed the Riemann hypothesis. This theorem also represents some progress towards Chowla’s conjecture (discussed in Supplement 4) that

$\displaystyle \sum_{n \leq x} \lambda(n+h_1) \dots \lambda(n+h_k) = o( x )$

as ${x \rightarrow \infty}$ for any fixed distinct ${h_1,\dots,h_k}$; indeed, it implies that this conjecture holds if one performs a small amount of averaging in the ${h_1,\dots,h_k}$.

Below the fold, we give a “cheap” version of the Matomaki-Radziwiłł argument. More precisely, we establish

Theorem 7 (Cheap Matomaki-Radziwiłł) Let ${X}$ be a parameter going to infinity, and let ${1 \leq T \leq X}$. Then

$\displaystyle \int_X^{X^A} \left|\sum_{x \leq n \leq e^{1/T} x} \frac{\lambda(n)}{n}\right|^2\frac{dx}{x} = o\left( \frac{\log X}{T^2} \right), \ \ \ \ \ (5)$

for any fixed ${A>1}$.

Note that (5) improves upon the trivial bound of ${O( \frac{\log X}{T^2} )}$. Again, one can replace ${\lambda}$ with ${\mu}$ if desired. Due to the cheapness of Theorem 7, the proof will require few ingredients; the deepest input is the improved zero-free region for the Riemann zeta function due to Vinogradov and Korobov. Other than that, the main tools are the Turan-Kubilius result established above, and some Fourier (or complex) analysis.