Kaisa Matomäki, Xuancheng Shao, Joni Teräväinen, and myself have just uploaded to the arXiv our preprint “Higher uniformity of arithmetic functions in short intervals I. All intervals“. This paper investigates the higher order (Gowers) uniformity of standard arithmetic functions in analytic number theory (and specifically, the Möbius function {\mu}, the von Mangoldt function {\Lambda}, and the generalised divisor functions {d_k}) in short intervals {(X,X+H]}, where {X} is large and {H} lies in the range {X^{\theta+\varepsilon} \leq H \leq X^{1-\varepsilon}} for a fixed constant {0 < \theta < 1} (that one would like to be as small as possible). If we let {f} denote one of the functions {\mu, \Lambda, d_k}, then there is extensive literature on the estimation of short sums

\displaystyle  \sum_{X < n \leq X+H} f(n)

and some literature also on the estimation of exponential sums such as

\displaystyle  \sum_{X < n \leq X+H} f(n) e(-\alpha n)

for a real frequency {\alpha}, where {e(\theta) := e^{2\pi i \theta}}. For applications in the additive combinatorics of such functions {f}, it is also necessary to consider more general correlations, such as polynomial correlations

\displaystyle  \sum_{X < n \leq X+H} f(n) e(-P(n))

where {P: {\bf Z} \rightarrow {\bf R}} is a polynomial of some fixed degree, or more generally

\displaystyle  \sum_{X < n \leq X+H} f(n) \overline{F}(g(n) \Gamma)

where {G/\Gamma} is a nilmanifold of fixed degree and dimension (and with some control on structure constants), {g: {\bf Z} \rightarrow G} is a polynomial map, and {F: G/\Gamma \rightarrow {\bf C}} is a Lipschitz function (with some bound on the Lipschitz constant). Indeed, thanks to the inverse theorem for the Gowers uniformity norm, such correlations let one control the Gowers uniformity norm of {f} (possibly after subtracting off some renormalising factor) on such short intervals {(X,X+H]}, which can in turn be used to control other multilinear correlations involving such functions.

Traditionally, asymptotics for such sums are expressed in terms of a “main term” of some arithmetic nature, plus an error term that is estimated in magnitude. For instance, a sum such as {\sum_{X < n \leq X+H} \Lambda(n) e(-\alpha n)} would be approximated in terms of a main term that vanished (or is negligible) if {\alpha} is “minor arc”, but would be expressible in terms of something like a Ramanujan sum if {\alpha} was “major arc”, together with an error term. We found it convenient to cancel off such main terms by subtracting an approximant {f^\sharp} from each of the arithmetic functions {f} and then getting upper bounds on remainder correlations such as

\displaystyle  |\sum_{X < n \leq X+H} (f(n)-f^\sharp(n)) \overline{F}(g(n) \Gamma)| \ \ \ \ \ (1)

(actually for technical reasons we also allow the {n} variable to be restricted further to a subprogression of {(X,X+H]}, but let us ignore this minor extension for this discussion). There is some flexibility in how to choose these approximants, but we eventually found it convenient to use the following choices.

  • For the Möbius function {\mu}, we simply set {\mu^\sharp = 0}, as per the Möbius pseudorandomness conjecture. (One could choose a more sophisticated approximant in the presence of a Siegel zero, as I did with Joni in this recent paper, but we do not do so here.)
  • For the von Mangoldt function {\Lambda}, we eventually went with the Cramér-Granville approximant {\Lambda^\sharp(n) = \frac{W}{\phi(W)} 1_{(n,W)=1}}, where {W = \prod_{p < R} p} and {R = \exp(\log^{1/10} X)}.
  • For the divisor functions {d_k}, we used a somewhat complicated-looking approximant {d_k^\sharp(n) = \sum_{m \leq X^{\frac{k-1}{5k}}} P_m(\log n)} for some explicit polynomials {P_m}, chosen so that {d_k^\sharp} and {d_k} have almost exactly the same sums along arithmetic progressions (see the paper for details).

The objective is then to obtain bounds on sums such as (1) that improve upon the “trivial bound” that one can get with the triangle inequality and standard number theory bounds such as the Brun-Titchmarsh inequality. For {\mu} and {\Lambda}, the Siegel-Walfisz theorem suggests that it is reasonable to expect error terms that have “strongly logarithmic savings” in the sense that they gain a factor of {O_A(\log^{-A} X)} over the trivial bound for any {A>0}; for {d_k}, the Dirichlet hyperbola method suggests instead that one has “power savings” in that one should gain a factor of {X^{-c_k}} over the trivial bound for some {c_k>0}. In the case of the Möbius function {\mu}, there is an additional trick (introduced by Matomäki and Teräväinen) that allows one to lower the exponent {\theta} somewhat at the cost of only obtaining “weakly logarithmic savings” of shape {\log^{-c} X} for some small {c>0}.

Our main estimates on sums of the form (1) work in the following ranges:

  • For {\theta=5/8}, one can obtain strongly logarithmic savings on (1) for {f=\mu,\Lambda}, and power savings for {f=d_k}.
  • For {\theta=3/5}, one can obtain weakly logarithmic savings for {f = \mu, d_k}.
  • For {\theta=5/9}, one can obtain power savings for {f=d_3}.
  • For {\theta=1/3}, one can obtain power savings for {f=d_2}.

Conjecturally, one should be able to obtain power savings in all cases, and lower {\theta} down to zero, but the ranges of exponents and savings given here seem to be the limit of current methods unless one assumes additional hypotheses, such as GRH. The {\theta=5/8} result for correlation against Fourier phases {e(\alpha n)} was established previously by Zhan, and the {\theta=3/5} result for such phases and {f=\mu} was established previously by by Matomäki and Teräväinen.

By combining these results with tools from additive combinatorics, one can obtain a number of applications:

  • Direct insertion of our bounds in the recent work of Kanigowski, Lemanczyk, and Radziwill on the prime number theorem on dynamical systems that are analytic skew products gives some improvements in the exponents there.
  • We can obtain a “short interval” version of a multiple ergodic theorem along primes established by Frantzikinakis-Host-Kra and Wooley-Ziegler, in which we average over intervals of the form {(X,X+H]} rather than {[1,X]}.
  • We can obtain a “short interval” version of the “linear equations in primes” asymptotics obtained by Ben Green, Tamar Ziegler, and myself in this sequence of papers, where the variables in these equations lie in short intervals {(X,X+H]} rather than long intervals such as {[1,X]}.

We now briefly discuss some of the ingredients of proof of our main results. The first step is standard, using combinatorial decompositions (based on the Heath-Brown identity and (for the {\theta=3/5} result) the Ramaré identity) to decompose {\mu(n), \Lambda(n), d_k(n)} into more tractable sums of the following types:

  • Type {I} sums, which are basically of the form {\sum_{m \leq A:m|n} \alpha(m)} for some weights {\alpha(m)} of controlled size and some cutoff {A} that is not too large;
  • Type {II} sums, which are basically of the form {\sum_{A_- \leq m \leq A_+:m|n} \alpha(m)\beta(n/m)} for some weights {\alpha(m)}, {\beta(n)} of controlled size and some cutoffs {A_-, A_+} that are not too close to {1} or to {X};
  • Type {I_2} sums, which are basically of the form {\sum_{m \leq A:m|n} \alpha(m) d_2(n/m)} for some weights {\alpha(m)} of controlled size and some cutoff {A} that is not too large.

The precise ranges of the cutoffs {A, A_-, A_+} depend on the choice of {\theta}; our methods fail once these cutoffs pass a certain threshold, and this is the reason for the exponents {\theta} being what they are in our main results.

The Type {I} sums involving nilsequences can be treated by methods similar to those in this previous paper of Ben Green and myself; the main innovations are in the treatment of the Type {II} and Type {I_2} sums.

For the Type {II} sums, one can split into the “abelian” case in which (after some Fourier decomposition) the nilsequence {F(g(n)\Gamma)} is basically of the form {e(P(n))}, and the “non-abelian” case in which {G} is non-abelian and {F} exhibits non-trivial oscillation in a central direction. In the abelian case we can adapt arguments of Matomaki and Shao, which uses Cauchy-Schwarz and the equidistribution properties of polynomials to obtain good bounds unless {e(P(n))} is “major arc” in the sense that it resembles (or “pretends to be”) {\chi(n) n^{it}} for some Dirichlet character {\chi} and some frequency {t}, but in this case one can use classical multiplicative methods to control the correlation. It turns out that the non-abelian case can be treated similarly. After applying Cauchy-Schwarz, one ends up analyzing the equidistribution of the four-variable polynomial sequence

\displaystyle  (n,m,n',m') \mapsto (g(nm)\Gamma, g(n'm)\Gamma, g(nm') \Gamma, g(n'm'\Gamma))

as {n,m,n',m'} range in various dyadic intervals. Using the known multidimensional equidistribution theory of polynomial maps in nilmanifolds, one can eventually show in the non-abelian case that this sequence either has enough equidistribution to give cancellation, or else the nilsequence involved can be replaced with one from a lower dimensional nilmanifold, in which case one can apply an induction hypothesis.

For the type {I_2} sum, a model sum to study is

\displaystyle  \sum_{X < n \leq X+H} d_2(n) e(\alpha n)

which one can expand as

\displaystyle  \sum_{n,m: X < nm \leq X+H} e(\alpha nm).

We experimented with a number of ways to treat this type of sum (including automorphic form methods, or methods based on the Voronoi formula or van der Corput’s inequality), but somewhat to our surprise, the most efficient approach was an elementary one, in which one uses the Dirichlet approximation theorem to decompose the hyperbolic region {\{ (n,m) \in {\bf N}^2: X < nm \leq X+H \}} into a number of arithmetic progressions, and then uses equidistribution theory to establish cancellation of sequences such as {e(\alpha nm)} on the majority of these progressions. As it turns out, this strategy works well in the regime {H > X^{1/3+\varepsilon}} unless the nilsequence involved is “major arc”, but the latter case is treatable by existing methods as discussed previously; this is why the {\theta} exponent for our {d_2} result can be as low as {1/3}.

In a sequel to this paper (currently in preparation), we will obtain analogous results for almost all intervals {(x,x+H]} with {x} in the range {[X,2X]}, in which we will be able to lower {\theta} all the way to {0}.