Let us call an arithmetic function {f: {\bf N} \rightarrow {\bf C}} {1}-bounded if we have {|f(n)| \leq 1} for all {n \in {\bf N}}. In this section we focus on the asymptotic behaviour of {1}-bounded multiplicative functions. Some key examples of such functions include:

  • The Möbius function {\mu};
  • The Liouville function {\lambda};
  • Archimedean” characters {n \mapsto n^{it}} (which I call Archimedean because they are pullbacks of a Fourier character {x \mapsto x^{it}} on the multiplicative group {{\bf R}^+}, which has the Archimedean property);
  • Dirichlet characters (or “non-Archimedean” characters) {\chi} (which are essentially pullbacks of Fourier characters on a multiplicative cyclic group {({\bf Z}/q{\bf Z})^\times} with the discrete (non-Archimedean) metric);
  • Hybrid characters {n \mapsto \chi(n) n^{it}}.

The space of {1}-bounded multiplicative functions is also closed under multiplication and complex conjugation.

Given a multiplicative function {f}, we are often interested in the asymptotics of long averages such as

\displaystyle  \frac{1}{X} \sum_{n \leq X} f(n)

for large values of {X}, as well as short sums

\displaystyle  \frac{1}{H} \sum_{x \leq n \leq x+H} f(n)

where {H} and {x} are both large, but {H} is significantly smaller than {x}. (Throughout these notes we will try to normalise most of the sums and integrals appearing here as averages that are trivially bounded by {O(1)}; note that other normalisations are preferred in some of the literature cited here.) For instance, as we established in Theorem 58 of Notes 1, the prime number theorem is equivalent to the assertion that

\displaystyle  \frac{1}{X} \sum_{n \leq X} \mu(n) = o(1) \ \ \ \ \ (1)

as {X \rightarrow \infty}. The Liouville function behaves almost identically to the Möbius function, in that estimates for one function almost always imply analogous estimates for the other:

Exercise 1 Without using the prime number theorem, show that (1) is also equivalent to

\displaystyle  \frac{1}{X} \sum_{n \leq X} \lambda(n) = o(1) \ \ \ \ \ (2)

as {X \rightarrow \infty}. (Hint: use the identities {\lambda(n) = \sum_{d^2|n} \mu(n/d^2)} and {\mu(n) = \sum_{d^2|n} \mu(d) \lambda(n/d^2)}.)

Henceforth we shall focus our discussion more on the Liouville function, and turn our attention to averages on shorter intervals. From (2) one has

\displaystyle  \frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n) = o(1) \ \ \ \ \ (3)

as {x \rightarrow \infty} if {H = H(x)} is such that {H \geq \varepsilon x} for some fixed {\varepsilon>0}. However it is significantly more difficult to understand what happens when {H} grows much slower than this. By using the techniques based on zero density estimates discussed in Notes 6, it was shown by Motohashi and that one can also establish \eqref. On the Riemann Hypothesis Maier and Montgomery lowered the threshold to {H \geq x^{1/2} \log^C x} for an absolute constant {C} (the bound {H \geq x^{1/2+\varepsilon}} is more classical, following from Exercise 33 of Notes 2). On the other hand, the randomness heuristics from Supplement 4 suggest that {H} should be able to be taken as small as {x^\varepsilon}, and perhaps even {\log^{1+\varepsilon} x} if one is particularly optimistic about the accuracy of these probabilistic models. On the other hand, the Chowla conjecture (mentioned for instance in Supplement 4) predicts that {H} cannot be taken arbitrarily slowly growing in {x}, due to the conjectured existence of arbitrarily long strings of consecutive numbers where the Liouville function does not change sign (and in fact one can already show from the known partial results towards the Chowla conjecture that (3) fails for some sequence {x \rightarrow \infty} and some sufficiently slowly growing {H = H(x)}, by modifying the arguments in these papers of mine).
The situation is better when one asks to understand the mean value on almost all short intervals, rather than all intervals. There are several equivalent ways to formulate this question:

Exercise 2 Let {H = H(X)} be a function of {X} such that {H \rightarrow \infty} and {H = o(X)} as {X \rightarrow \infty}. Let {f: {\bf N} \rightarrow {\bf C}} be a {1}-bounded function. Show that the following assertions are equivalent:

  • (i) One has

    \displaystyle  \frac{1}{H} \sum_{x \leq n \leq x+H} f(n) = o(1)

    as {X \rightarrow \infty}, uniformly for all {x \in [X,2X]} outside of a set of measure {o(X)}.

  • (ii) One has

    \displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} f(n)|\ dx = o(1)

    as {X \rightarrow \infty}.

  • (iii) One has

    \displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} f(n)|^2\ dx = o(1) \ \ \ \ \ (4)

    as {X \rightarrow \infty}.

As it turns out the second moment formulation in (iii) will be the most convenient for us to work with in this set of notes, as it is well suited to Fourier-analytic techniques (and in particular the Plancherel theorem).

Using zero density methods, for instance, it was shown by Ramachandra that

\displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n)|^2\ dx \ll_{A,\varepsilon} \log^{-A} X

whenever {X^{1/6+\varepsilon} \leq H \leq X} and {\varepsilon>0}. With this quality of bound (saving arbitrary powers of {\log X} over the trivial bound of {O(1)}), this is still the lowest value of {H} one can reach unconditionally. However, in a striking recent breakthrough, it was shown by Matomaki and Radziwill that as long as one is willing to settle for weaker bounds (saving a small power of {\log X} or {\log H}, or just a qualitative decay of {o(1)}), one can obtain non-trivial estimates on far shorter intervals. For instance, they show

Theorem 3 (Matomaki-Radziwill theorem for Liouville) For any {2 \leq H \leq X}, one has

\displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n)|^2\ dx \ll \log^{-c} H

for some absolute constant {c>0}.

In fact they prove a slightly more precise result: see Theorem 1 of that paper. In particular, they obtain the asymptotic (4) for any function {H = H(X)} that goes to infinity as {X \rightarrow \infty}, no matter how slowly! This ability to let {H} grow slowly with {X} is important for several applications; for instance, in order to combine this type of result with the entropy decrement methods from Notes 9, it is essential that {H} be allowed to grow more slowly than {\log X}. See also this survey of Soundararajan for further discussion.

Exercise 4 In this exercise you may use Theorem 3 freely.

  • (i) Establish the upper bound

    \displaystyle  \frac{1}{X} \sum_{n \leq X} \lambda(n)\lambda(n+1) < 1-c

    for some absolute constant {c>0} and all sufficiently large {X}. (Hint: if this bound failed, then {\lambda(n)=\lambda(n+1)} would hold for almost all {n}; use this to create many intervals {[x,x+H]} for which {\frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n)} is extremely large.)

  • (ii) Show that Theorem 3 also holds with {\lambda(n)} replaced by {\chi_2 \lambda(n)}, where {\chi_2} is the principal character of period {2}. (Use the fact that {\lambda(2n)=-\lambda(n)} for all {n}.) Use this to establish the corresponding lower bound

    \displaystyle  \frac{1}{X} \sum_{n \leq X} \lambda(n)\lambda(n+1) > -1+c

    to (i).

(There is a curious asymmetry to the difficulty level of these bounds; the upper bound in (ii) was established much earlier by Harman, Pintz, and Wolke, but the lower bound in (i) was only established in the Matomaki-Radziwill paper.)

The techniques discussed previously were highly complex-analytic in nature, relying in particular on the fact that functions such as {\mu} or {\lambda} have Dirichlet series {{\mathcal D} \mu(s) = \frac{1}{\zeta(s)}}, {{\mathcal D} \lambda(s) = \frac{\zeta(2s)}{\zeta(s)}} that extend meromorphically into the critical strip. In contrast, the Matomaki-Radziwill theorem does not rely on such meromorphic continuations, and in fact holds for more general classes of {1}-bounded multiplicative functions {f}, for which one typically does not expect any meromorphic continuation into the strip. Instead, one can view the Matomaki-Radziwill theory as following the philosophy of a slightly different approach to multiplicative number theory, namely the pretentious multiplicative number theory of Granville and Soundarajan (as presented for instance in their draft monograph). A basic notion here is the pretentious distance between two {1}-bounded multiplicative functions {f,g} (at a given scale {X}), which informally measures the extent to which {f} “pretends” to be like {g} (or vice versa). The precise definition is

Definition 5 (Pretentious distance) Given two {1}-bounded multiplicative functions {f,g}, and a threshold {X>0}, the pretentious distance {\mathbb{D}(f,g;X)} between {f} and {g} up to scale {X} is given by the formula

\displaystyle  \mathbb{D}(f,g;X) := \left( \sum_{p \leq X} \frac{1 - \mathrm{Re}(f(p) \overline{g(p)})}{p} \right)^{1/2}

Note that one can also define an infinite version {\mathbb{D}(f,g;\infty)} of this distance by removing the constraint {p \leq X}, though in such cases the pretentious distance may then be infinite. The pretentious distance is not quite a metric (because {\mathbb{D}(f,f;X)} can be non-zero, and furthermore {\mathbb{D}(f,g;X)} can vanish without {f,g} being equal), but it is still quite close to behaving like a metric, in particular it obeys the triangle inequality; see Exercise 16 below. The philosophy of pretentious multiplicative number theory is that two {1}-bounded multiplicative functions {f,g} will exhibit similar behaviour at scale {X} if their pretentious distance {\mathbb{D}(f,g;X)} is bounded, but will become uncorrelated from each other if this distance becomes large. A simple example of this philosophy is given by the following “weak Halasz theorem”, proven in Section 2:

Proposition 6 (Logarithmically averaged version of Halasz) Let {X} be sufficiently large. Then for any {1}-bounded multiplicative functions {f,g}, one has

\displaystyle  \frac{1}{\log X} \sum_{n \leq X} \frac{f(n) \overline{g(n)}}{n} \ll \exp( - c \mathbb{D}(f, g;X)^2 )

for an absolute constant {c>0}.

In particular, if {f} does not pretend to be {1}, then the logarithmic average {\frac{1}{\log X} \sum_{n \leq X} \frac{f(n)}{n}} will be small. This condition is basically necessary, since of course {\frac{1}{\log X} \sum_{n \leq X} \frac{1}{n} = 1 + o(1)}.

If one works with non-logarithmic averages {\frac{1}{X} \sum_{n \leq X} f(n)}, then not pretending to be {1} is insufficient to establish decay, as was already observed in Exercise 11 of Notes 1: if {f} is an Archimedean character {f(n) = n^{it}} for some non-zero real {t}, then {\frac{1}{\log X} \sum_{n \leq X} \frac{f(n)}{n}} goes to zero as {X \rightarrow \infty} (which is consistent with Proposition 6), but {\frac{1}{X} \sum_{n \leq X} f(n)} does not go to zero. However, this is in some sense the “only” obstruction to these averages decaying to zero, as quantified by the following basic result:

Theorem 7 (Halasz’s theorem) Let {X} be sufficiently large. Then for any {1}-bounded multiplicative function {f}, one has

\displaystyle  \frac{1}{X} \sum_{n \leq X} f(n) \ll \exp( - c \min_{|t| \leq T} \mathbb{D}(f, n \mapsto n^{it};X)^2 ) + \frac{1}{T}

for an absolute constant {c>0} and any {T > 0}.

Informally, we refer to a {1}-bounded multiplicative function as “pretentious’; if it pretends to be a character such as {n^{it}}, and “non-pretentious” otherwise. The precise distinction is rather malleable, as the precise class of characters that one views as “obstructions” varies from situation to situation. For instance, in Proposition 6 it is just the trivial character {1} which needs to be considered, but in Theorem 7 it is the characters {n \mapsto n^{it}} with {|t| \leq T}. In other contexts one may also need to add Dirichlet characters {\chi(n)} or hybrid characters such as {\chi(n) n^{it}} to the list of characters that one might pretend to be. The division into pretentious and non-pretentious functions in multiplicative number theory is faintly analogous to the division into major and minor arcs in the circle method applied to additive number theory problems; see Notes 8. The Möbius and Liouville functions are model examples of non-pretentious functions; see Exercise 24.

In the contrapositive, Halasz’ theorem can be formulated as the assertion that if one has a large mean

\displaystyle  |\frac{1}{X} \sum_{n \leq X} f(n)| \geq \eta

for some {\eta > 0}, then one has the pretentious property

\displaystyle  \mathbb{D}( f, n \mapsto n^{it}; X ) \ll \sqrt{\log(1/\eta)}

for some {t \ll \eta^{-1}}. This has the flavour of an “inverse theorem”, of the type often found in arithmetic combinatorics.
Among other things, Halasz’s theorem gives yet another proof of the prime number theorem (1); see Section 2.

We now give a version of the Matomaki-Radziwill theorem for general (non-pretentious) multiplicative functions that is formulated in a similar contrapositive (or “inverse theorem”) fashion, though to simplify the presentation we only state a qualitative version that does not give explicit bounds.

Theorem 8 ((Qualitative) Matomaki-Radziwill theorem) Let {\eta>0}, and let {1 \leq H \leq X}, with {H} sufficiently large depending on {\eta}. Suppose that {f} is a {1}-bounded multiplicative function such that

\displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} f(n)|^2\ dx \geq \eta^2.

Then one has

\displaystyle  \mathbb{D}(f, n \mapsto n^{it};X) \ll_\eta 1

for some {t \ll_\eta \frac{X}{H}}.

The condition {t \ll_\eta \frac{X}{H}} is basically optimal, as the following example shows:

Exercise 9 Let {\varepsilon>0} be a sufficiently small constant, and let {1 \leq H \leq X} be such that {\frac{1}{\varepsilon} \leq H \leq \varepsilon X}. Let {f} be the Archimedean character {f(n) = n^{it}} for some {|t| \leq \varepsilon \frac{X}{H}}. Show that

\displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} f(n)|^2\ dx \asymp 1.

Combining Theorem 8 with standard non-pretentiousness facts about the Liouville function (see Exercise 24), we recover Theorem 3 (but with a decay rate of only {o(1)} rather than {\log^{-c} H}). We refer the reader to the original paper of Matomaki-Radziwill (as well as this followup paper with myself) for the quantitative version of Theorem 8 that is strong enough to recover the full version of Theorem 3, and which can also handle real-valued pretentious functions.

With our current state of knowledge, the only arguments that can establish the full strength of Halasz and Matomaki-Radziwill theorems are Fourier analytic in nature, relating sums involving an arithmetic function {f} with its Dirichlet series

\displaystyle  {\mathcal D} f(s) := \sum_{n=1}^\infty \frac{f(n)}{n^s}

which one can view as a discrete Fourier transform of {f} (or more precisely of the measure {\sum_{n=1}^\infty \frac{f(n)}{n} \delta_{\log n}}, if one evaluates the Dirichlet series on the right edge {\{ 1+it: t \in {\bf R} \}} of the critical strip). In this aspect, the techniques resemble the complex-analytic methods from Notes 2, but with the key difference that no analytic or meromorphic continuation into the strip is assumed. The key identity that allows us to pass to Dirichlet series is the following variant of Proposition 7 of Notes 2:

Proposition 10 (Parseval type identity) Let {f,g: {\bf N} \rightarrow {\bf C}} be finitely supported arithmetic functions, and let {\psi: {\bf R} \rightarrow {\bf R}} be a Schwartz function. Then

\displaystyle  \sum_{n=1}^\infty \sum_{m=1}^\infty \frac{f(n)}{n} \frac{\overline{g(m)}}{m} \psi(\log n - \log m) = \frac{1}{2\pi} \int_{\bf R} {\mathcal D} f(1+it) \overline{{\mathcal D} g(1+it)} \hat \psi(t)\ dt

where {\hat \psi(t) := \int_{\bf R} \psi(u) e^{itu}\ du} is the Fourier transform of {\psi}. (Note that the finite support of {f,g} and the Schwartz nature of {\psi,\hat \psi} ensure that both sides of the identity are absolutely convergent.)

The restriction that {f,g} be finitely supported will be slightly annoying in places, since most multiplicative functions will fail to be finitely supported, but this technicality can usually be overcome by suitably truncating the multiplicative function, and taking limits if necessary.

Proof: By expanding out the Dirichlet series, it suffices to show that

\displaystyle  \psi(\log n - \log m) = \frac{1}{2\pi} \int_{\bf R} \frac{1}{n^{it}} \frac{1}{m^{-it}} \hat \psi(t)\ dt

for any natural numbers {n,m}. But this follows from the Fourier inversion formula {\psi(u) = \frac{1}{2\pi} \int_{\bf R} e^{-itu} \hat \psi(t)\ dt} applied at {u = \log n - \log m}. \Box
For applications to Halasz type theorems, one sets {g(n)} equal to the Kronecker delta {\delta_{n=1}}, producing weighted integrals of {{\mathcal D} f(1+it)} of “{L^1}” type. For applications to Matomaki-Radziwill theorems, one instead sets {f=g}, and more precisely uses the following corollary of the above proposition, to obtain weighted integrals of {|{\mathcal D} f(1+it)|^2} of “{L^2}” type:

Exercise 11 (Plancherel type identity) If {f: {\bf N} \rightarrow {\bf C}} is finitely supported, and {\varphi: {\bf R} \rightarrow {\bf R}} is a Schwartz function, establish the identity

\displaystyle  \int_0^\infty |\sum_{n=1}^\infty \frac{f(n)}{n} \varphi(\log n - \log y)|^2 \frac{dy}{y} = \frac{1}{2\pi} \int_{\bf R} |{\mathcal D} f(1+it)|^2 |\hat \varphi(t)|^2\ dt.

In contrast, information about the non-pretentious nature of a multiplicative function {f} will give “pointwise” or “{L^\infty}” type control on the Dirichlet series {{\mathcal D} f(1+it)}, as is suggested from the Euler product factorisation of {{\mathcal D} f}.

It will be convenient to formalise the notion of {L^1}, {L^2}, and {L^\infty} control of the Dirichlet series {{\mathcal D} f}, which as previously mentioned can be viewed as a sort of “Fourier transform” of {f}:

Definition 12 (Fourier norms) Let {f: {\bf N} \rightarrow {\bf C}} be finitely supported, and let {\Omega \subset {\bf R}} be a bounded measurable set. We define the Fourier {L^\infty} norm

\displaystyle  \| f\|_{FL^\infty(\Omega)} := \sup_{t \in \Omega} |{\mathcal D} f(1+it)|,

the Fourier {L^2} norm

\displaystyle  \| f\|_{FL^2(\Omega)} := \left(\int_\Omega |{\mathcal D} f(1+it)|^2\ dt\right)^{1/2},

and the Fourier {L^1} norm

\displaystyle  \| f\|_{FL^1(\Omega)} := \int_\Omega |{\mathcal D} f(1+it)|\ dt.

One could more generally define {FL^p} norms for other exponents {p}, but we will only need the exponents {p=1,2,\infty} in this current set of notes. It is clear that all the above norms are in fact (semi-)norms on the space of finitely supported arithmetic functions.

As mentioned above, Halasz’s theorem gives good control on the Fourier {L^\infty} norm for restrictions of non-pretentious functions to intervals:

Exercise 13 (Fourier {L^\infty} control via Halasz) Let {f: {\bf N} \rightarrow {\bf C}} be a {1}-bounded multiplicative function, let {I} be an interval in {[C^{-1} X, CX]} for some {X \geq C \geq 1}, let {R \geq 1}, and let {\Omega \subset {\bf R}} be a bounded measurable set. Show that

\displaystyle  \| f 1_I \|_{FL^\infty(\Omega)} \ll_C \exp( - c \min_{t: \mathrm{dist}(t,\Omega) \leq R} \mathbb{D}(f, n \mapsto n^{it};X)^2 ) + \frac{1}{R}.

(Hint: you will need to use summation by parts (or an equivalent device) to deal with a {\frac{1}{n}} weight.)

Meanwhile, the Plancherel identity in Exercise 11 gives good control on the Fourier {L^2} norm for functions on long intervals (compare with Exercise 2 from Notes 6):

Exercise 14 ({L^2} mean value theorem) Let {T \geq 1}, and let {f: {\bf N} \rightarrow {\bf C}} be finitely supported. Show that

\displaystyle  \| f \|_{FL^2([-T,T])}^2 \ll \sum_n \frac{1}{n} (\frac{T}{n} \sum_{m: |n-m| \leq n/T} |f(m)|)^2.

Conclude in particular that if {f} is supported in {[C^{-1} N, C N]} for some {C \geq 1} and {N \gg T}, then

\displaystyle  \| f \|_{FL^2([-T,T])}^2 \ll C^{O(1)} \frac{1}{N} \sum_n |f(n)|^2.

In the simplest case of the logarithmically averaged Halasz theorem (Proposition 6), Fourier {L^\infty} estimates are already sufficient to obtain decent control on the (weighted) Fourier {L^1} type expressions that show up. However, these estimates are not enough by themselves to establish the full Halasz theorem or the Matomaki-Radziwill theorem. To get from Fourier {L^\infty} control to Fourier {L^1} or {L^2} control more efficiently, the key trick is use Hölder’s inequality, which when combined with the basic Dirichlet series identity

\displaystyle  {\mathcal D}(f*g) = ({\mathcal D} f) ({\mathcal D} g)

gives the inequalities

\displaystyle  \| f*g \|_{FL^1(\Omega)} \leq \|f\|_{FL^2(\Omega)} \|g\|_{FL^2(\Omega)} \ \ \ \ \ (5)


\displaystyle  \| f*g \|_{FL^2(\Omega)} \leq \|f\|_{FL^2(\Omega)} \|g\|_{FL^\infty(\Omega)} \ \ \ \ \ (6)

The strategy is then to factor (or approximately factor) the original function {f} as a Dirichlet convolution (or average of convolutions) of various components, each of which enjoys reasonably good Fourier {L^2} or {L^\infty} estimates on various regions {\Omega}, and then combine them using the Hölder inequalities (5), (6) and the triangle inequality. For instance, to prove Halasz’s theorem, we will split {f} into the Dirichlet convolution of three factors, one of which will be estimated in {FL^\infty} using the non-pretentiousness hypothesis, and the other two being estimated in {FL^2} using Exercise 14. For the Matomaki-Radziwill theorem, one uses a significantly more complicated decomposition of {f} into a variety of Dirichlet convolutions of factors, and also splits up the Fourier domain {[-T,T]} into several subregions depending on whether the Dirichlet series associated to some of these components are large or small. In each region and for each component of these decompositions, all but one of the factors will be estimated in {FL^\infty}, and the other in {FL^2}; but the precise way in which this is done will vary from component to component. For instance, in some regions a key factor will be small in {FL^\infty} by construction of the region; in other places, the {FL^\infty} control will come from Exercise 13. Similarly, in some regions, satisfactory {FL^2} control is provided by Exercise 14, but in other regions one must instead use “large value” theorems (in the spirit of Proposition 9 from Notes 6), or amplify the power of the standard {L^2} mean value theorems by combining the Dirichlet series with other Dirichlet series that are known to be large in this region.
There are several ways to achieve the desired factorisation. In the case of Halasz’s theorem, we can simply work with a crude version of the Euler product factorisation, dividing the primes into three categories (“small”, “medium”, and “large” primes) and expressing {f} as a triple Dirichlet convolution accordingly. For the Matomaki-Radziwill theorem, one instead exploits the Turan-Kubilius phenomenon (Section 5 of Notes 1, or Lemma 2 of Notes 9)) that for various moderately wide ranges {[P,Q]} of primes, the number of prime divisors of a large number {n} in the range {[P,Q]} is almost always close to {\log\log Q - \log\log P}. Thus, if we introduce the arithmetic functions

\displaystyle  w_{[P,Q]}(n) = \frac{1}{\log\log Q - \log\log P} \sum_{P \leq p \leq Q} 1_{n=p} \ \ \ \ \ (7)

then we have

\displaystyle  1 \approx 1 * w_{[P,Q]}

and more generally we have a twisted approximation

\displaystyle  f \approx f * fw_{[P,Q]}

for multiplicative functions {f}. (Actually, for technical reasons it will be convenient to work with a smoothed out version of these functions; see Section 3.) Informally, these formulas suggest that the “{FL^2} energy” of a multiplicative function {f} is concentrated in those regions where {f w_{[P,Q]}} is extremely large in a {FL^\infty} sense. Iterations of this formula (or variants of this formula, such as an identity due to Ramaré) will then give the desired (approximate) factorisation of {{\mathcal D} f}.

— 1. Pretentious distance —

In this section we explore the notion of pretentious distance. The following Hilbert space lemma will be useful for establishing the triangle inequality for this distance:

Lemma 15 (Triangle inequality) Let {u,v,w} be vectors in a real Hilbert space {H} with {\|u\|, \|v\|, \|w\| \leq 1}. Then

\displaystyle  (1-\langle u,w \rangle)^{1/2} \leq (1-\langle u,v \rangle)^{1/2} + (1-\langle v,w \rangle)^{1/2}.

Proof: First suppose that {u,v,w} are unit vectors: {\|u\|=\|v\|=\|w\|=1}. Then by the cosine rule {(1-\langle u,v\rangle)^{1/2} = 2^{-1/2} \|u-v\|}, and similarly for {(1-\langle v,w \rangle)^{1/2}} and {(1-\langle u,w \rangle)^{1/2}}. The claim now follows from the usual triangle inequality {\|u-w\| \leq \|u-v\| + \|v-w\|}.

Now suppose we are in the general case when {\|u\|,\|v\|,\|w\| \leq 1}. In this case we extend {u,v,w} to unit vectors by working in the product {H \times {\bf R}^3} of {H} with the Euclidean space {{\bf R}^3} and applying the previous inequality to the extended unit vectors

\displaystyle  \tilde u := (u, (1-\|u\|^2)^{1/2},0,0)

\displaystyle  \tilde v := (v,0,(1-\|v\|^2)^{1/2},0)

\displaystyle  \tilde w := (w,0,0,(1-\|w\|^2)^{1/2}),

observing that the extensions have the same inner products as the original vectors. \Box

Exercise 16 (Basic properties of pretentious distance) Let {f,g,h,k} be {1}-bounded multiplicative functions, and let {X,Y \geq 2}.

  • (i) (Metric type properties) Show that {\mathbb{D}(f,g;X) \geq 0}, with equality if and only if {f=g} and {|f(p)|=1} for all primes {p}. Furthermore, show that {\mathbb{D}(f,g;X) = \mathbb{D}(g,f;X)} and {\mathbb{D}(f,h;X) \leq \mathbb{D}(f,g;X) + \mathbb{D}(g,h;X)}. (Hint: for the last property, apply Lemma 15 to a suitable Hilbert space {H}.)
  • (ii) (Alternate triangle inequality) Show that {\mathbb{D}(fh, gk;X) \leq \mathbb{D}(f,g;X) + \mathbb{D}(h,k;X)}.
  • (iii) (Bounds) One has

    \displaystyle  0 \leq \mathbb{D}(f,g;X)^2 \leq \log\log X + O(1)

    and if {X \leq Y}, then

    \displaystyle  \mathbb{D}(f,g;X)^2 \leq \mathbb{D}(f,g;Y)^2 \leq \mathbb{D}(f,g;X)^2 + \log\frac{\log Y}{\log X} + O(1).

  • (iv) (Invariance) One has {\mathbb{D}(\overline{f},\overline{g};X) = \mathbb{D}(f,g;X)}, {\mathbb{D}(fh,g;X) = \mathbb{D}(f,g\overline{h};X)} and

    \displaystyle  |\mathbb{D}(fh,gh;X)^2 - \mathbb{D}(f,g;X)^2| \leq \mathbb{D}(h,h;X)^2.

    In particular, if {|h(p)|=1} for all {p}, then {\mathbb{D}(fh,gh;X) = \mathbb{D}(f,g;X)}.

Exercise 17 If {\chi, \chi'} are Dirichlet characters of periods {q,q'} respectively induced from the same primitive character, and {X \geq 2}, show that {\mathbb{D}(\chi,\chi';X) \ll \log\log\log(C+qq')} for some absolute constant {C>0} (the only purpose of which is to keep the triple logarithm {\log\log\log(C+qq')} positive). (Hint: control the contributions of the primes {p} in each dyadic block {2^k \leq p < 2^{k+1}} separately for {\log qq' \ll 2^k \ll qq'}.)

Next, we relate pretentious distance to the value of Dirichlet series just to the right of the critical strip. There is an annoying minor technicality that the prime {2} has to be treated separately, but this will not cause too much trouble.

Lemma 18 (Dirichlet series and pretentious distance) Let {f: {\bf N} \rightarrow {\bf C}} be a {1}-bounded multiplicative function, {X \geq 10}, and {t \in {\bf R}}. Then

\displaystyle  \frac{1}{\log X} |{\mathcal D} f(1+\frac{1}{\log X}+it)| \asymp |\sum_{j=0}^\infty \frac{f(2^j)}{2^{(1+\frac{1}{\log X}+it)j}}| \ \ \ \ \ (8)

\displaystyle  \times \exp( - {\mathbb D}(f,n \mapsto n^{it};X)^2 ).

In particular, we always have the upper bound

\displaystyle  \frac{1}{\log X} {\mathcal D} f(1+\frac{1}{\log X}+it) \ll \exp( - {\mathbb D}(f,n \mapsto n^{it};X)^2 ), \ \ \ \ \ (9)

and if one imposes the technical condition that either {f(2^j) = f(2)^j} for all {j} or {f(2^j) = 0} for all {j > 1}, then

\displaystyle  \frac{1}{\log X} |{\mathcal D} f(1+\frac{1}{\log X}+it)| \asymp \exp( -{\mathbb D}(f,n \mapsto n^{it};X)^2 ). \ \ \ \ \ (10)

If {f(p^j) = 0} for all {p > X} and {j \geq 1}, then we may delete the {\frac{1}{\log X}} terms in the above claims.

Proof: By replacing {f} with {n \mapsto f(n) n^{-it}} we may assume without loss of generality that {t=0}. We begin with the first claim (8). By expanding out the Euler product, the left-hand side of (8) is equal to

\displaystyle  \frac{1}{\log X} \prod_p |\sum_{j=0}^\infty \frac{f(p^j)}{p^{(1+\frac{1}{\log X})j}}|

and from Definition 5 and Mertens’ theorem we have

\displaystyle {\mathbb D}(f,1;X)^2 = \log\log X - \sum_{p \leq X} \frac{\mathrm{Re} f(p)}{p} + O(1)

and so it will suffice on canceling the {p=2} factor and taking logarithms to show that

\displaystyle  \sum_{p>2} \log |\sum_{j=0}^\infty \frac{f(p^j)}{p^{(1+\frac{1}{\log X})j}}| = \sum_{p \leq X} \frac{\mathrm{Re} f(p)}{p} + O(1).

For {p>2}, the quantity {\sum_{j=0}^\infty \frac{f(p^j)}{p^{(1+\frac{1}{\log X})j}}} differs from {1} by at most {\sum_{j=1}^\infty \frac{1}{p^j} \leq \frac{1}{2}}. Also we have

\displaystyle \sum_{j=0}^\infty \frac{f(p^j)}{p^{(1+\frac{1}{\log X})j}} = 1 + \frac{f(p)}{p^{1+\frac{1}{\log X}}} + O(\frac{1}{p^2})

and hence by Taylor expansion

\displaystyle \log |\sum_{j=0}^\infty \frac{f(p^j)}{p^{(1+\frac{1}{\log X})j}}| = \frac{\mathrm{Re} f(p)}{p^{1+\frac{1}{\log X}}} + O(\frac{1}{p^2}).

By the triangle inequality, it thus suffices to show that

\displaystyle  \sum_{p \leq X} \frac{1}{p} - \frac{1}{p^{1+\frac{1}{\log X}}}, \sum_{p>X} \frac{1}{p^{1+\frac{1}{\log X}}} \ll 1.

But the first bound follows from the mean value estimate {\frac{1}{p} - \frac{1}{p^{1+\frac{1}{\log X}}} \ll \frac{\log p}{p \log X}} and Mertens’ theorems, while the second bound follows from summing the bounds

\displaystyle  \sum_{X^{2^j} < p \leq X^{2^{j+1}}} \frac{1}{p^{1+\frac{1}{\log X}}} \ll 2^{-2^j}

that also arise from Mertens’ theorems.
The quantity {\sum_{j=0}^\infty \frac{f(2^j)}{2^{(1+\frac{1}{\log X}+it)j}}} is bounded in magnitude by {\sum_{j=0}^\infty \frac{1}{2^j} = 2}, giving (9). Under either of the two technical conditions listed, this quantity is equal to either {\frac{1}{1-f(2)/2^{1+\frac{1}{\log X}}}} or {1 + \frac{f(2)}{2^{1+\frac{1}{\log X}}}}, and in either case it is comparable in magnitude to {1}, giving (10).

If {f(p^j)=0} for {p>X} and {j \geq 1}, we may repeat the above arguments with the {\frac{1}{\log X}} terms deleted, since we no longer need to control the tail contribution {\sum_{p>X}}. \Box

Now we explore the geometry of the Archimedean characters {n \mapsto n^{it}} with respect to pretentious distance.

Proposition 19 If {X} is sufficiently large, then

\displaystyle  \mathbb{D}(1, n \mapsto n^{it}; X) \asymp \sqrt{\log(1 + \min(|t|,1) \log X)}

for {|t| \ll \exp( \log^{1.4} X ) )}. In particular one has

\displaystyle  \mathbb{D}(1, n \mapsto n^{it}; X) \asymp \sqrt{\log\log X}

for {1 \ll |t| \ll \exp( \log^{1.4} X ) )}.

The precise exponent {1.4} here is not of particular significance; any constant between {1} and {1.5} would work for our application to the Matomaki-Radziwill theorem, with the most important feature being that {\exp(\log^{1.4} X)} grows significantly faster than any fixed power {X^C} of {X}. The ability to raise the exponent beyond {1} will be provided by the Vinogradov estimates for the zeta function. (For Halasz’s theorem one only needs these bounds in the easier range {|t| \leq X}, which does not require the Vinogradov estimates.) As a particular corollary of this proposition and Exercise 16(iii), we see that

\displaystyle  \mathbb{D}(n \mapsto n^{it}, n \mapsto n^{it'}; X) \asymp \sqrt{\log\log X}

whenever {1 \ll |t-t'| \ll \exp( \log^{1.4} X )}; thus the Archimedean characters {n \mapsto n^{it}} do not pretend to be like each other at all once the {t} parameter is changed by at least a unit distance (but not changed by an enormous amount).
Proof: By Definition 5, our task is to show that

\displaystyle  \sum_{p \leq X} \frac{1 - \mathrm{Re} p^{it}}{p} \asymp \log(1 + \min(|t|,1) \log X).

We begin with the upper bound. For {|t| \gg 1}, the claim follows from Mertens’ theorems and the triangle inequality. For {|t| \ll \frac{1}{\log X}}, we bound

\displaystyle  1 - \mathrm{Re} p^{it} = 1 - \cos(t \log p) \ll (t \log p)^2 \ll |t|^2 \log X \log p

and the claim again follows from Mertens’ theorems (note that {\log(1 + \min(|t|,1) \log X) \asymp |t| \log X} in this case). For {\frac{1}{\log X} \ll |t| \ll 1}, we bound {1-\cos(t\log p)} by {O( t \log p )} for {\log p \leq \frac{1}{|t|}} and by {O(1)} for {\frac{1}{|t|} \leq \log p \leq \log X}, and the claim once again follows from Mertens’ theorems.
Now we establish the lower bound. We first work in the range {|t| \leq \frac{100}{\log X}}. In this case we have a matching lower bound

\displaystyle  1 - \mathrm{Re} p^{it} = 1 - \cos(t \log p) \gg (t \log p)^2

for {p \leq X^c} and some small absolute constant {c>0}, and hence

\displaystyle  \sum_{p \leq X^c} \frac{1 - \mathrm{Re} p^{it}}{p} \gg_c |t|^2 \sum_{X^{c/2} \leq p \leq X^c} \frac{\log^2 X}{p} \gg_c |t|^2 \log^2 X

giving the lower bound. Now suppose that {\frac{100}{\log X} \leq |t| \leq 100}. Applying Lemma 18 with {f=1} and {X} replaced by some {y>10}, we have

\displaystyle  \frac{1}{\log y} |\zeta(1+\frac{1}{\log y}+it)| \asymp \exp( -{\mathbb D}(1,n \mapsto n^{it};y)^2 ); \ \ \ \ \ (11)

for {t \leq 100} and {\log y \gg 1/|t|} one has

\displaystyle  |\zeta(1+\frac{1}{\log y}+it)| \asymp \frac{1}{|t|}

and thus

\displaystyle  {\mathbb D}(1,n \mapsto n^{it};y)^2 = \log\log y - \log \frac{1}{|t|} + O(1)

or equivalently by Mertens’ theorem

\displaystyle  \sum_{p \leq y} \frac{\mathrm{Re} p^{it}}{p} = \log \frac{1}{|t|} + O(1).

Applying this bound with {y} replaced by {\exp(1/|t|)} and by {X} we conclude that

\displaystyle \sum_{\exp(1/|t|) \leq p \leq X} \frac{\mathrm{Re} p^{it}}{p} \ll 1

and hence by Mertens’ theorem and the triangle inequality (for {c} small enough)

\displaystyle \sum_{\exp(1/|t|) \leq p \leq X} \frac{1 - \mathrm{Re} p^{it}}{p} \gg \log( |t|\log X),

giving the claim.
Finally, assume that {100 \leq |t| \ll \exp(\log^{1.4} X)}. From Mertens’ theorems we have

\displaystyle  \sum_{\exp(\log^{0.99} X) \leq p \leq X} \frac{1}{p} \gg \log\log X

so by the triangle inequality it will suffice to show that

\displaystyle  \sum_{\exp(\log^{0.99} X) \leq p \leq X} \frac{1}{p^{1+it}} \ll 1.

Taking logarithms in (11) for {y = X, \exp(\log^{0.99} X)} we have

\displaystyle  \sum_{p \leq X} \frac{1}{p^{1+it}} = \log \zeta(1+\frac{1}{\log X} + it) + O(1)

and also

\displaystyle  \sum_{p \leq \exp(\log^{0.99} X)} \frac{1}{p^{1+it}} = \log \zeta(1+\frac{1}{\log^{0.99} X} + it) + O(1)

hence by the fundamental theorem of calculus

\displaystyle  \sum_{\exp(\log^{0.99} X) \leq p \leq X} \frac{1}{p^{1+it}} \ll \frac{1}{\log^{0.99} X} |\frac{\zeta'}{\zeta}(\sigma+it)| + 1

for some {1+\frac{1}{\log X} \leq \sigma \leq 1+\frac{1}{\log^{0.99} X}}. However, from the Vinogradov-Korobov estimates (Exercise 43 of Notes 2) we have

\displaystyle  \frac{\zeta'}{\zeta}(\sigma+it) \ll \log^{2/3} |t|

whenever {\sigma > 1 + \frac{(\log\log |t|)^{1/3}}{\log^{2/3} |t|}}; since we are assuming {\log |t| \ll \log^{1.4} X}, the claim follows. \Box

Exercise 20 Assume the Riemann hypothesis. Establish a bound of the form

\displaystyle  \frac{1}{P} \sum_{n \leq P} \Lambda(n) n^{-it} \ll \frac{1}{1+|t|} + P^{-c}

for some absolute constant {c>0} whenever {P \geq \log^C(2+|t|)} for a sufficiently large absolute constant {C}. (Hint: use Perron’s formula and shift the contour to within {\frac{1}{\log P}} of the critical line.) Use this to conclude that the upper bound {|t| \ll \exp(\log^{1.4} X)} in Proposition 19 can be relaxed (assuming RH) to {|t| \ll \exp(\exp(\log^{0.9} X))}.

Exercise 21 Let {f} be a {1}-bounded multiplicative function with {|f(p)|=1} for all {p}. For any {X \geq 2}, show that

\displaystyle  \lim\inf_{t \rightarrow \infty} \mathbb{D}(f, n \mapsto n^{it}; X) = 0.

Thus some sort of upper bound on {t} in Proposition 19 is necessary.

Exercise 22 Let {\chi} be a non-principal character of modulus {q}, and let {X} be sufficiently large depending on {q}. Show that

\displaystyle  \mathbb{D}(1, n \mapsto \chi(n) n^{it}; X) \asymp \sqrt{\log\log X}

for all {t = O( \exp( \log^{1.4} X ) )}. (One will need to adapt the Vinogradov-Korobov theory to Dirichet {L}-functions.)

Proposition 19 measures how close the function {1} lies to the Archimedean characters {n \mapsto n^{it}}. Using the triangle inequality, one can then lower bound the distance of any other {1}-bounded multiplicative function to these characters:

Proposition 23 Let {X} be sufficiently large. Then for any {1}-bounded multiplicative function {f}, there exists a real number {t_0} with {|t_0| \leq \exp(\log^{1.4} X)} such that

\displaystyle  \mathbb{D}( f, n \mapsto n^{it}; X ) \gg \sqrt{\log(1 + \min(|t-t_0|,1) \log X)}

whenever {|t| \leq \exp(\log^{1.4} X)}. In particular we have

\displaystyle  \mathbb{D}( f, n \mapsto n^{it}; X ) \asymp \sqrt{\log\log X}

if {|t| \leq \exp(\log^{1.4} X)} and {|t-t_0| \geq 1}. If {f} is real-valued, one can take {t_0=0}.

Proof: For the first claim, choose {t_0} to minimize {\mathbb{D}( f, n \mapsto n^{it_0}; X )} among all real numbers with {|t_0| \leq \exp(\log^{1.4} X)}. Then for any other {t}, we see from the triangle inequality that

\displaystyle  \mathbb{D}( n \mapsto n^{it}, n \mapsto n^{it_0}; X) \leq \mathbb{D}( f, n \mapsto n^{it}; X ) + \mathbb{D}( f, n \mapsto n^{it_0}; X ) \leq 2 \mathbb{D}( f, n \mapsto n^{it}; X ).

But from Proposition 19 we have

\displaystyle  \mathbb{D}( n \mapsto n^{it}, n \mapsto n^{it_0}; X) = \mathbb{D}( 1, n \mapsto n^{i(t-t_0)}; X)

\displaystyle \gg \sqrt{\log(1 + \min(|t-t_0|,1) \log X)},

giving the first claim. When {f} is real valued, we can similarly use the triangle inequality to bound

\displaystyle  \mathbb{D}( n \mapsto n^{it}, n \mapsto n^{-it}; X)

\displaystyle \leq \mathbb{D}( f, n \mapsto n^{it}; X ) + \mathbb{D}( f, n \mapsto n^{-it}; X )

\displaystyle = 2 \mathbb{D}( f, n \mapsto n^{it}; X )

which gives

\displaystyle  \mathbb{D}( f, n \mapsto n^{it}; X ) \gg \sqrt{\log(1 + \min(|t|,1) \log X)}

giving the second claim. \Box
We can now quantify the non-pretentious nature of the Möbius and Liouville functions.

Exercise 24

  • (i) If {X} is sufficiently large, and {f} is a real-valued {1}-bounded multiplicative function, show that

    \displaystyle  \mathbb{D}( f, n \mapsto n^{it}; X) \gg \mathbb{D}(f, 1; X)

    whenever {t \ll \exp(\log^{1.4} X)}.

  • (ii) Show that

    \displaystyle  \mathbb{D}( \mu, n \mapsto n^{it}; X) = \mathbb{D}( \lambda, n \mapsto n^{it}; X) \gg \sqrt{\log\log X}

    whenever {t \ll \exp(\log^{1.4} X)} and {X} is sufficiently large.

  • (iii) If {\chi} is a Dirichlet character of some period {q}, show that

    \displaystyle  \mathbb{D}( \mu, n \mapsto n^{it}; X) = \mathbb{D}( \lambda, n \mapsto n^{it}; X) \gg \sqrt{\log\log X}

    whenever {t \ll \exp(\log^{1.4} X)} and {X} is sufficiently large depending on {q}.

— 2. Halasz’s inequality —

We now prove Halasz’s inequality. As a warm up, we prove Proposition 6:

Proof: (Proof of Proposition 6) By Exercise 16(iv) we may normalise {g=1}. We may assume that {f(p^j)=0} when {p>X}, since the value of {f} on these primes has no impact on the sum {\sum_{n \leq X} \frac{f(n)}{n}} or on {\mathbb{D}(f,1;X)}. In particular, from Euler products we now have the absolute convergence {\sum_{n=1}^\infty \frac{|f(n)|}{n} < \infty}. Let {0 < \varepsilon < 1} be a small quantity to be optimized later, and {\psi: {\bf R} \rightarrow {\bf R}} be smooth compactly supported function on {[-1,1]} that equals one on {[-1+\varepsilon,1+\varepsilon]} with the derivative bounds {\psi^{(j)}(u) \ll_j \varepsilon^{-j}}, on {[-1,1] \backslash [-1+\varepsilon,1+\varepsilon]}, so on integration by parts we see that the Fourier transform {\hat \psi(t) = \int_{\bf R} \psi(u) e^{itu}} obeys the bounds

\displaystyle  \hat \psi(t) \ll_j \min( 1, \frac{1}{|t|}, \frac{1}{\varepsilon^{j-1} |t|^j} )

for any {t \in {\bf R}} and {j \geq 2}. From the triangle inequality have

\displaystyle  \frac{1}{\log X} \sum_{n \leq X} \frac{f(n)}{n} = \frac{1}{\log X} \sum_{n=1}^\infty \frac{f(n)}{n} \psi(\frac{\log n}{\log X}) + O( \varepsilon ).

Applying Proposition 10 applied to a finite truncation of {f}, and then using the absolute convergence of {\sum_{n=1}^\infty \frac{|f(n)|}{n}} and dominated convergence to eliminate the truncation (or by using Proposition 7 of Notes 2 and then shifting the contour), we can write the right-hand side as

\displaystyle  \frac{1}{2\pi} \int_{\bf R} {\mathcal D} f(1+it) \hat \psi( t \log X )\ dt + O(\varepsilon)

which after rescaling by {\log X} gives

\displaystyle  \frac{1}{\log X} \sum_{n \leq X} \frac{f(n)}{n} \ll_j \int_{\bf R} \frac{1}{\log X} |{\mathcal D} f(1+\frac{it}{\log X})| \min( 1, \frac{1}{|t|}, \frac{1}{\varepsilon^{j-1} |t|^j} )\ dt + \varepsilon. \ \ \ \ \ (12)

Now from Lemma 18 one has

\displaystyle  \frac{1}{\log X} {\mathcal D} f(1+\frac{it}{\log X}) \ll \exp( \mathbb{D}( f, n \mapsto n^{-it/\log X}; X)^2 ) \ \ \ \ \ (13)

\displaystyle \ll \exp( -\frac{M}{2} + O(\log(1 + \min(|t/\log X|,1) \log X)) )

\displaystyle \leq e^{-M/2} (1 + |t|)^{O(1)},

where {M := \mathbb{D}(f,1;X)^2}, and thus we can bound (12) by

\displaystyle  \ll_j e^{-M/2} \int_{\bf R} (1+|t|)^{O(1)} \min( 1, \frac{1}{|t|}, \frac{1}{\varepsilon^{j-1} |t|^j} ) + \varepsilon

\displaystyle  \ll e^{-M/2} \varepsilon^{-O(1)} + \varepsilon

if {j} is chosen to be a sufficiently large absolute constant. The claim then follows by optimising in {\varepsilon}. \Box
We remark that an elementary proof of Proposition 6 with {c=1/2} was given in Proposition 1.2.6 of this monograph of Granville and Soundararajan. Also, in this paper of Lanzouri and Mangerel the variant estimate

\displaystyle  \frac{1}{\log X} \sum_{n \leq X} \frac{f(n)}{n} \ll \exp( - \sup_{|t| \leq T} \mathbb{D}(f, n \mapsto n^{it};X)^2 ) + \frac{1}{T \log X}

is proven for any {0 < T \leq 1}. (Thanks to Andrew Granville for these references.)
It was observed by Granville, Koukoulopoulos, and Matomaki that we can also sharpen Proposition 6 when {f} is non-negative by purely elementary methods:

Proposition 25 Let {X \geq 10}, and suppose that {f: {\bf N} \rightarrow [0,1]} is multiplicative, {1}-bounded, and non-negative. Then

\displaystyle  \frac{1}{\log X} \sum_{n \leq X} \frac{f(n)}{n} \asymp \exp( -{\mathbb D}(f,1;X)^2 ).

Proof: From Definition 5 and Mertens’ theorems the estimate is equivalent to

\displaystyle  \sum_{n \leq X} \frac{f(n)}{n} \asymp \exp( \sum_{p \leq X} \frac{f(p)}{p} ).

For the upper bound, we may assume that {f(p^j)} vanishes for {p>X} since these primes make no contribution to either side. We can then bound

\displaystyle  \sum_{n \leq X} \frac{f(n)}{n} \leq \sum_{n=1}^\infty \frac{f(n)}{n}

\displaystyle  = \prod_{p \leq X} \sum_{j=0}^\infty \frac{f(p^j)}{p^j}

\displaystyle  = \prod_{p \leq X} \exp( \frac{f(p)}{p} + O(\frac{1}{p^2}) )

\displaystyle  \ll \exp( \sum_{p \leq X} \frac{f(p)}{p} ).

For the lower bound, we let {g(n)} be the {1}-bounded multiplicative function with {g(p) := 1-f(p)} and {g(p^j) = 1} for {j > 1} and all primes {p}. Then we observe the pointwise bound {f*g(n) \geq 1} for all {n}, hence

\displaystyle  \sum_{n \leq X} \frac{f(n)}{n} \sum_{n \leq X} \frac{g(n)}{n} \geq \sum_{n \leq X} \frac{1}{n} \gg \log X.

By the upper bound just obtained and Mertens’ theorems, we have

\displaystyle  \sum_{n \leq X} \frac{g(n)}{n} \ll \exp( \sum_{p \leq X} \frac{g(p)}{p} ) \ll \log X \exp( -\sum_{p \leq X} \frac{f(p)}{p} )

and the claim follows. \Box
Now we can prove Theorem 7.

Proof: (Proof of Theorem 7) We may assume that {T \geq 10}, since the claim is trivial otherwise. On the other hand, for {T > \log^C X} for a sufficiently large {C}, the second term on the right-hand side is dominated by the first, so the estimate does not become any stronger as one increases {T} beyond {\log^C X}, and hence we may assume without loss of generality that {T \leq \log^{O(1)} X}.

We abbreviate {M := \min_{|t| \leq T} \mathbb{D}(f, n \mapsto n^{it};X)^2}, thus we wish to show that

\displaystyle  \frac{1}{X} \sum_{n \leq X} f(n) \ll \exp( - c M ) + \frac{1}{T}. \ \ \ \ \ (14)

It is convenient to remove some exceptional values of {n}. Let {0 < \varepsilon < \frac{1}{10}} be a small quantity to be chosen later, subject to the restriction

\displaystyle  X^{\varepsilon^3} \geq T^{10}. \ \ \ \ \ (15)

From standard sieves (e.g., Theorem 32 from Notes 4), we see that the proportion of numbers in {[X/2,X]} that do not have a “large” prime factor in {[X^\varepsilon,X]}, or do not have a “medium” prime factor in {[X^{\varepsilon^2},X^\varepsilon)}, is {O(\varepsilon)}. Thus by paying an error of {\varepsilon}, we may restrict to numbers that have at least one “large” prime factor in {[X^\varepsilon,X]} and at least one “medium” prime factor in {[X^{\varepsilon^2},X^\varepsilon)} (and no prime factor larger than {X}). This is the same as replacing {f} with the Dirichlet convolution

\displaystyle  \tilde f := f_1 * f_2 * f_3

where {f_1} is the restriction of {f} to numbers with all prime factors in the “small” range {[1,X^{\varepsilon^2})}, {f_2} is the restriction of {f} to numbers in {[2,X]} with all prime factors in the “medium” range {[X^{\varepsilon^2},X^\varepsilon)}, and {f_3} is the restriction of {f} to numbers in {[2,X]} with all prime factors in the “large” range {[X^\varepsilon,X]}. We can thus write

\displaystyle  \frac{1}{X} \sum_{n \leq X} f(n) = \frac{1}{X} \sum_{n \leq X} \tilde f(n) + O(\varepsilon).

This we can write in turn as

\displaystyle  \sum_{n=1}^\infty \frac{\tilde f(n)}{n} \psi(\log n - \log X) + O(\varepsilon)

where {\psi(u) := 1_{[-\infty, 0]} e^u}. It is not advantageous to immediately apply Proposition 10 due to the rough nature of {\psi} (which is not even Schwartz). But if we let {\varphi} be a Schwartz function of total mass {1} whose Fourier transform is supported on {[-1,1]}, and define the mollified function

\displaystyle  \tilde \psi(u) := \int_{\bf R} \psi(u - \frac{v}{T}) \varphi(v)\ dv

then one easily checks that

\displaystyle  \psi(u) - \tilde \psi(u) \ll (1 + T |u|)^{-10}

which from the triangle inequality soon gives the bounds

\displaystyle  \sum_{n=1}^\infty \frac{\tilde f(n)}{n} (\psi-\tilde \psi)(\log n - \log X) \ll \frac{1}{T}.

Hence we may write

\displaystyle  \frac{1}{X} \sum_{X/2 \leq n \leq X} f(n) = \sum_{n=1}^\infty \frac{\tilde f(n)}{n} \tilde \psi(\log n - \log X) + O(\varepsilon) + O(\frac{1}{T}).

Now we apply Proposition 10 and the triangle inequality to bound this by

\displaystyle  \int_{\bf R} |{\mathcal D} \tilde f(1 + it)| |\hat{\tilde \psi}(t)|\ dt + \varepsilon + \frac{1}{T}.

But we may factor

\displaystyle  \hat{\tilde \psi}(t) = \hat \psi(t) \varphi(t/T)

and we may rather crudely bound {\hat \psi(t) = O(1)}, hence we have

\displaystyle  \frac{1}{X} \sum_{n \leq X} f(n) \ll \| \tilde f \|_{FL^1([-T,T])} + \varepsilon + \frac{1}{T}.

Using the Hölder inequalities (5), (6) we have

\displaystyle  \| \tilde f \|_{FL^1([-T,T])} \leq \| f_1 \|_{FL^\infty([-T,T])} \| f_2 \|_{FL^2([-T,T])} \| f_3 \|_{FL^2([-T,T])}.

From Exercise 14 we have

\displaystyle  \| f_2 \|_{FL^2([-T,T])}^2 \ll \sum_n \frac{1}{n} (\frac{T}{n} \sum_{m: |n-m| \leq n/T} |f_2(m)|)^2.

Note that {f_2} is supported on {[X^{\varepsilon^2}, X]} and hence {n} is effectively also restricted to the range {X^{\varepsilon^2} \ll n \ll X}. From standard sieves (using (15)), we have

\displaystyle  \frac{T}{n} \sum_{m: |n-m| \leq n/T} |f_2(m)| \ll \frac{\varepsilon^{-O(1)}}{\log X}

and thus

\displaystyle \| f_2 \|_{FL^2([-T,T])}^2 \ll \frac{\varepsilon^{-O(1)}}{\log X}.


\displaystyle \| f_3 \|_{FL^2([-T,T])}^2 \ll \frac{\varepsilon^{-O(1)}}{\log X}.

Finally, from Lemma 18 one has

\displaystyle  \| f_1 \|_{FL^\infty([-T,T])} \ll \log X^{\varepsilon^2} \exp( - \inf_{|t| \leq T} \mathbb{D}( f, n \mapsto n^{-it}; X^{\varepsilon^2})^2 )

\displaystyle  \ll \log X \exp( - \inf_{|t| \leq T} \mathbb{D}( f, n \mapsto n^{-it}; X)^2)

\displaystyle  = e^{-M} \log X.

Putting all this together, we conclude that

\displaystyle \frac{1}{X} \sum_{n \leq X} f(n) = \varepsilon^{-O(1)} e^{-M} + \varepsilon + \frac{1}{T}.

Setting {\varepsilon := e^{-cM}} for some sufficiently small constant {c>0} (which in particular will ensure (15) since {M \ll \log\log X + O(1)}), we obtain the claim. \Box
One can optimise this argument to make the constant {c} in Theorem 7 arbitrarily close to {1/2}; see this previous post. With an even more refined argument, one can prove the sharper estimate

\displaystyle  \frac{1}{X} \sum_{n \leq X} f(n) \ll (1+M) e^{-M} + \frac{1}{T}

with {M := \min_{|t| \leq T} \mathbb{D}(f, n \mapsto n^{it};X)^2 )}, a result initially due to Montgomery and Tenenbaum; see Theorem 2.3.1 of this text of Granville and Soundararajan. In the case of non-negative {f}, an elementary argument gives the stronger bound {\frac{1}{X} \sum_{n \leq X} f(n) \ll \exp( - \mathbb{D}(f;1,X)^2)}; see Corollary 1.2.3 of . However, the slightly weaker estimates in Theorem \ref Let {X} be sufficiently large. Then for any real-valued {1}-bounded multiplicative function {f}, one has

\displaystyle  \frac{1}{X} \sum_{n \leq X} f(n) \ll \exp( - c \mathbb{D}(f, 1;X)^2 ).

for an absolute constant {c>0}.

Thus for instance, setting {f = \mu}, we can use Wirsing’s theorem and Exercise 24 (or Mertens’ theorem) to recover a form of the prime number theorem with a modestly decaying error term, in that

\displaystyle  \frac{1}{X} \sum_{n \leq X} \mu(n) \ll \log^{-c} X \ \ \ \ \ (16)

for all large {X} and some absolute constant {c>0}. (Admittedly, we did use the far stronger Vinogradov-Korobov estimates earlier in this set of notes; but a careful inspection reveals that those estimates were not used in the proof of (16), so this is a non-circular proof of the prime number theorem.)

— 3. The Matomaki-Radziwill theorem —

We now give the proof of the Matomaki-Radziwill theorem, though we will leave several of the details to exercises. We first make a small but convenient reduction:

Exercise 26 Show that to prove Theorem 8, it suffices to do so for functions {f} that are completely multiplicative. (This is similar to Exercise 1.)

Now we use Exercise 11 to phrase the theorem in an equivalent Fourier form:

Theorem 27 (Matomaki-Radziwill theorem, Fourier form) Let {\eta>0}, and let {1 \leq T \leq X}, with {X/T} sufficiently large depending on {\eta}. Let {\psi:{\bf R} \rightarrow {\bf R}} be a fixed smooth compactly supported function, and set {\psi_X(n) := \psi(\log n - \log X)}. Suppose that {f} is a {1}-bounded completely multiplicative function such that

\displaystyle  \| f \psi_X \|_{FL^2([-T,T])} \gg \eta. \ \ \ \ \ (17)

Then one has

\displaystyle  \mathbb{D}(f, n \mapsto n^{it};X) \ll_\eta 1

for some {t \ll_\eta T}.

Let us assume Theorem 27 for the moment and see how it implies Theorem 8. In the latter theorem we may assume without loss of generality that {\eta} is small. We may assume that {H \leq \eta^2 X}, since the case {\eta^2 X \leq H \leq X} follows easily from Theorem \reF{halasz}.

Let {\psi} be a smooth compactly supported function with {\psi(u) = e^u} on {[-10,10]}. By hypothesis, we have

\displaystyle  \frac{1}{X} \int_X^{2X} |\frac{X}{H} \sum_{x \leq n \leq x+H} \frac{f \psi_X(n)}{n}|^2\ dx \gg \eta^2. \ \ \ \ \ (18)

Let {\varphi} be a Schwartz function of mean {1} whose Fourier transform is supported on {[-1,1]}. For any {x\in [X,2X]}, we consider the expression

\displaystyle  \frac{X}{H} \int_x^{x+H} \frac{X}{\eta^2 H} \sum_n \frac{f\psi_X(n)}{n} \varphi( \frac{X}{\eta^2 H} (\log n - \log y) )\ \frac{dy}{y}. \ \ \ \ \ (19)

A routine calculation using the rapid decrease of {\varphi} shows that

\displaystyle  \frac{X}{\eta^2 H} \int_x^{x+H} \varphi( \frac{X}{\eta^2 H} (\log n - \log y) )\ \frac{dy}{y} = 1_{[x,x+H]}(n)

\displaystyle + (1 + \frac{|\log n - \log x|}{\eta^2 H/X})^{-10} + (1 + \frac{|\log n - \log(x+H)|}{\eta^2 H/X})^{-10}

and thus the expression (19) can be estimated as

\displaystyle  \frac{X}{H} \sum_{x \leq n \leq x+H} \frac{f\psi_X(n)}{n} + O(\eta^2).

By Cauchy-Schwarz we then have

\displaystyle  |\frac{X}{H} \sum_{x \leq n \leq x+H} \frac{f\psi_X(n)}{n}|^2

\displaystyle \ll \frac{X}{H} \int_x^{x+H} |\frac{X}{\eta^2 H} \sum_n \frac{f\psi_X(n)}{n} \varphi( \frac{X}{\eta^2 H} (\log n - \log y) )|^2\ \frac{dy}{y} + \eta^4.

Averaging this for {x \in [X,2X]} and using Fubini’s theorem, we have

\displaystyle  \frac{1}{X} \int_X^{2X} |\frac{X}{H} \sum_{x \leq n \leq x+H} \frac{f\psi_X(n)}{n}|^2\ dx

\displaystyle \ll \int_0^\infty |\frac{X}{\eta^2 H} \sum_n \frac{f\psi_X(n)}{n} \varphi( \frac{X}{\eta^2 H} (\log n - \log y) )|^2\ \frac{dy}{y} + \eta^4

and thus from (18) and the triangle inequality we have

\displaystyle  \int_0^\infty |\frac{X}{\eta^2 H} \sum_n \frac{f\psi_X(n)}{n} \varphi( \frac{X}{\eta^2 H} (\log n - \log y) )|^2\ \frac{dy}{y} \gg \eta^2.

On the other hand, from Exercise 11 we have

\displaystyle  \int_0^\infty |\frac{X}{\eta^2 H} \sum_n \frac{f\psi_X(n)}{n} \varphi( \frac{X}{\eta^2 H} (\log n - \log y) )|^2\ \frac{dy}{y} \ll \| f \psi_X \|_{FL^2([-T,T])}^2.

Applying Theorem 27 (with a slightly smaller value of {H} and {\eta}), we obtain the claim.

Exercise 28 In the converse direction, show that Theorem 27 is a consequence of Theorem 8.

Exercise 29 Let {f} be supported on {[X,3X]}, and let {1 \leq H \leq X}. Show that

\displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} f(n)|^2\ dx

\displaystyle \ll \frac{1}{H/X} \int_{H/X}^{3H/X} \int_0^\infty |\frac{X}{H} \sum_{x \leq n \leq (1+\theta)x} \frac{f(n)}{n}|^2\ \frac{dx}{x} d\theta.

(Hint: use summation by parts to express {\sum_{x \leq n \leq x+H} f(n)} as a suitable linear combination of sums {\sum_{x \leq n \leq (1+\theta)x} \frac{f(n)}{n}} and {\sum_{(x+H) \leq n \leq (1+\theta)(x+H)} \frac{f(n)}{n}}, then use the Cauchy-Schwarz inequality and the Fubini-Tonelli theorem.) Conclude in particular that

\displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} f(n)|^2\ dx

\displaystyle \ll \int_{\bf R} |{\mathcal D} f(1+it)|^2 (1 + \frac{|t|}{T})^{-2}\ dt

where {T := X/H}. (This argument is due to Saffari and Vaughan.)

It remains to establish Theorem 27. As before we may assume that {\eta} is small. Let us call a finitely supported arithmetic function {g} large on some subset {\Omega} of {[-X,X]} if

\displaystyle  \| g \|_{FL^2(\Omega)} \gg \eta,

and small on {\Omega} if

\displaystyle  \| g \|_{FL^2(\Omega)} \ll \eta^2.

Note that a function cannot be simultaneously large and small on the same set {\Omega}; and if a function is large on some subset {\Omega}, then it remains large on {\Omega} after modifying by any small error (assuming {\eta} is small enough, and adjusting the implied constants appropriately). From the hypothesis (17) we know that {f \psi_X} is large on {[-T,T]}. As discussed in the introduction, the strategy is now to decompose {[-T,T]} into various regions, and on each of these regions split {f \psi_X} (up to small errors) as an average of Dirichlet convolutions of other factors which enjoy either good {FL^\infty} estimates or good {FL^2} estimates on the given region.
We will need the following ranges:

  • (i) {[P_0,Q_0]} is the interval

    \displaystyle  [P_0,Q_0] := [\exp( \log^{0.9} X ), \exp( \log^{0.95} X )].

  • (ii) {[P_1,Q_1]} is the interval

    \displaystyle  [P_1,Q_1] := [\exp( (\log\log X)^2 ), \exp( (\log\log X)^3 )].

  • (iii) {[P_2,Q_2]} is the interval

    \displaystyle  [P_2,Q_2] := [\exp( (\log\log\log X)^2 ), \exp( (\log\log\log X)^3 )].

We will be able to cover the range {T \leq X/Q_0} just using arguments involving the zeroth interval {[P_0,Q_0]}; the range {T \leq X/Q_1} can be covered using arguments involving the zeroth interval {[P_0,Q_0]} and the first interval {[P_1,Q_1]}; and the range {T \leq X/Q_2} can be covered using arguments involving all three intervals {[P_0,Q_0], [P_1,Q_1], [P_2,Q_2]}. Coverage of the remaining ranges of {T} can be done by an extension of the methods given here and will be left to the exercises at the end of the notes.

We introduce some weight functions and some exceptional sets. For any {P > 10}, let {\phi: {\bf R} \rightarrow {\bf R}} be a bump function on {[-1,1]} of total mass {1}, and let {w_P} denote the arithmetic function

\displaystyle  w_P(n) := \frac{\log P}{\eta^4} \sum_{p} \phi( \frac{\log p - \log P}{\eta^4} ) 1_{n=p}

supported on the primes {p = (1+O(\eta^4)) P}. We then define the following subsets of {[-X,X]}:

  • (i) {\Omega_0} is the set of those {t \in [-X,X]} such that

    \displaystyle  |{\mathcal D}(fw_P)(1+it)| \geq \frac{1}{\log^{10} X}

    for some dyadic {P \in [P_0,Q_0]} (i.e., {P} is restricted to be a power of {2}).

  • (ii) {\Omega_1} is the set of those {t \in [-X,X]} such that

    \displaystyle  |{\mathcal D}(fw_P)(1+it)| \geq P^{-1/20}

    for some dyadic {P \in [P_1,Q_1]}.

  • (iii) {\Omega_2} is the set of those {t \in [-X,X]} such that

    \displaystyle  |{\mathcal D}(fw_P)(1+it)| \geq P^{-1/30}

    for some dyadic {P \in [P_2,Q_2]}.

We will establish the following claims:

Proposition 30

  • (i) If {T \leq X/Q_i} for some {i=0,1,2}, then {f \psi_X} is small on {[-T,T] \backslash \Omega_i}.
  • (ii) {f \psi_X} is small on {\Omega_1 \backslash \Omega_0}.
  • (iii) {f \psi_X} is small on {\Omega_2 \backslash \Omega_1}.
  • (iv) If {f \psi_X} is large on {[-T,T] \cap \Omega_0}, then one has {\mathbb{D}(f, n \mapsto n^{it};X) \ll_\eta 1} for some {t \ll_\eta T}.

Note that parts (i) (with {i=0}) and (iv) of the claims are already enough to treat the case {T \leq X/Q_0}; parts (i) (with {i=1}), (ii), and (iv) are enough to treat the case {T \leq X/Q_1}; and parts (i) (with {i=2}), (ii), (iii), and (iv) are enough to treat the case {T \leq X/Q_2}.

We first prove (i). For {i=0,1,2}, let {\tilde w_{[P_i,Q_i]}} denote the function

\displaystyle  \tilde w_{[P_i,Q_i]} := \frac{1}{\log \frac{\log Q_i}{\log P_i}} \sum_{P \in [P_i,Q_i]} \frac{1}{\log P} w_P. \ \ \ \ \ (20)

This function is a variant of the function {w_{[P_i,Q_i]}} introduced in (7). A key point is that the convolutions {1 * w_{[P_i,Q_i]}} stay close to {1}:

Exercise 31 (Turan-Kubilius inequalities) For {i=0,1,2}, show that

\displaystyle  \frac{1}{X} \sum_{X/2 \leq n \leq 4X} |1*\tilde w_{[P_i,Q_i]}(n) - 1|^2 \ll \eta^4.

(Hint: use the second moment method, as in the proof of Lemma 2 of Notes 9.)

Let {i=0,1,2}. Inserting the bounded factor {|f \psi_X(n)|^2} in the above estimates, and applying Exercise 14, we conclude in particular that the expression

\displaystyle  f \psi_X (1 * \tilde w_{[P_i,Q_i]}) - f \psi_X

is small on {[-X,X]}. Since {f} is completely multiplicative, we can write this expression as

\displaystyle  (f * (f\tilde w_{[P_i,Q_i]})) \psi_X - f \psi_X

We now perform some technical manipulations to move the cutoff {\psi_X} to a more convenient location. From (20) we have

\displaystyle (f * (f \tilde w_{[P_i,Q_i]})) \psi_X = \frac{1}{\log\frac{\log Q_i}{\log P_i}} \sum_{P \in [P_i,Q_i]} \frac{1}{\log P} (f * (f w_{P})) \psi_X.

We would like to approximate {(f*(f w_{P})) \psi_X} by {(f \psi_{X/P}) * (fw_{P})}. A brief triangle inequality calculation using the smoothness of {\psi}, the {1}-boundedness of {f}, and the narrow support of {w_{P'}} shows that

\displaystyle  (f * (f w_{P})) \psi_X = (f \psi_{X/P}) * (fw_{P}) + O( \eta^4 (1 * w_{P}) \tilde \psi_X )

where {\tilde \psi_X} is defined similarly to {\psi_X} but with a slightly larger choice of initial cutoff {\tilde \psi}. Integrating this we conclude that

\displaystyle (f * (f \tilde w_{[P_i,Q_i]})) \psi_X

\displaystyle = \frac{1}{\log\frac{\log Q_i}{\log P_i}} \sum_{P \in [P_i,Q_i]} \frac{1}{\log P} ((f\psi_{X/P}) * (f w_{P}))

\displaystyle + O( \eta^4 (1 * \tilde w_{[P_i,Q_i]}) \tilde \psi_X ).

Using Exercise 31 and Exercise 14, the error term is small on {[-X,X]}. Thus we conclude that

\displaystyle  \frac{1}{\log\frac{\log Q_i}{\log P_i}} \sum_{P \in [P_i,Q_i]} \frac{1}{\log P_i} ((f\psi_{X/P}) * (f w_{P})) - f \psi_X \ \ \ \ \ (21)

is small on {[-X,X]}, and hence also on {[-T,T] \backslash \Omega_i}. Thus by the triangle inequality, it will suffice to show that

\displaystyle  (f\psi_{X/P}) * (f w_{P})

is small on {[-T,T] \backslash \Omega_i} for each {P \in [P_i,Q_i]}. But by construction we definitely have

\displaystyle  \|fw_P \|_{FL^\infty([-T,T] \backslash \Omega_i)} \ll \eta^4

while from Exercise 14 and the hypothesis {T \leq X/Q_i \leq X/P} we have

\displaystyle  \|f \psi_{X/P} \|_{FL^2([-T,T] \backslash \Omega_i)} \ll 1

and the claim (i) now follows from (6).
We now jump to (iv). The first observation is that the set {\Omega_0} is quite small. To quantify this we use the following bound:

Proposition 32 (Large values of {{\mathcal D} w_P(1+it)}) Let {X} be sufficiently large depending on {\eta}, and let {2 \leq P \leq X^{1/2}}. Then for any {\sigma > 0}, the set

\displaystyle  \Omega := \{ t \in [-X,X]: |{\mathcal D} w_P(1+it)| \geq \sigma \}

has measure at most {\exp( (\frac{\log X}{\log P} +1) (1 + \log \frac{\log X}{\sigma \log P} ) )}.

Proof: We use the high moment method. Let {k} be a natural number to be optimised in later, and let {w_P^{*k}} be the convolution of {k} copies of {w_P}. Then on {\Omega} we have {|{\mathcal D} w_P^{*k}(1+it)| \geq \sigma^k}. Thus by Markov’s inequality, the measure of {\Omega} is at most

\displaystyle  \sigma^{-2k} \| w_P^{*k} \|_{FL^2([-X,X])}^2.

To bound this we use Exercise 14. If we choose {k} to be the first integer for which {P^k \geq X}, then this exercise gives us the bound

\displaystyle  \| w_P^{*k} \|_{FL^2([-X,X])}^2 \ll O_\eta(1)^{O(k)} \frac{1}{P^k} \sum_n w_P^{*k}(n)^2.

From the fundamental theorem of arithmetic we see that {w_P^{*k}(n) \ll k^k}, hence

\displaystyle \sum_n w_P^{*k}(n)^2 \ll k^k \sum_n w_P^{*k}(n) = k^k (\sum_n w_P(n))^k.

From the prime number theorem we have {\sum_n w_P(n) \ll_\eta P}. Putting all this together, we conclude that the measure of {\Omega} is at most

\displaystyle  O_\eta( \frac{k}{\sigma^2} )^{k}.

Since {k \leq \frac{\log X}{\log P} + 1}, we obtain the claim. \Box
Applying this proposition with {P} ranging between {P_0} and {Q_0} and {\sigma = \frac{1}{\log^{10} X}}, and applying the union bound, the we see that the measure of {\Omega_0} is at most {\exp(\log^{0.2} X)}. To exploit this, we will need some bounds of Vinogradov-Korobov type:

Exercise 33 (Vinogradov bounds) For {P \geq 10}, establish the bound

\displaystyle  {\mathcal D} w_P( 1+it ) \ll_{\eta,\varepsilon} \frac{1}{(1+|t|)^2}

\displaystyle  + \log^{O(1)}(2+|t|) \exp( - \frac{\log P}{\log^{2/3+\varepsilon}(2+|t|)} )

for any {t \in {\bf R}} and {\varepsilon>0}. (Hint: replace {w_P} with a weighted version of the von Mangoldt function, apply Proposition 7 of Notes 2, and shift the contour, using the zero free region from Exercise 43 of Notes 2.)

Now we can establish we use the following variant of the Montgomery-Halasz large values theorem (cf. Proposition 9 from Notes 6):

Proposition 34 (Montgomery-Halasz for primes) Let {X \geq 10}, let {P \in [P_0,Q_0]}, and {\Omega \subset [-X,X]} have measure {O(\exp(\log^{0.2} X))}. Then for any {1}-bounded function {f}, one has

\displaystyle  \| f w_P \|_{FL^2(\Omega)} \ll_{\eta} 1.

Proof: By duality, we may write

\displaystyle  \| f w_P \|_{FL^2(\Omega)} = \int_\Omega {\mathcal D} fw_P(1+it) G(t)\ dt

for some measurable function {G} with {\int_\Omega |G(t)|^2\ dt = 1}. We can rearrange the right-hand side as

\displaystyle  \sum_p \frac{w_P(p)}{p} f(p) \int_\Omega G(t) p^{-it}\ dt

which by Cauchy-Schwarz is boudned in magnitude by

\displaystyle  \ll \sum_p \frac{w_P(p)}{p} |\int_\Omega G(t) p^{-it}\ dt|^2.

We can rearrange this as

\displaystyle  \int_\Omega \int_\Omega G(t) \overline{G(t')} {\mathcal D} w_P(1 + i(t-t'))\ dt dt',

which by the elementary inequality {G(t) \overline{G(t')} \ll |G(t)|^2 + |G(t')|^2} is bounded by

\displaystyle  \ll \sup_{t' \in \Omega} \int_\Omega |{\mathcal D} w_P(1 + i(t-t'))|\ dt

(cf. Lemma 6 from Notes 9). By Exercise 33 we have

\displaystyle  {\mathcal D} w_P( 1+i(t-t') ) \ll_{\eta,\varepsilon} \frac{1}{(1+|t|)^2} + \exp( - \log^{0.2} X ).

The claim follows. \Box
Now we can prove (iv). By hypothesis, {f \chi_X} is large on {[-T,T] \cap \Omega_0}. On the other hand, the function (21) (with {i=0}) is small on {[-T,T] \cap \Omega_0}. By the triangle inequality, we conclude that

\displaystyle  \frac{1}{\log\frac{\log Q_0}{\log P_0}} \sum_{P \in [P_0,Q_0]} \frac{1}{\log P_0} ((f\psi_{X/P}) * (f w_{P}))

is large on {[-T,T] \cap \Omega_0}, hence by the pigeonhole principle, there exists {P \in [P_0,Q_0]} such that

\displaystyle (f\psi_{X/P}) * (f w_{P})

is large on {[-T,T] \cap \Omega_0}. On the other hand, from Proposition 32 and Prosition 34 we have

\displaystyle  \| f w_P \|_{FL^2([-T,T] \cap \Omega_0)} \ll_{\eta} 1,

hence by (6)

\displaystyle  \| f\psi_{X/P} \|_{FL^\infty([-T,T] \cap \Omega_0)} \gg_\eta 1.

By the pigeonhole principle, this implies that

\displaystyle  \| f 1_I \|_{FL^\infty([-T,T] \cap \Omega_0)} \gg_\eta 1.

for some interval {I \subset [X/10P, 10X/P]}. The claim (iv) now follows from Exercise 13. Note that this already concludes the argument in the range {T \geq X/Q_0}.
Now we establish (ii). Here the set {\Omega_1} is not as well controlled in size as {\Omega_0}, but is still quite small. Indeed, from applying Proposition 32 with {P} ranging between {P_1} and {Q_1} and {\sigma = P^{-1/20}}, and applying the union bound, the we see that the measure of {\Omega_1} is at most {X^{0.1}}. This is too large of a bound to apply Proposition 34, but we may instead apply a different bound:

Exercise 35 (Montgomery-Halasz for integers) Let {X \geq 10}, and let {\Omega \subset [-X,X]} have measure {O(X^{0.1})}. For any {1}-bounded function {f} supported on {[X^{0.8},2X]}, show that

\displaystyle  \| f \|_{FL^2(\Omega)} \ll \log X.

(It is possible to remove the logarithmic loss here by being careful, but this loss will be acceptable for our arguments. One can either repeat the arguments used to prove Proposition 34, or else appeal to Proposition 9 from Notes 6.)

The point here is that we can get good bounds even when the function {f} is supported at narrower scales (such as {X^{0.8}}) than the Fourier interval under consideration (such as {[-X,X]} or {[-T,T]}). In particular, this exercise will serve as a replacement for Exercise 14, which will not give good estimates in this case.

As before, the function (21) is small on {\Omega_1 \backslash \Omega_0}, so it will suffice by the triangle inequality to show that

\displaystyle (f\psi_{X/P}) * (f w_{P})

is small on {\Omega_1 \backslash \Omega_0} for all {P \in [P_0,Q_0]}. From Exercise 35, we have

\displaystyle  \| f \psi_{X/P} \|_{FL^2(\Omega_1 \backslash \Omega_0)} \ll \log X

while from definition of {\Omega_0} we have

\displaystyle  \| f w_P \|_{FL^\infty(\Omega_1 \backslash \Omega_0)} \leq \log^{-10} X

and the claim now follows from (6). Note that this already concludes the argument in the range {T \leq X/Q_1}.
Finally, we establish (iii). The function (21) (with {i=1}) is small on {\Omega_2 \backslash \Omega_1}, so by the triangle inequality as before it suffices to show that

\displaystyle (f\psi_{X/P}) * (f w_{P})

is small on {\Omega_2 \backslash \Omega_1} for all {P \in [P_1,Q_1]}. On the one hand, the definition of {\Omega_1} gives a good {FL^\infty} bound on {f w_P}:

\displaystyle  \| fw_P \|_{FL^\infty(\Omega_2 \backslash \Omega_1)} \ll P^{-1/20}. \ \ \ \ \ (22)

To conclude using (6) we now need a good bound for {\|f \psi_{X/P} \|_{FL^2(\Omega_2 \backslash \Omega_1)}}. Unfortunately, the function {f \psi_{X/P}} is now supported on too short of an interval for Exercise 14 to give good estimates, and {\Omega_2} is too large for Exercise 35 to be applicable either.
But from definition we see that for {t \in \Omega_2}, we have

\displaystyle  {\mathcal D}(f w_{P'})(1+it)| \geq (P')^{-1/30} \ \ \ \ \ (23)

for at least one {P' \in [P_2,Q_2]}. We can use this to amplify the power of Exercise 14:

Proposition 36 (Amplified {L^2} mean value estimate) One has {\| f \psi_{X/P} \|_{FL^2(\Omega_2)} \ll P^{1/25}}.

Proof: For each {P' \in [P_2,Q_2]}, let {\Omega_{2,P'}} denote the set of {t \in [-X,X]} where (23) holds. Since there are {O((\log\log\log X)^3)} values of {P'}, and {P \geq P_1 = \exp((\log\log X)^2)}, it suffices by the union bound to show that

\displaystyle \| f \psi_{X/P} \|_{FL^2(\Omega_{2,P'})} \ll P^{1/26}

(say) for each {P'}. Let {k} be the first integer for which the quantity {(P')^k \geq P}, thus

\displaystyle  k \leq \frac{\log P}{\log P'} + 1.

From (23) we have the pointwise bound

\displaystyle  |{\mathcal D}( f \psi_{X/P})(1+it)| \leq (P')^{k/30} |{\mathcal D}( (f \psi_{X/P}) * (f w_{P'})^{*k} )(1+it)|

and hence by Exercise 14

\displaystyle  \| f \psi_{X/P} \|_{FL^2(\Omega_{2,P'})}^2 \ll \eta^{-O(k)} (P')^{2k/30} \frac{1}{Y} \sum_{n} |(f \psi_{X/P}) * (f w_{P'})^{*k}(n)|^2

where {Y := (P')^k X/P \geq X}. As {f} is {1}-bounded, and the summand only vanishes when {n = \exp(O(k)) Y}, we can bound the right-hand side by

\displaystyle  \ll (\eta^{-O(1)} (P')^{2/30} \log P')^k \frac{1}{Y} \sum_{n = \exp(O(k)) Y} |1 * 1_A^{*k}(n)|^2

where {A} denotes the set of primes in the interval {[P'/2, 2P']}.
Suppose {n} has {l} prime factors in this interval (counting multiplicity). Then {1 * 1_A^{*k}(n)} vanishes unless {l \geq k}, in which case we can bound

\displaystyle  1 * 1_A^{*k}(n) \ll l^k


\displaystyle  1 * 1_A^{*l}(n) \geq 1.

Thus we may bound the above sum by

\displaystyle  \ll (\eta^{-O(1)} (P')^{2/30} \log P')^k \frac{1}{Y} \sum_{l \geq k} l^k \sum_{n = \exp(O(k)) Y} 1 * 1_A^{*l}(n).

By the prime number theorem, {A} has {O(P'/\log P')} elements, so by double counting we have

\displaystyle  \sum_{n = \exp(O(k)) Y} 1 * 1_A^{*l}(n) \ll O(P'/\log P')^l \exp(O(k)) Y / (P')^l

and thus the previous bound becomes

\displaystyle  \ll (\eta^{-O(1)} (P')^{2/30})^k \sum_{l \geq k} l^k O(\frac{1}{\log P'})^{l-k}

which sums to

\displaystyle  \ll (k \eta^{-O(1)} (P')^{2/30})^k.


\displaystyle  (P')^k \leq P P'

we thus have

\displaystyle \| f \psi_{X/P} \|_{FL^2(\Omega_{2,P'})} \ll (PP')^{1/30} \exp( O( k \log( k/\eta ) ) );

\displaystyle  k \ll \frac{\log P}{\log P_2} \ll (\log\log\log X)^{-2} \log P \ll (\log\log X)^3

so that {\log(k/\eta) \ll \log\log\log X}, we obtain the claim. \Box
Combining this proposition with (22) and (6), we conclude part (iii) of Proposition 30. This establishes Theorem 27 up to the range {X \leq T/Q_2}.

Exercise 37 Show that for any fixed {k \geq 2}, Theorem 27 holds in the range

\displaystyle  T \leq X \exp( - (\log_{k} X)^2 )

where {\log_k X = \log \dots \log X} denotes the {k}-fold iterated logarithm of {X}. (Hint: this is already accomplished for {k=3}. For higher {k}, one has to introduce additional exceptional intervals {\Omega_3,\dots,\Omega_{k-1}} and extend Proposition 30 appropriately.)

Exercise 38 Establish Theorem 27 (and hence Theorem 8) in full generality. (This is the preceding exercise, but now with {k} potentially as large as {O(\log_* X)}, where the inverse tower exponential function {\log_* X} is defined as the least {k} for which {\log_k X \leq 2}. Now one has to start tracking dependence on {k} of all the arguments in the above analysis; in particular, the convenient notation of arithmetic functions being “large” or “small” needs to be replaced with something more precise.)