One of the basic problems in analytic number theory is to estimate sums of the form

\displaystyle  \sum_{p<x} f(p)

as {x \rightarrow \infty}, where {p} ranges over primes and {f} is some explicit function of interest (e.g. a linear phase function {f(p) = e^{2\pi i \alpha p}} for some real number {\alpha}). This is essentially the same task as obtaining estimates on the sum

\displaystyle  \sum_{n<x} \Lambda(n) f(n)

where {\Lambda} is the von Mangoldt function. If {f} is bounded, {f(n)=O(1)}, then from the prime number theorem one has the trivial bound

\displaystyle  \sum_{n<x} \Lambda(n) f(n) = O(x)

but often (when {f} is somehow “oscillatory” in nature) one is seeking the refinement

\displaystyle  \sum_{n<x} \Lambda(n) f(n) = o(x) \ \ \ \ \ (1)

or equivalently

\displaystyle  \sum_{p<x} f(p) = o(\frac{x}{\log x}). \ \ \ \ \ (2)

Thanks to identities such as

\displaystyle  \Lambda(n) = \sum_{d|n} \mu(d) \log(\frac{n}{d}), \ \ \ \ \ (3)

where {\mu} is the Möbius function, refinements such as (1) are similar in spirit to estimates of the form

\displaystyle  \sum_{n<x} \mu(n) f(n) = o(x). \ \ \ \ \ (4)

Unfortunately, the connection between (1) and (4) is not particularly tight; roughly speaking, one needs to improve the bounds in (4) (and variants thereof) by about two factors of {\log x} before one can use identities such as (3) to recover (1). Still, one generally thinks of (1) and (4) as being “morally” equivalent, even if they are not formally equivalent.

When {f} is oscillating in a sufficiently “irrational” way, then one standard way to proceed is the method of Type I and Type II sums, which uses truncated versions of divisor identities such as (3) to expand out either (1) or (4) into linear (Type I) or bilinear sums (Type II) with which one can exploit the oscillation of {f}. For instance, Vaughan’s identity lets one rewrite the sum in (1) as the sum of the Type I sum

\displaystyle  \sum_{d \leq U} \mu(d) (\sum_{V/d \leq r \leq x/d} (\log r) f(rd)),

the Type I sum

\displaystyle  -\sum_{d \leq UV} a(d) \sum_{V/d \leq r \leq x/d} f(rd),

the Type II sum

\displaystyle  -\sum_{V \leq d \leq x/U} \sum_{U < m \leq x/V} \Lambda(d) b(m) f(dm),

and the error term {\sum_{d \leq V} \Lambda(n) f(n)}, whenever {1 \leq U, V \leq x} are parameters, and {a, b} are the sequences

\displaystyle  a(d) := \sum_{e \leq U, f \leq V: ef = d} \Lambda(d) \mu(e)


\displaystyle  b(m) := \sum_{d|m: d \leq U} \mu(d).

Similarly one can express (4) as the Type I sum

\displaystyle  -\sum_{d \leq UV} c(d) \sum_{UV/d \leq r \leq x/d} f(rd),

the Type II sum

\displaystyle  - \sum_{V < d \leq x/U} \sum_{U < m \leq x/d} \mu(m) b(d) f(dm)

and the error term {\sum_{d \leq UV} \mu(n) f(N)}, whenever {1 \leq U,V \leq x} with {UV \leq x}, and {c} is the sequence

\displaystyle  c(d) := \sum_{e \leq U, f \leq V: ef = d} \mu(d) \mu(e).

After eliminating troublesome sequences such as {a(), b(), c()} via Cauchy-Schwarz or the triangle inequality, one is then faced with the task of estimating Type I sums such as

\displaystyle  \sum_{r \leq y} f(rd)

or Type II sums such as

\displaystyle  \sum_{r \leq y} f(rd) \overline{f(rd')}

for various {y, d, d' \geq 1}. Here, the trivial bound is {O(y)}, but due to a number of logarithmic inefficiencies in the above method, one has to obtain bounds that are more like {O( \frac{y}{\log^C y})} for some constant {C} (e.g. {C=5}) in order to end up with an asymptotic such as (1) or (4).

However, in a recent paper of Bourgain, Sarnak, and Ziegler, it was observed that as long as one is only seeking the Mobius orthogonality (4) rather than the von Mangoldt orthogonality (1), one can avoid losing any logarithmic factors, and rely purely on qualitative equidistribution properties of {f}. A special case of their orthogonality criterion (which actually dates back to an earlier paper of Katai, as was pointed out to me by Nikos Frantzikinakis) is as follows:

Proposition 1 (Orthogonality criterion) Let {f: {\bf N} \rightarrow {\bf C}} be a bounded function such that

\displaystyle  \sum_{n \leq x} f(pn) \overline{f(qn)} = o(x) \ \ \ \ \ (5)

for any distinct primes {p, q} (where the decay rate of the error term {o(x)} may depend on {p} and {q}). Then

\displaystyle  \sum_{n \leq x} \mu(n) f(n) =o(x). \ \ \ \ \ (6)

Actually, the Bourgain-Sarnak-Ziegler paper establishes a more quantitative version of this proposition, in which {\mu} can be replaced by an arbitrary bounded multiplicative function, but we will content ourselves with the above weaker special case. (See also these notes of Harper, which uses the Katai argument to give a slightly weaker quantitative bound in the same spirit.) This criterion can be viewed as a multiplicative variant of the classical van der Corput lemma, which in our notation asserts that {\sum_{n \leq x} f(n) = o(x)} if one has {\sum_{n \leq x} f(n+h) \overline{f(n)} = o(x)} for each fixed non-zero {h}.

As a sample application, Proposition 1 easily gives a proof of the asymptotic

\displaystyle  \sum_{n \leq x} \mu(n) e^{2\pi i \alpha n} = o(x)

for any irrational {\alpha}. (For rational {\alpha}, this is a little trickier, as it is basically equivalent to the prime number theorem in arithmetic progressions.) The paper of Bourgain, Sarnak, and Ziegler also apply this criterion to nilsequences (obtaining a quick proof of a qualitative version of a result of Ben Green and myself, see these notes of Ziegler for details) and to horocycle flows (for which no Möbius orthogonality result was previously known).

Informally, the connection between (5) and (6) comes from the multiplicative nature of the Möbius function. If (6) failed, then {\mu(n)} exhibits strong correlation with {f(n)}; by change of variables, we then expect {\mu(pn)} to correlate with {f(pn)} and {\mu(pm)} to correlate with {f(qn)}, for “typical” {p,q} at least. On the other hand, since {\mu} is multiplicative, {\mu(pn)} exhibits strong correlation with {\mu(qn)}. Putting all this together (and pretending correlation is transitive), this would give the claim (in the contrapositive). Of course, correlation is not quite transitive, but it turns out that one can use the Cauchy-Schwarz inequality as a substitute for transitivity of correlation in this case.

I will give a proof of Proposition 1 below the fold (which is not quite based on the argument in the above mentioned paper, but on a variant of that argument communicated to me by Tamar Ziegler, and also independently discovered by Adam Harper). The main idea is to exploit the following observation: if {P} is a “large” but finite set of primes (in the sense that the sum {A := \sum_{p \in P} \frac{1}{p}} is large), then for a typical large number {n} (much larger than the elements of {P}), the number of primes in {P} that divide {n} is pretty close to {A = \sum_{p \in P} \frac{1}{p}}:

\displaystyle  \sum_{p \in P: p|n} 1 \approx A. \ \ \ \ \ (7)

A more precise formalisation of this heuristic is provided by the Turan-Kubilius inequality, which is proven by a simple application of the second moment method.

In particular, one can sum (7) against {\mu(n) f(n)} and obtain an approximation

\displaystyle  \sum_{n \leq x} \mu(n) f(n) \approx \frac{1}{A} \sum_{p \in P} \sum_{n \leq x: p|n} \mu(n) f(n)

that approximates a sum of {\mu(n) f(n)} by a bunch of sparser sums of {\mu(n) f(n)}. Since

\displaystyle  x = \frac{1}{A} \sum_{p \in P} \frac{x}{p},

we see (heuristically, at least) that in order to establish (4), it would suffice to establish the sparser estimates

\displaystyle  \sum_{n \leq x: p|n} \mu(n) f(n) = o(\frac{x}{p})

for all {p \in P} (or at least for “most” {p \in P}).

Now we make the change of variables {n = pm}. As the Möbius function is multiplicative, we usually have {\mu(n) = \mu(p) \mu(m) = - \mu(m)}. (There is an exception when {n} is divisible by {p^2}, but this will be a rare event and we will be able to ignore it.) So it should suffice to show that

\displaystyle  \sum_{m \leq x/p} \mu(m) f(pm) = o( x/p )

for most {p \in P}. However, by the hypothesis (5), the sequences {m \mapsto f(pm)} are asymptotically orthogonal as {p} varies, and this claim will then follow from a Cauchy-Schwarz argument.

— 1. Rigorous proof —

We will need a slowly growing function {H = H(x)} of {x}, with {H(x) \rightarrow \infty} as {x \rightarrow \infty}, to be chosen later. As the sum of reciprocals of primes diverges, we see that

\displaystyle  \sum_{p < H} \frac{1}{p} \rightarrow \infty

as {x \rightarrow \infty}. It will also be convenient to eliminate small primes. Note that we may find an even slower growing function {W = W(x)} of {x}, with {W(x) \rightarrow \infty} as {x \rightarrow \infty}, such that

\displaystyle  \sum_{W \leq p < H} \frac{1}{p} \rightarrow \infty.

Although it is not terribly important, we will take {W} and {H} to be powers of two. Thus, if we set {P} to be all the primes between {W} and {H}, the quantity

\displaystyle  A := \sum_{p \in P} \frac{1}{p}

goes to infinity as {x \rightarrow \infty}.

Lemma 2 (Turan-Kubilius inequality) One has

\displaystyle  \sum_{n \leq x} |\sum_{p \in P: p|n} 1 - A|^2 \ll A x. \ \ \ \ \ (8)

Proof: We have

\displaystyle  \sum_{n \leq x} \sum_{p \in P: p|n} 1 = \sum_{p \in P} \sum_{n \leq x: p|n} 1.

On the other hand, we have

\displaystyle  \sum_{n \leq x: p|n} 1 = \frac{x}{p} + O(1)

and thus (if {H} is sufficiently slowly growing)

\displaystyle  \sum_{n \leq x} \sum_{p \in P: p|n} 1 = x A + O(x).

Similarly, we have

\displaystyle  \sum_{n \leq x} (\sum_{p \in P: p|n} 1)^2 = \sum_{p,q \in P} \sum_{n \leq x: p|n, q|n} 1.

The expression {\sum_{n \leq x: p|n, q|n} 1} is equal to {\frac{x}{p}+O(1)} when {q=p}, and {\frac{x}{pq}} when {p \neq q}. A brief calculation then shows that

\displaystyle  \sum_{n \leq x} (\sum_{p \in P: p|n} 1)^2 = x A^2 + O(A x)

if {H} is sufficiently slowly growing. Inserting these bounds into (8), the claim follows. \Box

From (8) and the Cauchy-Schwarz inequality, one has

\displaystyle  \sum_{n \leq x} (\sum_{p \in P: p|n} 1 - A) \mu(n) f(n) = O( A^{1/2} x )

which we rearrange as

\displaystyle  \sum_{n \leq x} \mu(n) f(n) = \frac{1}{A} \sum_{p \in P} \sum_{n \leq x: p|n} \mu(n) f(n) + O( A^{-1/2} x ).

Since {A} goes to infinity, the {O(A^{-1/2} x)} term is {o(x)}, so it now suffices to show that

\displaystyle  \sum_{p \in P} \sum_{n \leq x: p|n} \mu(n) f(n) = o( A x ).

Write {n=pm}. Then we have {\mu(n) f(n) = - \mu(m) f(pm)} for all but {O(x/p^2)} values of {n} (if {H} is sufficiently slowly growing). The exceptional values contribute at most

\displaystyle  \sum_{p \in P} \frac{x}{p^2} = \sum_{p \in P} \frac{x}{Wp} = O(Ax/W) = o(Ax)

which is acceptable. Thus it suffices to show that

\displaystyle  \sum_{p \in P} \sum_{m \leq x/p} \mu(m) f(pm) = o(Ax).

Partitioning into dyadic blocks, it suffices to show that

\displaystyle  \sum_{p \in P_k} \sum_{m \leq x/p} \mu(m) f(pm) = o( |P_k| x / 2^k )

uniformly for {W \leq 2^k < H}, where {P_k} are the primes between {2^k} and {2^{k+1}}.

Fix {k}. The left-hand side can be rewritten as

\displaystyle  \sum_{m \leq x/2^k} \mu(m) \sum_{p \in P_k} f(pm) 1_{m \leq x/p}

so by the Cauchy-Schwarz inequality it suffices to show that

\displaystyle  \sum_{m \leq x/2^k} |\sum_{p \in P_k} f(pm) 1_{m \leq x/p}|^2 = o( |P_k|^2 x / 2^k ).

We can rearrange the left-hand side as

\displaystyle  \sum_{p,q \in P_k} \sum_{m \leq \min(x/p,x/q)} f(pm) \overline{f(qm)}.

Now if {H} is sufficiently slowly growing as a function of {x}, we see from (5) that for all distinct {p,q \leq H}, we have

\displaystyle  \sum_{m \leq \min(x/p,x/q)} f(pm) \overline{f(qm)} = o( x / 2^k )

uniformly in {p,q}; meanwhile, for {p=q}, we have the crude bound

\displaystyle  \sum_{m \leq \min(x/p,x/q)} f(pm) \overline{f(qm)} = O( x / 2^k ).

The claim follows (noting from the prime number theorem that {|P_k| = o(|P_k|^2)}).

— 2. From Möbius to von Mangoldt? —

It would be great if one could pass from the Möbius asymptotic orthogonality (4) to the von Mangoldt asymptotic orthgonality (1) (or equivalently, to (2)), as this would give some new information about the distribution of primes. Unfortunately, it seems that some additional input is needed to do so. Here is a simple example of a conditional implication that requires an additional input, namely some quantitative control on “Type I” sums:

Proposition 3 Let {f: {\bf N} \rightarrow {\bf C}} be a bounded function such that

\displaystyle  \sum_{n \leq x} \mu(n) f(dn) = o(x) \ \ \ \ \ (9)

for each fixed {d \geq 1} (with the decay rate allowed to depend on {d}). Suppose also that one has the Type I bound

\displaystyle  \sum_{1 \leq m \leq M} \sup_{y \leq x} |\sum_{n \leq y} f(mn)| \ll \frac{M x}{\log^{2+\epsilon} x} \ \ \ \ \ (10)

for all {M, x \geq 2} and some absolute constant {\epsilon>0}, where the implied constant is independent of both {M} and {x}. Then one has

\displaystyle  \sum_{n \leq x} \Lambda(n) f(n) = o(x) \ \ \ \ \ (11)

and thus (by discarding the prime powers and summing by parts)

\displaystyle  \sum_{p \in x} f(p) = o(\frac{x}{\log x}).

Proof: We use the Dirichlet hyperbola method. Using (3), one can write the left-hand side of (11) as

\displaystyle  \sum_{dm \leq x} \mu(m) (\log d) f(dm).

We let {D = D(x)} be a slowly growing function of {x} to be chosen later, and split this sum as

\displaystyle  \sum_{d \leq D} \log d \sum_{m \leq x/d} \mu(m) f(dm) + \sum_{m < x/D} \mu(m) \sum_{D < d \leq x/m} (\log d) f(dm). \ \ \ \ \ (12)

If {D} is sufficiently slowly growing, then by (9) one has { \sum_{m \leq x/d} \mu(m) f(dm) = o(x)} uniformly for all {d \leq D}. If {D} is sufficiently slowly growing, this implies that the first term in (12) is also {o(x)}. As for the second term, we dyadically decompose it and bound it in absolute value by

\displaystyle  \sum_{2^k < x/D} \sum_{2^k \leq m < 2^{k+1}} |\sum_{D < d \leq x/m} (\log d) f(dm)|. \ \ \ \ \ (13)

By summation by parts, we can bound

\displaystyle  |\sum_{D < d \leq x/m} (\log d) f(dm)| \ll \log(x/2^k) \sup_{y \leq x/2^k} |\sum_{n \leq y} f(mn)|

and so by (10), we can bound (13) by

\displaystyle  \ll \sum_{2^k < x/D} \frac{x}{\log^{1+\epsilon}(x/2^k)}.

This sum evaluates to {O( x / D^\epsilon )}, and the claim follows since {D} goes to infinity. \Box

Note that the trivial bound on (10) is {Mx}, so one needs to gain about two logarithmic factors over the trivial bound in order to use the above proposition. The presence of the supremum is annoying, but it can be removed by a modification of the argument if one improves the bound by an additional logarithm by a variety of methods (e.g. completion of sums), or by smoothing out the constraint {n \leq x}. However, I do not know of a way to remove the need to improve the trivial bound by two logarithmic factors.