You are currently browsing the tag archive for the ‘multiplicative functions’ tag.

In these notes we presume familiarity with the basic concepts of probability theory, such as random variables (which could take values in the reals, vectors, or other measurable spaces), probability, and expectation. Much of this theory is in turn based on measure theory, which we will also presume familiarity with. See for instance this previous set of lecture notes for a brief review.

The basic objects of study in analytic number theory are deterministic; there is nothing inherently random about the set of prime numbers, for instance. Despite this, one can still interpret many of the averages encountered in analytic number theory in probabilistic terms, by introducing random variables into the subject. Consider for instance the form

$\displaystyle \sum_{n \leq x} \mu(n) = o(x) \ \ \ \ \ (1)$

of the prime number theorem (where we take the limit ${x \rightarrow \infty}$). One can interpret this estimate probabilistically as

$\displaystyle {\mathbb E} \mu(\mathbf{n}) = o(1) \ \ \ \ \ (2)$

where ${\mathbf{n} = \mathbf{n}_{\leq x}}$ is a random variable drawn uniformly from the natural numbers up to ${x}$, and ${{\mathbb E}}$ denotes the expectation. (In this set of notes we will use boldface symbols to denote random variables, and non-boldface symbols for deterministic objects.) By itself, such an interpretation is little more than a change of notation. However, the power of this interpretation becomes more apparent when one then imports concepts from probability theory (together with all their attendant intuitions and tools), such as independence, conditioning, stationarity, total variation distance, and entropy. For instance, suppose we want to use the prime number theorem (1) to make a prediction for the sum

$\displaystyle \sum_{n \leq x} \mu(n) \mu(n+1).$

After dividing by ${x}$, this is essentially

$\displaystyle {\mathbb E} \mu(\mathbf{n}) \mu(\mathbf{n}+1).$

With probabilistic intuition, one may expect the random variables ${\mu(\mathbf{n}), \mu(\mathbf{n}+1)}$ to be approximately independent (there is no obvious relationship between the number of prime factors of ${\mathbf{n}}$, and of ${\mathbf{n}+1}$), and so the above average would be expected to be approximately equal to

$\displaystyle ({\mathbb E} \mu(\mathbf{n})) ({\mathbb E} \mu(\mathbf{n}+1))$

which by (2) is equal to ${o(1)}$. Thus we are led to the prediction

$\displaystyle \sum_{n \leq x} \mu(n) \mu(n+1) = o(x). \ \ \ \ \ (3)$

The asymptotic (3) is widely believed (it is a special case of the Chowla conjecture, which we will discuss in later notes; while there has been recent progress towards establishing it rigorously, it remains open for now.

How would one try to make these probabilistic intuitions more rigorous? The first thing one needs to do is find a more quantitative measurement of what it means for two random variables to be “approximately” independent. There are several candidates for such measurements, but we will focus in these notes on two particularly convenient measures of approximate independence: the “${L^2}$” measure of independence known as covariance, and the “${L \log L}$” measure of independence known as mutual information (actually we will usually need the more general notion of conditional mutual information that measures conditional independence). The use of ${L^2}$ type methods in analytic number theory is well established, though it is usually not described in probabilistic terms, being referred to instead by such names as the “second moment method”, the “large sieve” or the “method of bilinear sums”. The use of ${L \log L}$ methods (or “entropy methods”) is much more recent, and has been able to control certain types of averages in analytic number theory that were out of reach of previous methods such as ${L^2}$ methods. For instance, in later notes we will use entropy methods to establish the logarithmically averaged version

$\displaystyle \sum_{n \leq x} \frac{\mu(n) \mu(n+1)}{n} = o(\log x) \ \ \ \ \ (4)$

of (3), which is implied by (3) but strictly weaker (much as the prime number theorem (1) implies the bound ${\sum_{n \leq x} \frac{\mu(n)}{n} = o(\log x)}$, but the latter bound is much easier to establish than the former).

As with many other situations in analytic number theory, we can exploit the fact that certain assertions (such as approximate independence) can become significantly easier to prove if one only seeks to establish them on average, rather than uniformly. For instance, given two random variables ${\mathbf{X}}$ and ${\mathbf{Y}}$ of number-theoretic origin (such as the random variables ${\mu(\mathbf{n})}$ and ${\mu(\mathbf{n}+1)}$ mentioned previously), it can often be extremely difficult to determine the extent to which ${\mathbf{X},\mathbf{Y}}$ behave “independently” (or “conditionally independently”). However, thanks to second moment tools or entropy based tools, it is often possible to assert results of the following flavour: if ${\mathbf{Y}_1,\dots,\mathbf{Y}_k}$ are a large collection of “independent” random variables, and ${\mathbf{X}}$ is a further random variable that is “not too large” in some sense, then ${\mathbf{X}}$ must necessarily be nearly independent (or conditionally independent) to many of the ${\mathbf{Y}_i}$, even if one cannot pinpoint precisely which of the ${\mathbf{Y}_i}$ the variable ${\mathbf{X}}$ is independent with. In the case of the second moment method, this allows us to compute correlations such as ${{\mathbb E} {\mathbf X} \mathbf{Y}_i}$ for “most” ${i}$. The entropy method gives bounds that are significantly weaker quantitatively than the second moment method (and in particular, in its current incarnation at least it is only able to say non-trivial assertions involving interactions with residue classes at small primes), but can control significantly more general quantities ${{\mathbb E} F( {\mathbf X}, \mathbf{Y}_i )}$ for “most” ${i}$ thanks to tools such as the Pinsker inequality.

Joni Teräväinen and I have just uploaded to the arXiv our paper “The structure of logarithmically averaged correlations of multiplicative functions, with applications to the Chowla and Elliott conjectures“, submitted to Duke Mathematical Journal. This paper builds upon my previous paper in which I introduced an “entropy decrement method” to prove the two-point (logarithmically averaged) cases of the Chowla and Elliott conjectures. A bit more specifically, I showed that

$\displaystyle \lim_{m \rightarrow \infty} \frac{1}{\log \omega_m} \sum_{x_m/\omega_m \leq n \leq x_m} \frac{g_0(n+h_0) g_1(n+h_1)}{n} = 0$

whenever ${1 \leq \omega_m \leq x_m}$ were sequences going to infinity, ${h_0,h_1}$ were distinct integers, and ${g_0,g_1: {\bf N} \rightarrow {\bf C}}$ were ${1}$-bounded multiplicative functions which were non-pretentious in the sense that

$\displaystyle \liminf_{X \rightarrow \infty} \inf_{|t_j| \leq X} \sum_{p \leq X} \frac{1-\mathrm{Re}( g_j(p) \overline{\chi_j}(p) p^{it_j})}{p} = \infty \ \ \ \ \ (1)$

for all Dirichlet characters ${\chi_j}$ and for ${j=0,1}$. Thus, for instance, one had the logarithmically averaged two-point Chowla conjecture

$\displaystyle \sum_{n \leq x} \frac{\lambda(n) \lambda(n+h)}{n} = o(\log x)$

for fixed any non-zero ${h}$, where ${\lambda}$ was the Liouville function.

One would certainly like to extend these results to higher order correlations than the two-point correlations. This looks to be difficult (though perhaps not completely impossible if one allows for logarithmic averaging): in a previous paper I showed that achieving this in the context of the Liouville function would be equivalent to resolving the logarithmically averaged Sarnak conjecture, as well as establishing logarithmically averaged local Gowers uniformity of the Liouville function. However, in this paper we are able to avoid having to resolve these difficult conjectures to obtain partial results towards the (logarithmically averaged) Chowla and Elliott conjecture. For the Chowla conjecture, we can obtain all odd order correlations, in that

$\displaystyle \sum_{n \leq x} \frac{\lambda(n+h_1) \dots \lambda(n+h_k)}{n} = o(\log x) \ \ \ \ \ (2)$

for all odd ${k}$ and all integers ${h_1,\dots,h_k}$ (which, in the odd order case, are no longer required to be distinct). (Superficially, this looks like we have resolved “half of the cases” of the logarithmically averaged Chowla conjecture; but it seems the odd order correlations are significantly easier than the even order ones. For instance, because of the Katai-Bourgain-Sarnak-Ziegler criterion, one can basically deduce the odd order cases of (2) from the even order cases (after allowing for some dilations in the argument ${n}$).

For the more general Elliott conjecture, we can show that

$\displaystyle \lim_{m \rightarrow \infty} \frac{1}{\log \omega_m} \sum_{x_m/\omega_m \leq n \leq x_m} \frac{g_1(n+h_1) \dots g_k(n+h_k)}{n} = 0$

for any ${k}$, any integers ${h_1,\dots,h_k}$ and any bounded multiplicative functions ${g_1,\dots,g_k}$, unless the product ${g_1 \dots g_k}$ weakly pretends to be a Dirichlet character ${\chi}$ in the sense that

$\displaystyle \sum_{p \leq X} \frac{1 - \hbox{Re}( g_1 \dots g_k(p) \overline{\chi}(p)}{p} = o(\log\log X).$

This can be seen to imply (2) as a special case. Even when ${g_1,\dots,g_k}$ does pretend to be a Dirichlet character ${\chi}$, we can still say something: if the limits

$\displaystyle f(a) := \lim_{m \rightarrow \infty} \frac{1}{\log \omega_m} \sum_{x_m/\omega_m \leq n \leq x_m} \frac{g_1(n+ah_1) \dots g_k(n+ah_k)}{n}$

exist for each ${a \in {\bf Z}}$ (which can be guaranteed if we pass to a suitable subsequence), then ${f}$ is the uniform limit of periodic functions ${f_i}$, each of which is ${\chi}$isotypic in the sense that ${f_i(ab) = f_i(a) \chi(b)}$ whenever ${a,b}$ are integers with ${b}$ coprime to the periods of ${\chi}$ and ${f_i}$. This does not pin down the value of any single correlation ${f(a)}$, but does put significant constraints on how these correlations may vary with ${a}$.

Among other things, this allows us to show that all ${16}$ possible length four sign patterns ${(\lambda(n+1),\dots,\lambda(n+4)) \in \{-1,+1\}^4}$ of the Liouville function occur with positive density, and all ${65}$ possible length four sign patterns ${(\mu(n+1),\dots,\mu(n+4)) \in \{-1,0,+1\}^4 \backslash \{-1,+1\}^4}$ occur with the conjectured logarithmic density. (In a previous paper with Matomaki and Radziwill, we obtained comparable results for length three patterns of Liouville and length two patterns of Möbius.)

To describe the argument, let us focus for simplicity on the case of the Liouville correlations

$\displaystyle f(a) := \lim_{X \rightarrow \infty} \frac{1}{\log X} \sum_{n \leq X} \frac{\lambda(n) \lambda(n+a) \dots \lambda(n+(k-1)a)}{n}, \ \ \ \ \ (3)$

assuming for sake of discussion that all limits exist. (In the paper, we instead use the device of generalised limits, as discussed in this previous post.) The idea is to combine together two rather different ways to control this function ${f}$. The first proceeds by the entropy decrement method mentioned earlier, which roughly speaking works as follows. Firstly, we pick a prime ${p}$ and observe that ${\lambda(pn)=-\lambda(n)}$ for any ${n}$, which allows us to rewrite (3) as

$\displaystyle (-1)^k f(a) = \lim_{X \rightarrow \infty} \frac{1}{\log X}$

$\displaystyle \sum_{n \leq X} \frac{\lambda(pn) \lambda(pn+ap) \dots \lambda(pn+(k-1)ap)}{n}.$

Making the change of variables ${n' = pn}$, we obtain

$\displaystyle (-1)^k f(a) = \lim_{X \rightarrow \infty} \frac{1}{\log X}$

$\displaystyle \sum_{n' \leq pX} \frac{\lambda(n') \lambda(n'+ap) \dots \lambda(n'+(k-1)ap)}{n'} p 1_{p|n'}.$

The difference between ${n' \leq pX}$ and ${n' \leq X}$ is negligible in the limit (here is where we crucially rely on the log-averaging), hence

$\displaystyle (-1)^k f(a) = \lim_{X \rightarrow \infty} \frac{1}{\log X} \sum_{n \leq X} \frac{\lambda(n) \lambda(n+ap) \dots \lambda(n+(k-1)ap)}{n} p 1_{p|n}$

and thus by (3) we have

$\displaystyle (-1)^k f(a) = f(ap) + \lim_{X \rightarrow \infty} \frac{1}{\log X}$

$\displaystyle \sum_{n \leq X} \frac{\lambda(n) \lambda(n+ap) \dots \lambda(n+(k-1)ap)}{n} (p 1_{p|n}-1).$

The entropy decrement argument can be used to show that the latter limit is small for most ${p}$ (roughly speaking, this is because the factors ${p 1_{p|n}-1}$ behave like independent random variables as ${p}$ varies, so that concentration of measure results such as Hoeffding’s inequality can apply, after using entropy inequalities to decouple somewhat these random variables from the ${\lambda}$ factors). We thus obtain the approximate isotopy property

$\displaystyle (-1)^k f(a) \approx f(ap) \ \ \ \ \ (4)$

for most ${a}$ and ${p}$.

On the other hand, by the Furstenberg correspondence principle (as discussed in these previous posts), it is possible to express ${f(a)}$ as a multiple correlation

$\displaystyle f(a) = \int_X g(x) g(T^a x) \dots g(T^{(k-1)a} x)\ d\mu(x)$

for some probability space ${(X,\mu)}$ equipped with a measure-preserving invertible map ${T: X \rightarrow X}$. Using results of Bergelson-Host-Kra, Leibman, and Le, this allows us to obtain a decomposition of the form

$\displaystyle f(a) = f_1(a) + f_2(a) \ \ \ \ \ (5)$

where ${f_1}$ is a nilsequence, and ${f_2}$ goes to zero in density (even along the primes, or constant multiples of the primes). The original work of Bergelson-Host-Kra required ergodicity on ${X}$, which is very definitely a hypothesis that is not available here; however, the later work of Leibman removed this hypothesis, and the work of Le refined the control on ${f_1}$ so that one still has good control when restricting to primes, or constant multiples of primes.

Ignoring the small error ${f_2(a)}$, we can now combine (5) to conclude that

$\displaystyle f(a) \approx (-1)^k f_1(ap).$

Using the equidistribution theory of nilsequences (as developed in this previous paper of Ben Green and myself), one can break up ${f_1}$ further into a periodic piece ${f_0}$ and an “irrational” or “minor arc” piece ${f_3}$. The contribution of the minor arc piece ${f_3}$ can be shown to mostly cancel itself out after dilating by primes ${p}$ and averaging, thanks to Vinogradov-type bilinear sum estimates (transferred to the primes). So we end up with

$\displaystyle f(a) \approx (-1)^k f_0(ap),$

which already shows (heuristically, at least) the claim that ${f}$ can be approximated by periodic functions ${f_0}$ which are isotopic in the sense that

$\displaystyle f_0(a) \approx (-1)^k f_0(ap).$

But if ${k}$ is odd, one can use Dirichlet’s theorem on primes in arithmetic progressions to restrict to primes ${p}$ that are ${1}$ modulo the period of ${f_0}$, and conclude now that ${f_0}$ vanishes identically, which (heuristically, at least) gives (2).

The same sort of argument works to give the more general bounds on correlations of bounded multiplicative functions. But for the specific task of proving (2), we initially used a slightly different argument that avoids using the ergodic theory machinery of Bergelson-Host-Kra, Leibman, and Le, but replaces it instead with the Gowers uniformity norm theory used to count linear equations in primes. Basically, by averaging (4) in ${p}$ using the “${W}$-trick”, as well as known facts about the Gowers uniformity of the von Mangoldt function, one can obtain an approximation of the form

$\displaystyle (-1)^k f(a) \approx {\bf E}_{b: (b,W)=1} f(ab)$

where ${b}$ ranges over a large range of integers coprime to some primorial ${W = \prod_{p \leq w} p}$. On the other hand, by iterating (4) we have

$\displaystyle f(a) \approx f(apq)$

for most semiprimes ${pq}$, and by again averaging over semiprimes one can obtain an approximation of the form

$\displaystyle f(a) \approx {\bf E}_{b: (b,W)=1} f(ab).$

For ${k}$ odd, one can combine the two approximations to conclude that ${f(a)=0}$. (This argument is not given in the current paper, but we plan to detail it in a subsequent one.)

I’ve just uploaded two related papers to the arXiv:

This pair of papers is an outgrowth of these two recent blog posts and the ensuing discussion. In the first paper, we establish the following logarithmically averaged version of the Chowla conjecture (in the case ${k=2}$ of two-point correlations (or “pair correlations”)):

Theorem 1 (Logarithmically averaged Chowla conjecture) Let ${a_1,a_2}$ be natural numbers, and let ${b_1,b_2}$ be integers such that ${a_1 b_2 - a_2 b_1 \neq 0}$. Let ${1 \leq \omega(x) \leq x}$ be a quantity depending on ${x}$ that goes to infinity as ${x \rightarrow \infty}$. Let ${\lambda}$ denote the Liouville function. Then one has

$\displaystyle \sum_{x/\omega(x) < n \leq x} \frac{\lambda(a_1 n + b_1) \lambda(a_2 n+b_2)}{n} = o( \log \omega(x) ) \ \ \ \ \ (1)$

as ${x \rightarrow \infty}$.

Thus for instance one has

$\displaystyle \sum_{n \leq x} \frac{\lambda(n) \lambda(n+1)}{n} = o(\log x). \ \ \ \ \ (2)$

For comparison, the non-averaged Chowla conjecture would imply that

$\displaystyle \sum_{n \leq x} \lambda(n) \lambda(n+1) = o(x) \ \ \ \ \ (3)$

which is a strictly stronger estimate than (2), and remains open.

The arguments also extend to other completely multiplicative functions than the Liouville function. In particular, one obtains a slightly averaged version of the non-asymptotic Elliott conjecture that was shown in the previous blog post to imply a positive solution to the Erdos discrepancy problem. The averaged version of the conjecture established in this paper is slightly weaker than the one assumed in the previous blog post, but it turns out that the arguments there can be modified without much difficulty to accept this averaged Elliott conjecture as input. In particular, we obtain an unconditional solution to the Erdos discrepancy problem as a consequence; this is detailed in the second paper listed above. In fact we can also handle the vector-valued version of the Erdos discrepancy problem, in which the sequence ${f(1), f(2), \dots}$ takes values in the unit sphere of an arbitrary Hilbert space, rather than in ${\{-1,+1\}}$.

Estimates such as (2) or (3) are known to be subject to the “parity problem” (discussed numerous times previously on this blog), which roughly speaking means that they cannot be proven solely using “linear” estimates on functions such as the von Mangoldt function. However, it is known that the parity problem can be circumvented using “bilinear” estimates, and this is basically what is done here.

We now describe in informal terms the proof of Theorem 1, focusing on the model case (2) for simplicity. Suppose for contradiction that the left-hand side of (2) was large and (say) positive. Using the multiplicativity ${\lambda(pn) = -\lambda(n)}$, we conclude that

$\displaystyle \sum_{n \leq x} \frac{\lambda(n) \lambda(n+p) 1_{p|n}}{n}$

is also large and positive for all primes ${p}$ that are not too large; note here how the logarithmic averaging allows us to leave the constraint ${n \leq x}$ unchanged. Summing in ${p}$, we conclude that

$\displaystyle \sum_{n \leq x} \frac{ \sum_{p \in {\mathcal P}} \lambda(n) \lambda(n+p) 1_{p|n}}{n}$

is large and positive for any given set ${{\mathcal P}}$ of medium-sized primes. By a standard averaging argument, this implies that

$\displaystyle \frac{1}{H} \sum_{j=1}^H \sum_{p \in {\mathcal P}} \lambda(n+j) \lambda(n+p+j) 1_{p|n+j} \ \ \ \ \ (4)$

is large for many choices of ${n}$, where ${H}$ is a medium-sized parameter at our disposal to choose, and we take ${{\mathcal P}}$ to be some set of primes that are somewhat smaller than ${H}$. (A similar approach was taken in this recent paper of Matomaki, Radziwill, and myself to study sign patterns of the Möbius function.) To obtain the required contradiction, one thus wants to demonstrate significant cancellation in the expression (4). As in that paper, we view ${n}$ as a random variable, in which case (4) is essentially a bilinear sum of the random sequence ${(\lambda(n+1),\dots,\lambda(n+H))}$ along a random graph ${G_{n,H}}$ on ${\{1,\dots,H\}}$, in which two vertices ${j, j+p}$ are connected if they differ by a prime ${p}$ in ${{\mathcal P}}$ that divides ${n+j}$. A key difficulty in controlling this sum is that for randomly chosen ${n}$, the sequence ${(\lambda(n+1),\dots,\lambda(n+H))}$ and the graph ${G_{n,H}}$ need not be independent. To get around this obstacle we introduce a new argument which we call the “entropy decrement argument” (in analogy with the “density increment argument” and “energy increment argument” that appear in the literature surrounding Szemerédi’s theorem on arithmetic progressions, and also reminiscent of the “entropy compression argument” of Moser and Tardos, discussed in this previous post). This argument, which is a simple consequence of the Shannon entropy inequalities, can be viewed as a quantitative version of the standard subadditivity argument that establishes the existence of Kolmogorov-Sinai entropy in topological dynamical systems; it allows one to select a scale parameter ${H}$ (in some suitable range ${[H_-,H_+]}$) for which the sequence ${(\lambda(n+1),\dots,\lambda(n+H))}$ and the graph ${G_{n,H}}$ exhibit some weak independence properties (or more precisely, the mutual information between the two random variables is small).

Informally, the entropy decrement argument goes like this: if the sequence ${(\lambda(n+1),\dots,\lambda(n+H))}$ has significant mutual information with ${G_{n,H}}$, then the entropy of the sequence ${(\lambda(n+1),\dots,\lambda(n+H'))}$ for ${H' > H}$ will grow a little slower than linearly, due to the fact that the graph ${G_{n,H}}$ has zero entropy (knowledge of ${G_{n,H}}$ more or less completely determines the shifts ${G_{n+kH,H}}$ of the graph); this can be formalised using the classical Shannon inequalities for entropy (and specifically, the non-negativity of conditional mutual information). But the entropy cannot drop below zero, so by increasing ${H}$ as necessary, at some point one must reach a metastable region (cf. the finite convergence principle discussed in this previous blog post), within which very little mutual information can be shared between the sequence ${(\lambda(n+1),\dots,\lambda(n+H))}$ and the graph ${G_{n,H}}$. Curiously, for the application it is not enough to have a purely quantitative version of this argument; one needs a quantitative bound (which gains a factor of a bit more than ${\log H}$ on the trivial bound for mutual information), and this is surprisingly delicate (it ultimately comes down to the fact that the series ${\sum_{j \geq 2} \frac{1}{j \log j \log\log j}}$ diverges, which is only barely true).

Once one locates a scale ${H}$ with the low mutual information property, one can use standard concentration of measure results such as the Hoeffding inequality to approximate (4) by the significantly simpler expression

$\displaystyle \frac{1}{H} \sum_{j=1}^H \sum_{p \in {\mathcal P}} \frac{\lambda(n+j) \lambda(n+p+j)}{p}. \ \ \ \ \ (5)$

The important thing here is that Hoeffding’s inequality gives exponentially strong bounds on the failure probability, which is needed to counteract the logarithms that are inevitably present whenever trying to use entropy inequalities. The expression (5) can then be controlled in turn by an application of the Hardy-Littlewood circle method and a non-trivial estimate

$\displaystyle \sup_\alpha \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n) e(\alpha n)|\ dx = o(1) \ \ \ \ \ (6)$

for averaged short sums of a modulated Liouville function established in another recent paper by Matomäki, Radziwill and myself.

When one uses this method to study more general sums such as

$\displaystyle \sum_{n \leq x} \frac{g_1(n) g_2(n+1)}{n},$

one ends up having to consider expressions such as

$\displaystyle \frac{1}{H} \sum_{j=1}^H \sum_{p \in {\mathcal P}} c_p \frac{g_1(n+j) g_2(n+p+j)}{p}.$

where ${c_p}$ is the coefficient ${c_p := \overline{g_1}(p) \overline{g_2}(p)}$. When attacking this sum with the circle method, one soon finds oneself in the situation of wanting to locate the large Fourier coefficients of the exponential sum

$\displaystyle S(\alpha) := \sum_{p \in {\mathcal P}} \frac{c_p}{p} e^{2\pi i \alpha p}.$

In many cases (such as in the application to the Erdös discrepancy problem), the coefficient ${c_p}$ is identically ${1}$, and one can understand this sum satisfactorily using the classical results of Vinogradov: basically, ${S(\alpha)}$ is large when ${\alpha}$ lies in a “major arc” and is small when it lies in a “minor arc”. For more general functions ${g_1,g_2}$, the coefficients ${c_p}$ are more or less arbitrary; the large values of ${S(\alpha)}$ are no longer confined to the major arc case. Fortunately, even in this general situation one can use a restriction theorem for the primes established some time ago by Ben Green and myself to show that there are still only a bounded number of possible locations ${\alpha}$ (up to the uncertainty mandated by the Heisenberg uncertainty principle) where ${S(\alpha)}$ is large, and we can still conclude by using (6). (Actually, as recently pointed out to me by Ben, one does not need the full strength of our result; one only needs the ${L^4}$ restriction theorem for the primes, which can be proven fairly directly using Plancherel’s theorem and some sieve theory.)

It is tempting to also use the method to attack higher order cases of the (logarithmically) averaged Chowla conjecture, for instance one could try to prove the estimate

$\displaystyle \sum_{n \leq x} \frac{\lambda(n) \lambda(n+1) \lambda(n+2)}{n} = o(\log x).$

The above arguments reduce matters to obtaining some non-trivial cancellation for sums of the form

$\displaystyle \frac{1}{H} \sum_{j=1}^H \sum_{p \in {\mathcal P}} \frac{\lambda(n+j) \lambda(n+p+j) \lambda(n+2p+j)}{p}.$

A little bit of “higher order Fourier analysis” (as was done for very similar sums in the ergodic theory context by Frantzikinakis-Host-Kra and Wooley-Ziegler) lets one control this sort of sum if one can establish a bound of the form

$\displaystyle \frac{1}{X} \int_X^{2X} \sup_\alpha |\frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n) e(\alpha n)|\ dx = o(1) \ \ \ \ \ (7)$

where ${X}$ goes to infinity and ${H}$ is a very slowly growing function of ${X}$. This looks very similar to (6), but the fact that the supremum is now inside the integral makes the problem much more difficult. However it looks worth attacking (7) further, as this estimate looks like it should have many nice applications (beyond just the ${k=3}$ case of the logarithmically averaged Chowla or Elliott conjectures, which is already interesting).

For higher ${k}$ than ${k=3}$, the same line of analysis requires one to replace the linear phase ${e(\alpha n)}$ by more complicated phases, such as quadratic phases ${e(\alpha n^2 + \beta n)}$ or even ${k-2}$-step nilsequences. Given that (7) is already beyond the reach of current literature, these even more complicated expressions are also unavailable at present, but one can imagine that they will eventually become tractable, in which case we would obtain an averaged form of the Chowla conjecture for all ${k}$, which would have a number of consequences (such as a logarithmically averaged version of Sarnak’s conjecture, as per this blog post).

It would of course be very nice to remove the logarithmic averaging, and be able to establish bounds such as (3). I did attempt to do so, but I do not see a way to use the entropy decrement argument in a manner that does not require some sort of averaging of logarithmic type, as it requires one to pick a scale ${H}$ that one cannot specify in advance, which is not a problem for logarithmic averages (which are quite stable with respect to dilations) but is problematic for ordinary averages. But perhaps the problem can be circumvented by some clever modification of the argument. One possible approach would be to start exploiting multiplicativity at products of primes, and not just individual primes, to try to keep the scale fixed, but this makes the concentration of measure part of the argument much more complicated as one loses some independence properties (coming from the Chinese remainder theorem) which allowed one to conclude just from the Hoeffding inequality.

The Chowla conjecture asserts that all non-trivial correlations of the Liouville function are asymptotically negligible; for instance, it asserts that

$\displaystyle \sum_{n \leq X} \lambda(n) \lambda(n+h) = o(X)$

as ${X \rightarrow \infty}$ for any fixed natural number ${h}$. This conjecture remains open, though there are a number of partial results (e.g. these two previous results of Matomaki, Radziwill, and myself).

A natural generalisation of Chowla’s conjecture was proposed by Elliott. For simplicity we will only consider Elliott’s conjecture for the pair correlations

$\displaystyle \sum_{n \leq X} g(n) \overline{g}(n+h).$

For such correlations, the conjecture was that one had

$\displaystyle \sum_{n \leq X} g(n) \overline{g}(n+h) = o(X) \ \ \ \ \ (1)$

as ${X \rightarrow \infty}$ for any natural number ${h}$, as long as ${g}$ was a completely multiplicative function with magnitude bounded by ${1}$, and such that

$\displaystyle \sum_p \hbox{Re} \frac{1 - g(p) \overline{\chi(p)} p^{-it}}{p} = +\infty \ \ \ \ \ (2)$

for any Dirichlet character ${\chi}$ and any real number ${t}$. In the language of “pretentious number theory”, as developed by Granville and Soundararajan, the hypothesis (2) asserts that the completely multiplicative function ${g}$ does not “pretend” to be like the completely multiplicative function ${n \mapsto \chi(n) n^{it}}$ for any character ${\chi}$ and real number ${t}$. A condition of this form is necessary; for instance, if ${g(n)}$ is precisely equal to ${\chi(n) n^{it}}$ and ${\chi}$ has period ${q}$, then ${g(n) \overline{g}(n+q)}$ is equal to ${1_{(n,q)=1} + o(1)}$ as ${n \rightarrow \infty}$ and (1) clearly fails. The prime number theorem in arithmetic progressions implies that the Liouville function obeys (2), and so the Elliott conjecture contains the Chowla conjecture as a special case.

As it turns out, Elliott’s conjecture is false as stated, with the counterexample ${g}$ having the property that ${g}$ “pretends” locally to be the function ${n \mapsto n^{it_j}}$ for ${n}$ in various intervals ${[1, X_j]}$, where ${X_j}$ and ${t_j}$ go to infinity in a certain prescribed sense. See this paper of Matomaki, Radziwill, and myself for details. However, we view this as a technicality, and continue to believe that certain “repaired” versions of Elliott’s conjecture still hold. For instance, our counterexample does not apply when ${g}$ is restricted to be real-valued rather than complex, and we believe that Elliott’s conjecture is valid in this setting. Returning to the complex-valued case, we still expect the asymptotic (1) provided that the condition (2) is replaced by the stronger condition

$\displaystyle \sup_{|t| \leq X} |\sum_{p \leq X} \hbox{Re} \frac{1 - g(p) \overline{\chi(p)} p^{-it}}{p}| \rightarrow +\infty$

as ${X \rightarrow +\infty}$ for all fixed Dirichlet characters ${\chi}$. In our paper we supported this claim by establishing a certain “averaged” version of this conjecture; see that paper for further details. (See also this recent paper of Frantzikinakis and Host which establishes a different averaged version of this conjecture.)

One can make a stronger “non-asymptotic” version of this corrected Elliott conjecture, in which the ${X}$ parameter does not go to infinity, or equivalently that the function ${g}$ is permitted to depend on ${X}$:

Conjecture 1 (Non-asymptotic Elliott conjecture) Let ${\varepsilon > 0}$, let ${A \geq 1}$ be sufficiently large depending on ${\varepsilon}$, and let ${X}$ be sufficiently large depending on ${A,\varepsilon}$. Suppose that ${g}$ is a completely multiplicative function with magnitude bounded by ${1}$, such that

$\displaystyle \inf_{|t| \leq AX} |\sum_{p \leq X} \hbox{Re} \frac{1 - g(p) \overline{\chi(p)} p^{-it}}{p}| \geq A$

for all Dirichlet characters ${\chi}$ of period at most ${A}$. Then one has

$\displaystyle |\sum_{n \leq X} g(n) \overline{g(n+h)}| \leq \varepsilon X$

for all natural numbers ${1 \leq h \leq 1/\varepsilon}$.

The ${\varepsilon}$-dependent factor ${A}$ in the constraint ${|t| \leq AX}$ is necessary, as can be seen by considering the completely multiplicative function ${g(n) := n^{2iX}}$ (for instance). Again, the results in my previous paper with Matomaki and Radziwill can be viewed as establishing an averaged version of this conjecture.

Meanwhile, we have the following conjecture that is the focus of the Polymath5 project:

Conjecture 2 (Erdös discrepancy conjecture) For any function ${f: {\bf N} \rightarrow \{-1,+1\}}$, the discrepancy

$\displaystyle \sup_{n,d \in {\bf N}} |\sum_{j=1}^n f(jd)|$

is infinite.

It is instructive to compute some near-counterexamples to Conjecture 2 that illustrate the difficulty of the Erdös discrepancy problem. The first near-counterexample is that of a non-principal Dirichlet character ${f(n) = \chi(n)}$ that takes values in ${\{-1,0,+1\}}$ rather than ${\{-1,+1\}}$. For this function, one has from the complete multiplicativity of ${\chi}$ that

$\displaystyle |\sum_{j=1}^n f(jd)| = |\sum_{j=1}^n \chi(j) \chi(d)|$

$\displaystyle \leq |\sum_{j=1}^n \chi(j)|.$

If ${q}$ denotes the period of ${\chi}$, then ${\chi}$ has mean zero on every interval of length ${q}$, and thus

$\displaystyle |\sum_{j=1}^n f(jd)| \leq |\sum_{j=1}^n \chi(j)| \leq q.$

Thus ${\chi}$ has bounded discrepancy.

Of course, this is not a true counterexample to Conjecture 2 because ${\chi}$ can take the value ${0}$. Let us now consider the following variant example, which is the simplest member of a family of examples studied by Borwein, Choi, and Coons. Let ${\chi = \chi_3}$ be the non-principal Dirichlet character of period ${3}$ (thus ${\chi(n)}$ equals ${+1}$ when ${n=1 \hbox{ mod } 3}$, ${-1}$ when ${n = 2 \hbox{ mod } 3}$, and ${0}$ when ${n = 0 \hbox{ mod } 3}$), and define the completely multiplicative function ${f = \tilde \chi: {\bf N} \rightarrow \{-1,+1\}}$ by setting ${\tilde \chi(p) := \chi(p)}$ when ${p \neq 3}$ and ${\tilde \chi(3) = +1}$. This is about the simplest modification one can make to the previous near-counterexample to eliminate the zeroes. Now consider the sum

$\displaystyle \sum_{j=1}^n \tilde \chi(j)$

with ${n := 1 + 3 + 3^2 + \dots + 3^k}$ for some large ${k}$. Writing ${j = 3^a m}$ with ${m}$ coprime to ${3}$ and ${a}$ at most ${k}$, we can write this sum as

$\displaystyle \sum_{a=0}^k \sum_{1 \leq m \leq n/3^j} \tilde \chi(3^a m).$

Now observe that ${\tilde \chi(3^a m) = \tilde \chi(3)^a \tilde \chi(m) = \chi(m)}$. The function ${\chi}$ has mean zero on every interval of length three, and ${\lfloor n/3^j\rfloor}$ is equal to ${1}$ mod ${3}$, and thus

$\displaystyle \sum_{1 \leq m \leq n/3^j} \tilde \chi(3^a m) = 1$

for every ${a=0,\dots,k}$, and thus

$\displaystyle \sum_{j=1}^n \tilde \chi(j) = k+1 \gg \log n.$

Thus ${\tilde \chi}$ also has unbounded discrepancy, but only barely so (it grows logarithmically in ${n}$). These examples suggest that the main “enemy” to proving Conjecture 2 comes from completely multiplicative functions ${f}$ that somehow “pretend” to be like a Dirichlet character but do not vanish at the zeroes of that character. (Indeed, the special case of Conjecture 2 when ${f}$ is completely multiplicative is already open, appears to be an important subcase.)

All of these conjectures remain open. However, I would like to record in this blog post the following striking connection, illustrating the power of the Elliott conjecture (particularly in its nonasymptotic formulation):

Theorem 3 (Elliott conjecture implies unbounded discrepancy) Conjecture 1 implies Conjecture 2.

The argument relies heavily on two observations that were previously made in connection with the Polymath5 project. The first is a Fourier-analytic reduction that replaces the Erdos Discrepancy Problem with an averaged version for completely multiplicative functions ${g}$. An application of Cauchy-Schwarz then shows that any counterexample to that version will violate the conclusion of Conjecture 1, so if one assumes that conjecture then ${g}$ must pretend to be like a function of the form ${n \mapsto \chi(n) n^{it}}$. One then uses (a generalisation) of a second argument from Polymath5 to rule out this case, basically by reducing matters to a more complicated version of the Borwein-Choi-Coons analysis. Details are provided below the fold.

There is some hope that the Chowla and Elliott conjectures can be attacked, as the parity barrier which is so impervious to attack for the twin prime conjecture seems to be more permeable in this setting. (For instance, in my previous post I raised a possible approach, based on establishing expander properties of a certain random graph, which seems to get around the parity problem, in principle at least.)

(Update, Sep 25: fixed some treatment of error terms, following a suggestion of Andrew Granville.)

In analytic number theory, an arithmetic function is simply a function ${f: {\bf N} \rightarrow {\bf C}}$ from the natural numbers ${{\bf N} = \{1,2,3,\dots\}}$ to the real or complex numbers. (One occasionally also considers arithmetic functions taking values in more general rings than ${{\bf R}}$ or ${{\bf C}}$, as in this previous blog post, but we will restrict attention here to the classical situation of real or complex arithmetic functions.) Experience has shown that a particularly tractable and relevant class of arithmetic functions for analytic number theory are the multiplicative functions, which are arithmetic functions ${f: {\bf N} \rightarrow {\bf C}}$ with the additional property that

$\displaystyle f(nm) = f(n) f(m) \ \ \ \ \ (1)$

whenever ${n,m \in{\bf N}}$ are coprime. (One also considers arithmetic functions, such as the logarithm function ${L(n) := \log n}$ or the von Mangoldt function, that are not genuinely multiplicative, but interact closely with multiplicative functions, and can be viewed as “derived” versions of multiplicative functions; see this previous post.) A typical example of a multiplicative function is the divisor function

$\displaystyle \tau(n) := \sum_{d|n} 1 \ \ \ \ \ (2)$

that counts the number of divisors of a natural number ${n}$. (The divisor function ${n \mapsto \tau(n)}$ is also denoted ${n \mapsto d(n)}$ in the literature.) The study of asymptotic behaviour of multiplicative functions (and their relatives) is known as multiplicative number theory, and is a basic cornerstone of modern analytic number theory.

There are various approaches to multiplicative number theory, each of which focuses on different asymptotic statistics of arithmetic functions ${f}$. In elementary multiplicative number theory, which is the focus of this set of notes, particular emphasis is given on the following two statistics of a given arithmetic function ${f: {\bf N} \rightarrow {\bf C}}$:

1. The summatory functions

$\displaystyle \sum_{n \leq x} f(n)$

of an arithmetic function ${f}$, as well as the associated natural density

$\displaystyle \lim_{x \rightarrow \infty} \frac{1}{x} \sum_{n \leq x} f(n)$

(if it exists).

2. The logarithmic sums

$\displaystyle \sum_{n\leq x} \frac{f(n)}{n}$

of an arithmetic function ${f}$, as well as the associated logarithmic density

$\displaystyle \lim_{x \rightarrow \infty} \frac{1}{\log x} \sum_{n \leq x} \frac{f(n)}{n}$

(if it exists).

Here, we are normalising the arithmetic function ${f}$ being studied to be of roughly unit size up to logarithms, obeying bounds such as ${f(n)=O(1)}$, ${f(n) = O(\log^{O(1)} n)}$, or at worst

$\displaystyle f(n) = O(n^{o(1)}). \ \ \ \ \ (3)$

A classical case of interest is when ${f}$ is an indicator function ${f=1_A}$ of some set ${A}$ of natural numbers, in which case we also refer to the natural or logarithmic density of ${f}$ as the natural or logarithmic density of ${A}$ respectively. However, in analytic number theory it is usually more convenient to replace such indicator functions with other related functions that have better multiplicative properties. For instance, the indicator function ${1_{\mathcal P}}$ of the primes is often replaced with the von Mangoldt function ${\Lambda}$.

Typically, the logarithmic sums are relatively easy to control, but the summatory functions require more effort in order to obtain satisfactory estimates; see Exercise 7 below.

If an arithmetic function ${f}$ is multiplicative (or closely related to a multiplicative function), then there is an important further statistic on an arithmetic function ${f}$ beyond the summatory function and the logarithmic sum, namely the Dirichlet series

$\displaystyle {\mathcal D}f(s) := \sum_{n=1}^\infty \frac{f(n)}{n^s} \ \ \ \ \ (4)$

for various real or complex numbers ${s}$. Under the hypothesis (3), this series is absolutely convergent for real numbers ${s>1}$, or more generally for complex numbers ${s}$ with ${\hbox{Re}(s)>1}$. As we will see below the fold, when ${f}$ is multiplicative then the Dirichlet series enjoys an important Euler product factorisation which has many consequences for analytic number theory.

In the elementary approach to multiplicative number theory presented in this set of notes, we consider Dirichlet series only for real numbers ${s>1}$ (and focusing particularly on the asymptotic behaviour as ${s \rightarrow 1^+}$); in later notes we will focus instead on the important complex-analytic approach to multiplicative number theory, in which the Dirichlet series (4) play a central role, and are defined not only for complex numbers with large real part, but are often extended analytically or meromorphically to the rest of the complex plane as well.

Remark 1 The elementary and complex-analytic approaches to multiplicative number theory are the two classical approaches to the subject. One could also consider a more “Fourier-analytic” approach, in which one studies convolution-type statistics such as

$\displaystyle \sum_n \frac{f(n)}{n} G( t - \log n ) \ \ \ \ \ (5)$

as ${t \rightarrow \infty}$ for various cutoff functions ${G: {\bf R} \rightarrow {\bf C}}$, such as smooth, compactly supported functions. See this previous blog post for an instance of such an approach. Another related approach is the “pretentious” approach to multiplicative number theory currently being developed by Granville-Soundararajan and their collaborators. We will occasionally make reference to these more modern approaches in these notes, but will primarily focus on the classical approaches.

To reverse the process and derive control on summatory functions or logarithmic sums starting from control of Dirichlet series is trickier, and usually requires one to allow ${s}$ to be complex-valued rather than real-valued if one wants to obtain really accurate estimates; we will return to this point in subsequent notes. However, there is a cheap way to get upper bounds on such sums, known as Rankin’s trick, which we will discuss later in these notes.

The basic strategy of elementary multiplicative theory is to first gather useful estimates on the statistics of “smooth” or “non-oscillatory” functions, such as the constant function ${n \mapsto 1}$, the harmonic function ${n \mapsto \frac{1}{n}}$, or the logarithm function ${n \mapsto \log n}$; one also considers the statistics of periodic functions such as Dirichlet characters. These functions can be understood without any multiplicative number theory, using basic tools from real analysis such as the (quantitative version of the) integral test or summation by parts. Once one understands the statistics of these basic functions, one can then move on to statistics of more arithmetically interesting functions, such as the divisor function (2) or the von Mangoldt function ${\Lambda}$ that we will discuss below. A key tool to relate these functions to each other is that of Dirichlet convolution, which is an operation that interacts well with summatory functions, logarithmic sums, and particularly well with Dirichlet series.

This is only an introduction to elementary multiplicative number theory techniques. More in-depth treatments may be found in this text of Montgomery-Vaughan, or this text of Bateman-Diamond.