You are currently browsing the tag archive for the ‘Vaughan identity’ tag.

A fundamental and recurring problem in analytic number theory is to demonstrate the presence of cancellation in an oscillating sum, a typical example of which might be a correlation

$\displaystyle \sum_{n} f(n) \overline{g(n)} \ \ \ \ \ (1)$

between two arithmetic functions ${f: {\bf N} \rightarrow {\bf C}}$ and ${g: {\bf N} \rightarrow {\bf C}}$, which to avoid technicalities we will assume to be finitely supported (or that the ${n}$ variable is localised to a finite range, such as ${\{ n: n \leq x \}}$). A key example to keep in mind for the purposes of this set of notes is the twisted von Mangoldt summatory function

$\displaystyle \sum_{n \leq x} \Lambda(n) \overline{\chi(n)} \ \ \ \ \ (2)$

that measures the correlation between the primes and a Dirichlet character ${\chi}$. One can get a “trivial” bound on such sums from the triangle inequality

$\displaystyle |\sum_{n} f(n) \overline{g(n)}| \leq \sum_{n} |f(n)| |g(n)|;$

for instance, from the triangle inequality and the prime number theorem we have

$\displaystyle |\sum_{n \leq x} \Lambda(n) \overline{\chi(n)}| \leq x + o(x) \ \ \ \ \ (3)$

as ${x \rightarrow \infty}$. But the triangle inequality is insensitive to the phase oscillations of the summands, and often we expect (e.g. from the probabilistic heuristics from Supplement 4) to be able to improve upon the trivial triangle inequality bound by a substantial amount; in the best case scenario, one typically expects a “square root cancellation” that gains a factor that is roughly the square root of the number of summands. (For instance, for Dirichlet characters ${\chi}$ of conductor ${O(x^{O(1)})}$, it is expected from probabilistic heuristics that the left-hand side of (3) should in fact be ${O_\varepsilon(x^{1/2+\varepsilon})}$ for any ${\varepsilon>0}$.)

It has proven surprisingly difficult, however, to establish significant cancellation in many of the sums of interest in analytic number theory, particularly if the sums do not have a strong amount of algebraic structure (e.g. multiplicative structure) which allow for the deployment of specialised techniques (such as multiplicative number theory techniques). In fact, we are forced to rely (to an embarrassingly large extent) on (many variations of) a single basic tool to capture at least some cancellation, namely the Cauchy-Schwarz inequality. In fact, in many cases the classical case

$\displaystyle |\sum_n f(n) \overline{g(n)}| \leq (\sum_n |f(n)|^2)^{1/2} (\sum_n |g(n)|^2)^{1/2}, \ \ \ \ \ (4)$

considered by Cauchy, where at least one of ${f, g: {\bf N} \rightarrow {\bf C}}$ is finitely supported, suffices for applications. Roughly speaking, the Cauchy-Schwarz inequality replaces the task of estimating a cross-correlation between two different functions ${f,g}$, to that of measuring self-correlations between ${f}$ and itself, or ${g}$ and itself, which are usually easier to compute (albeit at the cost of capturing less cancellation). Note that the Cauchy-Schwarz inequality requires almost no hypotheses on the functions ${f}$ or ${g}$, making it a very widely applicable tool.

There is however some skill required to decide exactly how to deploy the Cauchy-Schwarz inequality (and in particular, how to select ${f}$ and ${g}$); if applied blindly, one loses all cancellation and can even end up with a worse estimate than the trivial bound. For instance, if one tries to bound (2) directly by applying Cauchy-Schwarz with the functions ${\Lambda}$ and ${\chi}$, one obtains the bound

$\displaystyle |\sum_{n \leq x} \Lambda(n) \overline{\chi(n)}| \leq (\sum_{n \leq x} \Lambda(n)^2)^{1/2} (\sum_{n \leq x} |\chi(n)|^2)^{1/2}.$

The right-hand side may be bounded by ${\ll x \log^{1/2} x}$, but this is worse than the trivial bound (3) by a logarithmic factor. This can be “blamed” on the fact that ${\Lambda}$ and ${\chi}$ are concentrated on rather different sets (${\Lambda}$ is concentrated on primes, while ${\chi}$ is more or less uniformly distributed amongst the natural numbers); but even if one corrects for this (e.g. by weighting Cauchy-Schwarz with some suitable “sieve weight” that is more concentrated on primes), one still does not do any better than (3). Indeed, the Cauchy-Schwarz inequality suffers from the same key weakness as the triangle inequality: it is insensitive to the phase oscillation of the factors ${f, g}$.

While the Cauchy-Schwarz inequality can be poor at estimating a single correlation such as (1), its power improves when considering an average (or sum, or square sum) of multiple correlations. In this set of notes, we will focus on one such situation of this type, namely that of trying to estimate a square sum

$\displaystyle (\sum_{j=1}^J |\sum_{n} f(n) \overline{g_j(n)}|^2)^{1/2} \ \ \ \ \ (5)$

that measures the correlations of a single function ${f: {\bf N} \rightarrow {\bf C}}$ with multiple other functions ${g_j: {\bf N} \rightarrow {\bf C}}$. One should think of the situation in which ${f}$ is a “complicated” function, such as the von Mangoldt function ${\Lambda}$, but the ${g_j}$ are relatively “simple” functions, such as Dirichlet characters. In the case when the ${g_j}$ are orthonormal functions, we of course have the classical Bessel inequality:

Lemma 1 (Bessel inequality) Let ${g_1,\dots,g_J: {\bf N} \rightarrow {\bf C}}$ be finitely supported functions obeying the orthonormality relationship

$\displaystyle \sum_n g_j(n) \overline{g_{j'}(n)} = 1_{j=j'}$

for all ${1 \leq j,j' \leq J}$. Then for any function ${f: {\bf N} \rightarrow {\bf C}}$, we have

$\displaystyle (\sum_{j=1}^J |\sum_{n} f(n) \overline{g_j(n)}|^2)^{1/2} \leq (\sum_n |f(n)|^2)^{1/2}.$

For sake of comparison, if one were to apply the Cauchy-Schwarz inequality (4) separately to each summand in (5), one would obtain the bound of ${J^{1/2} (\sum_n |f(n)|^2)^{1/2}}$, which is significantly inferior to the Bessel bound when ${J}$ is large. Geometrically, what is going on is this: the Cauchy-Schwarz inequality (4) is only close to sharp when ${f}$ and ${g}$ are close to parallel in the Hilbert space ${\ell^2({\bf N})}$. But if ${g_1,\dots,g_J}$ are orthonormal, then it is not possible for any other vector ${f}$ to be simultaneously close to parallel to too many of these orthonormal vectors, and so the inner products of ${f}$ with most of the ${g_j}$ should be small. (See this previous blog post for more discussion of this principle.) One can view the Bessel inequality as formalising a repulsion principle: if ${f}$ correlates too much with some of the ${g_j}$, then it does not have enough “energy” to have large correlation with the rest of the ${g_j}$.

In analytic number theory applications, it is useful to generalise the Bessel inequality to the situation in which the ${g_j}$ are not necessarily orthonormal. This can be accomplished via the Cauchy-Schwarz inequality:

Proposition 2 (Generalised Bessel inequality) Let ${g_1,\dots,g_J: {\bf N} \rightarrow {\bf C}}$ be finitely supported functions, and let ${\nu: {\bf N} \rightarrow {\bf R}^+}$ be a non-negative function. Let ${f: {\bf N} \rightarrow {\bf C}}$ be such that ${f}$ vanishes whenever ${\nu}$ vanishes, we have

$\displaystyle (\sum_{j=1}^J |\sum_{n} f(n) \overline{g_j(n)}|^2)^{1/2} \leq (\sum_n |f(n)|^2 / \nu(n))^{1/2} \ \ \ \ \ (6)$

$\displaystyle \times ( \sum_{j=1}^J \sum_{j'=1}^J c_j \overline{c_{j'}} \sum_n \nu(n) g_j(n) \overline{g_{j'}(n)} )^{1/2}$

for some sequence ${c_1,\dots,c_J}$ of complex numbers with ${\sum_{j=1}^J |c_j|^2 = 1}$, with the convention that ${|f(n)|^2/\nu(n)}$ vanishes whenever ${f(n), \nu(n)}$ both vanish.

Note by relabeling that we may replace the domain ${{\bf N}}$ here by any other at most countable set, such as the integers ${{\bf Z}}$. (Indeed, one can give an analogue of this lemma on arbitrary measure spaces, but we will not do so here.) This result first appears in this paper of Boas.

Proof: We use the method of duality to replace the role of the function ${f}$ by a dual sequence ${c_1,\dots,c_J}$. By the converse to Cauchy-Schwarz, we may write the left-hand side of (6) as

$\displaystyle \sum_{j=1}^J \overline{c_j} \sum_{n} f(n) \overline{g_j(n)}$

for some complex numbers ${c_1,\dots,c_J}$ with ${\sum_{j=1}^J |c_j|^2 = 1}$. Indeed, if all of the ${\sum_{n} f(n) \overline{g_j(n)}}$ vanish, we can set the ${c_j}$ arbitrarily, otherwise we set ${(c_1,\dots,c_J)}$ to be the unit vector formed by dividing ${(\sum_{n} f(n) \overline{g_j(n)})_{j=1}^J}$ by its length. We can then rearrange this expression as

$\displaystyle \sum_n f(n) \overline{\sum_{j=1}^J c_j g_j(n)}.$

Applying Cauchy-Schwarz (dividing the first factor by ${\nu(n)^{1/2}}$ and multiplying the second by ${\nu(n)^{1/2}}$, after first removing those ${n}$ for which ${\nu(n)}$ vanish), this is bounded by

$\displaystyle (\sum_n |f(n)|^2 / \nu(n))^{1/2} (\sum_n \nu(n) |\sum_{j=1}^J c_j g_j(n)|^2)^{1/2},$

and the claim follows by expanding out the second factor. $\Box$

Observe that Lemma 1 is a special case of Proposition 2 when ${\nu=1}$ and the ${g_j}$ are orthonormal. In general, one can expect Proposition 2 to be useful when the ${g_j}$ are almost orthogonal relative to ${\nu}$, in that the correlations ${\sum_n \nu(n) g_j(n) \overline{g_{j'}(n)}}$ tend to be small when ${j,j'}$ are distinct. In that case, one can hope for the diagonal term ${j=j'}$ in the right-hand side of (6) to dominate, in which case one can obtain estimates of comparable strength to the classical Bessel inequality. The flexibility to choose different weights ${\nu}$ in the above proposition has some technical advantages; for instance, if ${f}$ is concentrated in a sparse set (such as the primes), it is sometimes useful to tailor ${\nu}$ to a comparable set (e.g. the almost primes) in order not to lose too much in the first factor ${\sum_n |f(n)|^2 / \nu(n)}$. Also, it can be useful to choose a fairly “smooth” weight ${\nu}$, in order to make the weighted correlations ${\sum_n \nu(n) g_j(n) \overline{g_{j'}(n)}}$ small.

Remark 3 In harmonic analysis, the use of tools such as Proposition 2 is known as the method of almost orthogonality, or the ${TT^*}$ method. The explanation for the latter name is as follows. For sake of exposition, suppose that ${\nu}$ is never zero (or we remove all ${n}$ from the domain for which ${\nu(n)}$ vanishes). Given a family of finitely supported functions ${g_1,\dots,g_J: {\bf N} \rightarrow {\bf C}}$, consider the linear operator ${T: \ell^2(\nu^{-1}) \rightarrow \ell^2(\{1,\dots,J\})}$ defined by the formula

$\displaystyle T f := ( \sum_{n} f(n) \overline{g_j(n)} )_{j=1}^J.$

This is a bounded linear operator, and the left-hand side of (6) is nothing other than the ${\ell^2(\{1,\dots,J\})}$ norm of ${Tf}$. Without any further information on the function ${f}$ other than its ${\ell^2(\nu^{-1})}$ norm ${(\sum_n |f(n)|^2 / \nu(n))^{1/2}}$, the best estimate one can obtain on (6) here is clearly

$\displaystyle (\sum_n |f(n)|^2 / \nu(n))^{1/2} \times \|T\|_{op},$

where ${\|T\|_{op}}$ denotes the operator norm of ${T}$.

The adjoint ${T^*: \ell^2(\{1,\dots,J\}) \rightarrow \ell^2(\nu^{-1})}$ is easily computed to be

$\displaystyle T^* (c_j)_{j=1}^J := (\sum_{j=1}^J c_j \nu(n) g_j(n) )_{n \in {\bf N}}.$

The composition ${TT^*: \ell^2(\{1,\dots,J\}) \rightarrow \ell^2(\{1,\dots,J\})}$ of ${T}$ and its adjoint is then given by

$\displaystyle TT^* (c_j)_{j=1}^J := (\sum_{j=1}^J c_j \sum_n \nu(n) g_j(n) \overline{g_{j'}}(n) )_{j=1}^J.$

From the spectral theorem (or singular value decomposition), one sees that the operator norms of ${T}$ and ${TT^*}$ are related by the identity

$\displaystyle \|T\|_{op} = \|TT^*\|_{op}^{1/2},$

and as ${TT^*}$ is a self-adjoint, positive semi-definite operator, the operator norm ${\|TT^*\|_{op}}$ is also the supremum of the quantity

$\displaystyle \langle TT^* (c_j)_{j=1}^J, (c_j)_{j=1}^J \rangle_{\ell^2(\{1,\dots,J\})} = \sum_{j=1}^J \sum_{j'=1}^J c_j \overline{c_{j'}} \sum_n \nu(n) g_j(n) \overline{g_{j'}(n)}$

where ${(c_j)_{j=1}^J}$ ranges over unit vectors in ${\ell^2(\{1,\dots,J\})}$. Putting these facts together, we obtain Proposition 2; furthermore, we see from this analysis that the bound here is essentially optimal if the only information one is allowed to use about ${f}$ is its ${\ell^2(\nu^{-1})}$ norm.

For further discussion of almost orthogonality methods from a harmonic analysis perspective, see Chapter VII of this text of Stein.

Exercise 4 Under the same hypotheses as Proposition 2, show that

$\displaystyle \sum_{j=1}^J |\sum_{n} f(n) \overline{g_j(n)}| \leq (\sum_n |f(n)|^2 / \nu(n))^{1/2}$

$\displaystyle \times ( \sum_{j=1}^J \sum_{j'=1}^J |\sum_n \nu(n) g_j(n) \overline{g_{j'}(n)}| )^{1/2}$

as well as the variant inequality

$\displaystyle |\sum_{j=1}^J \sum_{n} f(n) \overline{g_j(n)}| \leq (\sum_n |f(n)|^2 / \nu(n))^{1/2}$

$\displaystyle \times | \sum_{j=1}^J \sum_{j'=1}^J \sum_n \nu(n) g_j(n) \overline{g_{j'}(n)}|^{1/2}.$

Proposition 2 has many applications in analytic number theory; for instance, we will use it in later notes to control the large value of Dirichlet series such as the Riemann zeta function. One of the key benefits is that it largely eliminates the need to consider further correlations of the function ${f}$ (other than its self-correlation ${\sum_n |f(n)|^2 / \nu(n)}$ relative to ${\nu^{-1}}$, which is usually fairly easy to compute or estimate as ${\nu}$ is usually chosen to be relatively simple); this is particularly useful if ${f}$ is a function which is significantly more complicated to analyse than the functions ${g_j}$. Of course, the tradeoff for this is that one now has to deal with the coefficients ${c_j}$, which if anything are even less understood than ${f}$, since literally the only thing we know about these coefficients is their square sum ${\sum_{j=1}^J |c_j|^2}$. However, as long as there is enough almost orthogonality between the ${g_j}$, one can estimate the ${c_j}$ by fairly crude estimates (e.g. triangle inequality or Cauchy-Schwarz) and still get reasonably good estimates.

In this set of notes, we will use Proposition 2 to prove some versions of the large sieve inequality, which controls a square-sum of correlations

$\displaystyle \sum_n f(n) e( -\xi_j n )$

of an arbitrary finitely supported function ${f: {\bf Z} \rightarrow {\bf C}}$ with various additive characters ${n \mapsto e( \xi_j n)}$ (where ${e(x) := e^{2\pi i x}}$), or alternatively a square-sum of correlations

$\displaystyle \sum_n f(n) \overline{\chi_j(n)}$

of ${f}$ with various primitive Dirichlet characters ${\chi_j}$; it turns out that one can prove a (slightly sub-optimal) version of this inequality quite quickly from Proposition 2 if one first prepares the sum by inserting a smooth cutoff with well-behaved Fourier transform. The large sieve inequality has many applications (as the name suggests, it has particular utility within sieve theory). For the purposes of this set of notes, though, the main application we will need it for is the Bombieri-Vinogradov theorem, which in a very rough sense gives a prime number theorem in arithmetic progressions, which, “on average”, is of strength comparable to the results provided by the Generalised Riemann Hypothesis (GRH), but has the great advantage of being unconditional (it does not require any unproven hypotheses such as GRH); it can be viewed as a significant extension of the Siegel-Walfisz theorem from Notes 2. As we shall see in later notes, the Bombieri-Vinogradov theorem is a very useful ingredient in sieve-theoretic problems involving the primes.

There is however one additional important trick, beyond the large sieve, which we will need in order to establish the Bombieri-Vinogradov theorem. As it turns out, after some basic manipulations (and the deployment of some multiplicative number theory, and specifically the Siegel-Walfisz theorem), the task of proving the Bombieri-Vinogradov theorem is reduced to that of getting a good estimate on sums that are roughly of the form

$\displaystyle \sum_{j=1}^J |\sum_n \Lambda(n) \overline{\chi_j}(n)| \ \ \ \ \ (7)$

for some primitive Dirichlet characters ${\chi_j}$. This looks like the type of sum that can be controlled by the large sieve (or by Proposition 2), except that this is an ordinary sum rather than a square sum (i.e., an ${\ell^1}$ norm rather than an ${\ell^2}$ norm). One could of course try to control such a sum in terms of the associated square-sum through the Cauchy-Schwarz inequality, but this turns out to be very wasteful (it loses a factor of about ${J^{1/2}}$). Instead, one should try to exploit the special structure of the von Mangoldt function ${\Lambda}$, in particular the fact that it can be expressible as a Dirichlet convolution ${\alpha * \beta}$ of two further arithmetic sequences ${\alpha,\beta}$ (or as a finite linear combination of such Dirichlet convolutions). The reason for introducing this convolution structure is through the basic identity

$\displaystyle (\sum_n \alpha*\beta(n) \overline{\chi_j}(n)) = (\sum_n \alpha(n) \overline{\chi_j}(n)) (\sum_n \beta(n) \overline{\chi_j}(n)) \ \ \ \ \ (8)$

for any finitely supported sequences ${\alpha,\beta: {\bf N} \rightarrow {\bf C}}$, as can be easily seen by multiplying everything out and using the completely multiplicative nature of ${\chi_j}$. (This is the multiplicative analogue of the well-known relationship ${\widehat{f*g}(\xi) = \hat f(\xi) \hat g(\xi)}$ between ordinary convolution and Fourier coefficients.) This factorisation, together with yet another application of the Cauchy-Schwarz inequality, lets one control (7) by square-sums of the sort that can be handled by the large sieve inequality.

As we have seen in Notes 1, the von Mangoldt function ${\Lambda}$ does indeed admit several factorisations into Dirichlet convolution type, such as the factorisation ${\Lambda = \mu * L}$. One can try directly inserting this factorisation into the above strategy; it almost works, however there turns out to be a problem when considering the contribution of the portion of ${\mu}$ or ${L}$ that is supported at very small natural numbers, as the large sieve loses any gain over the trivial bound in such settings. Because of this, there is a need for a more sophisticated decomposition of ${\Lambda}$ into Dirichlet convolutions ${\alpha * \beta}$ which are non-degenerate in the sense that ${\alpha,\beta}$ are supported away from small values. (As a non-example, the trivial factorisation ${\Lambda = \Lambda * \delta}$ would be a totally inappropriate factorisation for this purpose.) Fortunately, it turns out that through some elementary combinatorial manipulations, some satisfactory decompositions of this type are available, such as the Vaughan identity and the Heath-Brown identity. By using one of these identities we will be able to complete the proof of the Bombieri-Vinogradov theorem. (These identities are also useful for other applications in which one wishes to control correlations between the von Mangoldt function ${\Lambda}$ and some other sequence; we will see some examples of this in later notes.)

For further reading on these topics, including a significantly larger number of examples of the large sieve inequality, see Chapters 7 and 17 of Iwaniec and Kowalski.

Remark 5 We caution that the presentation given in this set of notes is highly ahistorical; we are using modern streamlined proofs of results that were first obtained by more complicated arguments.

Analytic number theory is often concerned with the asymptotic behaviour of various arithmetic functions: functions ${f: {\bf N} \rightarrow {\bf R}}$ or ${f: {\bf N} \rightarrow {\bf C}}$ from the natural numbers ${{\bf N} = \{1,2,\dots\}}$ to the real numbers ${{\bf R}}$ or complex numbers ${{\bf C}}$. In this post, we will focus on the purely algebraic properties of these functions, and for reasons that will become clear later, it will be convenient to generalise the notion of an arithmetic function to functions ${f: {\bf N} \rightarrow R}$ taking values in some abstract commutative ring ${R}$. In this setting, we can add or multiply two arithmetic functions ${f,g: {\bf N} \rightarrow R}$ to obtain further arithmetic functions ${f+g, fg: {\bf N} \rightarrow R}$, and we can also form the Dirichlet convolution ${f*g: {\bf N} \rightarrow R}$ by the usual formula

$\displaystyle f*g(n) := \sum_{d|n} f(d) g(\frac{n}{d}).$

Regardless of what commutative ring ${R}$ is in used here, we observe that Dirichlet convolution is commutative, associative, and bilinear over ${R}$.

An important class of arithmetic functions in analytic number theory are the multiplicative functions, that is to say the arithmetic functions ${f: {\bf N} \rightarrow {\bf R}}$ such that ${f(1)=1}$ and

$\displaystyle f(nm) = f(n) f(m)$

for all coprime ${n,m \in {\bf N}}$. A subclass of these functions are the completely multiplicative functions, in which the restriction that ${n,m}$ be coprime is dropped. Basic examples of completely multiplicative functions (in the classical setting ${R={\bf C}}$) include

• the Kronecker delta ${\delta}$, defined by setting ${\delta(n)=1}$ for ${n=1}$ and ${\delta(n)=0}$ otherwise;
• the constant function ${1: n \mapsto 1}$ and the linear function ${n \mapsto n}$ (which by abuse of notation we denote by ${n}$);
• more generally monomials ${n \mapsto n^s}$ for any fixed complex number ${s}$ (in particular, the “Archimedean characters” ${n \mapsto n^{it}}$ for any fixed ${t \in {\bf R}}$), which by abuse of notation we denote by ${n^s}$;
• Dirichlet characters ${\chi}$;
• the Liouville function ${\lambda}$;
• the indicator function of the ${z}$smooth numbers (numbers whose prime factors are all at most ${z}$), for some given ${z}$; and
• the indicator function of the ${z}$rough numbers (numbers whose prime factors are all greater than ${z}$), for some given ${z}$.

Examples of multiplicative functions that are not completely multiplicative include

• the Möbius function ${\mu}$;
• the divisor function ${\tau}$ (also referred to as ${d}$);
• more generally, the higher order divisor functions ${\tau_k(n) = \sum_{d_1,\dots,d_k: d_1 \dots d_k = n} 1}$ for ${k \geq 1}$;
• the Euler totient function ${\phi}$;
• the number of roots ${n \mapsto \# \{ a \in {\bf Z}/n{\bf Z}: P(a) = 0\}}$ of a given polynomial ${P}$ defined over ${{\bf Z}}$;
• more generally, the point counting function ${n \mapsto V[{\bf Z}/n{\bf Z}]}$ of a given algebraic variety ${V}$ defined over ${{\bf Z}}$ (closely tied to the Hasse-Weil zeta function of ${V}$);
• the function ${r: n \mapsto r(n)}$ that counts the number of representations of ${n}$ as the sum of two squares;
• more generally, the function that maps a natural number ${n}$ to the number of ideals in a given number field ${K}$ of absolute norm ${n}$ (closely tied to the Dedekind zeta function of ${K}$).

These multiplicative functions interact well with the multiplication and convolution operations: if ${f,g: {\bf N} \rightarrow R}$ are multiplicative, then so are ${fg}$ and ${f * g}$, and if ${\psi}$ is completely multiplicative, then we also have

$\displaystyle \psi (f*g) = (\psi f) * (\psi g). \ \ \ \ \ (1)$

Finally, the product of completely multiplicative functions is again completely multiplicative. On the other hand, the sum of two multiplicative functions will never be multiplicative (just look at what happens at ${n=1}$), and the convolution of two completely multiplicative functions will usually just be multiplicative rather than completley multiplicative.

The specific multiplicative functions listed above are also related to each other by various important identities, for instance

$\displaystyle \delta * f = f; \quad \mu * 1 = \delta; \quad 1 * 1 = \tau; \quad \phi * 1 = n$

where ${f}$ is an arbitrary arithmetic function.

On the other hand, analytic number theory also is very interested in certain arithmetic functions that are not exactly multiplicative (and certainly not completely multiplicative). One particularly important such function is the von Mangoldt function ${\Lambda}$. This function is certainly not multiplicative, but is clearly closely related to such functions via such identities as ${\Lambda = \mu * L}$ and ${L = \Lambda * 1}$, where ${L: n\mapsto \log n}$ is the natural logarithm function. The purpose of this post is to point out that functions such as the von Mangoldt function lie in a class closely related to multiplicative functions, which I will call the derived multiplicative functions. More precisely:

Definition 1 A derived multiplicative function ${f: {\bf N} \rightarrow R}$ is an arithmetic function that can be expressed as the formal derivative

$\displaystyle f(n) = \frac{d}{d\varepsilon} F_\varepsilon(n) |_{\varepsilon=0}$

at the origin of a family ${f_\varepsilon: {\bf N}\rightarrow R}$ of multiplicative functions ${F_\varepsilon: {\bf N} \rightarrow R}$ parameterised by a formal parameter ${\varepsilon}$. Equivalently, ${f: {\bf N} \rightarrow R}$ is a derived multiplicative function if it is the ${\varepsilon}$ coefficient of a multiplicative function in the extension ${R[\varepsilon]/(\varepsilon^2)}$ of ${R}$ by a nilpotent infinitesimal ${\varepsilon}$; in other words, there exists an arithmetic function ${F: {\bf N} \rightarrow R}$ such that the arithmetic function ${F + \varepsilon f: {\bf N} \rightarrow R[\varepsilon]/(\varepsilon^2)}$ is multiplicative, or equivalently that ${F}$ is multiplicative and one has the Leibniz rule

$\displaystyle f(nm) = f(n) F(m) + F(n) f(m) \ \ \ \ \ (2)$

for all coprime ${n,m \in {\bf N}}$.

More generally, for any ${k\geq 0}$, a ${k}$-derived multiplicative function ${f: {\bf N} \rightarrow R}$ is an arithmetic function that can be expressed as the formal derivative

$\displaystyle f(n) = \frac{d^k}{d\varepsilon_1 \dots d\varepsilon_k} F_{\varepsilon_1,\dots,\varepsilon_k}(n) |_{\varepsilon_1,\dots,\varepsilon_k=0}$

at the origin of a family ${f_{\varepsilon_1,\dots,\varepsilon_k}: {\bf N} \rightarrow R}$ of multiplicative functions ${F_{\varepsilon_1,\dots,\varepsilon_k}: {\bf N} \rightarrow R}$ parameterised by formal parameters ${\varepsilon_1,\dots,\varepsilon_k}$. Equivalently, ${f}$ is the ${\varepsilon_1 \dots \varepsilon_k}$ coefficient of a multiplicative function in the extension ${R[\varepsilon_1,\dots,\varepsilon_k]/(\varepsilon_1^2,\dots,\varepsilon_k^2)}$ of ${R}$ by ${k}$ nilpotent infinitesimals ${\varepsilon_1,\dots,\varepsilon_k}$.

We define the notion of a ${k}$-derived completely multiplicative function similarly by replacing “multiplicative” with “completely multiplicative” in the above discussion.

There are Leibniz rules similar to (2) but they are harder to state; for instance, a doubly derived multiplicative function ${f: {\bf N} \rightarrow R}$ comes with singly derived multiplicative functions ${F_1, F_2: {\bf N} \rightarrow R}$ and a multiplicative function ${G: {\bf N} \rightarrow R}$ such that

$\displaystyle f(nm) = f(n) G(m) + F_1(n) F_2(m) + F_2(n) F_1(m) + G(n) f(m)$

for all coprime ${n,m \in {\bf N}}$.

One can then check that the von Mangoldt function ${\Lambda}$ is a derived multiplicative function, because ${\delta + \varepsilon \Lambda}$ is multiplicative in the ring ${{\bf C}[\varepsilon]/(\varepsilon^2)}$ with one infinitesimal ${\varepsilon}$. Similarly, the logarithm function ${L}$ is derived completely multiplicative because ${\exp( \varepsilon L ) := 1 + \varepsilon L}$ is completely multiplicative in ${{\bf C}[\varepsilon]/(\varepsilon^2)}$. More generally, any additive function ${\omega: {\bf N} \rightarrow R}$ is derived multiplicative because it is the top order coefficient of ${\exp(\varepsilon \omega) := 1 + \varepsilon \omega}$.

Remark 1 One can also phrase these concepts in terms of the formal Dirichlet series ${F(s) = \sum_n \frac{f(n)}{n^s}}$ associated to an arithmetic function ${f}$. A function ${f}$ is multiplicative if ${F}$ admits a (formal) Euler product; ${f}$ is derived multiplicative if ${F}$ is the (formal) first logarithmic derivative of an Euler product with respect to some parameter (not necessarily ${s}$, although this is certainly an option); and so forth.

Using the definition of a ${k}$-derived multiplicative function as the top order coefficient of a multiplicative function of a ring with ${k}$ infinitesimals, it is easy to see that the product or convolution of a ${k}$-derived multiplicative function ${f: {\bf N} \rightarrow R}$ and a ${l}$-derived multiplicative function ${g: {\bf N} \rightarrow R}$ is necessarily a ${k+l}$-derived multiplicative function (again taking values in ${R}$). Thus, for instance, the higher-order von Mangoldt functions ${\Lambda_k := \mu * L^k}$ are ${k}$-derived multiplicative functions, because ${L^k}$ is a ${k}$-derived completely multiplicative function. More explicitly, ${L^k}$ is the top order coeffiicent of the completely multiplicative function ${\prod_{i=1}^k \exp(\varepsilon_i L)}$, and ${\Lambda_k}$ is the top order coefficient of the multiplicative function ${\mu * \prod_{i=1}^k \exp(\varepsilon_i L)}$, with both functions taking values in the ring ${C[\varepsilon_1,\dots,\varepsilon_k]/(\varepsilon_1^2,\dots,\varepsilon_k^2)}$ of complex numbers with ${k}$ infinitesimals ${\varepsilon_1,\dots,\varepsilon_k}$ attached.

It then turns out that most (if not all) of the basic identities used by analytic number theorists concerning derived multiplicative functions, can in fact be viewed as coefficients of identities involving purely multiplicative functions, with the latter identities being provable primarily from multiplicative identities, such as (1). This phenomenon is analogous to the one in linear algebra discussed in this previous blog post, in which many of the trace identities used there are derivatives of determinant identities. For instance, the Leibniz rule

$\displaystyle L (f * g) = (Lf)*g + f*(Lg)$

for any arithmetic functions ${f,g}$ can be viewed as the top order term in

$\displaystyle \exp(\varepsilon L) (f*g) = (\exp(\varepsilon L) f) * (\exp(\varepsilon L) g)$

in the ring with one infinitesimal ${\varepsilon}$, and then we see that the Leibniz rule is a special case (or a derivative) of (1), since ${\exp(\varepsilon L)}$ is completely multiplicative. Similarly, the formulae

$\displaystyle \Lambda = \mu * L; \quad L = \Lambda * 1$

are top order terms of

$\displaystyle (\delta + \varepsilon \Lambda) = \mu * \exp(\varepsilon L); \quad \exp(\varepsilon L) = (\delta + \varepsilon \Lambda) * 1,$

and the variant formula ${\Lambda = - (L\mu) * 1}$ is the top order term of

$\displaystyle (\delta + \varepsilon \Lambda) = (\exp(-\varepsilon L)\mu) * 1,$

which can then be deduced from the previous identities by noting that the completely multiplicative function ${\exp(-\varepsilon L)}$ inverts ${\exp(\varepsilon L)}$ multiplicatively, and also noting that ${L}$ annihilates ${\mu*1=\delta}$. The Selberg symmetry formula

$\displaystyle \Lambda_2 = \Lambda*\Lambda + \Lambda L, \ \ \ \ \ (3)$

which plays a key role in the Erdös-Selberg elementary proof of the prime number theorem (as discussed in this previous blog post), is the top order term of the identity

$\displaystyle \delta + \varepsilon_1 \Lambda + \varepsilon_2 \Lambda + \varepsilon_1\varepsilon_2 \Lambda_2 = (\exp(\varepsilon_2 L) (\delta + \varepsilon_1 \Lambda)) * (\delta + \varepsilon_2 \Lambda)$

involving the multiplicative functions ${\delta + \varepsilon_1 \Lambda + \varepsilon_2 \Lambda + \varepsilon_1\varepsilon_2 \Lambda_2}$, ${\exp(\varepsilon_2 L)}$, ${\delta+\varepsilon_1 \Lambda}$, ${\delta+\varepsilon_2 \Lambda}$ with two infinitesimals ${\varepsilon_1,\varepsilon_2}$, and this identity can be proven while staying purely within the realm of multiplicative functions, by using the identities

$\displaystyle \delta + \varepsilon_1 \Lambda + \varepsilon_2 \Lambda + \varepsilon_1\varepsilon_2 \Lambda_2 = \mu * (\exp(\varepsilon_1 L) \exp(\varepsilon_2 L))$

$\displaystyle \exp(\varepsilon_1 L) = 1 * (\delta + \varepsilon_1 \Lambda)$

$\displaystyle \delta + \varepsilon_2 \Lambda = \mu * \exp(\varepsilon_2 L)$

and (1). Similarly for higher identities such as

$\displaystyle \Lambda_3 = \Lambda L^2 + 3 \Lambda L * \Lambda + \Lambda * \Lambda * \Lambda$

which arise from expanding out ${\mu * (\exp(\varepsilon_1 L) \exp(\varepsilon_2 L) \exp(\varepsilon_3 L))}$ using (1) and the above identities; we leave this as an exercise to the interested reader.

An analogous phenomenon arises for identities that are not purely multiplicative in nature due to the presence of truncations, such as the Vaughan identity

$\displaystyle \Lambda_{> V} = \mu_{\leq U} * L - \mu_{\leq U} * \Lambda_{\leq V} * 1 + \mu_{>U} * \Lambda_{>V} * 1 \ \ \ \ \ (4)$

for any ${U,V \geq 1}$, where ${f_{>V} = f 1_{>V}}$ is the restriction of a multiplicative function ${f}$ to the natural numbers greater than ${V}$, and similarly for ${f_{\leq V}}$, ${f_{>U}}$, ${f_{\leq U}}$. In this particular case, (4) is the top order coefficient of the identity

$\displaystyle (\delta + \varepsilon \Lambda)_{>V} = \mu_{\leq U} * \exp(\varepsilon L) - \mu_{\leq U} * (\delta + \varepsilon \Lambda)_{\leq V} * 1$

$\displaystyle + \mu_{>U} * (\delta+\varepsilon \Lambda)_{>V} * 1$

which can be easily derived from the identities ${\delta = \mu_{\leq U} * 1 + \mu_{>U} * 1}$ and ${\exp(\varepsilon L) = (\delta + \varepsilon \Lambda)_{>V} * 1 + (\delta + \varepsilon \Lambda)_{\leq V} + 1}$. Similarly for the Heath-Brown identity

$\displaystyle \Lambda = \sum_{j=1}^K (-1)^{j-1} \binom{K}{j} \mu_{\leq U}^{*j} * 1^{*j-1} * L \ \ \ \ \ (5)$

valid for natural numbers up to ${U^K}$, where ${U \geq 1}$ and ${K \geq 1}$ are arbitrary parameters and ${f^{*j}}$ denotes the ${j}$-fold convolution of ${f}$, and discussed in this previous blog post; this is the top order coefficient of

$\displaystyle \delta + \varepsilon \Lambda = \sum_{j=1}^K (-1)^{j-1} \binom{K}{j} \mu_{\leq U}^{*j} * 1^{*j-1} * \exp( \varepsilon L )$

and arises by first observing that

$\displaystyle (\mu - \mu_{\leq U})^{*K} * 1^{*K-1} * \exp(\varepsilon L) = \mu_{>U}^{*K} * 1^{*K-1} * \exp( \varepsilon L )$

vanishes up to ${U^K}$, and then expanding the left-hand side using the binomial formula and the identity ${\mu^{*K} * 1^{*K-1} * \exp(\varepsilon L) = \delta + \varepsilon \Lambda}$.

One consequence of this phenomenon is that identities involving derived multiplicative functions tend to have a dimensional consistency property: all terms in the identity have the same order of derivation in them. For instance, all the terms in the Selberg symmetry formula (3) are doubly derived functions, all the terms in the Vaughan identity (4) or the Heath-Brown identity (5) are singly derived functions, and so forth. One can then use dimensional analysis to help ensure that one has written down a key identity involving such functions correctly, much as is done in physics.

In addition to the dimensional analysis arising from the order of derivation, there is another dimensional analysis coming from the value of multiplicative functions at primes ${p}$ (which is more or less equivalent to the order of pole of the Dirichlet series at ${s=1}$). Let us say that a multiplicative function ${f: {\bf N} \rightarrow R}$ has a pole of order ${j}$ if one has ${f(p)=j}$ on the average for primes ${p}$, where we will be a bit vague as to what “on the average” means as it usually does not matter in applications. Thus for instance, ${1}$ or ${\exp(\varepsilon L)}$ has a pole of order ${1}$ (a simple pole), ${\delta}$ or ${\delta + \varepsilon \Lambda}$ has a pole of order ${0}$ (i.e. neither a zero or a pole), Dirichlet characters also have a pole of order ${0}$ (although this is slightly nontrivial, requiring Dirichlet’s theorem), ${\mu}$ has a pole of order ${-1}$ (a simple zero), ${\tau}$ has a pole of order ${2}$, and so forth. Note that the convolution of a multiplicative function with a pole of order ${j}$ with a multiplicative function with a pole of order ${j'}$ will be a multiplicative function with a pole of order ${j+j'}$. If there is no oscillation in the primes ${p}$ (e.g. if ${f(p)=j}$ for all primes ${p}$, rather than on the average), it is also true that the product of a multiplicative function with a pole of order ${j}$ with a multiplicative function with a pole of order ${j'}$ will be a multiplicative function with a pole of order ${jj'}$. The situation is significantly different though in the presence of oscillation; for instance, if ${\chi}$ is a quadratic character then ${\chi^2}$ has a pole of order ${1}$ even though ${\chi}$ has a pole of order ${0}$.

A ${k}$-derived multiplicative function will then be said to have an underived pole of order ${j}$ if it is the top order coefficient of a multiplicative function with a pole of order ${j}$; in terms of Dirichlet series, this roughly means that the Dirichlet series has a pole of order ${j+k}$ at ${s=1}$. For instance, the singly derived multiplicative function ${\Lambda}$ has an underived pole of order ${0}$, because it is the top order coefficient of ${\delta + \varepsilon \Lambda}$, which has a pole of order ${0}$; similarly ${L}$ has an underived pole of order ${1}$, being the top order coefficient of ${\exp(\varepsilon L)}$. More generally, ${\Lambda_k}$ and ${L^k}$ have underived poles of order ${0}$ and ${1}$ respectively for any ${k}$.

By taking top order coefficients, we then see that the convolution of a ${k}$-derived multiplicative function with underived pole of order ${j}$ and a ${k'}$-derived multiplicative function with underived pole of order ${j'}$ is a ${k+k'}$-derived multiplicative function with underived pole of order ${j+j'}$. If there is no oscillation in the primes, the product of these functions will similarly have an underived pole of order ${jj'}$, for instance ${\Lambda L}$ has an underived pole of order ${0}$. We then have the dimensional consistency property that in any of the standard identities involving derived multiplicative functions, all terms not only have the same derived order, but also the same underived pole order. For instance, in (3), (4), (5) all terms have underived pole order ${0}$ (with any Mobius function terms being counterbalanced by a matching term of ${1}$ or ${L}$). This gives a second way to use dimensional analysis as a consistency check. For instance, any identity that involves a linear combination of ${\mu_{\leq U} * L}$ and ${\Lambda_{>V} * 1}$ is suspect because the underived pole orders do not match (being ${0}$ and ${1}$ respectively), even though the derived orders match (both are ${1}$).

One caveat, though: this latter dimensional consistency breaks down for identities that involve infinitely many terms, such as Linnik’s identity

$\displaystyle \Lambda = \sum_{i=0}^\infty (-1)^{i} L * 1_{>1}^{*i}.$

In this case, one can still rewrite things in terms of multiplicative functions as

$\displaystyle \delta + \varepsilon \Lambda = \sum_{i=0}^\infty (-1)^i \exp(\varepsilon L) * 1_{>1}^{*i},$

so the former dimensional consistency is still maintained.

I thank Andrew Granville, Kannan Soundararajan, and Emmanuel Kowalski for helpful conversations on these topics.