You are currently browsing the tag archive for the ‘multiplicative number theory’ tag.

Let us call an arithmetic function *-bounded* if we have for all . In this section we focus on the asymptotic behaviour of -bounded multiplicative functions. Some key examples of such functions include:

- The Möbius function ;
- The Liouville function ;
- “Archimedean” characters (which I call Archimedean because they are pullbacks of a Fourier character on the multiplicative group , which has the Archimedean property);
- Dirichlet characters (or “non-Archimedean” characters) (which are essentially pullbacks of Fourier characters on a multiplicative cyclic group with the discrete (non-Archimedean) metric);
- Hybrid characters .

The space of -bounded multiplicative functions is also closed under multiplication and complex conjugation.

Given a multiplicative function , we are often interested in the asymptotics of long averages such as

for large values of , as well as short sums

where and are both large, but is significantly smaller than . (Throughout these notes we will try to normalise most of the sums and integrals appearing here as averages that are trivially bounded by ; note that other normalisations are preferred in some of the literature cited here.) For instance, as we established in Theorem 58 of Notes 1, the prime number theorem is equivalent to the assertion that

as . The Liouville function behaves almost identically to the Möbius function, in that estimates for one function almost always imply analogous estimates for the other:

Exercise 1Without using the prime number theorem, show that (1) is also equivalent to

Henceforth we shall focus our discussion more on the Liouville function, and turn our attention to averages on shorter intervals. From (2) one has

as if is such that for some fixed . However it is significantly more difficult to understand what happens when grows much slower than this. By using the techniques based on zero density estimates discussed in Notes 6, it was shown by Motohashi and that one can also establish \eqref. On the Riemann Hypothesis Maier and Montgomery lowered the threshold to for an absolute constant (the bound is more classical, following from Exercise 33 of Notes 2). On the other hand, the randomness heuristics from Supplement 4 suggest that should be able to be taken as small as , and perhaps even if one is particularly optimistic about the accuracy of these probabilistic models. On the other hand, the Chowla conjecture (mentioned for instance in Supplement 4) predicts that cannot be taken arbitrarily slowly growing in , due to the conjectured existence of arbitrarily long strings of consecutive numbers where the Liouville function does not change sign (and in fact one can already show from the known partial results towards the Chowla conjecture that (3) fails for some sequence and some sufficiently slowly growing , by modifying the arguments in these papers of mine).

The situation is better when one asks to understand the mean value on *almost all* short intervals, rather than all intervals. There are several equivalent ways to formulate this question:

Exercise 2Let be a function of such that and as . Let be a -bounded function. Show that the following assertions are equivalent:

As it turns out the second moment formulation in (iii) will be the most convenient for us to work with in this set of notes, as it is well suited to Fourier-analytic techniques (and in particular the Plancherel theorem).

Using zero density methods, for instance, it was shown by Ramachandra that

whenever and . With this quality of bound (saving arbitrary powers of over the trivial bound of ), this is still the lowest value of one can reach unconditionally. However, in a striking recent breakthrough, it was shown by Matomaki and Radziwill that as long as one is willing to settle for weaker bounds (saving a small power of or , or just a qualitative decay of ), one can obtain non-trivial estimates on far shorter intervals. For instance, they show

Theorem 3 (Matomaki-Radziwill theorem for Liouville)For any , one hasfor some absolute constant .

In fact they prove a slightly more precise result: see Theorem 1 of that paper. In particular, they obtain the asymptotic (4) for *any* function that goes to infinity as , no matter how slowly! This ability to let grow slowly with is important for several applications; for instance, in order to combine this type of result with the entropy decrement methods from Notes 9, it is essential that be allowed to grow more slowly than . See also this survey of Soundararajan for further discussion.

Exercise 4In this exercise you may use Theorem 3 freely.

- (i) Establish the lower bound
for some absolute constant and all sufficiently large . (

Hint:if this bound failed, then would hold for almost all ; use this to create many intervals for which is extremely large.)- (ii) Show that Theorem 3 also holds with replaced by , where is the principal character of period . (Use the fact that for all .) Use this to establish the corresponding upper bound
to (i).

(There is a curious asymmetry to the difficulty level of these bounds; the upper bound in (ii) was established much earlier by Harman, Pintz, and Wolke, but the lower bound in (i) was only established in the Matomaki-Radziwill paper.)

The techniques discussed previously were highly complex-analytic in nature, relying in particular on the fact that functions such as or have Dirichlet series , that extend meromorphically into the critical strip. In contrast, the Matomaki-Radziwill theorem does *not* rely on such meromorphic continuations, and in fact holds for more general classes of -bounded multiplicative functions , for which one typically does not expect any meromorphic continuation into the strip. Instead, one can view the Matomaki-Radziwill theory as following the philosophy of a slightly different approach to multiplicative number theory, namely the *pretentious multiplicative number theory* of Granville and Soundarajan (as presented for instance in their draft monograph). A basic notion here is the *pretentious distance* between two -bounded multiplicative functions (at a given scale ), which informally measures the extent to which “pretends” to be like (or vice versa). The precise definition is

Definition 5 (Pretentious distance)Given two -bounded multiplicative functions , and a threshold , thepretentious distancebetween and up to scale is given by the formula

Note that one can also define an infinite version of this distance by removing the constraint , though in such cases the pretentious distance may then be infinite. The pretentious distance is not quite a metric (because can be non-zero, and furthermore can vanish without being equal), but it is still quite close to behaving like a metric, in particular it obeys the triangle inequality; see Exercise 16 below. The philosophy of pretentious multiplicative number theory is that two -bounded multiplicative functions will exhibit similar behaviour at scale if their pretentious distance is bounded, but will become uncorrelated from each other if this distance becomes large. A simple example of this philosophy is given by the following “weak Halasz theorem”, proven in Section 2:

Proposition 6 (Logarithmically averaged version of Halasz)Let be sufficiently large. Then for any -bounded multiplicative functions , one hasfor an absolute constant .

In particular, if does not pretend to be , then the logarithmic average will be small. This condition is basically necessary, since of course .

If one works with non-logarithmic averages , then not pretending to be is insufficient to establish decay, as was already observed in Exercise 11 of Notes 1: if is an Archimedean character for some non-zero real , then goes to zero as (which is consistent with Proposition 6), but does not go to zero. However, this is in some sense the “only” obstruction to these averages decaying to zero, as quantified by the following basic result:

Theorem 7 (Halasz’s theorem)Let be sufficiently large. Then for any -bounded multiplicative function , one hasfor an absolute constant and any .

Informally, we refer to a -bounded multiplicative function as “pretentious’; if it pretends to be a character such as , and “non-pretentious” otherwise. The precise distinction is rather malleable, as the precise class of characters that one views as “obstructions” varies from situation to situation. For instance, in Proposition 6 it is just the trivial character which needs to be considered, but in Theorem 7 it is the characters with . In other contexts one may also need to add Dirichlet characters or hybrid characters such as to the list of characters that one might pretend to be. The division into pretentious and non-pretentious functions in multiplicative number theory is faintly analogous to the division into major and minor arcs in the circle method applied to additive number theory problems; see Notes 8. The Möbius and Liouville functions are model examples of non-pretentious functions; see Exercise 24.

In the contrapositive, Halasz’ theorem can be formulated as the assertion that if one has a large mean

for some , then one has the pretentious property

for some . This has the flavour of an “inverse theorem”, of the type often found in arithmetic combinatorics.

Among other things, Halasz’s theorem gives yet another proof of the prime number theorem (1); see Section 2.

We now give a version of the Matomaki-Radziwill theorem for general (non-pretentious) multiplicative functions that is formulated in a similar contrapositive (or “inverse theorem”) fashion, though to simplify the presentation we only state a qualitative version that does not give explicit bounds.

Theorem 8 ((Qualitative) Matomaki-Radziwill theorem)Let , and let , with sufficiently large depending on . Suppose that is a -bounded multiplicative function such thatThen one has

for some .

The condition is basically optimal, as the following example shows:

Exercise 9Let be a sufficiently small constant, and let be such that . Let be the Archimedean character for some . Show that

Combining Theorem 8 with standard non-pretentiousness facts about the Liouville function (see Exercise 24), we recover Theorem 3 (but with a decay rate of only rather than ). We refer the reader to the original paper of Matomaki-Radziwill (as well as this followup paper with myself) for the quantitative version of Theorem 8 that is strong enough to recover the full version of Theorem 3, and which can also handle real-valued pretentious functions.

With our current state of knowledge, the only arguments that can establish the full strength of Halasz and Matomaki-Radziwill theorems are Fourier analytic in nature, relating sums involving an arithmetic function with its Dirichlet series

which one can view as a discrete Fourier transform of (or more precisely of the measure , if one evaluates the Dirichlet series on the right edge of the critical strip). In this aspect, the techniques resemble the complex-analytic methods from Notes 2, but with the key difference that no analytic or meromorphic continuation into the strip is assumed. The key identity that allows us to pass to Dirichlet series is the following variant of Proposition 7 of Notes 2:

Proposition 10 (Parseval type identity)Let be finitely supported arithmetic functions, and let be a Schwartz function. Thenwhere is the Fourier transform of . (Note that the finite support of and the Schwartz nature of ensure that both sides of the identity are absolutely convergent.)

The restriction that be finitely supported will be slightly annoying in places, since most multiplicative functions will fail to be finitely supported, but this technicality can usually be overcome by suitably truncating the multiplicative function, and taking limits if necessary.

*Proof:* By expanding out the Dirichlet series, it suffices to show that

for any natural numbers . But this follows from the Fourier inversion formula applied at .

For applications to Halasz type theorems, one sets equal to the Kronecker delta , producing weighted integrals of of “” type. For applications to Matomaki-Radziwill theorems, one instead sets , and more precisely uses the following corollary of the above proposition, to obtain weighted integrals of of “” type:

Exercise 11 (Plancherel type identity)If is finitely supported, and is a Schwartz function, establish the identity

In contrast, information about the non-pretentious nature of a multiplicative function will give “pointwise” or “” type control on the Dirichlet series , as is suggested from the Euler product factorisation of .

It will be convenient to formalise the notion of , , and control of the Dirichlet series , which as previously mentioned can be viewed as a sort of “Fourier transform” of :

Definition 12 (Fourier norms)Let be finitely supported, and let be a bounded measurable set. We define theFourier normthe

Fourier normand the

Fourier norm

One could more generally define norms for other exponents , but we will only need the exponents in this current set of notes. It is clear that all the above norms are in fact (semi-)norms on the space of finitely supported arithmetic functions.

As mentioned above, Halasz’s theorem gives good control on the Fourier norm for restrictions of non-pretentious functions to intervals:

Exercise 13 (Fourier control via Halasz)Let be a -bounded multiplicative function, let be an interval in for some , let , and let be a bounded measurable set. Show that(Hint: you will need to use summation by parts (or an equivalent device) to deal with a weight.)

Meanwhile, the Plancherel identity in Exercise 11 gives good control on the Fourier norm for functions on long intervals (compare with Exercise 2 from Notes 6):

Exercise 14 ( mean value theorem)Let , and let be finitely supported. Show thatConclude in particular that if is supported in for some and , then

In the simplest case of the logarithmically averaged Halasz theorem (Proposition 6), Fourier estimates are already sufficient to obtain decent control on the (weighted) Fourier type expressions that show up. However, these estimates are not enough by themselves to establish the full Halasz theorem or the Matomaki-Radziwill theorem. To get from Fourier control to Fourier or control more efficiently, the key trick is use Hölder’s inequality, which when combined with the basic Dirichlet series identity

The strategy is then to factor (or approximately factor) the original function as a Dirichlet convolution (or average of convolutions) of various components, each of which enjoys reasonably good Fourier or estimates on various regions , and then combine them using the Hölder inequalities (5), (6) and the triangle inequality. For instance, to prove Halasz’s theorem, we will split into the Dirichlet convolution of three factors, one of which will be estimated in using the non-pretentiousness hypothesis, and the other two being estimated in using Exercise 14. For the Matomaki-Radziwill theorem, one uses a significantly more complicated decomposition of into a variety of Dirichlet convolutions of factors, and also splits up the Fourier domain into several subregions depending on whether the Dirichlet series associated to some of these components are large or small. In each region and for each component of these decompositions, all but one of the factors will be estimated in , and the other in ; but the precise way in which this is done will vary from component to component. For instance, in some regions a key factor will be small in by construction of the region; in other places, the control will come from Exercise 13. Similarly, in some regions, satisfactory control is provided by Exercise 14, but in other regions one must instead use “large value” theorems (in the spirit of Proposition 9 from Notes 6), or amplify the power of the standard mean value theorems by combining the Dirichlet series with other Dirichlet series that are known to be large in this region.

There are several ways to achieve the desired factorisation. In the case of Halasz’s theorem, we can simply work with a crude version of the Euler product factorisation, dividing the primes into three categories (“small”, “medium”, and “large” primes) and expressing as a triple Dirichlet convolution accordingly. For the Matomaki-Radziwill theorem, one instead exploits the Turan-Kubilius phenomenon (Section 5 of Notes 1, or Lemma 2 of Notes 9)) that for various moderately wide ranges of primes, the number of prime divisors of a large number in the range is almost always close to . Thus, if we introduce the arithmetic functions

and more generally we have a twisted approximation

for multiplicative functions . (Actually, for technical reasons it will be convenient to work with a smoothed out version of these functions; see Section 3.) Informally, these formulas suggest that the “ energy” of a multiplicative function is concentrated in those regions where is extremely large in a sense. Iterations of this formula (or variants of this formula, such as an identity due to Ramaré) will then give the desired (approximate) factorisation of .

In analytic number theory, it is a well-known phenomenon that for many arithmetic functions of interest in number theory, it is significantly easier to estimate logarithmic sums such as

than it is to estimate summatory functions such as

(Here we are normalising to be roughly constant in size, e.g. as .) For instance, when is the von Mangoldt function , the logarithmic sums can be adequately estimated by Mertens’ theorem, which can be easily proven by elementary means (see Notes 1); but a satisfactory estimate on the summatory function requires the prime number theorem, which is substantially harder to prove (see Notes 2). (From a complex-analytic or Fourier-analytic viewpoint, the problem is that the logarithmic sums can usually be controlled just from knowledge of the Dirichlet series for near ; but the summatory functions require control of the Dirichlet series for on or near a large portion of the line . See Notes 2 for further discussion.)

Viewed conversely, whenever one has a difficult estimate on a summatory function such as , one can look to see if there is a “cheaper” version of that estimate that only controls the logarithmic sums , which is easier to prove than the original, more “expensive” estimate. In this post, we shall do this for two theorems, a classical theorem of Halasz on mean values of multiplicative functions on long intervals, and a much more recent result of Matomaki and Radziwiłł on mean values of multiplicative functions in short intervals. The two are related; the former theorem is an ingredient in the latter (though in the special case of the Matomaki-Radziwiłł theorem considered here, we will not need Halasz’s theorem directly, instead using a key tool in the *proof* of that theorem).

We begin with Halasz’s theorem. Here is a version of this theorem, due to Montgomery and to Tenenbaum:

Theorem 1 (Halasz-Montgomery-Tenenbaum)Let be a multiplicative function with for all . Let and , and setThen one has

Informally, this theorem asserts that is small compared with , unless “pretends” to be like the character on primes for some small . (This is the starting point of the “pretentious” approach of Granville and Soundararajan to analytic number theory, as developed for instance here.) We now give a “cheap” version of this theorem which is significantly weaker (both because it settles for controlling logarithmic sums rather than summatory functions, it requires to be completely multiplicative instead of multiplicative, it requires a strong bound on the analogue of the quantity , and because it only gives qualitative decay rather than quantitative estimates), but easier to prove:

Theorem 2 (Cheap Halasz)Let be an asymptotic parameter goingto infinity. Let be a completely multiplicative function (possibly depending on ) such that for all , such that

Note that now that we are content with estimating exponential sums, we no longer need to preclude the possibility that pretends to be like ; see Exercise 11 of Notes 1 for a related observation.

To prove this theorem, we first need a special case of the Turan-Kubilius inequality.

Lemma 3 (Turan-Kubilius)Let be a parameter going to infinity, and let be a quantity depending on such that and as . Then

Informally, this lemma is asserting that

for most large numbers . Another way of writing this heuristically is in terms of Dirichlet convolutions:

This type of estimate was previously discussed as a tool to establish a criterion of Katai and Bourgain-Sarnak-Ziegler for Möbius orthogonality estimates in this previous blog post. See also Section 5 of Notes 1 for some similar computations.

*Proof:* By Cauchy-Schwarz it suffices to show that

Expanding out the square, it suffices to show that

for .

We just show the case, as the cases are similar (and easier). We rearrange the left-hand side as

We can estimate the inner sum as . But a routine application of Mertens’ theorem (handling the diagonal case when separately) shows that

and the claim follows.

Remark 4As an alternative to the Turan-Kubilius inequality, one can use the Ramaré identity(see e.g. Section 17.3 of Friedlander-Iwaniec). This identity turns out to give superior quantitative results than the Turan-Kubilius inequality in applications; see the paper of Matomaki and Radziwiłł for an instance of this.

We now prove Theorem 2. Let denote the left-hand side of (2); by the triangle inequality we have . By Lemma 3 (for some to be chosen later) and the triangle inequality we have

We rearrange the left-hand side as

We now replace the constraint by . The error incurred in doing so is

which by Mertens’ theorem is . Thus we have

But by definition of , we have , thus

From Mertens’ theorem, the expression in brackets can be rewritten as

and so the real part of this expression is

By (1), Mertens’ theorem and the hypothesis on we have

for any . This implies that we can find going to infinity such that

and thus the expression in brackets has real part . The claim follows.

The Turan-Kubilius argument is certainly not the most efficient way to estimate sums such as . In the exercise below we give a significantly more accurate estimate that works when is non-negative.

Exercise 5(Granville-Koukoulopoulos-Matomaki)

- (i) If is a completely multiplicative function with for all primes , show that
as . (

Hint:for the upper bound, expand out the Euler product. For the lower bound, show that , where is the completely multiplicative function with for all primes .)- (ii) If is multiplicative and takes values in , show that
for all .

Now we turn to a very recent result of Matomaki and Radziwiłł on mean values of multiplicative functions in short intervals. For sake of illustration we specialise their results to the simpler case of the Liouville function , although their arguments actually work (with some additional effort) for arbitrary multiplicative functions of magnitude at most that are real-valued (or more generally, stay far from complex characters ). Furthermore, we give a qualitative form of their estimates rather than a quantitative one:

Theorem 6 (Matomaki-Radziwiłł, special case)Let be a parameter going to infinity, and let be a quantity going to infinity as . Then for all but of the integers , one has

A simple sieving argument (see Exercise 18 of Supplement 4) shows that one can replace by the Möbius function and obtain the same conclusion. See this recent note of Matomaki and Radziwiłł for a simple proof of their (quantitative) main theorem in this special case.

Of course, (4) improves upon the trivial bound of . Prior to this paper, such estimates were only known (using arguments similar to those in Section 3 of Notes 6) for unconditionally, or for for some sufficiently large if one assumed the Riemann hypothesis. This theorem also represents some progress towards Chowla’s conjecture (discussed in Supplement 4) that

as for any fixed distinct ; indeed, it implies that this conjecture holds if one performs a small amount of averaging in the .

Below the fold, we give a “cheap” version of the Matomaki-Radziwiłł argument. More precisely, we establish

Theorem 7 (Cheap Matomaki-Radziwiłł)Let be a parameter going to infinity, and let . Then

Note that (5) improves upon the trivial bound of . Again, one can replace with if desired. Due to the cheapness of Theorem 7, the proof will require few ingredients; the deepest input is the improved zero-free region for the Riemann zeta function due to Vinogradov and Korobov. Other than that, the main tools are the Turan-Kubilius result established above, and some Fourier (or complex) analysis.

Analytic number theory is often concerned with the asymptotic behaviour of various arithmetic functions: functions or from the natural numbers to the real numbers or complex numbers . In this post, we will focus on the purely algebraic properties of these functions, and for reasons that will become clear later, it will be convenient to generalise the notion of an arithmetic function to functions taking values in some abstract commutative ring . In this setting, we can add or multiply two arithmetic functions to obtain further arithmetic functions , and we can also form the Dirichlet convolution by the usual formula

Regardless of what commutative ring is in used here, we observe that Dirichlet convolution is commutative, associative, and bilinear over .

An important class of arithmetic functions in analytic number theory are the multiplicative functions, that is to say the arithmetic functions such that and

for all coprime . A subclass of these functions are the completely multiplicative functions, in which the restriction that be coprime is dropped. Basic examples of completely multiplicative functions (in the classical setting ) include

- the Kronecker delta , defined by setting for and otherwise;
- the constant function and the linear function (which by abuse of notation we denote by );
- more generally monomials for any fixed complex number (in particular, the “Archimedean characters” for any fixed ), which by abuse of notation we denote by ;
- Dirichlet characters ;
- the Liouville function ;
- the indicator function of the –smooth numbers (numbers whose prime factors are all at most ), for some given ; and
- the indicator function of the –rough numbers (numbers whose prime factors are all greater than ), for some given .

Examples of multiplicative functions that are not completely multiplicative include

- the Möbius function ;
- the divisor function (also referred to as );
- more generally, the higher order divisor functions for ;
- the Euler totient function ;
- the number of roots of a given polynomial defined over ;
- more generally, the point counting function of a given algebraic variety defined over (closely tied to the Hasse-Weil zeta function of );
- the function that counts the number of representations of as the sum of two squares;
- more generally, the function that maps a natural number to the number of ideals in a given number field of absolute norm (closely tied to the Dedekind zeta function of ).

These multiplicative functions interact well with the multiplication and convolution operations: if are multiplicative, then so are and , and if is completely multiplicative, then we also have

Finally, the product of completely multiplicative functions is again completely multiplicative. On the other hand, the sum of two multiplicative functions will never be multiplicative (just look at what happens at ), and the convolution of two completely multiplicative functions will usually just be multiplicative rather than completley multiplicative.

The specific multiplicative functions listed above are also related to each other by various important identities, for instance

where is an arbitrary arithmetic function.

On the other hand, analytic number theory also is very interested in certain arithmetic functions that are *not* exactly multiplicative (and certainly not completely multiplicative). One particularly important such function is the von Mangoldt function . This function is certainly not multiplicative, but is clearly closely related to such functions via such identities as and , where is the natural logarithm function. The purpose of this post is to point out that functions such as the von Mangoldt function lie in a class closely related to multiplicative functions, which I will call the *derived multiplicative functions*. More precisely:

Definition 1Aderived multiplicative functionis an arithmetic function that can be expressed as the formal derivativeat the origin of a family of multiplicative functions parameterised by a formal parameter . Equivalently, is a derived multiplicative function if it is the coefficient of a multiplicative function in the extension of by a nilpotent infinitesimal ; in other words, there exists an arithmetic function such that the arithmetic function is multiplicative, or equivalently that is multiplicative and one has the Leibniz rule

More generally, for any , a

-derived multiplicative functionis an arithmetic function that can be expressed as the formal derivativeat the origin of a family of multiplicative functions parameterised by formal parameters . Equivalently, is the coefficient of a multiplicative function in the extension of by nilpotent infinitesimals .

We define the notion of a -derived completely multiplicative function similarly by replacing “multiplicative” with “completely multiplicative” in the above discussion.

There are Leibniz rules similar to (2) but they are harder to state; for instance, a doubly derived multiplicative function comes with singly derived multiplicative functions and a multiplicative function such that

for all coprime .

One can then check that the von Mangoldt function is a derived multiplicative function, because is multiplicative in the ring with one infinitesimal . Similarly, the logarithm function is derived completely multiplicative because is completely multiplicative in . More generally, any additive function is derived multiplicative because it is the top order coefficient of .

Remark 1One can also phrase these concepts in terms of the formal Dirichlet series associated to an arithmetic function . A function is multiplicative if admits a (formal) Euler product; is derived multiplicative if is the (formal) first logarithmic derivative of an Euler product with respect to some parameter (not necessarily , although this is certainly an option); and so forth.

Using the definition of a -derived multiplicative function as the top order coefficient of a multiplicative function of a ring with infinitesimals, it is easy to see that the product or convolution of a -derived multiplicative function and a -derived multiplicative function is necessarily a -derived multiplicative function (again taking values in ). Thus, for instance, the higher-order von Mangoldt functions are -derived multiplicative functions, because is a -derived completely multiplicative function. More explicitly, is the top order coeffiicent of the completely multiplicative function , and is the top order coefficient of the multiplicative function , with both functions taking values in the ring of complex numbers with infinitesimals attached.

It then turns out that most (if not all) of the basic identities used by analytic number theorists concerning derived multiplicative functions, can in fact be viewed as coefficients of identities involving purely multiplicative functions, with the latter identities being provable primarily from multiplicative identities, such as (1). This phenomenon is analogous to the one in linear algebra discussed in this previous blog post, in which many of the trace identities used there are derivatives of determinant identities. For instance, the Leibniz rule

for any arithmetic functions can be viewed as the top order term in

in the ring with one infinitesimal , and then we see that the Leibniz rule is a special case (or a derivative) of (1), since is completely multiplicative. Similarly, the formulae

are top order terms of

and the variant formula is the top order term of

which can then be deduced from the previous identities by noting that the completely multiplicative function inverts multiplicatively, and also noting that annihilates . The Selberg symmetry formula

which plays a key role in the Erdös-Selberg elementary proof of the prime number theorem (as discussed in this previous blog post), is the top order term of the identity

involving the multiplicative functions , , , with two infinitesimals , and this identity can be proven while staying purely within the realm of multiplicative functions, by using the identities

and (1). Similarly for higher identities such as

which arise from expanding out using (1) and the above identities; we leave this as an exercise to the interested reader.

An analogous phenomenon arises for identities that are not purely multiplicative in nature due to the presence of truncations, such as the Vaughan identity

for any , where is the restriction of a multiplicative function to the natural numbers greater than , and similarly for , , . In this particular case, (4) is the top order coefficient of the identity

which can be easily derived from the identities and . Similarly for the Heath-Brown identity

valid for natural numbers up to , where and are arbitrary parameters and denotes the -fold convolution of , and discussed in this previous blog post; this is the top order coefficient of

and arises by first observing that

vanishes up to , and then expanding the left-hand side using the binomial formula and the identity .

One consequence of this phenomenon is that identities involving derived multiplicative functions tend to have a dimensional consistency property: all terms in the identity have the same order of derivation in them. For instance, all the terms in the Selberg symmetry formula (3) are doubly derived functions, all the terms in the Vaughan identity (4) or the Heath-Brown identity (5) are singly derived functions, and so forth. One can then use dimensional analysis to help ensure that one has written down a key identity involving such functions correctly, much as is done in physics.

In addition to the dimensional analysis arising from the order of derivation, there is another dimensional analysis coming from the value of multiplicative functions at primes (which is more or less equivalent to the order of pole of the Dirichlet series at ). Let us say that a multiplicative function has a *pole of order * if one has on the average for primes , where we will be a bit vague as to what “on the average” means as it usually does not matter in applications. Thus for instance, or has a pole of order (a simple pole), or has a pole of order (i.e. neither a zero or a pole), Dirichlet characters also have a pole of order (although this is slightly nontrivial, requiring Dirichlet’s theorem), has a pole of order (a simple zero), has a pole of order , and so forth. Note that the convolution of a multiplicative function with a pole of order with a multiplicative function with a pole of order will be a multiplicative function with a pole of order . If there is no oscillation in the primes (e.g. if for *all* primes , rather than on the average), it is also true that the product of a multiplicative function with a pole of order with a multiplicative function with a pole of order will be a multiplicative function with a pole of order . The situation is significantly different though in the presence of oscillation; for instance, if is a quadratic character then has a pole of order even though has a pole of order .

A -derived multiplicative function will then be said to have an *underived pole of order * if it is the top order coefficient of a multiplicative function with a pole of order ; in terms of Dirichlet series, this roughly means that the Dirichlet series has a pole of order at . For instance, the singly derived multiplicative function has an underived pole of order , because it is the top order coefficient of , which has a pole of order ; similarly has an underived pole of order , being the top order coefficient of . More generally, and have underived poles of order and respectively for any .

By taking top order coefficients, we then see that the convolution of a -derived multiplicative function with underived pole of order and a -derived multiplicative function with underived pole of order is a -derived multiplicative function with underived pole of order . If there is no oscillation in the primes, the product of these functions will similarly have an underived pole of order , for instance has an underived pole of order . We then have the dimensional consistency property that in any of the standard identities involving derived multiplicative functions, all terms not only have the same derived order, but also the same underived pole order. For instance, in (3), (4), (5) all terms have underived pole order (with any Mobius function terms being counterbalanced by a matching term of or ). This gives a second way to use dimensional analysis as a consistency check. For instance, any identity that involves a linear combination of and is suspect because the underived pole orders do not match (being and respectively), even though the derived orders match (both are ).

One caveat, though: this latter dimensional consistency breaks down for identities that involve infinitely many terms, such as Linnik’s identity

In this case, one can still rewrite things in terms of multiplicative functions as

so the former dimensional consistency is still maintained.

I thank Andrew Granville, Kannan Soundararajan, and Emmanuel Kowalski for helpful conversations on these topics.

One of the basic problems in analytic number theory is to obtain bounds and asymptotics for sums of the form

in the limit , where ranges over natural numbers less than , and is some arithmetic function of number-theoretic interest. (It is also often convenient to replace this sharply truncated sum with a smoother sum such as , but we will not discuss this technicality here.) For instance, the prime number theorem is equivalent to the assertion

where is the von Mangoldt function, while the Riemann hypothesis is equivalent to the stronger assertion

It is thus of interest to develop techniques to estimate such sums . Of course, the difficulty of this task depends on how “nice” the function is. The functions that come up in number theory lie on a broad spectrum of “niceness”, with some particularly nice functions being quite easy to sum, and some being insanely difficult.

At the easiest end of the spectrum are those functions that exhibit some sort of regularity or “smoothness”. Examples of smoothness include “Archimedean” smoothness, in which is the restriction of some smooth function from the reals to the natural numbers, and the derivatives of are well controlled. A typical example is

One can already get quite good bounds on this quantity by comparison with the integral , namely

with sharper bounds available by using tools such as the Euler-Maclaurin formula (see this blog post). Exponentiating such asymptotics, incidentally, leads to one of the standard proofs of Stirling’s formula (as discussed in this blog post).

One can also consider “non-Archimedean” notions of smoothness, such as periodicity relative to a small period . Indeed, if is periodic with period (and is thus essentially a function on the cyclic group ), then one has the easy bound

In particular, we have the fundamental estimate

This is a good estimate when is much smaller than , but as approaches in magnitude, the error term begins to overwhelm the main term , and one needs much more delicate information on the fractional part of in order to obtain good estimates at this point.

One can also consider functions which combine “Archimedean” and “non-Archimedean” smoothness into an “adelic” smoothness. We will not define this term precisely here (though the concept of a Schwartz-Bruhat function is one way to capture this sort of concept), but a typical example might be

where is periodic with some small period . By using techniques such as summation by parts, one can estimate such sums using the techniques used to estimate sums of periodic functions or functions with (Archimedean) smoothness.

Another class of functions that is reasonably well controlled are the multiplicative functions, in which whenever are coprime. Here, one can use the powerful techniques of multiplicative number theory, for instance by working with the Dirichlet series

which are clearly related to the partial sums (essentially via the Mellin transform, a cousin of the Fourier and Laplace transforms); for this post we ignore the (important) issue of how to make sense of this series when it is not absolutely convergent (but see this previous blog post for more discussion). A primary reason that this technique is effective is that the Dirichlet series of a multiplicative function factorises as an Euler product

One also obtains similar types of representations for functions that are not quite multiplicative, but are closely related to multiplicative functions, such as the von Mangoldt function (whose Dirichlet series is not given by an Euler product, but instead by the logarithmic derivative of an Euler product).

Moving another notch along the spectrum between well-controlled and ill-controlled functions, one can consider functions that are *divisor sums* such as

for some other arithmetic function , and some *level* . This is a linear combination of periodic functions and is thus *technically* periodic in (with period equal to the least common multiple of all the numbers from to ), but in practice this periodic is far too large to be useful (except for extremely small levels , e.g. ). Nevertheless, we can still control the sum simply by rearranging the summation:

and thus by (1) one can bound this by the sum of a main term and an error term . As long as the level is significantly less than , one may expect the main term to dominate, and one can often estimate this term by a variety of techniques (for instance, if is multiplicative, then multiplicative number theory techniques are quite effective, as mentioned previously). Similarly for other slight variants of divisor sums, such as expressions of the form

or expressions of the form

where each is periodic with period .

One of the simplest examples of this comes when estimating the divisor function

which counts the number of divisors up to . This is a multiplicative function, and is therefore most efficiently estimated using the techniques of multiplicative number theory; but for reasons that will become clearer later, let us “forget” the multiplicative structure and estimate the above sum by more elementary methods. By applying the preceding method, we see that

Here, we are (barely) able to keep the error term smaller than the main term; this is right at the edge of the divisor sum method, because the level in this case is equal to . Unfortunately, at this high choice of level, it is not always possible to always keep the error term under control like this. For instance, if one wishes to use the standard divisor sum representation

where is the Möbius function, to compute , then one ends up looking at

From Dirichlet series methods, it is not difficult to establish the identities

and

This suggests (but does not quite prove) that one has

in the sense of conditionally convergent series. Assuming one can justify this (which, ultimately, requires one to exclude zeroes of the Riemann zeta function on the line , as discussed in this previous post), one is eventually left with the estimate , which is useless as a lower bound (and recovers only the classical Chebyshev estimate as the upper bound). The inefficiency here when compared to the situation with the divisor function can be attributed to the signed nature of the Möbius function , which causes some cancellation in the divisor sum expansion that needs to be compensated for with improved estimates.

However, there are a number of tricks available to reduce the level of divisor sums. The simplest comes from exploiting the change of variables , which can in principle reduce the level by a square root. For instance, when computing the divisor function , one can observe using this change of variables that every divisor of above is paired with one below , and so we have

except when is a perfect square, in which case one must subtract one from the right-hand side. Using this reduced-level divisor sum representation, one can obtain an improvement to (2), namely

This type of argument is also known as the Dirichlet hyperbola method. A variant of this argument can also deduce the prime number theorem from (3), (4) (and with some additional effort, one can even drop the use of (4)); this is discussed at this previous blog post.

Using this square root trick, one can now also control divisor sums such as

(Note that has no multiplicativity properties in , and so multiplicative number theory techniques cannot be directly applied here.) The level of the divisor sum here is initially of order , which is too large to be useful; but using the square root trick, we can expand this expression as

which one can rewrite as

The constraint is periodic in with period , so we can write this as

where is the number of solutions in to the equation , and so

The function is multiplicative, and can be easily computed at primes and prime powers using tools such as quadratic reciprocity and Hensel’s lemma. For instance, by Fermat’s two-square theorem, is equal to for and for . From this and standard multiplicative number theory methods (e.g. by obtaining asymptotics on the Dirichlet series ), one eventually obtains the asymptotic

and also

and thus

Similar arguments give asymptotics for on other quadratic polynomials; see for instance this paper of Hooley and these papers by McKee. Note that the irreducibility of the polynomial will be important. If one considers instead a sum involving a reducible polynomial, such as , then the analogous quantity becomes significantly larger, leading to a larger growth rate (of order rather than ) for the sum.

However, the square root trick is insufficient by itself to deal with higher order sums involving the divisor function, such as

the level here is initially of order , and the square root trick only lowers this to about , creating an error term that overwhelms the main term. And indeed, the asymptotic for such this sum has not yet been rigorously established (although if one heuristically drops error terms, one can arrive at a reasonable conjecture for this asymptotic), although some results are known if one averages over additional parameters (see e.g. this paper of Greaves, or this paper of Matthiesen).

Nevertheless, there is an ingenious argument of Erdös that allows one to obtain good *upper* and *lower* bounds for these sorts of sums, in particular establishing the asymptotic

for any *fixed* irreducible non-constant polynomial that maps to (with the implied constants depending of course on the choice of ). There is also the related moment bound

for any fixed (not necessarily irreducible) and any fixed , due to van der Corput; this bound is in fact used to dispose of some error terms in the proof of (6). These should be compared with what one can obtain from the divisor bound and the trivial bound , giving the bounds

for any fixed .

The lower bound in (6) is easy, since one can simply lower the level in (5) to obtain the lower bound

for any , and the preceding methods then easily allow one to obtain the lower bound by taking small enough (more precisely, if has degree , one should take equal to or less). The upper bounds in (6) and (7) are more difficult. Ideally, if we could obtain upper bounds of the form

for any fixed , then the preceding methods would easily establish both results. Unfortunately, this bound can fail, as illustrated by the following example. Suppose that is the product of distinct primes , each of which is close to . Then has divisors, with of them close to for each . One can think of (the logarithms of) these divisors as being distributed according to what is essentially a Bernoulli distribution, thus a randomly selected divisor of has magnitude about , where is a random variable which has the same distribution as the number of heads in independently tossed fair coins. By the law of large numbers, should concentrate near when is large, which implies that the majority of the divisors of will be close to . Sending , one can show that the bound (8) fails whenever .

This however can be fixed in a number of ways. First of all, even when , one can show weaker substitutes for (8). For instance, for any fixed and one can show a bound of the form

for some depending only on . This nice elementary inequality (first observed by Landreau) already gives a quite short proof of van der Corput’s bound (7).

For Erdös’s upper bound (6), though, one cannot afford to lose these additional factors of , and one must argue more carefully. Here, the key observation is that the counterexample discussed earlier – when the natural number is the product of a large number of fairly small primes – is quite atypical; most numbers have at least one large prime factor. For instance, the number of natural numbers less than that contain a prime factor between and is equal to

which, thanks to Mertens’ theorem

for some absolute constant , is comparable to . In a similar spirit, one can show by similarly elementary means that the number of natural numbers less than that are *-smooth*, in the sense that all prime factors are at most , is only about or so. Because of this, one can hope that the bound (8), while not true in full generality, will still be true for *most* natural numbers , with some slightly weaker substitute available (such as (7)) for the exceptional numbers . This turns out to be the case by an elementary but careful argument.

The Erdös argument is quite robust; for instance, the more general inequality

for fixed irreducible and , which improves van der Corput’s inequality (8) was shown by Delmer using the same methods. (A slight error in the original paper of Erdös was also corrected in this latter paper.) In a forthcoming revision to my paper on the Erdös-Straus conjecture, Christian Elsholtz and I have also applied this method to obtain bounds such as

which turn out to be enough to obtain the right asymptotics for the number of solutions to the equation .

Below the fold I will provide some more details of the arguments of Landreau and of Erdös.

Given a positive integer , let denote the number of divisors of n (including 1 and n), thus for instance d(6)=4, and more generally, if n has a prime factorisation

(1)

then (by the fundamental theorem of arithmetic)

. (2)

Clearly, . The *divisor bound* asserts that, as gets large, one can improve this trivial bound to

(3)

for any , where depends only on ; equivalently, in asymptotic notation one has . In fact one has a more precise bound

. (4)

The divisor bound is useful in many applications in number theory, harmonic analysis, and even PDE (on periodic domains); it asserts that for any large number n, only a “logarithmically small” set of numbers less than n will actually divide n exactly, even in the worst-case scenario when n is smooth. (The average value of d(n) is much smaller, being about on the average, as can be seen easily from the double counting identity

,

or from the heuristic that a randomly chosen number m less than n has a probability about 1/m of dividing n, and . However, (4) is the correct “worst case” bound, as I discuss below.)

The divisor bound is elementary to prove (and not particularly difficult), and I was asked about it recently, so I thought I would provide the proof here, as it serves as a case study in how to establish worst-case estimates in elementary multiplicative number theory.

[*Update*, Sep 24: some applications added.]

## Recent Comments