Exercise 1Show that the field of algebraic numbers is indeed a field, and that the ring of algebraic integers is indeed a ring, and is in fact an integral domain. Also, show that , that is to say the ordinary integers are precisely the algebraic integers that are also rational. Because of this, we will sometimes refer to elements of asrational integers.

In practice, the field is too big to conveniently work with directly, having infinite dimension (as a vector space) over . Thus, algebraic number theory generally restricts attention to intermediate fields between and , which are of finite dimension over ; that is to say, finite degree extensions of . Such fields are known as algebraic number fields, or *number fields* for short. Apart from itself, the simplest examples of such number fields are the quadratic fields, which have dimension exactly two over .

Exercise 2Show that if is a rational number that is not a perfect square, then the field generated by and either of the square roots of is a quadratic field. Conversely, show that all quadratic fields arise in this fashion. (Hint:show that every element of a quadratic field is a root of a quadratic polynomial over the rationals.)

The ring of algebraic integers is similarly too large to conveniently work with directly, so in algebraic number theory one usually works with the rings of algebraic integers inside a given number field . One can (and does) study this situation in great generality, but for the purposes of this post we shall restrict attention to a simple but illustrative special case, namely the quadratic fields with a certain type of negative discriminant.

Exercise 3Let be a square-free natural number with or . Show that the ring of algebraic integers in is given byIf instead is square-free with , show that the ring is instead given by

What happens if is not square-free, or negative?

Remark 1In the case , it may naively appear more natural to work with the ring , which is an index two subring of . However, because this ring only captures some of the algebraic integers in rather than all of them, the algebraic properties of these rings are somewhat worse than those of (in particular, they generally fail to be Dedekind domains) and so are not convenient to work with in algebraic number theory.

We refer to fields of the form for natural square-free numbers as *quadratic fields of negative discriminant*, and similarly refer to as a ring of quadratic integers of negative discriminant. Quadratic fields and quadratic integers of positive discriminant are just as important to analytic number theory as their negative discriminant counterparts, but we will restrict attention to the latter here for simplicity of discussion.

Thus, for instance, when , the ring of integers in is the ring of Gaussian integers

and when , the ring of integers in is the ring of Eisenstein integers

where is a cube root of unity.

As these examples illustrate, the additive structure of a ring of quadratic integers is that of a two-dimensional lattice in , which is isomorphic as an additive group to . Thus, from an additive viewpoint, one can view quadratic integers as “two-dimensional” analogues of rational integers. From a *multiplicative* viewpoint, however, the quadratic integers (and more generally, integers in a number field) behave very similarly to the rational integers (as opposed to being some sort of “higher-dimensional” version of such integers). Indeed, a large part of basic algebraic number theory is devoted to treating the multiplicative theory of integers in number fields in a unified fashion, that naturally generalises the classical multiplicative theory of the rational integers.

For instance, every rational integer has an absolute value , with the multiplicativity property for , and the positivity property for all . Among other things, the absolute value detects units: if and only if is a unit in (that is to say, it is multiplicatively invertible in ). Similarly, in any ring of quadratic integers with negative discriminant, we can assign a norm to any quadratic integer by the formula

where is the complex conjugate of . (When working with other number fields than quadratic fields of negative discriminant, one instead defines to be the product of all the Galois conjugates of .) Thus for instance, when one has

Analogously to the rational integers, we have the multiplicativity property for and the positivity property for , and the units in are precisely the elements of norm one.

Exercise 4Establish the three claims of the previous paragraph. Conclude that the units (invertible elements) of consist of the four elements if , the six elements if , and the two elements if .

For the rational integers, we of course have the fundamental theorem of arithmetic, which asserts that every non-zero rational integer can be uniquely factored (up to permutation and units) as the product of irreducible integers, that is to say non-zero, non-unit integers that cannot be factored into the product of integers of strictly smaller norm. As it turns out, the same claim is true for a few additional rings of quadratic integers, such as the Gaussian integers and Eisenstein integers, but fails in general; for instance, in the ring , we have the famous counterexample

that decomposes non-uniquely into the product of irreducibles in . Nevertheless, it is an important fact that the fundamental theorem of arithmetic can be salvaged if one uses an “idealised” notion of a number in a ring of integers , now known in modern language as an ideal of that ring. For instance, in , the principal ideal turns out to uniquely factor into the product of (non-principal) ideals ; see Exercise 15. We will review the basic theory of ideals in number fields (focusing primarily on quadratic fields of negative discriminant) below the fold.

The norm forms (1), (2) can be viewed as examples of positive definite quadratic forms over the integers, by which we mean a polynomial of the form

for some integer coefficients . One can declare two quadratic forms to be *equivalent* if one can transform one to the other by an invertible linear transformation , so that . For example, the quadratic forms and are equivalent, as can be seen by using the invertible linear transformation . Such equivalences correspond to the different choices of basis available when expressing a ring such as (or an ideal thereof) additively as a copy of .

There is an important and classical invariant of a quadratic form , namely the discriminant , which will of course be familiar to most readers via the quadratic formula, which among other things tells us that a quadratic form will be positive definite precisely when its discriminant is negative. It is not difficult (particularly if one exploits the multiplicativity of the determinant of matrices) to show that two equivalent quadratic forms have the same discriminant. Thus for instance any quadratic form equivalent to (1) has discriminant , while any quadratic form equivalent to (2) has discriminant . Thus we see that each ring of quadratic integers is associated with a certain negative discriminant , defined to equal when and when .

Exercise 5 (Geometric interpretation of discriminant)Let be a quadratic form of negative discriminant , and extend it to a real form in the obvious fashion. Show that for any , the set is an ellipse of area .

It is natural to ask the converse question: if two quadratic forms have the same discriminant, are they necessarily equivalent? For certain choices of discriminant, this is the case:

Exercise 6Show that any quadratic form of discriminant is equivalent to the form , and any quadratic form of discriminant is equivalent to . (Hint:use elementary transformations to try to make as small as possible, to the point where one only has to check a finite number of cases; this argument is due to Legendre.) More generally, show that for any negative discriminant , there are only finitely many quadratic forms of that discriminant up to equivalence (a result first established by Gauss).

Unfortunately, for most choices of discriminant, the converse question fails; for instance, the quadratic forms and both have discriminant , but are not equivalent (Exercise 23). This particular failure of equivalence turns out to be intimately related to the failure of unique factorisation in the ring .

It turns out that there is a fundamental connection between quadratic fields, equivalence classes of quadratic forms of a given discriminant, and real Dirichlet characters, thus connecting the material discussed above with the last section of the previous set of notes. Here is a typical instance of this connection:

Proposition 1Let be the real non-principal Dirichlet character of modulus , or more explicitly is equal to when , when , and when .

- (i) For any natural number , the number of Gaussian integers with norm is equal to . Equivalently, the number of solutions to the equation with is . (Here, as in the previous post, the symbol denotes Dirichlet convolution.)
- (ii) For any natural number , the number of Gaussian integers that divide (thus for some ) is .

We will prove this proposition later in these notes. We observe that as a special case of part (i) of this proposition, we recover the Fermat two-square theorem: an odd prime is expressible as the sum of two squares if and only if .

As an illustration of the relevance of such connections to analytic number theory, let us now explicitly compute .

*Proof:* For a large number , consider the quantity

of all the Gaussian integers of norm less than . On the one hand, this is the same as the number of lattice points of in the disk of radius . Placing a unit square centred at each such lattice point, we obtain a region which differs from the disk by a region contained in an annulus of area . As the area of the disk is , we conclude the Gauss bound

On the other hand, by Proposition 1(i) (and removing the contribution), we see that

Now we use the Dirichlet hyperbola method to expand the right-hand side sum, first expressing

and then using the bounds , , from the previous set of notes to conclude that

Comparing the two formulae for and sending , we obtain the claim.

Exercise 7Give an alternate proof of Corollary 2 that relies on obtaining asymptotics for the Dirichlet series as , rather than using the Dirichlet hyperbola method.

Exercise 8Give a direct proof of Corollary 2 that does not use Proposition 1, instead using Taylor expansions of the complex logarithm .

More generally, one can relate for a real Dirichlet character with the number of inequivalent quadratic forms of a certain discriminant, via the famous class number formula; we will give a special case of this formula below the fold.

The material here is only a very rudimentary introduction to algebraic number theory, and is not essential to the rest of the course. A slightly expanded version of the material here, from the perspective of analytic number theory, may be found in Sections 5 and 6 of Davenport’s book. A more in-depth treatment of algebraic number theory may be found in a number of texts, e.g. Fröhlich and Taylor.

** — 1. Ideals — **

We begin by reviewing the notion of an ideal in an arbitrary commutative ring.

Definition 3 (Ideals)Let be a commutative ring (in this set of notes, rings are understood to contain a multiplicative unit ). Anidealof is an additive subgroup of with the property that whenever and . Note that if is an ideal, then the quotient is well defined as a commutative ring. We write for the quotient map from to , and write if , or equivalently if is equal to .An ideal is

properif it is not all of . An ideal isprincipalif it is of the form for some , andnon-zeroif it is not the zero ideal .If are ideals, then the intersection is an ideal, as is the sum . The product set need not be an ideal in general (it is not always closed under multiplication); however, we can define the product ideal to be the ideal generated by this product set (that is, the intersection of all the ideals containing this product set). One can then define powers for any in the obvious fashion, with the convention that . We say that

dividesif , thus for instance divides for any ideals . If , we say thatstrictly divides.An ideal is prime if it is proper, and it has the property that for any with , one has at least one of or true. Equivalently, an ideal is prime if the quotient ring is an integral domain.

One can easily check in the rational integers that product, divisibility, and primality correspond to their counterpart notions in the natural numbers. More precisely, if are natural numbers, then , that divides if and only if divides , and that is prime if and only if is prime. (But note that the zero ideal is also prime, and can be viewed as a sort of “prime at infinity” from the perspective of scheme theory.) Also, if is a natural number and are integers, then holds if and only if . Thus we see that the above operations on ideals are quite compatible with their classical counterparts in or . Also, the integers form a principal ideal domain, in that every ideal is principal; indeed, if is non-zero, it is generated by the element of minimal norm. In particular, from the classical fundamental theorem of arithmetic we see that every non-zero ideal in is uniquely factorisable (up to rearrangement) as the product of prime ideals.

Now we specialise to rings of quadratic integers, where is a squarefree natural number. These more general rings need no longer be principal ideal domains. For instance, contains the non-principal ideal . Closely related to this is the breakdown of the fundamental theorem of arithmetic for quadratic integers (i.e. need not be a unique factorisation domain); for instance, factors non-uniquely as and . Despite this, one still has unique factorisation at the level of ideals; for instance, in it turns out that factors uniquely as the product of , , , and . As we shall see, the precise failure of unique factorisation at the level of quadratic numbers can be quantified by an important number , known as the class number of the ring of integers, where is the discriminant mentioned in the introduction (equal to when , and when ).

** — 2. Unique factorisation of ideals — **

Henceforth is a fixed squarefree natural number, and is the ring of integers in . We set the discriminant equal to when and equal to when .

Exercise 9 (Algebraic interpretation of discriminant)Let be an additive basis for (thus is generated by as an additive group). Show thatWe remark that the discriminant of a more general number field is defined similarly.

As mentioned in the introduction, one can view additively, as a rank two lattice in the complex numbers . Any non-zero ideal of can then be seen to be a rank two sublattice of , and in particular must have finite index. We refer to this index as the norm of the ideal, thus is the natural number defined by the formula

For the ring of quadratic integers we are considering here, one can interpret geometrically as the area of the torus , divided by the area of the torus , which can be easily computed to be .

It is clear that if divides , then divides , since is a quotient of . Similarly, if divides and , then one must have . This implies the important Noetherian property: does not contain any infinite strictly increasing sequence of ideals

since their norms must be strictly decreasing, creating an infinite descent which is absurd. This notion of norm is compatible with the notion of the norm of a quadratic integer:

Exercise 10If is a quadratic integer, show that , where was defined in the introduction.

We remark that for quadratic integers of positive discriminant the situation is slightly more complicated, because the norm of an individual element can now be negative, whereas the norm of an ideal is always positive. We will not dwell on this complication further here.

Now we develop a unique factorisation theory for ideals. We first establish that prime ideals are prime within the multiplicative structure of ideals (rather than of quadratic integers):

Lemma 4Let be a prime ideal that divides the product of two ideals . Then must divide at least one of .

*Proof:* If does not divide either of , then we can find that lie outside of . As is prime, we conclude that also lies outside of , and so does not divide , a contradiction.

Also, prime ideals are maximal:

Exercise 11Show that the only ideals that divide a prime ideal are itself, and the full ring .

If a non-zero ideal is not prime, then by definition there exist two quadratic integers outside of such that . If we set and , we then see that strictly divide , and that divides . Thus any non-zero ideal is either prime, or divides the product of two non-zero ideals that strictly divide it (and thus have smaller norm). Iterating this (and using the Noetherian property), we conclude

Proposition 5Every non-zero ideal divides the product of a finite number of prime ideals.

A similar argument gives

Exercise 12Show that every non-zero ideal is divisible by at least one prime ideal.

We now need a technical lemma that allows one to “invert” a prime ideal .

Lemma 6Let be a prime ideal. Then there exists a quadratic field element that is not a quadratic integer (thus ), but is such that .

*Proof:* Let be a non-zero element of . By Proposition 5, must divide some product of prime ideals. In particular, also divides , which by Lemma 4 and Exercise 11 implies that one of the , say , is equal to . By taking to be minimal, we may assume that does not divide . Thus, we may find an element of that does not lie in , but such that is contained in . Setting , we obtain the claim.

Remark 2We can formalise the notion of inverting an ideal by introducing the concept of a fractional ideal, which are to ideals as rational numbers are to integers, but we will not do so in this set of notes.

Now we can give the most difficult step of unique factorisation:

Proposition 7Suppose is a non-zero ideal that is divisible by a prime ideal . Then one has for some non-zero ideal which is a strict divisor of .

*Proof:* By the previous lemma, we can find that is not a quadratic integer, such that . Note that is an ideal dividing , so by Exercise 11 is either equal to or .

Suppose first that . The ideal is a rank two lattice, and thus isomorphic as an abelian group to . The action of multiplication by on is then conjugate to the action of a matrix with integer coefficients. By the Cayley-Hamilton theorem, this implies that there is a monic quadratic polynomial of that annihilates , and is thus zero (since is an integral domain). In other words, is an algebraic integer, and hence , a contradiction. (Note here that we crucially used the fact that contains *all* the algebraic integers of ; cf. Remark 1.)

Thus we must have . If we then set , then we have , and is an ideal dividing . We are thus done unless , that is to say . But one can then repeat the previous argument to conclude that is an algebraic integer and thus in , again reaching a contradiction.

We now have enough tools to mimic the usual proof of unique factorisation for natural numbers, to obtain the analogous result for ideals in a ring of quadratic integers:

Exercise 13 (Unique factorisation)Show that any non-zero ideal can be uniquely expressed (up to rearrangement) as a product of prime ideals, with . Show that one non-zero ideal divides another if and only if the number of times any given prime ideal appears in the unique factorisation of is less than or equal to the number of times it appears in .

A basic application of unique factorisation is

Proposition 8 (Chinese Remainder Theorem)Let be non-zero ideals that are coprime (they have no prime ideal divisors in common). Then the obvious ring homomorphism from to , defined by setting , is an isomorphism.

*Proof:* Observe that the ideal divides both and and must therefore be all of , by unique factorisation and coprimality. Similarly, the ideal is divisible by both and while dividing , and must therefore be exactly , by unique factorisation and coprimality. Since the kernel of is , we conclude that is injective; it remains to show that is surjective. Since , we can split for some and . But then and , and the surjectivity then follows since these two elements generate as an -module.

In the non-coprime case, we have the following basic fact.

Proposition 9Let be a prime ideal. Then for any non-negative integer , we have isomorphic (as a ring) to .

*Proof:* By unique factorisation, is a strict divisor of , thus we can find that does not lie in . This gives a ring homomorphism defined by . The kernel of this map is an ideal dividing that does not contain , and is thus . Thus we have an injection from to .

It remains to show surjectivity. By several applications of Proposition 7, we may write for some non-zero ideal not divisible by . By the Chinese remainder theorem, we may then find, for any , a quadratic integer such that and . Thus lies in both and , and hence in by coprimality; thus is a quadratic integer. By construction, we have , giving the desired surjectivity.

Corollary 10 (Multiplicativity of norm)For any non-zero ideals , we have .

*Proof:* When are coprime this follows directly from the Chinese remainder theorem. By unique factorisation, it thus suffices to show that for all natural numbers . But this follows from Proposition 9 and induction on .

Exercise 14Show that the Gaussian integers and the Eisenstein integers are principal ideal domains. (Hint:if is a non-zero ideal in one of these rings, consider a non-zero element of of minimal norm.) Conclude a unique factorisation theorem for elements of these rings.

Exercise 15Verify that in , the principal ideal factors into the four ideals mentioned in the introduction, and that these ideals are prime. What are the norms of all the ideals involved?

Remark 3The unique factorisation theorem for ideals holds in the more general context of Dedekind domains, but we will not develop the abstract theory of Dedekind domains here.

** — 3. Connection with the Kronecker symbol — **

Let be a prime ideal, then is a finite integral domain, and is thus a finite field (each non-zero element acts via multiplication by a permutation). On the other hand, since is a rank two abelian group, this finite field must have rank at most two. We conclude that is isomorphic to a finite field of order either or for some rational prime , which is the characteristic of the field. In particular, is either a prime or a square of that prime . On the other hand, since has characteristic , must divide , which has norm exactly by Exercise 10. By unique factorisation, we conclude that for each rational prime , the ideal is either prime of norm , or is the product of two prime ideals that each have norm , and furthermore that all prime ideals arise in this fashion.

We can determine precisely which of the two is the case:

Proposition 11Let be a rational prime.

- If is a quadratic residue modulo , then is the product of two prime ideals of norm .
- is a quadratic non-residue modulo , then is a prime ideal of norm .

*Proof:* We just handle the case and leave the case as an exercise. Suppose there is an prime ideal of norm , then is isomorphic to the field of order . In particular, if is a element of not divisible by , then must be a multiple of modulo , thus one can find non-zero such that and , which implies that , since are not both zero modulo . Thus (and hence ) is a quadratic residue modulo . Conversely, if (and hence ) is a quadratic residue, then we can find with and with not both zero modulo , and then is an ideal dividing of norm , and thus prime. The claim follows.

Exercise 16Complete the proof of the proposition in the case .

Exercise 17Show that when is a quadratic residue modulo , the two prime ideals appearing in the above proposition are distinct unless divides , in which case the two ideals are equal.

The above proposition gives us a formula for the number of prime ideals of a given norm. For any natural number , define the Kronecker symbol to be the completely multiplicative function of such that for each prime , equals if divides , equals if is a non-zero quadratic residue mod , and if is a non-zero quadratic nonresidue mod . From the law of quadratic reciprocity, one can verify that is a Dirichlet character of conductor . For instance, if , .

Exercise 18Show that for any natural number , the number of ideals of norm is equal to .

Exercise 19Prove Proposition 1.

Another way to phrase the conclusion of Exercise 18 is as the factorisation

(for at least), where is the Riemann zeta function, is the Dedekind zeta function

where ranges over prime ideals, and is the Dirichlet -function

For instance, in the case of the Gaussian integers , we have

** — 4. Connection with quadratic forms — **

Let us say that two ideals are *equivalent* if one has for some . This is clearly an equivalence relation; the equivalence class of is simply the class of principal ideals. Using unique factorisation (and the fact that every ideal divides a principal ideal), the space of such equivalence classes is a group, called the class group of the ring of integers .

One can analyse this class group by associating a positive definite quadratic form to each ideal , by the formula

for all . Note that for , and so takes values in the non-negative integers (and is strictly positive for non-zero ). Since is a quadratic form in , we see that is a quadratic form on .

We call two quadratic forms , *equivalent* if there is an additive group isomorphism such that for all . This relation captures the equivalence relation on ideals:

Exercise 20Let be ideals. Show that and are equivalent if and only if are equivalent.

A quadratic form on the standard lattice is of the form for some integers . The discriminant of this quadratic form is defined to be . This is an invariant with respect to invertible linear transformations of . Thus, given any other quadratic form on a rank two lattice , one can define the discriminant of by identifying with via a linear isomorphism; it is clear that this definition does not depend on the choice of isomorphism. It turns out that the quadratic form of all ideals have the same discriminant:

Exercise 21Let be an ideal. Show that has discriminant . (Hint:if one identifies with , show that takes the form for some linear transformation of determinant , and use this to show that has the same discriminant as .)

From Exercise 6 we see that the class group of is finite. The order of this group is known as the class number and will be denoted .

In the converse direction, we have

Lemma 12Let be a positive definite quadratic form of discriminant . Then there exists an ideal such that is equivalent to .

In particular, the class group is in one-to-one correspondence with the equivalence classes of positive definite quadratic forms of a given discriminant, a famous result of Gauss. In particular, the group law on the class group induces the *Gauss composition law* on equivalence classes of quadratic forms of a given discriminant.

*Proof:* We perform some *ad hoc* computations in the case . If , then and , which makes even. One may verify that the set

is an additive subgroup of that is also closed under multiplication by , and is thus an ideal; one may also calculate its norm to be , with

for . This implies that is equivalent to as required.

Exercise 22Complete the proof of the lemma by treating the case.

Exercise 23Show that and are inequivalent quadratic forms of discriminant , and that all other quadratic forms of this discriminant are equivalent to one of these two forms.

Now we relate the distribution of norms of ideals to representation by quadratic forms.

Proposition 13Let be a natural number, and let be representatives of the equivalence classes of positive definite quadratic forms of discriminant . Then the number of ideals with is equal to the number of representations of of the form for some and , divided by the number of units (i.e. if , if , or otherwise, thanks to Exercise 4).

*Proof:* Suppose we can write for some and . By construction, is isomorphic to for some ideal , and so we can write for some , thus . By unique factorisation, we may write for some ideal of norm . Note that if we replace with a different index, then is replaced with an inequivalent ideal, and then so is . On the other hand, if we keep fixed and replace with some , thus also replacing with some , then we also change to a different ideal unless , or equivalently if differs from by a unit, in which case is unchanged. Thus we have a map from representations with and to ideals of norm whose multiplicity is exactly equal to the number of units.

To conclude the proposition, we need to show that every ideal of norm arises in this fashion. But by Lagrange’s theorem, lies in , and then , giving the claim.

We can then give an elementary asymptotic for the norm of ideals:

Corollary 14We havefor , where is the number of units.

*Proof:* By the preceding proposition, the left-hand side can be written as

The inner sum is the number of lattice points in the ellipse , which has area by Exercise 5, since has discriminant . If one places a unit square centred at each such lattice point, one obtains a region that differs from the ellipse by a set of area (since the difference set is contained in a -neighbourhood of the boundary of the ellipse). The claim follows.

This, combined with Exercise 18, gives a special case of the famous Dirichlet class number formula, generalising Exercise 2:

Exercise 24 (Dirichlet class number formula)Show thatfor sufficiently close to , and then conclude that

In particular, since the class number is clearly at least one, we obtain a “trivial” lower bound

This looks weaker than Siegel’s bound

for any (which we will discuss in later notes), but a key difference is that the trivial bound has effective constants, whereas Siegel’s bound is ineffective. The best effective bound currently known on only improves on the trivial bound by a logarithmic factor, and involves quite deep and difficult mathematics relating to elliptic curves; see this survey of Goldfeld.

Remark 4Most of the above discussion also extends to the rings of integers in real quadratic fields of positive discriminant, with a few changes; for instance, there are now infinitely many units, quadratic integers may now have negative norm, and the field is now embedded (in two different ways) into rather than into . The ellipses of Exercise 5 become hyperbolae, which creates a logarithmic correction term (known as a regulator) in the class number formula. We leave the detailed modifications needed to the interested reader. It turns out that every real Dirichlet character will essentially arise from a quadratic field (of either positive or negative discriminant); see Chapters 5, 6 of Davenport for details. One can also consider higher degree number fields than quadratic fields; again, much of the above theory carries through in this case, but the characters that then emerge are not necessarily Dirichlet characters, but lie instead in the more general class of Hecke characters. We will not discuss this more general theory here, but see for instance the book of Fröhlich and Taylor.

Remark 5There is an important connection between class groups and abelian extensions of fields (an abelian version of the connection between Galois groups and arbitrary extensions of fields), known as class field theory, but we will not discuss this topic further in this course.

Filed under: 254A - analytic prime number theory, math.NT, math.RA Tagged: algebraic integer, class number formula, Dedekind zeta function, Dirichlet character, ideals, number field, unique factorisation ]]>

whenever are coprime. (One also considers arithmetic functions, such as the logarithm function or the von Mangoldt function, that are not genuinely multiplicative, but interact closely with multiplicative functions, and can be viewed as “derived” versions of multiplicative functions; see this previous post.) A typical example of a multiplicative function is the divisor function

that counts the number of divisors of a natural number . (The divisor function is also denoted in the literature.) The study of asymptotic behaviour of multiplicative functions (and their relatives) is known as multiplicative number theory, and is a basic cornerstone of modern analytic number theory.

There are various approaches to multiplicative number theory, each of which focuses on different asymptotic statistics of arithmetic functions . In *elementary multiplicative number theory*, which is the focus of this set of notes, particular emphasis is given on the following two statistics of a given arithmetic function :

- The
*summatory functions*of an arithmetic function , as well as the associated natural density

(if it exists).

- The
*logarithmic sums*of an arithmetic function , as well as the associated

*logarithmic density*(if it exists).

Here, we are normalising the arithmetic function being studied to be of roughly unit size up to logarithms, obeying bounds such as , , or at worst

A classical case of interest is when is an indicator function of some set of natural numbers, in which case we also refer to the natural or logarithmic density of as the natural or logarithmic density of respectively. However, in analytic number theory it is usually more convenient to replace such indicator functions with other related functions that have better multiplicative properties. For instance, the indicator function of the primes is often replaced with the von Mangoldt function .

Typically, the logarithmic sums are relatively easy to control, but the summatory functions require more effort in order to obtain satisfactory estimates; see Exercise 3 below.

If an arithmetic function is multiplicative (or closely related to a multiplicative function), then there is an important further statistic on an arithmetic function beyond the summatory function and the logarithmic sum, namely the Dirichlet series

for various real or complex numbers . Under the hypothesis (3), this series is absolutely convergent for real numbers , or more generally for complex numbers with . As we will see below the fold, when is multiplicative then the Dirichlet series enjoys an important Euler product factorisation which has many consequences for analytic number theory.

In the elementary approach to multiplicative number theory presented in this set of notes, we consider Dirichlet series only for real numbers (and focusing particularly on the asymptotic behaviour as ); in later notes we will focus instead on the important *complex-analytic* approach to multiplicative number theory, in which the Dirichlet series (4) play a central role, and are defined not only for complex numbers with large real part, but are often extended analytically or meromorphically to the rest of the complex plane as well.

Remark 1The elementary and complex-analytic approaches to multiplicative number theory are the two classical approaches to the subject. One could also consider a more “Fourier-analytic” approach, in which one studies convolution-type statistics such asas for various cutoff functions , such as smooth, compactly supported functions. See for instance this previous blog post for an instance of such an approach. Another related approach is the “pretentious” approach to multiplicative number theory currently being developed by Granville-Soundararajan and their collaborators. We will occasionally make reference to these more modern approaches in these notes, but will primarily focus on the classical approaches.

To reverse the process and derive control on summatory functions or logarithmic sums starting from control of Dirichlet series is trickier, and usually requires one to allow to be complex-valued rather than real-valued if one wants to obtain really accurate estimates; we will return to this point in subsequent notes. However, there is a cheap way to get *upper bounds* on such sums, known as *Rankin’s trick*, which we will discuss later in these notes.

The basic strategy of elementary multiplicative theory is to first gather useful estimates on the statistics of “smooth” or “non-oscillatory” functions, such as the constant function , the harmonic function , or the logarithm function ; one also considers the statistics of periodic functions such as Dirichlet characters. These functions can be understood without any multiplicative number theory, using basic tools from real analysis such as the (quantitative version of the) integral test or summation by parts. Once one understands the statistics of these basic functions, one can then move on to statistics of more arithmetically interesting functions, such as the divisor function (2) or the von Mangoldt function that we will discuss below. A key tool to relate these functions to each other is that of Dirichlet convolution, which is an operation that interacts well with summatory functions, logarithmic sums, and particularly well with Dirichlet series.

This is only an introduction to elementary multiplicative number theory techniques. More in-depth treatments may be found in this text of Montgomery-Vaughan, or this text of Bateman-Diamond.

** — 1. Summing monotone functions — **

The most fundamental estimate regarding the equidistribution of the natural numbers is the trivial bound

for any , which reflects the evenly spaced nature of the natural numbers. In particular, one has the weaker bound

for any , but it is important to note that (7) does NOT hold in the range when is much smaller than one, and so some care needs to be taken with applying (7) if one does not know *a priori* that is going to be large. (In general, I recommend using (6) instead of (7), to avoid this trap.)

We have the following generalisation of (6) to summation of monotone functions:

Lemma 1 (Quantitative integral test)Let be real numbers, and let be a monotone function. Then

Note that monotone functions are automatically Riemann integrable and Lebesgue integrable, so there is no difficulty in defining the integral appearing above.

*Proof:* By replacing with if necessary, we may assume that is non-decreasing. By rounding up and rounding down , we may assume that are integers. We have

and similarly

and the claim follows.

for any (a weak form of Stirling’s formula, discussed in this previous blog post), and more generally one has

for all and some polynomial with leading term . (The remaining terms in may be computed explicitly, but for our purposes it will not be essential to know what they are.)

Remark 2If one assumes more differentiability on , one can get more precise control on the error term in Lemma 1 using the Euler-Maclaurin formula; see e.g. Exercise 7 below. However, we will usually not need these more refined estimates here. In any event, one can get even better control on the error term if one works with smoother sums such as (5) with smooth, thanks to tools such as the Poisson summation formula. See this previous blog post for some related discussion.

Lemma 1 combines well with the following basic lemma.

Lemma 2 (Cauchy sequences converge)Let , and be functions such that as . Then the following are equivalent:

- (i) One has
for all .

- (ii) There exists a constant such that
for all . In particular, .

The quantity in (ii) is unique; it is also real-valued if are real-valued.

If in addition as , then when (ii) holds, converges conditionally to .

Exercise 2Prove Lemma 2.

We now give some basic applications of these lemmas. If is a real number not equal to , then from Lemma 1 we have

for , and hence by Lemma 2, there is a real number with

for all . In the case , we conclude that the sum is absolutely convergent, and

however for this identity is technically not true, since the sum on the right-hand side is now divergent using the usual conventions for infinite sums. (The identity can however be recovered by using more general interpretations of infinite sums; see this previous blog post. The function is of course the famous Riemann zeta function.

For the case, we again see from Lemma 1 that

for , and thus there exists a real number such that

for . The constant is known as Euler’s constant (or the *Euler-Mascheroni constant*), and has a value of .

Exercise 3Let be an arithmetic function. If the natural density of exists and is equal to some complex number , show that the logarithmic density also exists and is equal to . (Hint:first establish the identity .) An important counterexample to the converse claim is given in Exercise 7 below.

Exercise 4Let be an arithmetic function obeying the crude bound (3), and let be a complex number. If the logarithmic density of exists and is equal to , show that as , or in other words that as . (Hint:.)

Exercise 5

Exercise 6Show rigorously that for any non-negative integer and real , the Riemann zeta function is -times differentiable at and one has(There are several ways to justify the term-by-term differentiation in the first equation; one way is to instead establish a term-by-term integration formula and then apply the fundamental theorem of calculus. Another is to use Taylor series with remainder to control the error in Newton quotients. A third approach is to use complex analysis.)

Exercise 7Let be real numbers, and let be a continuously differentiable function. Show that(

Hint:compare with . One may first wish to consider the case when are integers, and deal with the roundoff errors when this is not the case later.) Conclude that if is a non-zero number, that the function has logarithmic density zero, but does not have a natural density. (Hint:for the latter claim, argue by contradiction and consider sums of the form .)

Remark 3The above exercise demonstrates that logarithmic density is a cruder statistic than natural density, as it completely ignores oscillations by , whereas natural density is very sensitive to such oscillations. As such, controlling natural density of some arithmetic functions (such as the von Mangoldt function ) often boils down to determining to what extent such functions “conspire”, “correlate”, or “pretend to be” for various real numbers . This fact is a little tricky to discern in the elementary approach to multiplicative number theory, but is more readily apparent in the complex analytic approach, which we will discuss in later notes.

** — 2. The Euler product and Rankin’s trick — **

The fundamental theorem of arithmetic tells us that every natural number is uniquely expressible as a prime factorisation

A very useful way to encode this fact analytically (in a “generating function” form) is as follows. Recall from the introduction that an arithmetic function is multiplicative if one has and whenever are coprime. If is a multiplicative function, then we have

at least when is non-negative. If is complex-valued and the product

is finite, then an application of monotone convergence shows that (18) holds in this case also, with both sides of the equation being absolutely convergent. Multiplying by the multiplicative function , we conclude in particular the Euler product identity

whenever the right-hand side is absolutely convergent in the sense that

is finite, or equivalently that

is finite. Observe that if obeys the crude bound (3), then this absolute convergence is obtained whenever . Thus for instance one has the Euler product identity for the Riemann zeta function ,

whenever . From (11) we therefore have

Taking logarithms, we thus have

when (note that is clearly at least one). The logarithm here on the left-hand side is not so desirable, but we may remove it by using the Taylor expansion

and noting that is absolutely convergent, to conclude that

Taking , and using monotone convergence, we recover Euler’s theorem , but (23) gives more precise information. For instance, we have the following bound, already anticipated by Euler:

Theorem 3 (Cheap second Mertens’ theorem)We havefor .

We will improve this bound later in these notes.

*Proof:* We use a device known as *Rankin’s trick* to compare a Dirichlet series with a natural or logarithmic mean. Namely, observe that

for all and all natural numbers (in fact the implied constant can be given to be ), and hence

The claim now follows from (23).

Remark 4The same Rankin trick (24), when used to bound the harmonic series , gives an upper bound of , which is inferior to (13) by a factor of about . In general, one expects Rankin’s trick to lose a constant factor when estimating non-negative logarithmic sums.

Rankin’s trick can also be used to cheaply bound other means of multiplicative functions, as follows.

Theorem 4 (Cheap upper bound on multiplicative functions)Let be a real number, and let be a multiplicative function such thatfor all primes and . Then we have

for .

*Proof:* For brevity we allow all implied constants here to depend on . By replacing by , we may assume that is non-negative. From Rankin’s trick (24) we have

Using the Euler product (19), we conclude that

We can crudely bound

and hence by (25)

By Taylor expansion we have

and hence by (21)

But , and the claim follows.

Comparing this bound with (14), we are led to the guess that for as in the above theorem, should behave roughly like on the average. In the next section we will see some arguments that can make this more precise.

We can get more mileage out of (22) by differentiating in . Formally differentiating in , we arrive at the identity

Exercise 8Derive (26) rigorously. (Hint:For fixed and small , expand using Taylor series with remainder.)

Using the geometric series expansion

and introducing the von Mangoldt function , defined by setting whenever is a prime and is a natural number, and for all other , we thus obtain the important identity

for the Dirichlet series of , and for all . (In fact this identity will hold for a wider range of , but we will address this in later sections.) For comparison, a Taylor expansion of (22) gives the closely related identity

Indeed, (27) is essentially the derivative of (28) in .

for (note the claim is trivial if is large).

Exercise 9 (Cheap first Mertens’ theorem)Show thatfor all . (

Hint:use Rankin’s trick.)

One can also use Rankin’s trick to bound summatory functions by using the very crude bound

but this is wasteful (often losing a factor of about ). For instance, one could bound the summatory function by using

followed by Exercise 9, but this only gives the crude upper bound

which was trivial anyway since when . The problem is that (30) is very inefficient when is much smaller than . To do better, there is a standard rearrangement trick that moves much of the “mass” of a summatory function to smaller values of , effectively increasing the cutoff and so replacing (30) with a less inefficient comparison. To describe this rearrangement trick, we return to the fundamental theorem of arithmetic (17) and take logarithms to obtain

whenever is a natural number with prime factorisation (17). Using the von Mangoldt function defined previously, we can thus encode the fundamental theorem of arithmetic neatly in the important identity

For , we rearrange this identity as

which allows us to rewrite an arbitrary summatory function for some as

For the latter sum, we write and interchange summations to obtain (after replacing with ) the rearrangement identity

The expression in parentheses can be viewed as a weighted version of , with the weight tending to be larger for small than for large . Because of this reweighting, if one applies Rankin’s trick to bound the sum on the right-hand side, one can often obtain an upper bound on that recovers the loss of that would occur if one applied Rankin’s trick directly. We illustrate this with

Proposition 5 (Chebyshev’s upper bound)We haveIn particular, specialising to primes we have

*Proof:* Without loss of generality we may take . By (32), we may write

Observe that the expression is only non-zero when and are both powers of the same prime , in which case this expression equals . Furthermore, one must have if one is to have . Each prime has powers less than , so the total sum can be crudely bounded by . (We are conceding several unnecessary powers of here, but we can afford to do so because is so much smaller than .) Also, we can crudely bound . Applying Exercise 9, we conclude that

and the claim follows.

Exercise 10 (Chebyshev’s upper bound, alternate formulation)Show that the number of primes less than is for any . (Hint:first bound the number of primes between and .)

Exercise 11Give an alternate proof of Exercise 10, based on the observation that the binomial coefficient is divisible by every prime between and , while also being bounded by .

Remark 5Chebyshev also established the matching lower bound for ; we will establish this bound (by a slightly different method) in Exercise 19. Both of Chebyshev’s bounds will later be superseded by the prime number theorem , discussed in later notes.

Remark 6We have seen that the von Mangoldt function obeys at least three useful identities: (27), (28) and (31). (Not surprisingly, these identities are closely related to each other: (27) is basically the derivative of (28) and the Dirichlet series transform of (31).) It is because of these identities (and further related identities which we will see in later posts) that the von Mangoldt function is an extremely convenient proxy for the prime numbers in analytic number theory. Indeed, many question about primes in analytic number theory are most naturally tackled by converting them to a statement about the von Mangoldt function, by adding or subtracting the contribution of prime powers for , which are typically of significantly lower order), in order to access identities such as (27), (28) or (31).

Exercise 12 (Cheap upper bound for summatory functions)With obeying the conditions of Theorem 4, show thatfor . (

Hint:use (32). Treat the contribution of those that are powers of a prime separately by crude bounds, as in the proof of Proposition 5. The remaining contributions can be dealt with using Proposition 5 and Theorem 4.)

** — 3. The number of divisors — **

We now consider the asymptotics of the divisor function

that counts the number of divisors of a given natural number . Informally: how many divisors does one expect a typical large number to have? Remarkably, the answer turns out to depend on exactly on what one means by “typical”.

The first basic observation to make is that is a multiplicative function: whenever are coprime, since every divisor of can be uniquely expressed as the product of a divisor of and a divisor of . The same argument has an important generalisation: if are multiplicative functions, then the Dirichlet convolution , defined by

is also multiplicative. The multiplicativity of the divisor function then follows from the identity

since the constant function is clearly multiplicative.

We begin with a crude bound:

Lemma 6 (Divisor bound)We have as . Equivalently, we have for any fixed .

*Proof:* Let be fixed. Observe that for any prime power , we have . In particular, we see that whenever is sufficiently large depending on , and is a natural number. For the remaining values of , we have . By the multiplicativity of , we thus have as required.

Exercise 13If are arithmetic functions obeying (3), show that the Dirichlet convolution also obeys (3). Then establish the fundamental relationshiprelating the Dirichlet series of for all . (Compare with analogous identities for the Fourier transform or Laplace transform of ordinary convolutions.) Use this and (31) to obtain an alternate proof of (27). We will make heavier use of (33) in later notes.

Exercise 14Obtain the sharper bound for all . (Hint:first establish that for any by performing the proof of Lemma 6 more carefully, then optimise in .) It was in fact established by Wigert that one in fact has as , and that the constant cannot be replaced by any smaller constant.

Now we consider the mean value of . From the case of Theorem 4 we have

and from Exercise 12 we have

which suggest that the mean value of for is comparable to . We can verify this by using the general identity

A similar argument gives the variant

We can use this to control the summatory function and logarithmic sum of the divisor function. For instance, by applying (34) with followed by (6), we have

and hence by (13)

(We will improve the control on the error term here later in this section.) Similarly, from (35) followed by (13) we have

and thus by (14) and a brief calculation

for some quadratic polynomial with leading term . Comparing these bounds with (9) and (8), we see this is indeed compatible with the heuristic that behaves like on the average.

Remark 7One can justify the heuristic by the following non-rigorous probabilistic argument: for a typical large number , each number would be expected to divide with probability about ; since , the “expected value” of should thus be about . We will continue this heuristic argument later in this set of notes.

Even though behaves like on the average, it can fluctuate to be much larger or much smaller than this value. For instance, equals when is prime, and when is odd, is twice as large as , even though and are roughly the same size. A further hint of the irregularity of distribution can be seen by considering the * moments* and of for natural numbers . The function is multiplicative with for every prime . From Theorem 4 and Exercise 12 we have the upper bounds

for all , suggesting that behaves like on average. This may seem at first glance to be incompatible with the heuristic that behaves like on the average, but what is happening here is that can occasionally be much larger than , which does not significantly affect the mean of , but does affect higher moments with . (Continuing Remark 7, the issue here is that the events “ divides ” can be highly correlated with each other as varies, due to common factors between different choices of .) In fact, the *typical* value (in the sense of median, rather than mean) of is not or , but is in fact . See Section 5 below.

We can be more precise on the mean behaviour of , by establishing the following variant of Theorem 4.

Theorem 7 (Mean values of divisor-type multiplicative functions)Let be a natural number, and let be a multiplicative function obeying the estimatesfor all primes and all . Let denote the

singular seriesNote from Taylor expansion that

so is finite (though it may vanish, if one of its factors vanishes), with .

- (i) If , then one has
for all .

- (ii) If , we have
for .

- (iii) If , one has
for .

Note that the case of this proposition gives (a slightly weaker form of) the above estimates for , since in that case . Comparing with (9), (14), (16) we see that behaves approximately like on the average for . The hypotheses on this theorem may be relaxed somewhat; see Exercise 16 below.

*Proof:* For brevity we omit the dependence of implicit constants on . We begin with (i). If , then from (19) and crude bounds we have , which suffices, so we will assume that is sufficiently close to . From (19) we have

By Taylor expansion, we may write

where

Since is convergent, we conclude that

and the claim follows from (21).

To prove (ii) we induct on . First we establish the base case of (ii). From (19) we have

so it suffices to show that

But from (19) we see that

(for instance), and the claim follows.

Now suppose that and the claim (ii) has already been proven for . We let be the multiplicative function with

for primes and , then one easily verifies that . Note that satisfies the same hypotheses as , but with replaced by . A brief calculation shows that

and thus . From (35) one has

and thus by induction hypothesis

From Lemma 1 we have

and

and the claim (ii) follows.

Finally, we prove (iii). Let ; if , we assume inductively that (iii) is established for . Let be as before. From (34) one has

and thus by (6)

From (ii) we have

The the error term can be treated by the induction hypothesis when , giving the claim. When , we instead use the Rankin trick and bound

and use (19) to bound as before, giving the claim.

Thus for instance we have

for any , where is the quantity

Among other things, this shows that can be larger than any fixed power of for arbitrarily large ; compare with Lemma 6.

A more accurate description of the distribution of is that for is asymptotically distributed in the limit like a gaussian random variable with mean and variance comparable to ; see Section 5 below.

Exercise 15Let be the arithmetic function defined by setting when is squarefree (that is, is not divisible by any perfect square other than ) and zero otherwise; the reason for the notation will be given later. Show thatand

for . Thus, the square-free numbers have natural and logarithmic density . It can be shown by a variety of methods that , although we will not use this fact here. The error term may also be improved significantly (as we shall see when we turn to sieve-theoretic methods).

Exercise 16The purpose of this exercise is to give an alternate derivation of parts (ii) and (iii) of Theorem 7 that does not rely so strongly on being an integer; it also allows to only be close to on average, rather than in the pointwise sense. The arguments here are from Appendix A.2 Friedlander-Iwaniec, which are in turn based on this paper of Wirsing.

- (i) Let be as in Theorem 7, and define . By using (32) with replaced by , establish the integral equation
for all , and deduce that

for all and some independent of . Then use part (i) of Theorem 7 to deduce that , thus establishing part (ii) of Theorem 7.

- (ii) Assume the following form of the prime number theorem: for all . By using (32) for , establish part (iii) of Theorem 7.
- (iii) Extend Theorem 7 to the case when is real rather than integer (replacing factorials by Gamma functions as appropriate), and (39) is replaced by for all . (For part (ii) of Theorem 7, one can weaken (39) further to .)

We now present a basic method to improve upon the estimate (36), namely the Dirichlet hyperbola method. The point here is that the error term in (6) is much stronger for large than for small , so one would like to rearrange the proof of (36) so that (6) is only used in the former case and not the latter. To do this, we decompose

where and , so that may be decomposed as

Actually, it is traditional to rearrange this a bit as the identity

This decomposition is convenient for improving (36) for two reasons. Firstly, is supported on and thus makes no contribution to . At the other extreme, is supported in and so the restriction may be removed here, simplifying the sum substantially:

For the final sum , we use (34) and (6) as before, to conclude that

By (13), we thus obtain an improved version of (36):

Remark 8The Dirichlet hyperbola method may be visualised geometrically as follows, in a fashion that explains the terminology “hyperbola method”. The sum can be interpreted as the number of lattice points (that is to say, elements of ) lying underneath the hyperbola . The proof of (36) basically proceeded to count these lattice points by summing up the contribution of each column separately; this was an efficient process for the columns close to the -axis, but led to relatively large error terms for columns far away from the -axis. Symmetrically, one could proceed by summing by rows, which is efficient for rows close to the -axis, but not far from that axis. The hyperbola method splits the difference between these two counting procedures, counting rows within of the -axis and columns within of the -axis, and then removing one copy of the square to correct for it being double-counted. The estimation of this lattice point problem can be made more precise still by more sophisticated decompositions and approximations of the hyperbola, but we will not discuss this problem (known as the Dirichlet divisor problem) here.From an algebraic perspective, it is the identity (41), decomposing into Dirichlet convolutions of expressions with good spatial support properties, that is the key to the successful application of the hyperbola method. In later posts we will encounter more sophisticated identities that decompose various arithmetic functions (such as the von Mangoldt function ) into similar convolutions of spatially localised expressions. Unfortunately these identities are not as easy to visualise geometrically as the hyperbola method identity, as the corresponding geometric picture often takes place in three or higher dimensions.

Exercise 17Define the third divisor function by , or equivalently . Show thatfor all and some polynomial with leading term . (Note that Theorem 7(iii) only gives ; one needs a three-dimensional version of the hyperbola method to get the better error term.

Hint:Decompose at the threshold . If one is having difficulty figuring out exactly what identity to use, try working backwards, by writing down all the relevant convolutions (e.g. ) that look tractable, and then do some elementary linear algebra to express in terms of all the expressions that you know how to estimate well.)

Exercise 18State and prove an extension of the previous exercise to the divisor function for , defined as the Dirichlet convolution of copies of .

** — 4. Mertens’ theorems — **

In the previous section, we used identities such as (34) and (35) to obtain control on statistics of Dirichlet convolutions in terms of statistics of the individual factors . This method is particularly well suited for functions such as the divisor function , which can be expressed conveniently in such Dirichlet convolution form.

When dealing with arithmetic functions related to the primes, it turns out that one often has to run this procedure in reverse, for instance trying to control statistics of given information on and . This is basically a deconvolution problem, which from a Fourier-analytic point of view corresponds to dividing one Fourier transform by another. This can become problematic when the latter Fourier transform vanishes; in our arithmetic context, this corresponds to zeroes of the Dirichlet series . As such, we will see the location of the zeroes of Dirichlet series such as the Riemann zeta function play an extremely important role in later posts. However, the elementary approach cannot easily access this information directly. As such, it is somewhat limited in the type of information it can recover about the primes (at least in the more basic formulations of the theory); however, one can still obtain some non-trivial control on the primes by purely elementary methods, as we shall shortly see.

Let us first see where this deconvolution problem is coming from. As discussed in Remark 6, it is convenient to study the primes through the von Mangoldt function. The identity (31) concerning this function can be rewritten as

where is the logarithm function. Thanks to the methods in Section 1, we understand the statistics of and relatively well. The “deconvolution problem” is then to somehow use this information to control the statistics of . We demonstrate this with the following improvement of Exercise 9, controlling logarithmic sums of the von Mangoldt function:

*Proof:* Applying (34) to (43) we have

If we apply (8) and (6), we conclude that

and so (44) follows from Proposition 5.

It is easy to convert control of to control on primes:

Corollary 9 (Alternate form of first Mertens’ theorem)We havefor all .

*Proof:* From the definition of the von Mangoldt function, one has

We crudely bound for . Since , we obtain

and the claim follows from (44).

Exercise 19 (Chebyshev’s theorem)Show that there exists an absolute constant such that there are primes in for all sufficiently large . Conclude in particular that .

From this we can obtain a sharper form of Theorem 3, controlling logarithmic sums of the primes:

Theorem 10 (Second Mertens’ theorem)One has

The constant has no particular significance in applications, but the constant can be usefully computed: see (49) below.

*Proof:* We just prove the first claim, as the second is similar (and can be deduced from the first by a modification of the argument used to prove Corollary 9). By Lemma 2, it suffices to show that

as both go to infinity. From the fundamental theorem of calculus we have

for all , and thus

From Corollary 9 one has

for all ; since

and

the claim follows.

This leads to a useful general formula for computing various slowly varying sums over primes:

- (i) For any fixed , show that
as .

- (ii) If is a fixed compactly supported, Riemann integrable function, show that
as .

- (iii) If is a fixed natural number and is a fixed compactly supported, Riemann integrable function, show that
as .

- (iv) Obtain analogues of (i)-(iii) when the sum over primes are replaced by the sum over integers , but with factors of replaced by (with the convention that this expression vanishes at ).

Remark 9An alternate way to phrase the above exercise is that the Radon measureson converge in the vague topology to the absolutely continuous measure in the limit , where denotes the Dirac probability measure at . Similarly for the Radon measures

To put this another way, the Radon measures or behave like Lebesgue measure on dyadic intervals such as for fixed and large . This is weaker than the prime number theorem that we will prove in later notes, which basically asserts the same statement but on the much smaller intervals . (Statements such as the Riemann hypothesis make the same assertion on even finer intervals, such as .) See this previous blog post for some examples of this Radon measure perspective, which we will not emphasise in this set of notes.

Exercise 21 (Smooth numbers)For any , let denote the set of natural numbers less than which are -smooth, in the sense that they have no prime factors larger than .

- (i) Show that for any fixed , one has
where the Dickman function is defined by the alternating series

(note that for any given that only finitely many of the summands are non-zero; one can also view the term as the term of the summation after carefully working out what zero-dimensional spaces and empty products evaluate to).

Hint:Use the inclusion-exclusion principle to remove the multiples of from for each prime . This is a simple example of a sieve, which we will study in more detail in later notes.- (ii) (Delay-differential equation) Show that is continuous on , continuously differentiable except at , equals for and obeys the equation
for . Give a heuristic justification for this equation by considering how varies with respect to small perturbations of .

- (iii) (Wirsing integral equation) Show that
for all , or equivalently that

Give a heuristic justification for this equation by starting with a -smooth number and considering a factor , where is a prime factor of chosen at random with probability (if occurs times in the prime factorisation of ).

- (iv) (Laplace transform) Show that
for all .

Now we incorporate the information on primes coming from Euler products that we established in Section 2. Recall from (29) that

for . We can compare this against (47) by a variant of the Rankin trick. Namely, if we apply (48) with for some large , one obtains

and thus on subtracting (47)

For any fixed , we see from Exercise 20 that

On the other hand, by using Exercise 20 and dyadic decomposition of we see that the expressions

and

can be made arbitrarily small by making sufficiently small, sufficiently large, and sufficiently large. Putting all this together, we conclude that

We can compute this limit explicitly:

Lemma 11 (Exponential integral asymptotics)For sufficiently small , one has

*Proof:* We start by using the identity to express the harmonic series as

or on summing the geometric series

Since , we thus have

making the change of variables , this becomes

As , converges pointwise to and is pointwise dominated by . Taking limits as using dominated convergence and (13), we conclude that

or equivalently

The claim then follows by bounding the portion of the integral on the left-hand side.

Exercise 22Show thatConclude that if is the Gamma function, defined for by the formula

then one has .

By arguing as in the proof of Corollary 9, we then have

or equivalently by Taylor expansion

Taking exponentials, we conclude

Theorem 12 (Third Mertens’ theorem)We haveas .

Because of this theorem, the factors and frequently arise in analytic prime number theory. We give two examples of this below.

Exercise 23Let be the Dickman function from Exercise 21. Show that .

Exercise 24For any natural number , define the Euler totient function of to be the number of natural numbers less than that are coprime to , or equivalently is the order of the multiplicative group of .

- (i) Show that
for all .

- (ii) Show the more refined lower bound
as . Show that cannot be replaced here by any larger quantity.

** — 5. The number of prime factors (optional) — **

Given a natural number , we use to denote the number of prime factors of (not counting multiplicity), and to denote the number of prime factors of (counting multiplicity). Thus we can write as *divisor sums*

We can ask what the mean values of and are. We start with the estimation of

We can rearrange this sum (cf. (34)) as

which by (6) is

so by Theorem 10 (and crudely bounding by , as we will not need additional accuracy here) gives

for (say). Thus we see that for , one expects to have about prime factors on the average.

Now we look at the second moment

If we expand this out directly, we get

which we can rearrange as

where is the least common multiple of , that is to say if and if . Here we run into a serious problem, which is that can be significantly less than , in which case the estimate

is horrendously inefficient (the error term is larger than the main term, which is always a bad sign). One could fix this by using the trivial estimate when . But there is another cheap trick one can use here, due to Turán, coming from the fact that a given natural number can have at most prime factors that are larger than, say, . Thus we can approximate the divisor sum (50) by a *truncated* divisor sum:

One can then run the previous argument with the truncated divisor sum and avoid the problem of dipping below . Indeed, on squaring we see that

and hence (by (51))

for . Rearranging and using (6) as before, we obtain

The contribution of the error may be crudely bounded by , which can easily be absorbed into the error term. The diagonal case also contributes thanks to Theorem 10. We conclude that

and thus by a final application of Theorem 10

We can combine this with (51) to give a variance bound:

To interpret this probabilistically, we see that if we pick uniformly at random, then the random quantity will have mean and standard deviation . In particular, by Chebyshev’s inequality, we expect to have prime factors most of the time.

Remark 10The strategy of truncating a divisor sum to obtain better control on error terms (perhaps at the expense of some inefficiency in the main terms) is one of the core techniques insieve theory, which we will discuss in later notes.

The same estimates are true for :

Exercise 25

- (i) Show that
for , and conclude that

- (ii) Show that
for all natural numbers .

From the above exercise and Chebyshev’s inequality, we now know the typical number of prime factors of a large number, a fact known as the Hardy-Ramanujan theorem:

Theorem 13 (Hardy-Ramanujan theorem)Let be an asymptotic parameter going to infinity, and let be any quantity depending on that goes to infinity as . Let be a natural number selected uniformly at random from . Then with probability , has distinct prime factors, and repeated prime factors (counting multiplicity, thus counts as repeated prime factors). In particular, has prime factors counting multiplicity.

This already has a cute consequence with regards to the multiplication table:

Proposition 14 (Multiplication table bound)At most of the natural numbers up to can be written in the form with natural numbers.

In other words, for large , the multiplication table only uses a small fraction of the numbers up to . This simple-sounding fact is surprisingly hard to prove if one does not use the simple argument provided below.

*Proof:* Pick natural numbers uniformly at random. By the Hardy-Ramanujan theorem, with probability , and will each have prime factors counting multiplicity. Hence with probability , will have prime factors counting multiplicity. But by a further application of the Hardy-Ramanujan theorem, the set of natural numbers up to with this property has cardinality . Thus all but of the products with are contained in a set of cardinality , and the claim follows.

Remark 11In fact, the cardinality of the multiplication table is known to be comparable to with ; see this paper of Ford.

Exercise 26 (Typical number of divisors)Let be an asymptotic parameter going to infinity. Show that one has for all but of the natural numbers less than . (Hint:first establish the bounds .)

In fact, one can describe the distribution of or more precisely, a fact known as the Erdös-Kac theorem:

Exercise 27 (Erdös-Kac theorem)(This exercise is intended for readers who are familiar with probability theory, and more specifically with the moment method proof of the central limit theorem, as discussed in this previous set of notes.) Let be an asymptotic parameter going to infinity, and let denote the truncated divisor sumDefine the quantity

thus by Mertens’ theorem

- (i) Show that for all .
- (ii) For any fixed natural number , show that
where the quantity is defined to equal zero when is odd, or

when is even.

- (iii) If is drawn uniformly at random from the natural numbers up to , show that the random variable
converges in distribution to the standard normal distribution in the limit . (You will need something like the

moment continuity theoremfrom Theorem 4 of these notes.)- (iv) Obtain the same claim as (iii), but with replaced by .

Informally, we thus have the heuristic formula

for , where is distributed approximately according to a standard normal distribution. As in Exercise 26, this leads to a related heuristic formula

for the number of divisors. This helps reconcile (though does not fully explain) the discrepancy between the typical (or median) value of , which is , and the mean (or higher moments) of , which is of the order of or , as it suggests that is in fact significantly larger than its median value of with a relatively high probability. (Unfortunately, though, the heuristic (52) is not very accurate at the very tail end of the distribution when is extremely large, and one cannot recover the correct exponent of the logarithm in (38), for instance, through a naive application of this heuristic.)

Remark 12Another rough heuristic to keep in mind is that a “typical” number less than would be expected to have divisors in any dyadic block between and , andprime divisorsin any hyper-dyadic block between and . For instance, for any fixed , one should have divisors in the interval , but only prime divisors. Typically, the only prime divisors that occur with multiplicity will be quite small (of size ). Note that such heuristics are compatible with the fact that has mean and that and both have mean . One can make these heuristics more precise by introducing the Poisson-Dirichlet process, as is done in this previous blog post, but we will not do so here. The study of the distribution of factors of “typical” large natural numbers is a topic sometimes referred to asanatomy of integers. Interestingly, there is a strong analogy between this problem and the problem of studying the distribution of cycles of “typical” large permutations; see for instance this article of Granville for further discussion.

Exercise 28Let be a natural number. Show that for sufficiently large , the number of natural numbers up to that are the products of exactly primes is .

** — 6. Mobius inversion and the Selberg symmetry formula — **

In Section 4, we used the identity , together with elementary estimates on and , to deduce various estimates on the von Mangoldt function . Another way to extract information about from this identity is to “deconvolve” or “invert” the operation of convolution to . This can be achieved by the basic tool of Möbius inversion, which we now discuss. We first observe that the Kronecker delta function , defined by , is an identity for Dirichlet convolution, thus

for any arithmetic function . Since Dirichlet convolution is associative and commutative, this implies that if we can find an arithmetic function with the property that

then any formula of the form may be inverted to the equivalent form , a fact known as the *Möbius inversion formula*. It is then a routine matter to locate such a function :

Exercise 29Define the Möbius function by setting when is the product of distinct primes for some , and otherwise. Show that is the unique arithmetic function that obeys (53).

Observe that is a multiplicative function that obeys the trivial bound

for all . Furthermore, is precisely when is square-free, and zero otherwise, so the notation here is consistent with that in Exercise 15.

One can express the Möbius inversion formula as the assertion that

for any compactly supported arithmetic function . This already reveals that must exhibit some cancellation beyond the trivial bound (54):

Lemma 15 (Weak cancellation for )For any non-negative integer , we have

*Proof:* We may of course take .

First suppose that . We apply (55) with to conclude that

Since , we conclude from (54) that

and the case of (56) follows.

Now suppose inductively that , and that the claim has already been proven for smaller values of . We apply (55) with to conclude that

By (15) we have

for some polynomial with leading term . Inserting this asymptotic and using the induction hypothesis to handle all the lower order terms of , and (54) to handle the error term, we conclude that

for some polynomial of degree at most . By (10) the error term is , and the claim follows.

Exercise 30Sharpen the case of the above lemma to , for any .

From Möbius inversion we can write in terms of :

Since , we have the general Leibniz-type identity

Since , we can obtain the alternative representation

by multiplying (53) by then applying (59), (58).

Using these identities and Lemma 15, we can recover many of the estimates in Section 4:

Exercise 31 (Alternate proof of Chebyshev and Mertens bounds)Use (58) and Lemma 15 to reprove the estimatesand

If one has a little bit more cancellation in the Möbius function, one can do better:

Theorem 16 (Equivalent forms of the prime number theorem)The following three statements are logically equivalent:

- (i) We have as .
- (ii) We have as .
- (iii) We have as .

In later notes we will prove that the claims (i), (ii), (iii) are indeed true; this is the famous prime number theorem. This result also illustrates a general principle, that one route to distribution estimates of the primes is via distribution estimates on the Möbius function, which can sometimes be a simpler object to study (for instance, the Möbius function is bounded in magnitude by , whereas the von Mangoldt function can grow logarithmically).

*Proof:* We use some arguments of Diamond. We first show that (i) implies (ii). From (60) and Möbius inversion we have

By (34) we thus have

For any fixed , we may use (i) to write as when is sufficiently large, and otherwise. From this, (54), and the case of (56) we thus have

when is large enough. By (10), (54) we have

summing and then dividing by , we obtain (ii) since is arbitrary.

Now we show that (ii) implies (iii). We start with the identity (57), which we write as

Let be a small fixed quantity. For , decreases through a fixed set of values, and from (ii) we conclude that

Meanwhile, since , we see from (54) that

Combining all three inequalities and dividing by , we conclude that

replacing by , then sending to zero, we obtain (iii).

From Exercise 3 we see that (iii) implies (ii). Thus, to conclude the theorem, it suffices to show that (ii) and (iii) jointly imply (i). From (58), (34) we have

Meanwhile, since , we have from (34) that

Since , it will thus suffice to show that

Let be a small fixed quantity. Arguing as in the implication of (iii) from (ii), we see from (ii) that

Next, we see from (8), (42) that

From Lemma 1 we have

and so from (iii) and (54) we conclude that

Summing this with (62) and then sending to zero, we obtain the claim.

Exercise 32 (Further reformulations of the prime number theorem)Show that the statements (i)-(iii) in the above theorem are also equivalent to the following statements:

- (iv) The number of primes less than or equal to is as .
- (v) The prime number is equal to as .

Unfortunately it is not so easy to actually obtain the required cancellation of the Möbius function , and to obtain the desired asymptotics for . However, one can do better if one works with the *higher-order von Mangoldt functions* , defined by setting

for all . Thus is the usual von Mangoldt function, and from (59), (61) we easily obtain the recursive identity

for . Among other things, this implies by induction that the are non-negative, and are supported on those natural numbers that have at most distinct prime factors. We have the following asymptotic for the summatory functions of :

Proposition 17 (Summatory function of )For any , we havefor all , and for some polynomial of leading term .

For , the error term here is comparable to the main term, and we obtain no improvement over the Chebyshev bound (Proposition 5). However, the estimates here become more useful for . For an explicit formula for the polynomials , together with sharper bounds on the error term, see this paper of Balazard.

*Proof:* From (63), (34) we have

By (9) we have

for some polynomial with leading term . The claim then follows from (56), using (54), (10) to control the error term.

The case of this proposition is known as the *Selberg symmetry formula*:

Among other things, this gives an upper bound that comes within a factor of two of the prime number theorem:

Corollary 18 (Cheap Brun-Titchmarsh theorem)For any , one has

Using the methods of sieve theory, we will obtain a stronger inequality, known as the Brun-Titchmarsh inequality, in later notes. This loss of a factor of two reflects a basic problem in analytic prime number theory known as the parity problem: estimates which involve only primes (or more generally, numbers whose number of prime factors has a fixed parity) often lose a factor of two in their upper bounds, and are trivial with regards to lower bounds, unless some non-trivial input about prime numbers is somehow injected into the argument. We will discuss the parity problem in more detail in later notes.

*Proof:* From the Selberg symmetry formula we have

(since ). From (64) we have the pointwise bound , thus

By Proposition 5 and dyadic decomposition we have

and the claim follows.

With some additional argument of a “Fourier-analytic” flavour (or using arguments closely related to Fourier analysis, such as Tauberian theorems or Gelfand’s theory of Banach algebras), one can use the Selberg symmetry formula to derive the prime number theorem; see for instance these previous blog posts for examples of this. However, in this course we will focus instead on the more traditional complex-analytic proof of the prime number theorem, which highlights an important connection between the distribution of the primes and the zeroes of the Riemann zeta function.

Exercise 33 (Cheap Brun-Titchmarsh, again)Show that for any , the number of primes between and is at most .

Exercise 34 (Golomb identity)Let be coprime natural numbers. Show that

Exercise 35 (Diamond-Steinig identity)Let . Show that can be expressed as a linear combination of convolutions of the form , where appears times and are non-negative integers with and . Identities of this form are due to Diamond and Steinig.

** — 7. Dirichlet characters — **

Now we consider the following vaguely worded question: how many primes are there in a given congruence class ? For instance, how many primes are there whose last digit is (i.e. lie in )?

If the congruence class is not primitive, that is to say that and share a common factor, then clearly the answer is either zero or one, with the latter occurring if the greatest common divisor of and is a prime which is congruent to modulo . So the interesting case is when is primitive, that is to say that it lies in the multiplicative group of primitive congruence classes.

In this case, we have the fundamental theorem of Dirichlet:

Theorem 19 (Dirichlet’s theorem, Euclid form)Every primitive congruence class contains infinitely many primes.

For a small number of primitive congruence classes, such as or , it is possible to prove Dirichlet’s theorem by mimicking one of the elementary proofs of Euclid’s theorem, but we do not know of a general way to do so; see this paper of Keith Conrad for some further discussion. For instance, there is no proof known that there are infinitely many primes that end in that does not basically go through most of the machinery of Dirichlet’s proof (in particular introducing the notion of a Dirichlet character). Indeed, it looks like the problem of finding a new proof of Dirichlet’s theorem is an excellent test case for any proposed alternative approach to studying the primes that does not go through the standard approach of analytic number theory (cf. Remark 2 from the announcement for this course).

In fact, Dirichlet’s arguments prove the following stronger statement, generalising Euler’s theorem (Theorem 2 from this set of notes):

Theorem 20 (Dirichlet’s theorem, Euler form)Let be a primitive residue class. Then the sum is divergent.

There is a more quantitative form, analogous to Mertens’ theorem:

Theorem 21 (Dirichlet’s theorem, Mertens form)Let be a primitive residue class. Then one hasfor any , where the Euler totient function is defined as the order of the multiplicative group .

Exercise 36Let be a primitive residue class. Use Theorem 21 to show thatthus giving Theorem 20. (

Hint:adapt the proof of Theorem 10.)

If one tries to adapt one of the above proofs of Mertens’ theorem (or Euler’s theorem) to this setting, one soon runs into the problem that the function is not multiplicative: . To resolve this issue, Dirichlet used some Fourier analysis to express in terms of completely multiplicative functions, known as Dirichlet characters.

We first quickly recall the Fourier analysis of finite abelian groups:

Theorem 22 (Fourier transform for finite abelian groups)Let be a finite abelian group (which can be written additively or multiplicatively). Define acharacteron to be a homomorphism to the unit circle of the complex numbers, and let be the set of characters. Then . Furthermore, given any function , one has a Fourier decompositionfor all , where the Fourier coefficients are given by the formula

*Proof:* Let be the -dimensional complex Hilbert space of functions with inner product . Clearly any character is a unit vector in this space. Furthermore, for any two characters , we may shift the variable by any shift and conclude that

for any ; in particular, we see that if , then . Thus is an orthonormal system in . To complete the proof of the theorem, it thus suffices to show that this orthonormal system is complete, that is to say that the characters span .

Each shift generates a unitary shift operator , defined by setting (if the group is written multiplicatively). These operators all commute with each other, so by the spectral theorem they may all be simultaneously diagonalised by an orthonormal basis of joint eigenvectors. It is easy to see that these eigenvectors are characters (up to scaling), and so the characters span as required.

See this previous post for a more detailed discussion of the Fourier transform on both finite and infinite abelian groups. We remark that an alternate way to establish that the characters of span is to use the classification of finite abelian groups to express as the product of cyclic groups, at which point one can write down the characters explicitly.

Define a Dirichlet character of modulus to be a function of the form

where is a character of . Thus, for instance, we have the *principal character*

of modulus . Another important example of a Dirichlet character is the quadratic character to a prime modulus , defined by setting to be when is a non-zero quadratic residue modulo , if is a quadratic residue modulo , and zero if is divisible by . (There are quadratic characters to composite moduli as well, but one needs to define them using the Kronecker symbol.) One can also easily verify that the product of two Dirichlet characters is again a Dirichlet character (even if the characters were initially of different modulus).

Dirichlet characters of modulus are completely multiplicative (thus for all , not necessarily coprime) and periodic of period (thus for all ). From Theorem 3 we see that there are exactly Dirichlet characters of modulus , and from (66) one has the Fourier inversion formula

From Mertens’ theorem we have

since the contribution of those for which is not coprime to is easily seen to be ( must be a power of a prime dividing in these cases). Thus, Theorem 21 follows from

Theorem 23 (Dirichlet’s theorem, character form)Let be a non-principal Dirichlet character of modulus . Thenfor any .

To prove this theorem, we use the “deconvolution” strategy. Observe that for any completely multiplicative function , one has

for any arithmetic functions . In particular, from (43) one has

Theorem 23 is seeking control on the logarithmic sums of , so it is natural to first control the logarithmic sums of and . To do this we use a somewhat crude lemma (cf. Lemma 1):

Lemma 24Let be a non-principal character of modulus , and let be an arithmetic function that is monotone on an interval . Then

*Proof:* Without loss of generality we may assume that is monotone non-increasing. By rounding up and down to the nearest multiple of , we may assume that are multiples of , then the left-hand side may be split into the sum of expressions of the form . As is non-principal, it is orthogonal to the principal character, and in particular . Thus we may write as , which by the trivial bound and monotonicity may be bounded by . The claim then follows from telescoping series.

From this lemma we obtain the crude upper bounds

for any . (Strictly speaking, the function is not monotone decreasing for , but clearly we may just delete this portion of the sum from (68) without significantly affecting the estimate.) By Lemma 2 we thus have

for all and some complex number .

Exercise 37 (Continuity of -function at )Let be a non-principal character. For any , define the Dirichlet -function by the formulaShow that for any . In particular, the Dirichlet -function extends continuously to . (In later notes we will extend this function to a much larger domain.)

We will shortly prove the following fundamental fact:

Theorem 25 (Non-vanishing)One has for any non-principal character .

Let us assume this theorem for now and conclude the proof of Theorem 23. Starting with the identity (67) and using (35), we see that

Inserting (68), (70) and using the trivial bound to control error terms, we conclude that

and Theorem 23 follows by dividing by and using Proposition 5.

Remark 13It is important to observe that this argument is potentiallyineffective: the implied constant in Theorem 23 will depend on what upper bound one can obtain for the quantity . Theorem 25 ensures that this quantity is finite, but does not directly supply a bound for it, and so we cannot explicitly (oreffectively) describe what the implied constant is as a computable function of , at least if one only uses Theorem 25 as a “black box”. It is thus of interest to strengthen Theorem 25 by obtaining effective lower bounds on for various characters . This can be done in some cases (particularly if is not real-valued), but to get a good effective bound for all characters is a surprisingly difficult problem, essentially the Siegel zero problem; we will return to this issue in later notes.

Exercise 38Show thatfor any and any non-principal character . (You will not need to know the non-vanishing of to establish this.) Conclude that

for any and any primitive residue class , where is the polynomial in Proposition 17. Deduce in particular the cheap Brun-Titchmarsh bound

for any and primitive residue class .

It remains to prove the non-vanishing of . Here we encounter a curious *repulsion* phenomenon (a special case of the Deuring-Heilbronn repulsion phenonemon): the vanishing of for one character prevents (or “repels”) the vanishing of for another character . More precisely, we have

Proposition 26Let . Then there is at most one non-principal Dirichlet character of modulus for which .

*Proof:* Let denote all the Dirichlet characters of modulus , including the principal character . The idea is to exploit a certain positivity when all the characters are combined together, which will be incompatible with two or more of the vanishing.

There are a number of ways to see the positivity, but we will start with the Euler product identity

from (22). We can “twist” this identity by replacing by for any Dirichlet character , which by the complete multiplicativity of gives

for any , where we allow for logarithms to be ambiguous up to multiples of . By Taylor expansion, we thus have

for (cf. (28)). Summing this for , we have

From (66) we see that . In particular, the left-hand side is non-negative. Exponentiating, we conclude the lower bound

Now we let . For non-principal characters , we see from Exercise 37 that stays bounded as , and decays like if vanishes. For the principal character , we will just use the crude upper bound . By (11), we conclude that if two or more are vanishing, then the product will go to zero as , contradicting (71), and the claim follows.

Call a Dirichlet character *real* if it only takes real values, and *complex* if it is not real. For instance, the character of modulus that takes the values on , on , on , on , and on is a complex character. The above theorem, together with conjugation symmetry, quickly disposes of the complex characters , as such characters can be “repelled” by their complex conjugates:

Corollary 27Let be a complex character. Then .

*Proof:* If is a complex character of some modulus , then its complex conjugate is a different complex character with the same modulus , and . If vanishes, we therefore have at least two non-principal characters of modulus whose -function vanishes at , contradicting Theorem 25.

This only leaves the case of real non-principal characters to deal with. These characters are also known as *quadratic characters*, as is the principal character; they are also connected to quadratic number fields, as we will discuss in a subsequent post. In this case, we cannot exploit the repulsion phenomenon, as we now only have one character for which vanishes. On the other hand, for quadratic characters we have a much simpler positivity property, namely that

for all natural numbers . Actually, it is convenient to use a variant of this positivity property, namely that

which can be proven first by working in the case that is a power of a prime and using (72), and then using multiplicativity to handle the general case. Crucially, we can do a little better than this: we can improve (73) to

whenever is a perfect square. Again, this can be verified by first working in the case when is an even prime power.

It is now natural to consider sums such as to exploit this positivity. It turns out that the best choice of to use here is , that is to say to control the sum

On the one hand, from positivity on the squares (74), we can bound this sum by

for (say), thanks to (13). On the other hand, we can expand (75) in the spirit of (34) or (35) as

From (12) one has

and so (using the trivial bound to control the error term) the previous expression can be rewritten as

From Lemma 24 we have . Inserting (70), we conclude that

If vanishes, then this leads to a contradiction if is large enough. This concludes the proof of Theorem 25, and hence Dirichlet’s theorem.

Remark 14The inequality (76) in fact shows that is positive for every real character . In fact, with the assistance of some algebraic number theory, one can show the class number formula which asserts (roughly speaking) that is proportional to the class number of a certain quadratic number field. This will be discussed in a subsequent post.

Exercise 39By using an effective version of the above arguments, establish the lower boundfor all non-principal characters of conductor (both real and complex).

Remark 15The bound (77) is very poor and can be improved. For instance, the class number formula alluded to in the previous remark gives the effective bound for real non-principal characters. In later notes we will also establishSiegel’s theorem, which gives anineffectivebound of the form for such characters, and any .

Exercise 40Let be a non-principal character. Show that the sum is conditionally convergent. Then show that the product is conditionally convergent to .

Filed under: 254A - analytic prime number theory, math.NT Tagged: Dirichlet characters, Dirichlet's theorem, divisor function, Mertens' theorems, multiplicative functions, Rankin trick, von Mangoldt function ]]>

- For the twin prime conjecture, one can use the linear forms , , and the property in question is the assertion that and are both prime.
- For the even Goldbach conjecture, the claim is similar but one uses the linear forms , for some even integer .
- For Chen’s theorem, we use the same linear forms as in the previous two cases, but now is the assertion that is prime and is an almost prime (in the sense that there are at most two prime factors).
- In the recent results establishing bounded gaps between primes, we use the linear forms for some admissible tuple , and take to be the assertion that at least two of are prime.

For these sorts of results, one can try a sieve-theoretic approach, which can broadly be formulated as follows:

- First, one chooses a carefully selected
*sieve weight*, which could for instance be a non-negative function having a divisor sum formfor some coefficients , where is a natural scale parameter. The precise choice of sieve weight is often quite a delicate matter, but will not be discussed here. (In some cases, one may work with multiple sieve weights .)

- Next, one uses tools from analytic number theory (such as the Bombieri-Vinogradov theorem) to obtain upper and lower bounds for sums such as
where is some “arithmetic” function involving the prime factorisation of (we will be a bit vague about what this means precisely, but a typical choice of might be a Dirichlet convolution of two other arithmetic functions ).

- Using some combinatorial arguments, one manipulates these upper and lower bounds, together with the non-negative nature of , to conclude the existence of an in the support of (or of at least one of the sieve weights being considered) for which holds

For instance, in the recent results on bounded gaps between primes, one selects a sieve weight for which one has upper bounds on

and lower bounds on

so that one can show that the expression

is strictly positive, which implies the existence of an in the support of such that at least two of are prime. As another example, to prove Chen’s theorem to find such that is prime and is almost prime, one uses a variety of sieve weights to produce a lower bound for

and an upper bound for

and

where is some parameter between and , and “rough” means that all prime factors are at least . One can observe that if , then there must be at least one for which is prime and is almost prime, since for any rough number , the quantity

is only positive when is an almost prime (if has three or more factors, then either it has at least two factors less than , or it is of the form for some ). The upper and lower bounds on are ultimately produced via asymptotics for expressions of the form (1), (2), (3) for various divisor sums and various arithmetic functions .

Unfortunately, there is an obstruction to sieve-theoretic techniques working for certain types of properties , which Zeb Brady and I recently formalised at an AIM workshop this week. To state the result, we recall the Liouville function , defined by setting whenever is the product of exactly primes (counting multiplicity). Define a *sign pattern* to be an element of the discrete cube . Given a property of natural numbers , we say that a sign pattern is *forbidden* by if there does not exist any natural numbers obeying for which

Example 1Let be the property that at least two of are prime. Then the sign patterns , , , are forbidden, because prime numbers have a Liouville function of , so that can only occur when at least two of are equal to .

Example 2Let be the property that is prime and is almost prime. Then the only forbidden sign patterns are and .

Example 3Let be the property that and are both prime. Then are all forbidden sign patterns.

We then have a parity obstruction as soon as has “too many” forbidden sign patterns, in the following (slightly informal) sense:

Claim 1 (Parity obstruction)Suppose is such that that the convex hull of the forbidden sign patterns of contains the origin. Then one cannot use the above sieve-theoretic approach to establish the existence of an such that holds.

Thus for instance, the property in Example 3 is subject to the parity obstruction since is a convex combination of and , whereas the properties in Examples 1, 2 are not. One can also check that the property “at least of the numbers is prime” is subject to the parity obstruction as soon as . Thus, the largest number of elements of a -tuple that one can force to be prime by purely sieve-theoretic methods is , rounded up.

This claim is not precisely a theorem, because it presumes a certain “Liouville pseudorandomness conjecture” (a very close cousin of the more well known “Möbius pseudorandomness conjecture”) which is a bit difficult to formalise precisely. However, this conjecture is widely believed by analytic number theorists, see e.g. this blog post for a discussion. (Note though that there are scenarios, most notably the “Siegel zero” scenario, in which there is a severe breakdown of this pseudorandomness conjecture, and the parity obstruction then disappears. A typical instance of this is Heath-Brown’s proof of the twin prime conjecture (which would ordinarily be subject to the parity obstruction) under the hypothesis of a Siegel zero.) The obstruction also does not prevent the establishment of an such that holds by introducing additional sieve axioms beyond upper and lower bounds on quantities such as (1), (2), (3). The proof of the Friedlander-Iwaniec theorem is a good example of this latter scenario.

Now we give a (slightly nonrigorous) proof of the claim.

*Proof:* (Nonrigorous) Suppose that the convex hull of the forbidden sign patterns contain the origin. Then we can find non-negative numbers for sign patterns , which sum to , are non-zero only for forbidden sign patterns, and which have mean zero in the sense that

for all . By Fourier expansion (or Lagrange interpolation), one can then write as a polynomial

where is a polynomial in variables that is a linear combination of monomials with and (thus has no constant or linear terms, and no monomials with repeated terms). The point is that the mean zero condition allows one to eliminate the linear terms. If we now consider the weight function

then is non-negative, is supported solely on for which is a forbidden pattern, and is equal to plus a linear combination of monomials with .

The Liouville pseudorandomness principle then predicts that sums of the form

and

or more generally

should be asymptotically negligible; intuitively, the point here is that the prime factorisation of should not influence the Liouville function of , even on the short arithmetic progressions that the divisor sum is built out of, and so any monomial occurring in should exhibit strong cancellation for any of the above sums. If one accepts this principle, then all the expressions (1), (2), (3) should be essentially unchanged when is replaced by .

Suppose now for sake of contradiction that one could use sieve-theoretic methods to locate an in the support of some sieve weight obeying . Then, by reweighting all sieve weights by the additional multiplicative factor of , the same arguments should also be able to locate in the support of for which holds. But is only supported on those whose Liouville sign pattern is forbidden, a contradiction.

Claim 1 is sharp in the following sense: if the convex hull of the forbidden sign patterns of do *not* contain the origin, then by the Hahn-Banach theorem (in the hyperplane separation form), there exist real coefficients such that

for all forbidden sign patterns and some . On the other hand, from Liouville pseudorandomness one expects that

is negligible (as compared against for any reasonable sieve weight . We conclude that for some in the support of , that

and hence is not a forbidden sign pattern. This does not actually imply that holds, but it does not prevent from holding purely from parity considerations. Thus, we do not expect a parity obstruction of the type in Claim 1 to hold when the convex hull of forbidden sign patterns does not contain the origin.

Example 4Let be a graph on vertices , and let be the property that one can find an edge of with both prime. We claim that this property is subject to the parity problem precisely when is two-colourable. Indeed, if is two-colourable, then we can colour into two colours (say, red and green) such that all edges in connect a red vertex to a green vertex. If we then consider the two sign patterns in which all the red vertices have one sign and the green vertices have the opposite sign, these are two forbidden sign patterns which contain the origin in the convex hull, and so the parity problem applies. Conversely, suppose that is not two-colourable, then it contains an odd cycle. Any forbidden sign pattern then must contain more s on this odd cycle than s (since otherwise two of the s are adjacent on this cycle by the pigeonhole principle, and this is not forbidden), and so by convexity any tuple in the convex hull of this sign pattern has a positive sum on this odd cycle. Hence the origin is not in the convex hull, and the parity obstruction does not apply. (See also this previous post for a similar obstruction ultimately coming from two-colourability).

Example 5An example of a parity-obstructed property (supplied by Zeb Brady) that does not come from two-colourability: we let be the property that are prime for some collection of pair sets that cover . For instance, this property holds if are both prime, or if are all prime, but not if are the only primes. An example of a forbidden sign pattern is the pattern where are given the sign , and the other three pairs are given . Averaging over permutations of we see that zero lies in the convex hull, and so this example is blocked by parity. However, there is no sign pattern such that it and its negation are both forbidden, which is another formulation of two-colourability.

Of course, the absence of a parity obstruction does not automatically mean that the desired claim is true. For instance, given an admissible -tuple , parity obstructions do not prevent one from establishing the existence of infinitely many such that at least three of are prime, however we are not yet able to actually establish this, even assuming strong sieve-theoretic hypotheses such as the generalised Elliott-Halberstam hypothesis. (However, the argument giving (4) does easily give the far weaker claim that there exist infinitely many such that at least three of have a Liouville function of .)

Remark 1Another way to get past the parity problem in some cases is to take advantage of linear forms that are constant multiples of each other (which correlates the Liouville functions to each other). For instance, on GEH we can find two numbers (products of exactly three primes) that differ by exactly ; a direct sieve approach using the linear forms fails due to the parity obstruction, but instead one can first find such that two of are prime, and then among the pairs of linear forms , , one can find a pair of numbers that differ by exactly . See this paper of Goldston, Graham, Pintz, and Yildirim for more examples of this type.

I thank John Friedlander and Sid Graham for helpful discussions and encouragement.

Filed under: expository, math.NT Tagged: parity problem, sieve theory, Zeb Brady ]]>

The type of results about primes that one aspires to prove here is well captured by Landau’s classical list of problems:

- Even Goldbach conjecture: every even number greater than two is expressible as the sum of two primes.
- Twin prime conjecture: there are infinitely many pairs which are simultaneously prime.
- Legendre’s conjecture: for every natural number , there is a prime between and .
- There are infinitely many primes of the form .

All four of Landau’s problems remain open, but we have convincing heuristic evidence that they are all true, and in each of the four cases we have some highly non-trivial partial results, some of which will be covered in this course. We also now have some understanding of the barriers we are facing to fully resolving each of these problems, such as the parity problem; this will also be discussed in the course.

One of the main reasons that the prime numbers are so difficult to deal with rigorously is that they have very little usable algebraic or geometric structure that we know how to exploit; for instance, we do not have any useful prime generating functions. One of course can create *non-useful* functions of this form, such as the ordered parameterisation that maps each natural number to the prime , or invoke Matiyasevich’s theorem to produce a polynomial of many variables whose only positive values are prime, but these sorts of functions have no usable structure to exploit (for instance, they give no insight into any of the Landau problems listed above; see also Remark 2 below). The various primality tests in the literature, while useful for practical applications (e.g. cryptography) involving primes, have also proven to be of little utility for these sorts of problems; again, see Remark 2. In fact, in order to make plausible heuristic predictions about the primes, it is best to take almost the opposite point of view to the structured viewpoint, using as a starting point the belief that the primes exhibit strong pseudorandomness properties that are largely incompatible with the presence of rigid algebraic or geometric structure. We will discuss such heuristics later in this course.

It may be in the future that some usable structure to the primes (or related objects) will eventually be located (this is for instance one of the motivations in developing a rigorous theory of the “field with one element“, although this theory is far from being fully realised at present). For now, though, analytic and combinatorial methods have proven to be the most effective way forward, as they can often be used even in the near-complete absence of structure.

In this course, we will not discuss combinatorial approaches (such as the deployment of tools from additive combinatorics) in depth, but instead focus on the analytic methods. The basic principles of this approach can be summarised as follows:

- Rather than try to isolate individual primes in , one works with the set of primes in
*aggregate*, focusing in particular on*asymptotic statistics*of this set. For instance, rather than try to find a single pair of twin primes, one can focus instead on the*count*of twin primes up to some threshold . Similarly, one can focus on counts such as , , or , which are the natural counts associated to the other three Landau problems. In all four of Landau’s problems, the basic task is now to obtain a non-trivial lower bounds on these counts. - If one wishes to proceed analytically rather than combinatorially, one should convert all these counts into sums, using the fundamental identity
(or variants thereof) for the cardinality of subsets of the natural numbers , where is the indicator function of (and ranges over ). Thus we are now interested in estimating (and particularly in lower bounding) sums such as

or

- Once one expresses number-theoretic problems in this fashion, we are naturally led to the more general question of how to accurately estimate (or, less ambitiously, to lower bound or upper bound) sums such as
or more generally bilinear or multilinear sums such as

or

for various functions of arithmetic interest. (Importantly, one should also generalise to include integrals as well as sums, particularly contour integrals or integrals over the unit circle or real line, but we postpone discussion of these generalisations to later in the course.) Indeed, a huge portion of modern analytic number theory is devoted to precisely this sort of question. In many cases, we can predict an

*expected main term*for such sums, and then the task is to control the*error term*between the true sum and its expected main term. It is often convenient to normalise the expected main term to be zero or negligible (e.g. by subtracting a suitable constant from ), so that one is now trying to show that a sum of signed real numbers (or perhaps complex numbers) is small. In other words, the question becomes one of rigorously establishing a significant amount of*cancellation*in one’s sums (also referred to as a*gain*or*savings*over a benchmark “trivial bound”). Or to phrase it negatively, the task is to rigorously prevent a*conspiracy*of non-cancellation, caused for instance by two factors in the summand exhibiting an unexpectedly large correlation with each other. - It is often difficult to discern cancellation (or to prevent conspiracy) directly for a given sum (such as ) of interest. However, analytic number theory has developed a large number of techniques to relate one sum to another, and then the strategy is to keep transforming the sum into more and more analytically tractable expressions, until one arrives at a sum for which cancellation can be directly exhibited. (Note though that there is often a short-term tradeoff between analytic tractability and algebraic simplicity; in a typical analytic number theory argument, the sums will get expanded and decomposed into many quite messy-looking sub-sums, until at some point one applies some crude estimation to replace these messy sub-sums by tractable ones again.) There are many transformations available, ranging such basic tools as the triangle inequality, pointwise domination, or the Cauchy-Schwarz inequality to key identities such as multiplicative number theory identities (such as the Vaughan identity and the Heath-Brown identity), Fourier-analytic identities (e.g. Fourier inversion, Poisson summation, or more advanced trace formulae), or complex analytic identities (e.g. the residue theorem, Perron’s formula, or Jensen’s formula). The sheer range of transformations available can be intimidating at first; there is no shortage of transformations and identities in this subject, and if one applies them randomly then one will typically just transform a difficult sum into an even more difficult and intractable expression. However, one can make progress if one is guided by the strategy of isolating and enhancing a desired cancellation (or conspiracy) to the point where it can be easily established (or dispelled), or alternatively to reach the point where no deep cancellation is needed for the application at hand (or equivalently, that no deep conspiracy can disrupt the application).
- One particularly powerful technique (albeit one which, ironically, can be highly “ineffective” in a certain technical sense to be discussed later) is to use one potential conspiracy to defeat another, a technique I refer to as the “dueling conspiracies” method. This technique may be unable to prevent a single strong conspiracy, but it can sometimes be used to prevent
*two or more*such conspiracies from occurring, which is particularly useful if conspiracies come in pairs (e.g. through complex conjugation symmetry, or a functional equation). A related (but more “effective”) strategy is to try to “disperse” a single conspiracy into several distinct conspiracies, which can then be used to defeat each other.

As stated before, the above strategy has not been able to establish any of the four Landau problems as stated. However, they can come close to such problems (and we now have some understanding as to why these problems remain out of reach of current methods). For instance, by using these techniques (and a lot of additional effort) one can obtain the following sample partial results in the Landau problems:

- Chen’s theorem: every sufficiently large even number is expressible as the sum of a prime and an almost prime (the product of at most two primes). The proof proceeds by finding a nontrivial lower bound on , where is the set of almost primes.
- Zhang’s theorem: There exist infinitely many pairs of consecutive primes with . The proof proceeds by giving a non-negative lower bound on the quantity for large and certain distinct integers between and . (The bound has since been lowered to .)
- The Baker-Harman-Pintz theorem: for sufficiently large , there is a prime between and . Proven by finding a nontrivial lower bound on .
- The Friedlander-Iwaniec theorem: There are infinitely many primes of the form . Proven by finding a nontrivial lower bound on .

We will discuss (simpler versions of) several of these results in this course.

Of course, for the above general strategy to have any chance of succeeding, one must at some point use *some* information about the set of primes. As stated previously, usefully structured parametric descriptions of do not appear to be available. However, we do have two other fundamental and useful ways to describe :

- (Sieve theory description) The primes consist of those numbers greater than one, that are not divisible by any smaller prime.
- (Multiplicative number theory description) The primes are the multiplicative generators of the natural numbers : every natural number is uniquely factorisable (up to permutation) into the product of primes (the fundamental theorem of arithmetic).

The sieve-theoretic description and its variants lead one to a good understanding of the *almost primes*, which turn out to be excellent tools for controlling the primes themselves, although there are known limitations as to how much information on the primes one can extract from sieve-theoretic methods alone, which we will discuss later in this course. The multiplicative number theory methods lead one (after some complex or Fourier analysis) to the Riemann zeta function (and other L-functions, particularly the Dirichlet L-functions), with the distribution of zeroes (and poles) of these functions playing a particularly decisive role in the multiplicative methods.

Many of our strongest results in analytic prime number theory are ultimately obtained by incorporating some combination of the above two fundamental descriptions of (or variants thereof) into the general strategy described above. In contrast, more advanced descriptions of , such as those coming from the various primality tests available, have (until now, at least) been surprisingly ineffective in practice for attacking problems such as Landau’s problems. One reason for this is that such tests generally involve operations such as exponentiation or the factorial function , which grow too quickly to be amenable to the analytic techniques discussed above.

To give a simple illustration of these two basic approaches to the primes, let us first give two variants of the usual proof of Euclid’s theorem:

Theorem 1 (Euclid’s theorem)There are infinitely many primes.

*Proof:* (Multiplicative number theory proof) Suppose for contradiction that there were only finitely many primes . Then, by the fundamental theorem of arithmetic, every natural number is expressible as the product of the primes . But the natural number is larger than one, but not divisible by any of the primes , a contradiction.

(Sieve-theoretic proof) Suppose for contradiction that there were only finitely many primes . Then, by the Chinese remainder theorem, the set of natural numbers that is not divisible by any of the has density , that is to say

In particular, has positive density and thus contains an element larger than . But the least such element is one further prime in addition to , a contradiction.

Remark 1One can also phrase the proof of Euclid’s theorem in a fashion that largely avoids the use of contradiction; see this previous blog post for more discussion.

Both proofs in fact extend to give a stronger result:

*Proof:* (Multiplicative number theory proof) By the fundamental theorem of arithmetic, every natural number is expressible uniquely as the product of primes in increasing order. In particular, we have the identity

(both sides make sense in as everything is unsigned). Since the left-hand side is divergent, the right-hand side is as well. But

and , so must be divergent.

(Sieve-theoretic proof) Suppose for contradiction that the sum is convergent. For each natural number , let be the set of natural numbers not divisible by the first primes , and let be the set of numbers not divisible by any prime in . As in the previous proof, each has density . Also, since contains at most multiples of , we have from the union bound that

Since is assumed to be convergent, we conclude that the density of converges to the density of ; thus has density , which is non-zero by the hypothesis that converges. On the other hand, since the primes are the only numbers greater than one not divisible by smaller primes, is just , which has density zero, giving the desired contradiction.

Remark 2We have seen how easy it is to prove Euler’s theorem by analytic methods. In contrast, there does not seem to be any known proof of this theorem that proceeds by using any sort of prime-generating formula or a primality test, which is further evidence that such tools are not the most effective way to make progress on problems such as Landau’s problems. (But the weaker theorem of Euclid, Theorem 1, can sometimes be proven by such devices.)

The two proofs of Theorem 2 given above are essentially the same proof, as is hinted at by the geometric series identity

One can also see the Riemann zeta function begin to make an appearance in both proofs. Once one goes beyond Euler’s theorem, though, the sieve-theoretic and multiplicative methods begin to diverge significantly. On one hand, sieve theory can still handle to some extent sets such as twin primes, despite the lack of multiplicative structure (one simply has to sieve out two residue classes per prime, rather than one); on the other, multiplicative number theory can attain results such as the prime number theorem for which purely sieve theoretic techniques have not been able to establish. The deepest results in analytic number theory will typically require a combination of both sieve-theoretic methods and multiplicative methods in conjunction with the many transforms discussed earlier (and, in many cases, additional inputs from other fields of mathematics such as arithmetic geometry, ergodic theory, or additive combinatorics).

** â€” 1. Topics covered â€” **

Analytic prime number theory is a vast subject (the 615-page text of Iwaniec and Kowalski, for instance, gives a good indication as to its scope). I will therefore have to be somewhat selective in deciding what subset of this field to cover. I have chosen the following “core” topics to focus on:

- Elementary multiplicative number theory.
- Heuristic random models for the primes.
- The basic theory of the Riemann zeta function and Dirichlet L-functions, and their relationship with the primes.
- Zero-free regions for the zeta function and the Dirichet L-function, including Siegel’s theorem.
- The prime number theorem, the Siegel-Walfisz theorem, and the Bombieri-Vinogradov theorem.
- Sieve theory, small and large gaps between the primes, and the parity problem.
- Exponential sum estimates over the integers, and the Vinogradov-Korobov zero-free region.
- Zero density estimates, Hohiesel’s theorem, and Linnik’s theorem.
- Exponential sum estimates over finite fields, and improved distribution estimates for the primes.
- (If time permits) Exponential sum estimates over the primes, the circle method, and Vinogradov’s three-primes theorem.

In order to cover all this material, I will focus on more qualitative results, as opposed to the strongest quantitative results, in particular I will not attempt to optimise many of the numerical constants and exponents appearing in various estimates. This also allows me to downplay the role of some key components of the field which are not essential for establishing the core results of this course at such a qualitative level:

- I will minimise the use of algebraic number theory tools (such as the class number formula).
- I will avoid deploying the functional equation (or related identities, such as Poisson summation) if they are unnecessary at a qualitative level (though I will note when the functional equation can be used to improve the quantitative results). As it turns out,
*all*of the core results mentioned above can in fact be derived without ever invoking the functional equation, although one usually gets poorer numerical exponents as a consequence. - Somewhat related to this, I will reduce the reliance on complex analytic methods as compared to more traditional presentations of the material, relying in some places instead on Fourier-analytic substitutes, or on results about harmonic functions. (But I will not go as far as deploying the primarily real-variable “pretentious” approach to analytic number theory currently in development by Granville and Soundararajan, although my approach here does align in spirit with that approach.)
- The discussion on sieve methods will be somewhat abridged, focusing primarily on the Selberg sieve, which is a good general-purpose sieve for qualitative applications at least.
- I will almost certainly avoid any discussion of automorphic forms methods.
- Similarly, I will not cover methods that rely on additive combinatorics or ergodic theory.

Of course, many of these additional topics are well covered in existing textbooks, such as the above-mentioned text of Iwaniec and Kowalski (or, for the finer points of sieve theory, the text of Friedlander and Iwaniec). Other good texts that can be used for supplementary reading are Davenport’s “Multiplicative number theory” and Montgomery-Vaughan’s “Multiplicative number theory I.”. As for prerequisites: some exposure to complex analysis, Fourier analysis, and real analysis will be particularly helpful, although we will review some of this material as needed (particularly with regard to complex analysis and the theory of harmonic functions). Experience with other quantitative areas of mathematics in which lower bounds, upper bounds, and other forms of estimation are emphasised (e.g. asymptotic combinatorics or theoretical computer science) will also be useful. Knowledge of algebraic number theory or arithmetic geometry will add a valuable additional perspective to the course, but will not be necessary to follow most of the material.

** â€” 2. Notation â€” **

In this course, all sums will be understood to be over the natural numbers unless otherwise specified, with the exception of sums over the variable (or variants such as , , etc.), which will be understood to be over primes.

We will use asymptotic notation in two contexts, one in which there is no asymptotic parameter present, and one in which there is an asymptotic parameter (such as ) that is going to infinity. In the non-asymptotic setting (which is the default context if no asymptotic parameter is explicitly specified), we use , , or to denote an estimate of the form , where is an absolute constant. In some cases we would like the implied constant to depend on some additional parameters such as , in which case we will denote this by subscripts, for instance denotes the claim that for some depending on .

In some cases it will instead be convenient to work in an asymptotic setting, in which there is an explicitly designated asymptotic parameter (such as ) going to infinity. In that case, all mathematical objects will be permitted to depend on this asymptotic parameter, unless they are explicitly referred to as being *fixed*. We then use , , or to denote the claim that for some fixed . Note that in slight contrast to the non-asymptotic setting, the implied constant here is allowed to depend on other parameters, so long as these parameters are also fixed. As such, the asymptotic setting can be a convenient way to manage dependencies of various implied constants on parameters. In the asymptotic setting we also use to denote the claim that , where is a quantity which goes to zero as the asymptotic parameter goes to infinity.

Remark 3In later posts we will make a distinction between implied constants that areeffective(they can be computed, at least in principle, by some explicit method) and those at areineffective(they can be proven to be finite, but there is no algorithm known to compute them in finite time).

We use to denote the assertion that divides , and to denote the residue class of modulo .

We use to denote the indicator function of a set , thus when and otherwise. Similarly, for any mathematical statement , we use to denote the value when is true and when is false. Thus for instance is the indicator function of the even numbers.

We use to denote the cardinality of a set .

Filed under: 254A - analytic prime number theory, admin ]]>

where is a function of both time and space , with being the Laplacian operator. One can generalise this equation in a number of ways, for instance by replacing the spatial domain with some other manifold and replacing the Laplacian with the Laplace-Beltrami operator or adding lower order terms (such as a potential, or a coupling with a magnetic field). But for sake of discussion let us work with the classical wave equation on . We will work formally in this post, being unconcerned with issues of convergence, justifying interchange of integrals, derivatives, or limits, etc.. One then has a conserved energy

which we can rewrite using integration by parts and the inner product on as

A key feature of the wave equation is *finite speed of propagation*: if, at time (say), the initial position and initial velocity are both supported in a ball , then at any later time , the position and velocity are supported in the larger ball . This can be seen for instance (formally, at least) by inspecting the exterior energy

and observing (after some integration by parts and differentiation under the integral sign) that it is non-increasing in time, non-negative, and vanishing at time .

The wave equation is second order in time, but one can turn it into a first order system by working with the pair rather than just the single field , where is the velocity field. The system is then

and the conserved energy is now

Finite speed of propagation then tells us that if are both supported on , then are supported on for all . One also has time reversal symmetry: if is a solution, then is a solution also, thus for instance one can establish an analogue of finite speed of propagation for negative times using this symmetry.

If one has an eigenfunction

of the Laplacian, then we have the explicit solutions

of the wave equation, which formally can be used to construct all other solutions via the principle of superposition.

When one has vanishing initial velocity , the solution is given via functional calculus by

and the propagator can be expressed as the average of half-wave operators:

One can view as a minor of the full wave propagator

which is unitary with respect to the energy form (1), and is the fundamental solution to the wave equation in the sense that

Viewing the contraction as a minor of a unitary operator is an instance of the “dilation trick“.

It turns out (as I learned from Yuval Peres) that there is a useful discrete analogue of the wave equation (and of all of the above facts), in which the time variable now lives on the integers rather than on , and the spatial domain can be replaced by discrete domains also (such as graphs). Formally, the system is now of the form

where is now an integer, take values in some Hilbert space (e.g. functions on a graph ), and is some operator on that Hilbert space (which in applications will usually be a self-adjoint contraction). To connect this with the classical wave equation, let us first consider a rescaling of this system

where is a small parameter (representing the discretised time step), now takes values in the integer multiples of , and is the wave propagator operator or the heat propagator (the two operators are different, but agree to fourth order in ). One can then formally verify that the wave equation emerges from this rescaled system in the limit . (Thus, is not exactly the direct analogue of the Laplacian , but can be viewed as something like in the case of small , or if we are not rescaling to the small case. The operator is sometimes known as the *diffusion operator*)

Assuming is self-adjoint, solutions to the system (3) formally conserve the energy

This energy is positive semi-definite if is a contraction. We have the same time reversal symmetry as before: if solves the system (3), then so does . If one has an eigenfunction

to the operator , then one has an explicit solution

to (3), and (in principle at least) this generates all other solutions via the principle of superposition.

Finite speed of propagation is a lot easier in the discrete setting, though one has to offset the support of the “velocity” field by one unit. Suppose we know that has unit speed in the sense that whenever is supported in a ball , then is supported in the ball . Then an easy induction shows that if are supported in respectively, then are supported in .

The fundamental solution to the discretised wave equation (3), in the sense of (2), is given by the formula

where and are the Chebyshev polynomials of the first and second kind, thus

and

In particular, is now a minor of , and can also be viewed as an average of with its inverse :

As before, is unitary with respect to the energy form (4), so this is another instance of the dilation trick in action. The powers and are discrete analogues of the heat propagators and wave propagators respectively.

One nice application of all this formalism, which I learned from Yuval Peres, is the Varopoulos-Carne inequality:

Theorem 1 (Varopoulos-Carne inequality)Let be a (possibly infinite) regular graph, let , and let be vertices in . Then the probability that the simple random walk at lands at at time is at most , where is the graph distance.

This general inequality is quite sharp, as one can see using the standard Cayley graph on the integers . Very roughly speaking, it asserts that on a regular graph of reasonably controlled growth (e.g. polynomial growth), random walks of length concentrate on the ball of radius or so centred at the origin of the random walk.

*Proof:* Let be the graph Laplacian, thus

for any , where is the degree of the regular graph and sum is over the vertices that are adjacent to . This is a contraction of unit speed, and the probability that the random walk at lands at at time is

where are the Dirac deltas at . Using (5), we can rewrite this as

where we are now using the energy form (4). We can write

where is the simple random walk of length on the integers, that is to say where are independent uniform Bernoulli signs. Thus we wish to show that

By finite speed of propagation, the inner product here vanishes if . For we can use Cauchy-Schwarz and the unitary nature of to bound the inner product by . Thus the left-hand side may be upper bounded by

and the claim now follows from the Chernoff inequality.

This inequality has many applications, particularly with regards to relating the entropy, mixing time, and concentration of random walks with volume growth of balls; see this text of Lyons and Peres for some examples.

For sake of comparison, here is a continuous counterpart to the Varopoulos-Carne inequality:

Theorem 2 (Continuous Varopoulos-Carne inequality)Let , and let be supported on compact sets respectively. Thenwhere is the Euclidean distance between and .

*Proof:* By Fourier inversion one has

for any real , and thus

By finite speed of propagation, the inner product vanishes when ; otherwise, we can use Cauchy-Schwarz and the contractive nature of to bound this inner product by . Thus

Bounding by , we obtain the claim.

Observe that the argument is quite general and can be applied for instance to other Riemannian manifolds than .

Filed under: expository, math.AP, math.MG, math.OA Tagged: Cayley graphs, random walks, Varopoulos-Carne bound, wave equation, Yuval Peres ]]>

for any fixed . Unconditionally, the best result so far (up to logarithmic factors) that holds for all primes is by Burgess, who showed that

for any fixed . See this previous post for a proof of these bounds.

In this paper, we show that the Vinogradov conjecture is a consequence of the Elliott-Halberstam conjecture. Using a variant of the argument, we also show that the “Type II” estimates established by Zhang and numerically improved by the Polymath8a project can be used to improve a little on the Vinogradov bound (1), to

although this falls well short of the Burgess bound. However, the method is somewhat different (although in both cases it is the Weil exponential sum bounds that are the source of the gain over (1)) and it is conceivable that a numerically stronger version of the Type II estimates could obtain results that are more competitive with the Burgess bound. At any rate, this demonstrates that the equidistribution estimates introduced by Zhang may have further applications beyond the family of results related to bounded gaps between primes.

The connection between the least quadratic nonresidue problem and Elliott-Halberstam is follows. Suppose for contradiction we can find a prime with unusually large. Letting be the quadratic character modulo , this implies that the sums are also unusually large for a significant range of (e.g. ), although the sum is also quite small for large (e.g. ), due to the cancellation present in . It turns out (by a sort of “uncertainty principle” for multiplicative functions, as per this paper of Granville and Soundararajan) that these facts force to be unusually large in magnitude for some large (with for two large absolute constants ). By the periodicity of , this means that

must be unusually large also. However, because is large, one can factorise as for a fairly sparsely supported function . The Elliott-Halberstam conjecture, which controls the distribution of in arithmetic progressions on the average can then be used to show that is small, giving the required contradiction.

The implication involving Type II estimates is proven by a variant of the argument. If is large, then a character sum is unusually large for a certain . By multiplicativity of , this shows that correlates with , and then by periodicity of , this shows that correlates with for various small . By the Cauchy-Schwarz inequality (cf. this previous blog post), this implies that correlates with for some distinct . But this can be ruled out by using Type II estimates.

I’ll record here a well-known observation concerning potential counterexamples to any improvement to the Burgess bound, that is to say an infinite sequence of primes with . Suppose we let be the asymptotic mean value of the quadratic character at and the mean value of ; these quantities are defined precisely in my paper, but roughly speaking one can think of

and

Thanks to the basic Dirichlet convolution identity , one can establish the *Wirsing integral equation*

for all ; see my paper for details (actually far sharper claims than this appear in previous work of Wirsing and Granville-Soundararajan). If we have an infinite sequence of counterexamples to any improvement to the Burgess bound, then we have

while from the Burgess exponential sum estimates we have

These two constraints, together with the Wirsing integral equation, end up determining and completely. It turns out that we must have

and

and then for , evolves by the integral equation

For instance

and then oscillates in a somewhat strange fashion towards zero as . This very odd behaviour of is surely impossible, but it seems remarkably hard to exclude it without invoking a strong hypothesis, such as GRH or the Elliott-Halberstam conjecture (or weaker versions thereof).

Filed under: math.NT, paper Tagged: Elliott-Halberstam conjecture, quadratic nonresidue, Type II estimate, Vinogradov conjecture ]]>

is the von Mangoldt function. It is a basic result in analytic number theory, but requires a bit of effort to prove. One “elementary” proof of this theorem proceeds through the Selberg symmetry formula

where the second von Mangoldt function is defined by the formula

(We are avoiding the use of the symbol here to denote Dirichlet convolution, as we will need this symbol to denote ordinary convolution shortly.) For the convenience of the reader, we give a proof of the Selberg symmetry formula below the fold. Actually, for the purposes of proving the prime number theorem, the weaker estimate

In this post I would like to record a somewhat “soft analysis” reformulation of the elementary proof of the prime number theorem in terms of Banach algebras, and specifically in Banach algebra structures on (completions of) the space of compactly supported continuous functions equipped with the convolution operation

This soft argument does not easily give any quantitative decay rate in the prime number theorem, but by the same token it avoids many of the quantitative calculations in the traditional proofs of this theorem. Ultimately, the key “soft analysis” fact used is the spectral radius formula

for any element of a unital commutative Banach algebra , where is the space of characters (i.e., continuous unital algebra homomorphisms from to ) of . This formula is due to Gelfand and may be found in any text on Banach algebras; for sake of completeness we prove it below the fold.

The connection between prime numbers and Banach algebras is given by the following consequence of the Selberg symmetry formula.

Theorem 1 (Construction of a Banach algebra norm)For any , let denote the quantityThen is a seminorm on with the bound

We prove this theorem below the fold. The prime number theorem then follows from Theorem 1 and the following two assertions. The first is an application of the spectral radius formula (6) and some basic Fourier analysis (in particular, the observation that contains a plentiful supply of local units:

Theorem 2 (Non-trivial Banach algebras with many local units have non-trivial spectrum)Let be a seminorm on obeying (7), (8). Suppose that is not identically zero. Then there exists such thatfor all . In particular, by (7), one has

whenever is a non-negative function.

The second is a consequence of the Selberg symmetry formula and the fact that is real (as well as Mertens’ theorem, in the case), and is closely related to the non-vanishing of the Riemann zeta function on the line :

Theorem 3 (Breaking the parity barrier)Let . Then there exists such that is non-negative, and

Assuming Theorems 1, 2, 3, we may now quickly establish the prime number theorem as follows. Theorem 2 and Theorem 3 imply that the seminorm constructed in Theorem 1 is trivial, and thus

as for any Schwartz function (the decay rate in may depend on ). Specialising to functions of the form for some smooth compactly supported on , we conclude that

as ; by the smooth Urysohn lemma this implies that

as for any fixed , and the prime number theorem then follows by a telescoping series argument.

The same argument also yields the prime number theorem in arithmetic progressions, or equivalently that

for any fixed Dirichlet character ; the one difference is that the use of Mertens’ theorem is replaced by the basic fact that the quantity is non-vanishing.

** — 1. Proof of Selberg symmetry formula — **

We now prove (2). From (3) we have

From the integral test we have the estimates

for some absolute constants whose exact value is unimportant for us, and for any . We conclude that

for some further absolute constants . Replacing by and inserting this into (9), one obtains

The error term can be computed to be . The main term simplifies by Möbius inversion to , and the claim follows.

** — 2. Constructing the Banach algebra — **

We now prove Theorem 1. It is convenient to transform the situation from the classical context of arithmetic functions on (such as or ) to the more Fourier-analytic context of Radon measures on the real line . Define the discrete Radon measure

and for any , let denote the left translate of the measure by , thus

for any continuous compactly supported . We note in passing that the prime number theorem (1) is equivalent to the assertion that the translates converge in the vague topology to Lebesgue measure as .

where is the convolution of the Radon measures , and is the measure multiplied by the identity function . From (4) one has

We claim that the Selberg symmetry formula (5) implies (in fact, it is equivalent to) the assertion that the translates converge in the vague topology to . Indeed, (5) implies for any fixed that

or equivalently that

which we rewrite as

Since for , we thus have

which implies that converges vaguely to , and the claim follows.

Now we begin the proof of Theorem 1. Observe that the quantity can be rewritten as

and converges vaguely to , we see that the measures are precompact in the vague topology, thanks to the Helly selection principle or Prokhorov theorem. In particular, we have

for some limit point of the translates in the vague topology. From (12) we have

Finally, we prove (8). By(11), it suffices to show that

for any , where the decay errors are allowed to depend on . Since converges vaguely to , we already have from (10) that

so it suffices to show that

Let be Lebesgue measure on the half-line . Then , so converges vaguely to . The measure is equal to times the function , so by Mertens’ theorem this function also converges vaguely to . We conclude that

converges vaguely to , and so it suffices to show that

We rewrite this as

On the support of , we have , so it suffices to show that

(The error term in can be controlled by using (15) with replaced by , and modifying the preceding arguments to replace by .)

From Fubini’s theorem we have

The integrand vanishes unless . By (11), we have

and

and the claim (15) follows.

** — 3. Non-trivial algebras with many local units have non-trivial spectrum — **

We now prove Theorem 2. Let be the Banach algebra completion of under the seminorm (thus is the space of Cauchy sequences in , quotiented out by the sequences that go to zero in the seminorm ). Since is not identically zero, is a non-trivial commutative Banach algebra (but it is not necessarily unital).

It is convenient to adjoin a unit to to create a unital commutative Banach algebra with the extended norm

for and ; one easily verifies that is a unital commutative Banach algebra.

Suppose that all elements of have zero spectral radius (as defined in (6)). Let be a Schwartz function with compactly supported Fourier transform. Then we can find another Schwartz function with compactly supported Fourier transform such that (by ensuring that on the support of ; thus is a “local unit” on the Fourier support of ). Thus for all . But has spectral radius zero, thus is zero in . By density this implies that is trivial, a contradiction.

Thus there is an element of with positive spectral radius. Then by (6), there is a character that is does not vanish identically on . Suppose that for each there exists in the kernel of whose Fourier coefficient is non-vanishing. Since the kernel of is a space closed with respect to convolutions by functions, some Fourier analysis and a smooth partition of unity then shows that the kernel of contains any Schwartz function with compactly supported Fourier transform, and thus by density is trivial, a contradiction. Thus there must exist such that contains all test functions with Fourier coefficient vanishing at . From this we conclude that on is a constant multiple of the Fourier coefficient map ; being a non-trivial algebra homomorphism on , we thus have

for all . Since characters have norm at most (as can be seen for instance from (6)), we obtain the claim.

** — 4. Breaking the parity barrier — **

We now prove Theorem 3. We divide into two cases, depending on whether or . If , we let be a continuous function that equals on and is supported on for some large . From Mertens’ theorem we have

for sufficiently large depending on , and thus

The claim then follows by taking sufficiently large.

Now suppose . In the language of Section 2, we have

for some limit point of the . We can write the right-hand side as

for some phase . From (14), is a real measure between and , so by the triangle inequality we have

Now we set , where is as before. Then

Since is periodic with period and has mean value strictly less than (in fact, it has mean ), we thus have

if is sufficiently large depending on . The claim follows.

** — 5. The prime number theorem in arithmetic progressions — **

Let be a non-principal Dirichlet character of some period . We allow all implied constants in the notation to depend on . In this section we sketch the changes to the above arguments needed to establish

which gives the prime number in arithmetic progressions by the usual Fourier expansion into Dirichlet characters.

We have the twisted versions

and

of (3), (4). Since has mean zero, a decomposition into intervals of length reveals that

from which we obtain the twisted Selberg symmetry formula

If we define the twisted measures

and

then

and hence converges weakly to zero as . Introducing the twisted norms

we may verify that obeys the conclusions of Theorem 1.

By repeating the previous arguments, it will suffice that the analogue of Theorem 3 for holds. When , we can argue as in Section 4, where the role of Mertens’ theorem is replaced by Dirichlet’s theorem

which is ultimately a consequence of the non-vanishing of .

For , the argument in Section 4 works with minimal changes if is real-valued. If is complex valued, it still takes only a finite number of values in the unit disk. Then the limit measures appearing in Section 4 are equal to Lebesgue measure times a density taking values in the convex hull of this finite set of values, which is a polygon in the unit disk. One can then modify the arguments in Section 4 to bound

for some phase . If we set as before, we again observe that the function is periodic and has mean strictly less than one, and so we can again establish the required bound if is large enough.

** — 6. Proof of Gelfand formula — **

We now prove (6).

If is a character, then it has an operator norm:

But we may eliminate this norm by using the “tensor power trick”: replacing with and then taking roots we conclude that

and then on sending we have

Replacing by again, taking roots, and sending we conclude that

(The limit exists because is submultiplicative.) This gives one direction of (6). To give the other direction, suppose for sake of contradiction that we could find an such that

There are two cases, depending on whether we can find a complex number with and non-invertible. First suppose that such a exists; then generates an ideal of , which by Zorn’s lemma is contained in a maximal ideal , whose quotient is then a field. By Neumann series, any element of sufficiently close to the identity is invertible and thus not in ; since is a field, we conclude that the complement of is open, and so is closed. This makes a Banach algebra as well as a field. If is not a multiple of the identity, then is invertible for every and so (by Neumann series) is an analytic function from to which goes to zero at infinity, contradicting Liouville’s theorem. Thus is one-dimensional (this is the Banach-Mazur theorem) and thus isomorphic to ; this gives a continuous unital algebra homomorphism with in the kernel, thus , contradicting the second inequality in (16).

Now suppose that is invertible for all . Then, as in the preceding argument, is an analytic function from to which decays to zero at infinity, so we have the Cauchy integral formula

for any natural number . From the triangle inequality we conclude in particular that

which contradicts the first inequality in (16).

Filed under: expository, math.NT, math.OA, math.SP Tagged: Banach algebra, prime number theorem, spectral theorem ]]>

converge to the integral

the triangle density

converges to the integral

the four-cycle density

converges to the integral

and so forth. One can use graph limits to prove many results in graph theory that were traditionally proven using the regularity lemma, such as the triangle removal lemma, and can also reduce many asymptotic graph theory problems to continuous problems involving multilinear integrals (although the latter problems are not necessarily easy to solve!). See this text of Lovasz for a detailed study of graph limits and their applications.

One can also express graph limits (and more generally hypergraph limits) in the language of nonstandard analysis (or of ultraproducts); see for instance this paper of Elek and Szegedy, Section 6 of this previous blog post, or this paper of Towsner. (In this post we assume some familiarity with nonstandard analysis, as reviewed for instance in the previous blog post.) Here, one starts as before with a sequence of finite graphs, and then takes an ultraproduct (with respect to some arbitrarily chosen non-principal ultrafilter ) to obtain a nonstandard graph , where is the ultraproduct of the , and similarly for the . The set can then be viewed as a symmetric subset of which is measurable with respect to the Loeb -algebra of the product (see this previous blog post for the construction of Loeb measure). A crucial point is that this -algebra is larger than the product of the Loeb -algebra of the individual vertex set . This leads to a decomposition

where the “graphon” is the orthogonal projection of onto , and the “regular error” is orthogonal to all product sets for . The graphon then captures the statistics of the nonstandard graph , in exact analogy with the more traditional graph limits: for instance, the edge density

(or equivalently, the limit of the along the ultrafilter ) is equal to the integral

where denotes Loeb measure on a nonstandard finite set ; the triangle density

(or equivalently, the limit along of the triangle densities of ) is equal to the integral

and so forth. Note that with this construction, the graphon is living on the Cartesian square of an abstract probability space , which is likely to be inseparable; but it is possible to cut down the Loeb -algebra on to minimal countable -algebra for which remains measurable (up to null sets), and then one can identify with , bringing this construction of a graphon in line with the traditional notion of a graphon. (See Remark 5 of this previous blog post for more discussion of this point.)

Additive combinatorics, which studies things like the additive structure of finite subsets of an abelian group , has many analogies and connections with asymptotic graph theory; in particular, there is the arithmetic regularity lemma of Green which is analogous to the graph regularity lemma of Szemerédi. (There is also a higher order arithmetic regularity lemma analogous to hypergraph regularity lemmas, but this is not the focus of the discussion here.) Given this, it is natural to suspect that there is a theory of “additive limits” for large additive sets of bounded doubling, analogous to the theory of graph limits for large dense graphs. The purpose of this post is to record a candidate for such an additive limit. This limit can be used as a substitute for the arithmetic regularity lemma in certain results in additive combinatorics, at least if one is willing to settle for qualitative results rather than quantitative ones; I give a few examples of this below the fold.

It seems that to allow for the most flexible and powerful manifestation of this theory, it is convenient to use the nonstandard formulation (among other things, it allows for full use of the transfer principle, whereas a more traditional limit formulation would only allow for a transfer of those quantities continuous with respect to the notion of convergence). Here, the analogue of a nonstandard graph is an *ultra approximate group* in a nonstandard group , defined as the ultraproduct of finite -approximate groups for some standard . (A -approximate group is a symmetric set containing the origin such that can be covered by or fewer translates of .) We then let be the external subgroup of generated by ; equivalently, is the union of over all standard . This space has a Loeb measure , defined by setting

whenever is an internal subset of for any standard , and extended to a countably additive measure; the arguments in Section 6 of this previous blog post can be easily modified to give a construction of this measure.

The Loeb measure is a translation invariant measure on , normalised so that has Loeb measure one. As such, one should think of as being analogous to a locally compact abelian group equipped with a Haar measure. It should be noted though that is not *actually* a locally compact group with Haar measure, for two reasons:

- There is not an obvious topology on that makes it simultaneously locally compact, Hausdorff, and -compact. (One can get one or two out of three without difficulty, though.)
- The addition operation is not measurable from the product Loeb algebra to . Instead, it is measurable from the coarser Loeb algebra to (compare with the analogous situation for nonstandard graphs).

Nevertheless, the analogy is a useful guide for the arguments that follow.

Let denote the space of bounded Loeb measurable functions (modulo almost everywhere equivalence) that are supported on for some standard ; this is a complex algebra with respect to pointwise multiplication. There is also a convolution operation , defined by setting

whenever , are bounded nonstandard functions (extended by zero to all of ), and then extending to arbitrary elements of by density. Equivalently, is the pushforward of the -measurable function under the map .

The basic structural theorem is then as follows.

Theorem 1 (Kronecker factor)Let be an ultra approximate group. Then there exists a (standard) locally compact abelian group of the formfor some standard and some compact abelian group , equipped with a Haar measure and a measurable homomorphism (using the Loeb -algebra on and the Borel -algebra on ), with the following properties:

- (i) has dense image, and is the pushforward of Loeb measure by .
- (ii) There exists sets with open and compact, such that
- (iii) Whenever with compact and open, there exists a nonstandard finite set such that
- (iv) If , then we have the convolution formula
where are the pushforwards of to , the convolution on the right-hand side is convolution using , and is the pullback map from to . In particular, if , then for all .

One can view the locally compact abelian group as a “model “or “Kronecker factor” for the ultra approximate group (in close analogy with the Kronecker factor from ergodic theory). In the case that is a genuine nonstandard finite group rather than an ultra approximate group, the non-compact components of the Kronecker group are trivial, and this theorem was implicitly established by Szegedy. The compact group is quite large, and in particular is likely to be inseparable; but as with the case of graphons, when one is only studying at most countably many functions , one can cut down the size of this group to be separable (or equivalently, second countable or metrisable) if desired, so one often works with a “reduced Kronecker factor” which is a quotient of the full Kronecker factor .

Given any sequence of uniformly bounded functions for some fixed , we can view the function defined by

as an “additive limit” of the , in much the same way that graphons are limits of the indicator functions . The additive limits capture some of the statistics of the , for instance the normalised means

converge (along the ultrafilter ) to the mean

and for three sequences of functions, the normalised correlation

converges along to the correlation

the normalised Gowers norm

converges along to the Gowers norm

and so forth. We caution however that some correlations that involve evaluating more than one function at the same point will not necessarily be preserved in the additive limit; for instance the normalised norm

does not necessarily converge to the norm

but can converge instead to a larger quantity, due to the presence of the orthogonal projection in the definition (4) of .

An important special case of an additive limit occurs when the functions involved are indicator functions of some subsets of . The additive limit does not necessarily remain an indicator function, but instead takes values in (much as a graphon takes values in even though the original indicators take values in ). The convolution is then the ultralimit of the normalised convolutions ; in particular, the measure of the support of provides a lower bound on the limiting normalised cardinality of a sumset. In many situations this lower bound is an equality, but this is not necessarily the case, because the sumset could contain a large number of elements which have very few () representations as the sum of two elements of , and in the limit these portions of the sumset fall outside of the support of . (One can think of the support of as describing the “essential” sumset of , discarding those elements that have only very few representations.) Similarly for higher convolutions of . Thus one can use additive limits to partially control the growth of iterated sumsets of subsets of approximate groups , in the regime where stays bounded and goes to infinity.

Theorem 1 can be proven by Fourier-analytic means (combined with Freiman’s theorem from additive combinatorics), and we will do so below the fold. For now, we give some illustrative examples of additive limits.

Example 1 (Bohr sets)We take to be the intervals , where is a sequence going to infinity; these are -approximate groups for all . Let be an irrational real number, let be an interval in , and for each natural number let be the Bohr setIn this case, the (reduced) Kronecker factor can be taken to be the infinite cylinder with the usual Lebesgue measure . The additive limits of and end up being and , where is the finite cylinder

and is the rectangle

Geometrically, one should think of and as being wrapped around the cylinder via the homomorphism , and then one sees that is converging in some normalised weak sense to , and similarly for and . In particular, the additive limit predicts the growth rate of the iterated sumsets to be quadratic in until becomes comparable to , at which point the growth transitions to linear growth, in the regime where is bounded and is large.

If were rational instead of irrational, then one would need to replace by the finite subgroup here.

Example 2 (Structured subsets of progressions)We take be the rank two progressionwhere is a sequence going to infinity; these are -approximate groups for all . Let be the subset

Then the (reduced) Kronecker factor can be taken to be with Lebesgue measure , and the additive limits of the and are then and , where is the square

and is the circle

Geometrically, the picture is similar to the Bohr set one, except now one uses a Freiman homomorphism for to embed the original sets into the plane . In particular, one now expects the growth rate of the iterated sumsets and to be quadratic in , in the regime where is bounded and is large.

Example 3 (Dissociated sets)Let be a fixed natural number, and takewhere are randomly chosen elements of a large cyclic group , where is a sequence of primes going to infinity. These are -approximate groups. The (reduced) Kronecker factor can (almost surely) then be taken to be with counting measure, and the additive limit of is , where and is the standard basis of . In particular, the growth rates of should grow approximately like for bounded and large.

Example 4 (Random subsets of groups)Let be a sequence of finite additive groups whose order is going to infinity. Let be a random subset of of some fixed density . Then (almost surely) the Kronecker factor here can be reduced all the way to the trivial group , and the additive limit of the is the constant function . The convolutions then converge in the ultralimit (modulo almost everywhere equivalence) to the pullback of ; this reflects the fact that of the elements of can be represented as the sum of two elements of in ways. In particular, occupies a proportion of .

Example 5 (Trigonometric series)Take for a sequence of primes going to infinity, and for each let be an infinite sequence of frequencies chosen uniformly and independently from . Let denote the random trigonometric seriesThen (almost surely) we can take the reduced Kronecker factor to be the infinite torus (with the Haar probability measure ), and the additive limit of the then becomes the function defined by the formula

In fact, the pullback is the ultralimit of the . As such, for any standard exponent , the normalised norm

can be seen to converge to the limit

The reader is invited to consider combinations of the above examples, e.g. random subsets of Bohr sets, to get a sense of the general case of Theorem 1.

It is likely that this theorem can be extended to the noncommutative setting, using the noncommutative Freiman theorem of Emmanuel Breuillard, Ben Green, and myself, but I have not attempted to do so here (see though this recent preprint of Anush Tserunyan for some related explorations); in a separate direction, there should be extensions that can control higher Gowers norms, in the spirit of the work of Szegedy.

Note: the arguments below will presume some familiarity with additive combinatorics and with nonstandard analysis, and will be a little sketchy in places.

** — 1. Proof of theorem — **

By Freiman’s theorem for arbitrary abelian groups (see this paper of Green and Ruzsa), we can find an *ultra coset progression* such that

for some standard ; we abbreviate the latter inclusion as . By an ultra coset progression, we mean the sumset of a nonstandard finite group and a nonstandard generalised arithmetic progression

with (known as the *rank*) standard, the *generators* in and the *dimensions* being nonstandard natural numbers. (To get the containment , one can first use the Bogulybov lemma to get a large ultra coset progression inside , so that can be covered by translates of ; one can then add these translates to the generators of to obtain an ultra coset progression with the required properties.

We call the ultra coset progression *-proper* if the sums for and for are all distinct. If fails to be -proper, then we can find a containment

where the coset progression has strictly smaller rank than ; see e.g. Lemma 5.1 of this paper of Van Vu and myself). Iterating this fact, we see that we may assume without loss of generality that is -proper. In particular, the group can now be parameterised by the sums with for , with each element of having exactly one representation of this form.

The dimensions are either bounded (and thus standard natural numbers) or unbounded. After permuting the generators if necessary, we may assume that are unbounded and are bounded for some with . We then have an external surjective group homomorphism defined by

this will end up being the non-compact portion of the projection map that we will eventually construct. The image is precompact in (in fact it is compact, thanks to countable saturation).

Now we perform some Fourier analysis on (analogous to the usual theory of Fourier analysis on locally compact abelian groups). Define a *frequency* to be a measurable homomorphism from to , and let denote the space of such frequencies; this is an additive group, which should be thought of as a “Pontryagin dual” to (even though is not a locally compact group). Meanwhile, we have the (genuine) Pontryagin dual of , using the identification

The homomorphism then induces a dual homomorphism , defined by the formula

for all and . This homomorphism is easily seen to be injective. If we let denote the cokernel of this map, then is an abelian group (which we will view as a discrete group) and we have the short exact sequence

Observe that is a divisible group. From this and a Zorn’s lemma argument we can split this short exact sequence, lifting up to a subgroup of , so that the latter group can be viewed as the direct sum of and .

Let be the Pontryagin dual of , that is to say the space of all homomorphisms from to (with no measurability or continuity hypotheses imposed). This is a compact abelian group (it is a closed subset of , which is compact by Tychonoff’s theorem). Set . We have a homomorphism , defined by

We claim that has dense image. Since is surjective, it suffices to show that the map from to has dense image from to , where

is the kernel of . The closure of the image of is a compact subgroup of , so this map did not have dense image, there would exist a non-trivial in the Pontryagin dual of which annihilates all of . The map then factors through and thus can be identified with an element of ; but and only intersect at , a contradiction. Thus has dense image.

It is a routine matter to verify that is measurable, that is precompact, and that the inverse image of any compact set is contained in for some standard . From this and the Riesz representation theorem, we can define a Haar measure on by defining

for all continuous, compactly supported functions ; the translation invariance of this measure follows from the surjectivity of . From Urysohn’s lemma and the inner and outer regularity of Haar measures, one can then show that is the pushforward of Loeb measure under .

Now we show the convolution property (3). First suppose that , which in particular implies that

for all , since the function factors through . By the Loeb measure construction, we can write as the limit (in ) of functions , where are uniformly bounded nonstandard functions and is some standard natural number. Then we have

which in particular implies that

where ranges over all nonstandard maps of the form

for some and nonstandard homomorphism . From (nonstandard) Fourier analysis, we conclude that

for any bounded nonstandard function , or equivalently that

and thus on taking limits we see that , and on taking further limits we see that for any , as required. This proves (3) when ; similarly when . To finish off the general case of (3), it suffices to show that

for bounded measurable . By Fourier decomposition, we may assume that takes the form

for some and some continuous compactly supported , and similarly

for some and continuous compactly supported .

If , then for some , and one can use this to show that and both vanish. Thus we may assume that ; using modulation symmetry we may then assume that . It thus suffices to show that

A direct calculation shows that the left and right hand sides agree up to constants; but both sides also have integral when integrated against , so they must agree identically.

Now, we prove the inclusions (1). The outer inclusion comes from the compactness (or precompactness) of . For the inner inclusion, we note from (3) and the positive measure and symmetry of that is the pullback of a continuous function on that is positive at the origin, and thus also bounded away from zero on a neighbourhood of the origin. This implies that the set has full measure in . We then let be a smaller symmetric neighbourhood of the origin such that . We then see that for any