You are currently browsing the tag archive for the ‘prime number theorem’ tag.

A basic object of study in multiplicative number theory are the arithmetic functions: functions from the natural numbers to the complex numbers. Some fundamental examples of such functions include

- The constant function ;
- The Kronecker delta function ;
- The natural logarithm function ;
- The divisor function ;
- The von Mangoldt function , with defined to equal when is a power of a prime for some , and defined to equal zero otherwise; and
- The Möbius function , with defined to equal when is the product of distinct primes, and defined to equal zero otherwise.

Given an arithmetic function , we are often interested in statistics such as the summatory function

the logarithmically (or harmonically) weighted summatory function

or the Dirichlet series

In the latter case, one typically has to first restrict to those complex numbers whose real part is large enough in order to ensure the series on the right converges; but in many important cases, one can then extend the Dirichlet series to almost all of the complex plane by analytic continuation. One is also interested in correlations involving additive shifts, such as , but these are significantly more difficult to study and cannot be easily estimated by the methods of classical multiplicative number theory.

A key operation on arithmetic functions is that of Dirichlet convolution, which when given two arithmetic functions , forms a new arithmetic function , defined by the formula

Thus for instance , , , and for any arithmetic function . Dirichlet convolution and Dirichlet series are related by the fundamental formula

at least when the real part of is large enough that all sums involved become absolutely convergent (but in practice one can use analytic continuation to extend this identity to most of the complex plane). There is also the identity

at least when the real part of is large enough to justify interchange of differentiation and summation. As a consequence, many Dirichlet series can be expressed in terms of the Riemann zeta function , thus for instance

Much of the difficulty of multiplicative number theory can be traced back to the discrete nature of the natural numbers , which form a rather complicated abelian semigroup with respect to multiplication (in particular the set of generators is the set of prime numbers). One can obtain a simpler analogue of the subject by working instead with the half-infinite interval , which is a much simpler abelian semigroup under multiplication (being a one-dimensional Lie semigroup). (I will think of this as a sort of “completion” of at the infinite place , hence the terminology.) Accordingly, let us define a *continuous arithmetic function* to be a locally integrable function . The analogue of the summatory function (1) is then an integral

and similarly the analogue of (2) is

The analogue of the Dirichlet series is the Mellin-type transform

which will be well-defined at least if the real part of is large enough and if the continuous arithmetic function does not grow too quickly, and hopefully will also be defined elsewhere in the complex plane by analytic continuation.

For instance, the continuous analogue of the discrete constant function would be the constant function , which maps any to , and which we will denote by in order to keep it distinct from . The two functions and have approximately similar statistics; for instance one has

and

where is the harmonic number, and we are deliberately vague as to what the symbol means. Continuing this analogy, we would expect

which reflects the fact that has a simple pole at with residue , and no other poles. Note that the identity is initially only valid in the region , but clearly the right-hand side can be continued analytically to the entire complex plane except for the pole at , and so one can define in this region also.

In a similar vein, the logarithm function is approximately similar to the logarithm function , giving for instance the crude form

of Stirling’s formula, or the Dirichlet series approximation

The continuous analogue of Dirichlet convolution is multiplicative convolution using the multiplicative Haar measure : given two continuous arithmetic functions , one can define their convolution by the formula

Thus for instance . A short computation using Fubini’s theorem shows the analogue

of (3) whenever the real part of is large enough that Fubini’s theorem can be justified; similarly, differentiation under the integral sign shows that

again assuming that the real part of is large enough that differentiation under the integral sign (or some other tool like this, such as the Cauchy integral formula for derivatives) can be justified.

Direct calculation shows that for any complex number , one has

(at least for the real part of large enough), and hence by several applications of (5)

for any natural number . This can lead to the following heuristic: if a Dirichlet series behaves like a linear combination of poles , in that

for some set of poles and some coefficients and natural numbers (where we again are vague as to what means, and how to interpret the sum if the set of poles is infinite), then one should expect the arithmetic function to behave like the continuous arithmetic function

In particular, if we only have simple poles,

then we expect to have behave like continuous arithmetic function

Integrating this from to , this heuristically suggests an approximation

for the summatory function, and similarly

with the convention that is when , and similarly is when . One can make these sorts of approximations more rigorous by means of Perron’s formula (or one of its variants) combined with the residue theorem, provided that one has good enough control on the relevant Dirichlet series, but we will not pursue these rigorous calculations here. (But see for instance this previous blog post for some examples.)

For instance, using the more refined approximation

to the zeta function near , we have

we would expect that

and thus for instance

which matches what one actually gets from the Dirichlet hyperbola method (see e.g. equation (44) of this previous post).

Or, noting that has a simple pole at and assuming simple zeroes elsewhere, the log derivative will have simple poles of residue at and at all the zeroes, leading to the heuristic

suggesting that should behave like the continuous arithmetic function

leading for instance to the summatory approximation

which is a heuristic form of the Riemann-von Mangoldt explicit formula (see Exercise 45 of these notes for a rigorous version of this formula).

Exercise 1Go through some of the other explicit formulae listed at this Wikipedia page and give heuristic justifications for them (up to some lower order terms) by similar calculations to those given above.

Given the “adelic” perspective on number theory, I wonder if there are also -adic analogues of arithmetic functions to which a similar set of heuristics can be applied, perhaps to study sums such as . A key problem here is that there does not seem to be any good interpretation of the expression when is complex and is a -adic number, so it is not clear that one can analyse a Dirichlet series -adically. For similar reasons, we don’t have a canonical way to define for a Dirichlet character (unless its conductor happens to be a power of ), so there doesn’t seem to be much to say in the -aspect either.

In analytic number theory, there is a well known analogy between the prime factorisation of a large integer, and the cycle decomposition of a large permutation; this analogy is central to the topic of “anatomy of the integers”, as discussed for instance in this survey article of Granville. Consider for instance the following two parallel lists of facts (stated somewhat informally). Firstly, some facts about the prime factorisation of large integers:

- Every positive integer has a prime factorisation
into (not necessarily distinct) primes , which is unique up to rearrangement. Taking logarithms, we obtain a partition

of .

- (Prime number theorem) A randomly selected integer of size will be prime with probability when is large.
- If is a randomly selected large integer of size , and is a randomly selected prime factor of (with each index being chosen with probability ), then is approximately uniformly distributed between and . (See Proposition 9 of this previous blog post.)
- The set of real numbers arising from the prime factorisation of a large random number converges (away from the origin, and in a suitable weak sense) to the Poisson-Dirichlet process in the limit . (See the previously mentioned blog post for a definition of the Poisson-Dirichlet process, and a proof of this claim.)

Now for the facts about the cycle decomposition of large permutations:

- Every permutation has a cycle decomposition
into disjoint cycles , which is unique up to rearrangement, and where we count each fixed point of as a cycle of length . If is the length of the cycle , we obtain a partition

of .

- (Prime number theorem for permutations) A randomly selected permutation of will be an -cycle with probability exactly . (This was noted in this previous blog post.)
- If is a random permutation in , and is a randomly selected cycle of (with each being selected with probability ), then is exactly uniformly distributed on . (See Proposition 8 of this blog post.)
- The set of real numbers arising from the cycle decomposition of a random permutation converges (in a suitable sense) to the Poisson-Dirichlet process in the limit . (Again, see this previous blog post for details.)

See this previous blog post (or the aforementioned article of Granville, or the Notices article of Arratia, Barbour, and Tavaré) for further exploration of the analogy between prime factorisation of integers and cycle decomposition of permutations.

There is however something unsatisfying about the analogy, in that it is not clear *why* there should be such a kinship between integer prime factorisation and permutation cycle decomposition. It turns out that the situation is clarified if one uses another fundamental analogy in number theory, namely the analogy between integers and polynomials over a finite field , discussed for instance in this previous post; this is the simplest case of the more general function field analogy between number fields and function fields. Just as we restrict attention to positive integers when talking about prime factorisation, it will be reasonable to restrict attention to monic polynomials . We then have another analogous list of facts, proven very similarly to the corresponding list of facts for the integers:

- Every monic polynomial has a factorisation
into irreducible monic polynomials , which is unique up to rearrangement. Taking degrees, we obtain a partition

of .

- (Prime number theorem for polynomials) A randomly selected monic polynomial of degree will be irreducible with probability when is fixed and is large.
- If is a random monic polynomial of degree , and is a random irreducible factor of (with each selected with probability ), then is approximately uniformly distributed in when is fixed and is large.
- The set of real numbers arising from the factorisation of a randomly selected polynomial of degree converges (in a suitable sense) to the Poisson-Dirichlet process when is fixed and is large.

The above list of facts addressed the *large limit* of the polynomial ring , where the order of the field is held fixed, but the degrees of the polynomials go to infinity. This is the limit that is most closely analogous to the integers . However, there is another interesting asymptotic limit of polynomial rings to consider, namely the *large limit* where it is now the *degree* that is held fixed, but the order of the field goes to infinity. Actually to simplify the exposition we will use the slightly more restrictive limit where the *characteristic* of the field goes to infinity (again keeping the degree fixed), although all of the results proven below for the large limit turn out to be true as well in the large limit.

The large (or large ) limit is technically a different limit than the large limit, but in practice the asymptotic statistics of the two limits often agree quite closely. For instance, here is the prime number theorem in the large limit:

Theorem 1 (Prime number theorem)The probability that a random monic polynomial of degree is irreducible is in the limit where is fixed and the characteristic goes to infinity.

*Proof:* There are monic polynomials of degree . If is irreducible, then the zeroes of are distinct and lie in the finite field , but do not lie in any proper subfield of that field. Conversely, every element of that does not lie in a proper subfield is the root of a unique monic polynomial in of degree (the minimal polynomial of ). Since the union of all the proper subfields of has size , the total number of irreducible polynomials of degree is thus , and the claim follows.

Remark 2The above argument and inclusion-exclusion in fact gives the well known exact formula for the number of irreducible monic polynomials of degree .

Now we can give a precise connection between the cycle distribution of a random permutation, and (the large limit of) the irreducible factorisation of a polynomial, giving a (somewhat indirect, but still connected) link between permutation cycle decomposition and integer factorisation:

Theorem 3The partition of a random monic polynomial of degree converges in distribution to the partition of a random permutation of length , in the limit where is fixed and the characteristic goes to infinity.

We can quickly prove this theorem as follows. We first need a basic fact:

Lemma 4 (Most polynomials square-free in large limit)A random monic polynomial of degree will be square-free with probability when is fixed and (or ) goes to infinity. In a similar spirit, two randomly selected monic polynomials of degree will be coprime with probability if are fixed and or goes to infinity.

*Proof:* For any polynomial of degree , the probability that is divisible by is at most . Summing over all polynomials of degree , and using the union bound, we see that the probability that is *not* squarefree is at most , giving the first claim. For the second, observe from the first claim (and the fact that has only a bounded number of factors) that is squarefree with probability , giving the claim.

Now we can prove the theorem. Elementary combinatorics tells us that the probability of a random permutation consisting of cycles of length for , where are nonnegative integers with , is precisely

since there are ways to write a given tuple of cycles in cycle notation in nondecreasing order of length, and ways to select the labels for the cycle notation. On the other hand, by Theorem 1 (and using Lemma 4 to isolate the small number of cases involving repeated factors) the number of monic polynomials of degree that are the product of irreducible polynomials of degree is

which simplifies to

and the claim follows.

This was a fairly short calculation, but it still doesn’t quite explain *why* there is such a link between the cycle decomposition of permutations and the factorisation of a polynomial. One immediate thought might be to try to link the multiplication structure of permutations in with the multiplication structure of polynomials; however, these structures are too dissimilar to set up a convincing analogy. For instance, the multiplication law on polynomials is abelian and non-invertible, whilst the multiplication law on is (extremely) non-abelian but invertible. Also, the multiplication of a degree and a degree polynomial is a degree polynomial, whereas the group multiplication law on permutations does not take a permutation in and a permutation in and return a permutation in .

I recently found (after some discussions with Ben Green) what I feel to be a satisfying conceptual (as opposed to computational) explanation of this link, which I will place below the fold.

In the previous set of notes, we saw how zero-density theorems for the Riemann zeta function, when combined with the zero-free region of Vinogradov and Korobov, could be used to obtain prime number theorems in short intervals. It turns out that a more sophisticated version of this type of argument also works to obtain prime number theorems in arithmetic progressions, in particular establishing the celebrated theorem of Linnik:

Theorem 1 (Linnik’s theorem)Let be a primitive residue class. Then contains a prime with .

In fact it is known that one can find a prime with , a result of Xylouris. For sake of comparison, recall from Exercise 65 of Notes 2 that the Siegel-Walfisz theorem gives this theorem with a bound of , and from Exercise 48 of Notes 2 one can obtain a bound of the form if one assumes the generalised Riemann hypothesis. The probabilistic random models from Supplement 4 suggest that one should in fact be able to take .

We will not aim to obtain the optimal exponents for Linnik’s theorem here, and follow the treatment in Chapter 18 of Iwaniec and Kowalski. We will in fact establish the following more quantitative result (a special case of a more powerful theorem of Gallagher), which splits into two cases, depending on whether there is an exceptional zero or not:

Theorem 2 (Quantitative Linnik theorem)Let be a primitive residue class for some . For any , let denote the quantityAssume that for some sufficiently large .

- (i) (No exceptional zero) If all the real zeroes of -functions of real characters of modulus are such that , then
for all and some absolute constant .

- (ii) (Exceptional zero) If there is a zero of an -function of a real character of modulus with for some sufficiently small , then
for all and some absolute constant .

The implied constants here are effective.

Note from the Landau-Page theorem (Exercise 54 from Notes 2) that at most one exceptional zero exists (if is small enough). A key point here is that the error term in the exceptional zero case is an *improvement* over the error term when no exceptional zero is present; this compensates for the potential reduction in the main term coming from the term. The splitting into cases depending on whether an exceptional zero exists or not turns out to be an essential technique in many advanced results in analytic number theory (though presumably such a splitting will one day become unnecessary, once the possibility of exceptional zeroes are finally eliminated for good).

Exercise 3Assuming Theorem 2, and assuming for some sufficiently large absolute constant , establish the lower boundwhen there is no exceptional zero, and

when there is an exceptional zero . Conclude that Theorem 2 implies Theorem 1, regardless of whether an exceptional zero exists or not.

Remark 4The Brun-Titchmarsh theorem (Exercise 33 from Notes 4), in the sharp form of Montgomery and Vaughan, gives thatfor any primitive residue class and any . This is (barely) consistent with the estimate (1). Any lowering of the coefficient in the Brun-Titchmarsh inequality (with reasonable error terms), in the regime when is a large power of , would then lead to at least some elimination of the exceptional zero case. However, this has not led to any progress on the Landau-Siegel zero problem (and may well be just a reformulation of that problem). (When is a relatively small power of , some improvements to Brun-Titchmarsh are possible that are not in contradiction with the presence of an exceptional zero; see this paper of Maynard for more discussion.)

Theorem 2 is deduced in turn from facts about the distribution of zeroes of -functions. We first need a version of the truncated explicit formula that does not lose unnecessary logarithms:

Exercise 5 (Log-free truncated explicit formula)With the hypotheses as above, show thatfor any non-principal character of modulus , where we assume for some large ; for the principal character establish the same formula with an additional term of on the right-hand side. (

Hint:this is almost immediate from Exercise 45(iv) and Theorem 21 ofNotes 2) with (say) , except that there is a factor of in the error term instead of when is extremely large compared to . However, a closer inspection of the proof (particularly with regards to the truncated Perron formula in Proposition 12 of Notes 2) shows that the factor can be replaced fairly easily by . To get rid of the final factor of , note that the proof of Proposition 12 used the rather crude bound . If one replaces this crude bound by more sophisticated tools such as the Brun-Titchmarsh inequality, one will be able to remove the factor of .

Using the Fourier inversion formula

(see Theorem 69 of Notes 1), we thus have

and so it suffices by the triangle inequality (bounding very crudely by , as the contribution of the low-lying zeroes already turns out to be quite dominant) to show that

when no exceptional zero is present, and

when an exceptional zero is present.

To handle the former case (2), one uses two facts about zeroes. The first is the classical zero-free region (Proposition 51 from Notes 2), which we reproduce in our context here:

Proposition 6 (Classical zero-free region)Let . Apart from a potential exceptional zero , all zeroes of -functions with of modulus and are such thatfor some absolute constant .

Using this zero-free region, we have

whenever contributes to the sum in (2), and so the left-hand side of (2) is bounded by

where we recall that is the number of zeroes of any -function of a character of modulus with and (here we use conjugation symmetry to make non-negative, accepting a multiplicative factor of two).

In Exercise 25 of Notes 6, the grand density estimate

is proven. If one inserts this bound into the above expression, one obtains a bound for (2) which is of the form

Unfortunately this is off from what we need by a factor of (and would lead to a weak form of Linnik’s theorem in which was bounded by rather than by ). In the analogous problem for prime number theorems in short intervals, we could use the Vinogradov-Korobov zero-free region to compensate for this loss, but that region does not help here for the contribution of the low-lying zeroes with , which as mentioned before give the dominant contribution. Fortunately, it is possible to remove this logarithmic loss from the zero-density side of things:

Theorem 7 (Log-free grand density estimate)For any and , one hasThe implied constants are effective.

We prove this estimate below the fold. The proof follows the methods of the previous section, but one inserts various sieve weights to restrict sums over natural numbers to essentially become sums over “almost primes”, as this turns out to remove the logarithmic losses. (More generally, the trick of restricting to almost primes by inserting suitable sieve weights is quite useful for avoiding any unnecessary losses of logarithmic factors in analytic number theory estimates.)

Now we turn to the case when there is an exceptional zero (3). The argument used to prove (2) applies here also, but does not gain the factor of in the exponent. To achieve this, we need an additional tool, a version of the Deuring-Heilbronn repulsion phenomenon due to Linnik:

Theorem 9 (Deuring-Heilbronn repulsion phenomenon)Suppose is such that there is an exceptional zero with small. Then all other zeroes of -functions of modulus are such thatIn other words, the exceptional zero enlarges the classical zero-free region by a factor of . The implied constants are effective.

Exercise 10Use Theorem 7 and Theorem 9 to complete the proof of (3), and thus Linnik’s theorem.

Exercise 11Use Theorem 9 to give an alternate proof of (Tatuzawa’s version of) Siegel’s theorem (Theorem 62 of Notes 2). (Hint:if two characters have different moduli, then they can be made to have the same modulus by multiplying by suitable principal characters.)

Theorem 9 is proven by similar methods to that of Theorem 7, the basic idea being to insert a further weight of (in addition to the sieve weights), the point being that the exceptional zero causes this weight to be quite small on the average. There is a strengthening of Theorem 9 due to Bombieri that is along the lines of Theorem 7, obtaining the improvement

with effective implied constants for any and in the presence of an exceptional zero, where the prime in means that the exceptional zero is omitted (thus if ). Note that the upper bound on falls below one when for a sufficiently small , thus recovering Theorem 9. Bombieri’s theorem can be established by the methods in this set of notes, and will be given as an exercise to the reader.

Remark 12There are a number of alternate ways to derive the results in this set of notes, for instance using the Turan power sums method which is based on studying derivatives such asfor and large , and performing various sorts of averaging in to attenuate the contribution of many of the zeroes . We will not develop this method here, but see for instance Chapter 9 of Montgomery’s book. See the text of Friedlander and Iwaniec for yet another approach based primarily on sieve-theoretic ideas.

Remark 13When one optimises all the exponents, it turns out that the exponent in Linnik’s theorem isextremelygood in the presence of an exceptional zero – indeed Friedlander and Iwaniec showed can even get a bound of the form for some , which is even stronger than one can obtain from GRH! There are other places in which exceptional zeroes can be used to obtain results stronger than what one can obtain even on the Riemann hypothesis; for instance, Heath-Brown used the hypothesis of an infinite sequence of Siegel zeroes to obtain the twin prime conejcture.

In the previous set of notes, we studied upper bounds on sums such as for that were valid for all in a given range, such as ; this led in turn to upper bounds on the Riemann zeta for in the same range, and for various choices of . While some improvement over the trivial bound of was obtained by these methods, we did not get close to the conjectural bound of that one expects from pseudorandomness heuristics (assuming that is not too large compared with , e.g. .

However, it turns out that one can get much better bounds if one settles for estimating sums such as , or more generally finite Dirichlet series (also known as *Dirichlet polynomials*) such as , for *most* values of in a given range such as . Equivalently, we will be able to get some control on the *large values* of such Dirichlet polynomials, in the sense that we can control the set of for which exceeds a certain threshold, even if we cannot show that this set is empty. These large value theorems are often closely tied with estimates for *mean values* such as of a Dirichlet series; these latter estimates are thus known as *mean value theorems* for Dirichlet series. Our approach to these theorems will follow the same sort of methods used in Notes 3, in particular relying on the generalised Bessel inequality from those notes.

Our main application of the large value theorems for Dirichlet polynomials will be to control the number of zeroes of the Riemann zeta function (or the Dirichlet -functions ) in various rectangles of the form for various and . These rectangles will be larger than the zero-free regions for which we can exclude zeroes completely, but we will often be able to limit the number of zeroes in such rectangles to be quite small. For instance, we will be able to show the following weak form of the Riemann hypothesis: as , a proportion of zeroes of the Riemann zeta function in the critical strip with will have real part . Related to this, the number of zeroes with and can be shown to be bounded by as for any .

In the next set of notes we will use refined versions of these theorems to establish Linnik’s theorem on the least prime in an arithmetic progression.

Our presentation here is broadly based on Chapters 9 and 10 in Iwaniec and Kowalski, who give a number of more sophisticated large value theorems than the ones discussed here.

The prime number theorem can be expressed as the assertion

is the von Mangoldt function. It is a basic result in analytic number theory, but requires a bit of effort to prove. One “elementary” proof of this theorem proceeds through the Selberg symmetry formula

where the second von Mangoldt function is defined by the formula

(We are avoiding the use of the symbol here to denote Dirichlet convolution, as we will need this symbol to denote ordinary convolution shortly.) For the convenience of the reader, we give a proof of the Selberg symmetry formula below the fold. Actually, for the purposes of proving the prime number theorem, the weaker estimate

In this post I would like to record a somewhat “soft analysis” reformulation of the elementary proof of the prime number theorem in terms of Banach algebras, and specifically in Banach algebra structures on (completions of) the space of compactly supported continuous functions equipped with the convolution operation

This soft argument does not easily give any quantitative decay rate in the prime number theorem, but by the same token it avoids many of the quantitative calculations in the traditional proofs of this theorem. Ultimately, the key “soft analysis” fact used is the spectral radius formula

for any element of a unital commutative Banach algebra , where is the space of characters (i.e., continuous unital algebra homomorphisms from to ) of . This formula is due to Gelfand and may be found in any text on Banach algebras; for sake of completeness we prove it below the fold.

The connection between prime numbers and Banach algebras is given by the following consequence of the Selberg symmetry formula.

Theorem 1 (Construction of a Banach algebra norm)For any , let denote the quantityThen is a seminorm on with the bound

for all . Furthermore, we have the Banach algebra bound

We prove this theorem below the fold. The prime number theorem then follows from Theorem 1 and the following two assertions. The first is an application of the spectral radius formula (6) and some basic Fourier analysis (in particular, the observation that contains a plentiful supply of local units):

Theorem 2 (Non-trivial Banach algebras with many local units have non-trivial spectrum)Let be a seminorm on obeying (7), (8). Suppose that is not identically zero. Then there exists such thatfor all . In particular, by (7), one has

whenever is a non-negative function.

The second is a consequence of the Selberg symmetry formula and the fact that is real (as well as Mertens’ theorem, in the case), and is closely related to the non-vanishing of the Riemann zeta function on the line :

Theorem 3 (Breaking the parity barrier)Let . Then there exists such that is non-negative, and

Assuming Theorems 1, 2, 3, we may now quickly establish the prime number theorem as follows. Theorem 2 and Theorem 3 imply that the seminorm constructed in Theorem 1 is trivial, and thus

as for any Schwartz function (the decay rate in may depend on ). Specialising to functions of the form for some smooth compactly supported on , we conclude that

as ; by the smooth Urysohn lemma this implies that

as for any fixed , and the prime number theorem then follows by a telescoping series argument.

The same argument also yields the prime number theorem in arithmetic progressions, or equivalently that

for any fixed Dirichlet character ; the one difference is that the use of Mertens’ theorem is replaced by the basic fact that the quantity is non-vanishing.

One of the most basic methods in additive number theory is the Hardy-Littlewood circle method. This method is based on expressing a quantity of interest to additive number theory, such as the number of representations of an integer as the sum of three primes , as a Fourier-analytic integral over the unit circle involving exponential sums such as

where the sum here ranges over all primes up to , and . For instance, the expression mentioned earlier can be written as

The strategy is then to obtain sufficiently accurate bounds on exponential sums such as in order to obtain non-trivial bounds on quantities such as . For instance, if one can show that for all odd integers greater than some given threshold , this implies that all odd integers greater than are expressible as the sum of three primes, thus establishing all but finitely many instances of the odd Goldbach conjecture.

Remark 1In practice, it can be more efficient to work with smoother sums than the partial sum (1), for instance by replacing the cutoff with a smoother cutoff for a suitable choice of cutoff function , or by replacing the restriction of the summation to primes by a more analytically tractable weight, such as the von Mangoldt function . However, these improvements to the circle method are primarily technical in nature and do not have much impact on the heuristic discussion in this post, so we will not emphasise them here. One can also certainly use the circle method to study additive combinations of numbers from other sets than the set of primes, but we will restrict attention to additive combinations of primes for sake of discussion, as it is historically one of the most studied sets in additive number theory.

In many cases, it turns out that one can get fairly precise evaluations on sums such as in the *major arc* case, when is close to a rational number with small denominator , by using tools such as the prime number theorem in arithmetic progressions. For instance, the prime number theorem itself tells us that

and the prime number theorem in residue classes modulo suggests more generally that

when is small and is close to , basically thanks to the elementary calculation that the phase has an average value of when is uniformly distributed amongst the residue classes modulo that are coprime to . Quantifying the precise error in these approximations can be quite challenging, though, unless one assumes powerful hypotheses such as the Generalised Riemann Hypothesis.

In the *minor arc* case when is not close to a rational with small denominator, one no longer expects to have such precise control on the value of , due to the “pseudorandom” fluctuations of the quantity . Using the standard probabilistic heuristic (supported by results such as the central limit theorem or Chernoff’s inequality) that the sum of “pseudorandom” phases should fluctuate randomly and be of typical magnitude , one expects upper bounds of the shape

for “typical” minor arc . Indeed, a simple application of the Plancherel identity, followed by the prime number theorem, reveals that

which is consistent with (though weaker than) the above heuristic. In practice, though, we are unable to rigorously establish bounds anywhere near as strong as (3); upper bounds such as are far more typical.

Because one only expects to have upper bounds on , rather than asymptotics, in the minor arc case, one cannot realistically hope to make much use of phases such as for the minor arc contribution to integrals such as (2) (at least if one is working with a single, deterministic, value of , so that averaging in is unavailable). In particular, from upper bound information alone, it is difficult to avoid the “conspiracy” that the magnitude oscillates in sympathetic resonance with the phase , thus essentially eliminating almost all of the possible gain in the bounds that could arise from exploiting cancellation from that phase. Thus, one basically has little option except to use the triangle inequality to control the portion of the integral on the minor arc region :

Despite this handicap, though, it is still possible to get enough bounds on both the major and minor arc contributions of integrals such as (2) to obtain non-trivial lower bounds on quantities such as , at least when is large. In particular, this sort of method can be developed to give a proof of Vinogradov’s famous theorem that every sufficiently large odd integer is the sum of three primes; my own result that all odd numbers greater than can be expressed as the sum of at most five primes is also proven by essentially the same method (modulo a number of minor refinements, and taking advantage of some numerical work on both the Goldbach problems and on the Riemann hypothesis ). It is certainly conceivable that some further variant of the circle method (again combined with a suitable amount of numerical work, such as that of numerically establishing zero-free regions for the Generalised Riemann Hypothesis) can be used to settle the full odd Goldbach conjecture; indeed, under the assumption of the Generalised Riemann Hypothesis, this was already achieved by Deshouillers, Effinger, te Riele, and Zinoviev back in 1997. I am optimistic that an unconditional version of this result will be possible within a few years or so, though I should say that there are still significant technical challenges to doing so, and some clever new ideas will probably be needed to get either the Vinogradov-style argument or numerical verification to work unconditionally for the three-primes problem at medium-sized ranges of , such as . (But the intermediate problem of representing all even natural numbers as the sum of at most four primes looks somewhat closer to being feasible, though even this would require some substantially new and non-trivial ideas beyond what is in my five-primes paper.)

However, I (and many other analytic number theorists) are considerably more skeptical that the circle method can be applied to the even Goldbach problem of representing a large even number as the sum of two primes, or the similar (and marginally simpler) twin prime conjecture of finding infinitely many pairs of twin primes, i.e. finding infinitely many representations of as the *difference* of two primes. At first glance, the situation looks tantalisingly similar to that of the Vinogradov theorem: to settle the even Goldbach problem for large , one has to find a non-trivial lower bound for the quantity

for sufficiently large , as this quantity is also the number of ways to represent as the sum of two primes . Similarly, to settle the twin prime problem, it would suffice to obtain a lower bound for the quantity

that goes to infinity as , as this quantity is also the number of ways to represent as the difference of two primes less than or equal to .

In principle, one can achieve either of these two objectives by a sufficiently fine level of control on the exponential sums . Indeed, there is a trivial (and uninteresting) way to take any (hypothetical) solution of either the asymptotic even Goldbach problem or the twin prime problem and (artificially) convert it to a proof that “uses the circle method”; one simply begins with the quantity or , expresses it in terms of using (5) or (6), and then uses (5) or (6) again to convert these integrals back into a the combinatorial expression of counting solutions to or , and then uses the hypothetical solution to the given problem to obtain the required lower bounds on or .

Of course, this would not qualify as a genuine application of the circle method by any reasonable measure. One can then ask the more refined question of whether one could hope to get non-trivial lower bounds on or (or similar quantities) purely from the upper and lower bounds on or similar quantities (and of various type norms on such quantities, such as the bound (4)). Of course, we do not yet know what the strongest possible upper and lower bounds in are yet (otherwise we would already have made progress on major conjectures such as the Riemann hypothesis); but we can make plausible heuristic conjectures on such bounds. And this is enough to make the following heuristic conclusions:

- (i) For “binary” problems such as computing (5), (6), the contribution of the minor arcs potentially dominates that of the major arcs (if all one is given about the minor arc sums is magnitude information), in contrast to “ternary” problems such as computing (2), in which it is the major arc contribution which is absolutely dominant.
- (ii) Upper and lower bounds on the magnitude of are not sufficient, by themselves, to obtain non-trivial bounds on (5), (6) unless these bounds are
*extremely*tight (within a relative error of or better); but - (iii) obtaining such tight bounds is a problem of comparable difficulty to the original binary problems.

I will provide some justification for these conclusions below the fold; they are reasonably well known “folklore” to many researchers in the field, but it seems that they are rarely made explicit in the literature (in part because these arguments are, by their nature, heuristic instead of rigorous) and I have been asked about them from time to time, so I decided to try to write them down here.

In view of the above conclusions, it seems that the best one can hope to do by using the circle method for the twin prime or even Goldbach problems is to reformulate such problems into a statement of roughly comparable difficulty to the original problem, even if one assumes powerful conjectures such as the Generalised Riemann Hypothesis (which lets one make very precise control on major arc exponential sums, but not on minor arc ones). These are not rigorous conclusions – after all, we have already seen that one can always artifically insert the circle method into any viable approach on these problems – but they do strongly suggest that one needs a method other than the circle method in order to fully solve either of these two problems. I do not know what such a method would be, though I can give some heuristic objections to some of the other popular methods used in additive number theory (such as sieve methods, or more recently the use of inverse theorems); this will be done at the end of this post.

A fundamental problem in analytic number theory is to understand the distribution of the prime numbers . For technical reasons, it is convenient not to study the primes directly, but a proxy for the primes known as the von Mangoldt function , defined by setting to equal when is a prime (or a power of that prime) and zero otherwise. The basic reason why the von Mangoldt function is useful is that it encodes the fundamental theorem of arithmetic (which in turn can be viewed as the defining property of the primes) very neatly via the identity

The most important result in this subject is the prime number theorem, which asserts that the number of prime numbers less than a large number is equal to :

Here, of course, denotes a quantity that goes to zero as .

It is not hard to see (e.g. by summation by parts) that this is equivalent to the asymptotic

for the von Mangoldt function (the key point being that the squares, cubes, etc. of primes give a negligible contribution, so is essentially the same quantity as ). Understanding the nature of the term is a very important problem, with the conjectured optimal decay rate of being equivalent to the Riemann hypothesis, but this will not be our concern here.

The prime number theorem has several important generalisations (for instance, there are analogues for other number fields such as the Chebotarev density theorem). One of the more elementary such generalisations is the prime number theorem in arithmetic progressions, which asserts that for fixed and with coprime to (thus ), the number of primes less than equal to mod is equal to , where is the Euler totient function:

(Of course, if is not coprime to , the number of primes less than equal to mod is . The subscript in the and notation denotes that the implied constants in that notation is allowed to depend on .) This is a more quantitative version of Dirichlet’s theorem, which asserts the weaker statement that the number of primes equal to mod is infinite. This theorem is important in many applications in analytic number theory, for instance in Vinogradov’s theorem that every sufficiently large odd number is the sum of three odd primes. (Imagine for instance if almost all of the primes were clustered in the residue class mod , rather than mod . Then almost all sums of three odd primes would be divisible by , leaving dangerously few sums left to cover the remaining two residue classes. Similarly for other moduli than . This does not fully rule out the possibility that Vinogradov’s theorem could still be true, but it does indicate why the prime number theorem in arithmetic progressions is a relevant tool in the proof of that theorem.)

As before, one can rewrite the prime number theorem in arithmetic progressions in terms of the von Mangoldt function as the equivalent form

Philosophically, one of the main reasons why it is so hard to control the distribution of the primes is that we do not currently have too many tools with which one can rule out “conspiracies” between the primes, in which the primes (or the von Mangoldt function) decide to correlate with some structured object (and in particular, with a totally multiplicative function) which then visibly distorts the distribution of the primes. For instance, one could imagine a scenario in which the probability that a randomly chosen large integer is prime is not asymptotic to (as is given by the prime number theorem), but instead to fluctuate depending on the phase of the complex number for some fixed real number , thus for instance the probability might be significantly less than when is close to an integer, and significantly more than when is close to a half-integer. This would contradict the prime number theorem, and so this scenario would have to be somehow eradicated in the course of proving that theorem. In the language of Dirichlet series, this conspiracy is more commonly known as a zero of the Riemann zeta function at .

In the above scenario, the primality of a large integer was somehow sensitive to asymptotic or “Archimedean” information about , namely the approximate value of its logarithm. In modern terminology, this information reflects the local behaviour of at the infinite place . There are also potential consipracies in which the primality of is sensitive to the local behaviour of at finite places, and in particular to the residue class of mod for some fixed modulus . For instance, given a Dirichlet character of modulus , i.e. a completely multiplicative function on the integers which is periodic of period (and vanishes on those integers not coprime to ), one could imagine a scenario in which the probability that a randomly chosen large integer is prime is large when is close to , and small when is close to , which would contradict the prime number theorem in arithmetic progressions. (Note the similarity between this scenario at and the previous scenario at ; in particular, observe that the functions and are both totally multiplicative.) In the language of Dirichlet series, this conspiracy is more commonly known as a zero of the -function of at .

An especially difficult scenario to eliminate is that of *real characters*, such as the Kronecker symbol , in which numbers which are quadratic nonresidues mod are very likely to be prime, and quadratic residues mod are unlikely to be prime. Indeed, there is a scenario of this form – the Siegel zero scenario – which we are still not able to eradicate (without assuming powerful conjectures such as GRH), though fortunately Siegel zeroes are not quite strong enough to destroy the prime number theorem in arithmetic progressions.

It is difficult to prove that no conspiracy between the primes exist. However, it is not entirely impossible, because we have been able to exploit two important phenomena. The first is that there is often a “all or nothing dichotomy” (somewhat resembling the *zero-one laws* in probability) regarding conspiracies: in the asymptotic limit, the primes can either conspire totally (or more precisely, anti-conspire totally) with a multiplicative function, or fail to conspire at all, but there is no middle ground. (In the language of Dirichlet series, this is reflected in the fact that zeroes of a meromorphic function can have order , or order (i.e. are not zeroes after all), but cannot have an intermediate order between and .) As a corollary of this fact, the prime numbers cannot conspire with two distinct multiplicative functions at once (by having a partial correlation with one and another partial correlation with another); thus one can use the existence of one conspiracy to exclude all the others. In other words, there is at most one conspiracy that can significantly distort the distribution of the primes. Unfortunately, this argument is *ineffective*, because it doesn’t give any control at all on what that conspiracy is, or even if it exists in the first place!

But now one can use the second important phenomenon, which is that because of symmetries, one type of conspiracy can lead to another. For instance, because the von Mangoldt function is real-valued rather than complex-valued, we have conjugation symmetry; if the primes correlate with, say, , then they must also correlate with . (In the language of Dirichlet series, this reflects the fact that the zeta function and -functions enjoy symmetries with respect to reflection across the real axis (i.e. complex conjugation).) Combining this observation with the all-or-nothing dichotomy, we conclude that the primes cannot correlate with for any non-zero , which in fact leads directly to the prime number theorem (2), as we shall discuss below. Similarly, if the primes correlated with a Dirichlet character , then they would also correlate with the conjugate , which also is inconsistent with the all-or-nothing dichotomy, except in the exceptional case when is real – which essentially means that is a quadratic character. In this one case (which is the only scenario which comes close to threatening the truth of the prime number theorem in arithmetic progressions), the above tricks fail and one has to instead exploit the algebraic number theory properties of these characters instead, which has so far led to weaker results than in the non-real case.

As mentioned previously in passing, these phenomena are usually presented using the language of Dirichlet series and complex analysis. This is a very slick and powerful way to do things, but I would like here to present the elementary approach to the same topics, which is slightly weaker but which I find to also be very instructive. (However, I will not be *too* dogmatic about keeping things elementary, if this comes at the expense of obscuring the key ideas; in particular, I will rely on multiplicative Fourier analysis (both at and at finite places) as a substitute for complex analysis in order to expedite various parts of the argument. Also, the emphasis here will be more on heuristics and intuition than on rigour.)

The material here is closely related to the theory of *pretentious characters* developed by Granville and Soundararajan, as well as an earlier paper of Granville on elementary proofs of the prime number theorem in arithmetic progressions.

Atle Selberg, who made immense and fundamental contributions to analytic number theory and related areas of mathematics, died last Monday, aged 90.

Selberg’s early work was focused on the study of the Riemann zeta function . In 1942, Selberg showed that a positive fraction of the zeroes of this function lie on the critical line . Apart from improvements in the fraction (the best value currently being a little over 40%, a result of Conrey), this is still one of the strongest partial results we have towards the Riemann hypothesis. (I discuss Selberg’s result, and the method of mollifiers he introduced there, in a little more detail after the jump.)

In working on the zeta function, Selberg developed two powerful tools which are still used routinely in analytic number theory today. The first is the method of mollifiers to smooth out the magnitude oscillations of the zeta function, making the (more interesting) phase oscillation more visible. The second was the method of the Selberg sieve, which is a particularly elegant choice of sieve which allows one to count patterns in almost primes (and hence to upper bound patterns in primes) quite accurately. Variants of the Selberg sieve were a crucial ingredient in, for instance, the recent work of Goldston-Yıldırım-Pintz on prime gaps, as well as the work of Ben Green and myself on arithmetic progressions in primes. (I discuss the Selberg sieve, as well as the Selberg symmetry formula below, in my post on the parity problem. Incidentally, Selberg was the first to formalise this problem as a significant obstruction in sieve theory.)

For all of these achievements, Selberg was awarded the Fields Medal in 1950. Around that time, Selberg and Erdős also produced the first elementary proof of the prime number theorem. A key ingredient here was the Selberg symmetry formula, which is an elementary analogue of the prime number theorem for almost primes.

But perhaps Selberg’s greatest contribution to mathematics was his discovery of the Selberg trace formula, which is a non-abelian generalisation of the Poisson summation formula, and which led to many further deep connections between representation theory and number theory, and in particular being one of the main inspirations for the Langlands program, which in turn has had an impact on many different parts of mathematics (for instance, it plays a role in Wiles’ proof of Fermat’s last theorem). For an introduction to the trace formula, its history, and its impact, I recommend the survey article of Arthur.

Other major contributions of Selberg include the Rankin-Selberg theory connecting Artin L-functions from representation theory to the integrals of automorphic forms (very much in the spirit of the Langlands program), and the Chowla-Selberg formula relating the Gamma function at rational values to the periods of elliptic curves with complex multiplication. He also made an influential conjecture on the spectral gap of the Laplacian on quotients of by congruence groups, which is still open today (Selberg had the first non-trivial partial result). As an example of this conjecture’s impact, Selberg’s eigenvalue conjecture has inspired some recent work of Sarnak-Xue, Gamburd, and Bourgain-Gamburd on new constructions of expander graphs, and has revealed some further connections between number theory and arithmetic combinatorics (such as sum-product theorems); see this announcement of Bourgain-Gamburd-Sarnak for the most recent developments (this work, incidentally, also employs the Selberg sieve). As observed by Satake, Selberg’s eigenvalue conjecture and the more classical Ramanujan-Petersson conjecture can be unified into a single conjecture, now known as the* Ramanujan-Selberg conjecture*; the eigenvalue conjecture is then essentially an archimedean (or “non-dyadic“) special case of the general Ramanujan-Selberg conjecture. (The original (dyadic) Ramanujan-Petersson conjecture was finally proved by Deligne-Serre, after many important contributions by other authors, but the non-dyadic version remains open.)

## Recent Comments