We now move away from the world of *multiplicative* prime number theory covered in Notes 1 and Notes 2, and enter the wider, and complementary, world of *non-multiplicative* prime number theory, in which one studies statistics related to non-multiplicative patterns, such as twins . This creates a major jump in difficulty; for instance, even the most basic multiplicative result about the primes, namely Euclid’s theorem that there are infinitely many of them, remains unproven for twin primes. Of course, the situation is even worse for stronger results, such as Euler’s theorem, Dirichlet’s theorem, or the prime number theorem. Finally, even many multiplicative questions about the primes remain open. The most famous of these is the Riemann hypothesis, which gives the asymptotic (see Proposition 24 from Notes 2). But even if one assumes the Riemann hypothesis, the precise distribution of the error term in the above asymptotic (or in related asymptotics, such as for the sum that measures the distribution of primes in short intervals) is not entirely clear.

Despite this, we do have a number of extremely convincing and well supported *models* for the primes (and related objects) that let us *predict* what the answer to many prime number theory questions (both multiplicative and non-multiplicative) should be, particularly in asymptotic regimes where one can work with aggregate statistics about the primes, rather than with a small number of individual primes. These models are based on taking some statistical distribution related to the primes (e.g. the primality properties of a randomly selected -tuple), and replacing that distribution by a *model* distribution that is easy to compute with (e.g. a distribution with strong joint independence properties). One can then predict the asymptotic value of various (normalised) statistics about the primes by replacing the relevant statistical distributions of the primes with their simplified models. In this non-rigorous setting, many difficult conjectures on the primes reduce to relatively simple calculations; for instance, all four of the (still unsolved) Landau problems may now be justified in the affirmative by one or more of these models. Indeed, the models are so effective at this task that analytic number theory is in the curious position of being able to confidently predict the answer to a large proportion of the open problems in the subject, whilst not possessing a clear way forward to rigorously confirm these answers!

As it turns out, the models for primes that have turned out to be the most accurate in practice are *random* models, which involve (either explicitly or implicitly) one or more random variables. This is despite the prime numbers being obviously deterministic in nature; no coins are flipped or dice rolled to create the set of primes. The point is that while the primes have a lot of obvious *multiplicative* structure (for instance, the product of two primes is never another prime), they do not appear to exhibit much discernible *non-multiplicative* structure asymptotically, in the sense that they rarely exhibit statistical anomalies in the asymptotic limit that cannot be easily explained in terms of the multiplicative properties of the primes. As such, when considering non-multiplicative statistics of the primes, the primes appear to behave pseudorandomly, and can thus be modeled with reasonable accuracy by a random model. And even for multiplicative problems, which are in principle controlled by the zeroes of the Riemann zeta function, one can obtain good predictions by positing various pseudorandomness properties of these zeroes, so that the distribution of these zeroes can be modeled by a random model.

Of course, one cannot expect *perfect* accuracy when replicating a deterministic set such as the primes by a probabilistic model of that set, and each of the heuristic models we discuss below have some limitations to the range of statistics about the primes that they can expect to track with reasonable accuracy. For instance, many of the models about the primes do not fully take into account the multiplicative structure of primes, such as the connection with a zeta function with a meromorphic continuation to the entire complex plane; at the opposite extreme, we have the GUE hypothesis which appears to accurately model the zeta function, but does not capture such basic properties of the primes as the fact that the primes are all natural numbers. Nevertheless, each of the models described below, when deployed within their sphere of reasonable application, has (possibly after some fine-tuning) given predictions that are in remarkable agreement with numerical computation and with known rigorous theoretical results, as well as with other models in overlapping spheres of application; they are also broadly compatible with the general heuristic (discussed in this previous post) that in the absence of any exploitable structure, asymptotic statistics should default to the most “uniform”, “pseudorandom”, or “independent” distribution allowable.

As hinted at above, we do not have a single unified model for the prime numbers (other than the primes themselves, of course), but instead have an overlapping family of useful models that each appear to accurately describe some, but not all, aspects of the prime numbers. In this set of notes, we will discuss four such models:

- The Cramér random model and its refinements, which model the set of prime numbers by a random set.
- The
*Möbius pseudorandomness principle*, which predicts that the Möbius function does not correlate with any genuinely different arithmetic sequence of reasonable “complexity”. - The
*equidistribution of residues principle*, which predicts that the residue classes of a large number modulo a small or medium-sized prime behave as if they are independently and uniformly distributed as varies. - The
*GUE hypothesis*, which asserts that the zeroes of the Riemann zeta function are distributed (at microscopic and mesoscopic scales) like the zeroes of a GUE random matrix, and which generalises the*pair correlation conjecture*regarding pairs of such zeroes.

This is not an exhaustive list of models for the primes and related objects; for instance, there is also the model in which the major arc contribution in the Hardy-Littlewood circle method is predicted to always dominate, and with regards to various finite groups of number-theoretic importance, such as the class groups discussed in Supplement 1, there are also heuristics of Cohen-Lenstra type. Historically, the first heuristic discussion of the primes along these lines was by Sylvester, who worked informally with a model somewhat related to the equidistribution of residues principle. However, we will not discuss any of these models here.

A word of warning: the discussion of the above four models will inevitably be largely informal, and “fuzzy” in nature. While one can certainly make precise formalisations of at least some aspects of these models, one should not be inflexibly wedded to a specific such formalisation as being “the” correct way to pin down the model rigorously. (To quote the statistician George Box: “all models are wrong, but some are useful”.) Indeed, we will see some examples below the fold in which some finer structure in the prime numbers leads to a correction term being added to a “naive” implementation of one of the above models to make it more accurate, and it is perfectly conceivable that some further such fine-tuning will be applied to one or more of these models in the future. These sorts of mathematical models are in some ways closer in nature to the scientific theories used to model the physical world, than they are to the axiomatic theories one is accustomed to in rigorous mathematics, and one should approach the discussion below accordingly. In particular, and in contrast to the other notes in this course, the material here is not *directly* used for proving further theorems, which is why we have marked it as “optional” material. Nevertheless, the heuristics and models here are still used *indirectly* for such purposes, for instance by

- giving a clearer indication of what results one
*expects*to be true, thus guiding one to fruitful conjectures; - providing a quick way to scan for possible errors in a mathematical claim (e.g. by finding that the main term is off from what a model predicts, or an error term is too small);
- gauging the relative strength of various assertions (e.g. classifying some results as “unsurprising”, others as “potential breakthroughs” or “powerful new estimates”, others as “unexpected new phenomena”, and yet others as “way too good to be true”); or
- setting up heuristic barriers (such as the parity barrier) that one has to resolve before resolving certain key problems (e.g. the twin prime conjecture).

See also my previous essay on the distinction between “rigorous” and “post-rigorous” mathematics, or Thurston’s essay discussing, among other things, the “definition-theorem-proof” model of mathematics and its limitations.

Remark 1The material in this set of notes presumes some prior exposure to probability theory. See for instance this previous post for a quick review of the relevant concepts.

** — 1. The Cramér random model — **

We begin by introducing the “naive” Cramér random model for the primes , explicitly introduced by Harold Cramér in 1936. The naive version of the Cramér model is a little too oversimplified to give very accurate results, but tends to give predictions of the right order of magnitude at least, and will serve as a basis for a more accurate version of the Cramér model to be introduced shortly.

The starting point for the model is the prime number theorem, which implies that there are approximately primes between and for any fixed and large . Thus, if one chooses a natural number at random in , the probability that it will be prime is about . Inspired by this, we construct a random model of the primes to be a random subset of the natural numbers, such that each natural number has an independent probability of of lying in , thus

for all distinct . We exclude from consideration in (1) to avoid the issue that is greater than one, and thus cannot serve as a probability. The membership of and in is not of any great importance, but given that is supposed to model , let us require that and . We ignore here the (routine) measure-theoretic machinery needed to rigorously construct a probability space that can support a countably infinite number of random variables, as this set of notes is not focused on rigorous formalism. (See for instance this previous post for the relevant theory needed.)

We then have

Model 2 (Naive Cramér random model)The asymptotic statistics for the set of primes should (almost surely) behave like the asymptotic statistics of the random set .

This model asserts that while the set of primes is certainly deterministic, it is pseudorandom, in the sense that it behaves like a random set, namely . The naive version of the Cramér model uses a random set that has no structure beyond its density.

Let’s see some basic predictions of this model. The first is that it recovers the Riemann hypothesis (cf. Proposition 24 of Notes 2):

Prediction 3 (Riemann hypothesis)For any fixed , we have

*Proof:* (Uses naive Cramér model) The contribution of those that are squares or higher order of primes can be easily seen to be , so it suffices to show that

We then invoke the Cramér random model, and verify that almost surely one has

for all (here we allow the implied constant in the notation to be random). By the Borel-Cantelli lemma, it suffices for each fixed choice of (which we can take to be a natural number) to obtain the estimate

Fix , and consider the random variable

which we rewrite as

where

From (1), the have mean zero and are independent, and have size . Thus

for any fixed natural number (the only terms in that give a non-zero contribution are those in which each appears at least twice, so there are at most distinct indices of that arise, and so there are only such terms, each contributing . From Chebyshev’s inequality this implies that with probability . Setting sufficiently large depending on , we obtain the claim.

Remark 4One could also have proven the above claim using concentration of measure results such as the Chernoff inequality, as discussed in this previous blog post. The estimate (2) reflects the general heuristic ofexpected square root cancellation: when summing “independent” expressions, each of size , the sum should typically deviate by around its expected value, thus saving a factor of about over the trivial deviation bound of .

Exercise 5Use the naive Cramér model to predict the size of the error term iCn the first, second, and third Mertens theorems (Theorems 33, 36, 42 from Notes 1). (Note that the Cramér model will not predict the constant in the third Mertens theorem, as this constant is sensitive to the values of small primes, which the Cramér model does not accurately model; however, the model can predict a rate of convergence with replaced by an unspecified constant, which can thena posterioribe matched with using the existing version of the third Mertens theorem.)

Remark 6The fact that the naive Cramér model recovers essentially the same prediction on the prime number error term as the Riemann hypothesis (up to factors of ) is encouraging as far as evidence for the Riemann hypothesis is concerned. But one should caution that due to the multiplicative structure of the von Mangoldt function (as evidenced by the explicit formula), the naive Cramér model does not quite give the right bounds on error terms if one looks at the factors more carefully. For instance, from the smoothed explicit formula (Exercise 22 of Notes 2) it is a routine matter (using Proposition 16 from Notes 2 and Exercise 28 of Supplement 2) to arrive at the boundassuming the Riemann hypothesis for a fixed smooth, compactly supported function and all . On the other hand, if one applies the naive Cramér random model, the quantity becomes modeled by a quantity of mean and variance comparable to ; we leave this computation as an exercise to the interested reader. This suggests that the error term should be closer to than to . Thus we see that the explicit formula, which is a feature of the genuine primes that are not present in the Cramér random model, bestows (on the Riemann hypothesis) slightly subrandom behaviour on the primes – the error terms can in fact be slightly better than what a purely random model predicts! (See Section 4 below for further examples of this subrandom behaviour.) Ultimately, the reason for this is that the multiplicative semigroup generated by the primes – that is to say, the natural numbers – is unusually well distributed in the sense that , which we have seen in previous notes to lead to an meromorphic continuation of the zeta function to the left of the critical line. In contrast, almost any conceivable Cramér-type random model for the primes will only have a distribution of at best (where we view as a multiset, so the sum is weighted by multiplicity), and so cannot capture consequences of this subrandom distribution of , such as the explicit formula. However, it appears that such deviations from random behaviour are only significant if one wishes to obtain accuracy better than in one’s estimates, and so one should not be too concerned about such deviations if one is willing to concede these errors. See this paper of Pintz for further discussion of this deviation of the primes from Cramér type models.

Next, we give another prediction of the naive Cramér model, which was in fact Cramér’s original motivation for introducing the model. For any , let

denote the largest gap between the primes up to , where denotes the prime.

Note that this conjecture would imply Legendre’s conjecture that there is always a prime between and , at least for sufficiently large ; indeed, any bound of the shape would suffice for this purpose. Intuitively, the reason for this is that with an average density of , the probability that a given gap between primes exceeds should be like ; as one has gaps to play with, the largest gap that one still expects to actually occur would then be of size with , which explains the factor.

*Proof:* (Uses naive Cramér model) Let be the elements of the random set in increasing order, and let be the random variable

Invoking the naive Cramér model, we will verify that almost surely one has

as . By the Borel-Cantelli lemma (and restricting to be a power of two), it suffices to show for each and that one has

We begin with (4). To show this bound, it suffices to show that every discrete interval in of cardinality contains at least one element of . There are such intervals, so by the union bound it suffices to show that for any given interval , the probability that contains at least one element of is at least . From (1) and the inclusion-exclusion principle, this probability is precisely

Upper bounding by , this probability is at least , and the claim follows.

Now we show (5). Observe that the interval contains disjoint discrete intervals of cardinality . It will suffice to show that with probability , at least one of these intervals is disjoint from . (It is easy to see that the interval will contain at least one element of with probability .) By (1), the probability that a given interval has this property is

Lower bounding by , where the goes to zero as keeping fixed, we see that the probability that a given is disjoint from is at least

As the events that are disjoint from are independent in , the probability that none of them are disjoint from is then at most

and the claim follows.

As it turns out, we now believe this prediction of the naive Cramér model to be slightly incorrect; see below. For the most recent rigorous lower bounds on , see this recent post. As for upper bounds, it was shown by Cramér on the Riemann hypothesis that , and a slight further improvement under the additional assumption of the pair correlation conjecture (see Section 4 below) was obtained by Heath-Brown.

Our final example of the naive Cramér model is to the twin prime conjecture.

Prediction 8 (Twin prime asymptotic, first attempt)We haveas . In particular, there are infinitely many primes such that is also prime.

Intuitively, the reason for this asymptotic is that for most of the numbers between and , each of and have an independent probability of about being prime (or more precisely, of lying in the Cramér model for the primes), so the probability that they are both prime should be about .

*Proof:* (Uses naive Cramér model) We will show that

almost surely. By the Borel-Cantelli lemma again, it suffices to show that for each , we have (6) with probability (say) as .

Fix . We write the left-hand side as

where

To estimate the sum , we can easily check that the contribution with is , and for the remaining portion we have , and thus

It thus suffices to show that

with probability . But from (1), each has mean zero, is bounded by , and is independent of unless , from which we easily calculate that

A direct application of Chebyshev’s inequality then gives the required bound with a failure probability of , which is not quite good enough for our purposes; but a similar calculation gives

which is sufficient. (Alternatively, one can first round to the nearest power of for a small fixed , in which case Borel-Cantelli would already work with a failure probability as large as, say, .)

Exercise 9Sharpen the prediction of the naive Cramér model tofor any and .

Unfortunately, the above argument reveals a significant drawback to the naive Cramér model, in that the same argument also predicts the asymptotic

implying that there are an infinite number of *consecutive* primes . This is of course absurd, since all but one of the primes are odd. The problem is that the naive Cramér model does not take into account the severe irregularity in the distribution of the primes with respect to the residue classes modulo . One also needs to take into account irregularity modulo , to avoid predicting the infinitude of prime triplets (note that as one of must be divisible by , the only prime triplet is in fact ); and so forth. We formalise this with the following revision of the Cramér model. Given a parameter , let denote the product of all the primes up to (this is sometimes known as the primorial of ). Note that after removing the primes less than or equal to , all the remaining primes lie in the residue classes with coprime to . From the prime number theorem in arithmetic progressions (Exercise 50 from Notes 2), we expect (for sufficiently large depending on ) each such residue class to contain about primes less than or equal to , compared to about natural numbers less than or equal to , leading to a density of . This motivates a variant of the random set , defined by letting consist (deterministically) of the primes up to , together with a random subset of those natural numbers coprime to , with

for distinct coprime to . (The minimum here is to ensure that probabilities stay less than , and can be largely ignored in practice.) Thus, for instance, consists of the prime together with a random set of odd numbers, with each large odd number having an independent probability of of lying in . Similarly, consists of the primes and , together with a random set of numbers coprime to , with each such large number having an independent probability of of lying in . We then have the following (somewhat imprecise) modification of the Cramér model:

Model 10 (Modified Cramér random model)The asymptotic statistics for the set of primes up to some threshold should (almost surely) behave like the asymptotic statistics of the random set for slowly growing in (or alternatively as the limit of the statistics of for fixed , in the limit ).

This model is not as precise as the naive Cramér model, because it does not specify exactly how is to depend on , but in many applications it does not matter so much (except perhaps for predicting the size of error terms) exactly what is, so long as it grows slowly (e.g. or slower).

We illustrate this model with an refined prediction for the twin prime conjecture.

Prediction 11 (Refined twin prime asymptotic)We haveas , where is the twin primes constant, defined as

In particular, there are infinitely many primes such that is also prime.

In analogy with Exercise 9, one is led to the refined prediction

for any fixed . This prediction matches well with numerics. For instance, the number of twin primes up to is , which is within of the prediction given by the above formula (in contrast, (8) is off by about thirty percent). (See this web page of Gourdon and Sebah for some further numerical statistics on twin primes.)

*Proof:* (Uses modified Cramér model; informal) The argument here is essentially due to Pólya. We study the statistic

for fixed and large . The primes less than or equal to contribute to this statistic, so we restrict attention to those with . A residue class only contributes to the above statistic in case and are both coprime to ; by the Chinese remainder theorem, the number of such residue classes is

On the other hand, a modification of the computation in Prediction 8 shows that the contribution of a given residue class with both coprime to should be

with “high probability” (we do not quantify this term here). Thus the total size of (8) should be approximately

This simplifies as

Letting grow slowly to infinity, we obtain the claim.

Note now that we do not run into the problem of the Cramér model predicting infinitely many pairs of consecutive primes (as will avoid such patterns once ), or infinitely many prime triplets , etc.

The modified Cramér model already predicts answers (and asymptotics) to many of the basic unsolved questions in analytic prime number theory, including an affirmative answer to all four of Landau’s problems; these are discussed in the exercises below.

Exercise 12 (Goldbach conjecture)Let be a large even number. Use the modified Cramér model to predict that the number of ways to represent as the sum of two primes is equal to

as . In particular, the even Goldbach conjecture should be true for sufficiently large even numbers. Given that this conjecture has been numerically verified for even integers up to about , it is thus widely expected that this conjecture is true.

Exercise 13 (Hardy-Littlewood prime tuples conjecture)Let , and let be a tuple of distinct integers which areadmissiblein the sense that for every prime , there is at least one residue class that is disjoint from all of the . (For instance, is admissible, but or is not admissible.) Use the modified Cramér model to predict the Hardy-Littlewood prime tuples conjecture:

as , where is the

singular seriesand is the number of distinct residue classes that the lie in modulo . Show also that is convergent to a non-zero number when the tuple is admissible; in particular, the modified Cramér model predicts that there are infinitely many such that are simultaneously prime.

Again in analogy with Exercise 9, one expects to be able to improve (10) to

for any and ; we shall refer to this as the *strong Hardy-Littlewood prime tuples conjecture*.

Remark 14The even Goldbach conjecture and the prime tuples conjecture are still unsolved. However, if one performs more averaging over additional parameters, the analogous asymptotics to (9) or (10) can be proven in many cases, thus giving some support to the accuracy of the modified Cramér model. For instance, Vinogradov famously established (via the Hardy-Littlewood circle method) the analogue to (9) for the number of representations of a large odd natural number as the sum ofthreeprimes, implying in particular that every sufficiently large odd number was the sum of three primes. Similarly, an averaged version of the Hardy-Littlewood conjecture was obtained by Balog, and a general higher-dimensional version of this conjecture in a series of papers by Green, Ziegler, and myself. This allows one to unconditionally establish the asymptotics predicted by the modified Cramér model for statistics such as the number of arithmetic progressions of length consisting solely of primes less than a given threshold .

Exercise 15 (Landau’s fourth problem)Use the modified Cramér model to predict thatwhere

and is the number of residue classes such that , with convergent to a positive non-zero number. (To show the latter claim, you will need quadratic reciprocity (Exercise 59 from Notes 2) and Dirichlet’s theorem on primes in arithmetic progression, in the form of Exercise 81 of Notes 1.) In particular, the modified Cramér model predicts an affirmative answer to Landau’s fourth problem, namely that there are infinitely many such that is prime.

Remark 16A common generalisation of the previous two exercises is the Bateman-Horn conjecture, that asserts that given a collection of distinct non-constant irreducible polynomials that take integer values at the integers, have positive leading coefficients, and such that for every prime there is a natural number such that are not divisible by , one hasas , where is the product of the degrees of the polynomials , and

where is the number of residue classes such that . This can be justified from the modified Cramér model by similar reasoning to the above, although to show that the product converges to a non-zero number requires some additional algebraic number theory input (the Chebotarev density theorem, which can be viewed as a generalisation of Dirichlet’s theorem) which we are not covering here. A consequence of the Bateman-Horn conjecture is Schinzel’s hypothesis H, which asserts under the above hypotheses that there are infinitely many natural numbers such that are simultaneously prime.

We saw that the twin prime conjecture asymptotic (Prediction 8) had to be adjusted (to Prediction 11) when the naive Cramér model was replaced by the modified Cramér model. At the other extreme of prime gap behaviour, a somewhat analogous modification of the Cramér conjecture (Conjecture 7) on was shown by Granville:

Prediction 17 (Modified Cramér conjecture)We have as .

Since , we thus see that Prediction 8 is now expected to be incorrect by over ten percent. It is uncertain whether the quantity can be improved further, however it seems reasonable to continue to conjecture a weak form of the Cramér conjecture.

*Proof:* (Uses modified Cramér model) The basic idea here is not to look at *all* prime gaps, but gaps associated to given residue classes modulo , for a well chosen choice of . This is an instance of the Maier matrix method, which we will revisit later in this course.

More specifically, fix a small quantity . Set ; the reason for this choice is that this is almost the largest choice that keeps significantly less than , since from the prime number theorem one has

where are the elements of in increasing order. We will show that almost surely one has as , which leads to the above prediction by applying the modified Cramér model with this choice of .

By Borel-Cantelli, it suffices to show that with probability for sufficiently large . Set . Suppose we can find an natural number between and such that the numbers all lie outside of . It is easy to see that with probability there is at least one element of in , and so we will have in this case. So it suffices to show that with probability there is at least one with .

Fix . The key point is that an unusually large number of the elements of the sequence are automatically excluded from by virtue of having a common factor with ; probabilistic heuristics suggest that at least or so of the elements should be coprime, but the truth turns out to be closer to , which is ultimately where the gain of over Conjecture 7 is coming from. Indeed, since for large enough (and small), we see from the sieve of Erathosthenes that the only elements in this sequence which are coprime to are , , for primes , and for primes . Let us count how many such elements there are. From the prime number theorem there are at most

elements of the form . From the prime number theorem again, the number of elements of the form , noting that and , is at most

By choice of , we have for in this sum, so by Mertens’ theorem this contribution is at most

which simplifies to

Thus the total number of elements in that are coprime to is at most . In the modified Cramér model, each of these elements has an independent probability of of lying in . From Mertens’ third theorem we have

and so each of the has an independent probability of of lying in . The probability that none of the lie in is thus at least

These events are independent in , and there are choices of . Thus the probability that none of these events occurs is at most

which is as required.

** — 2. Möbius pseudorandomness — **

We now turn to a somewhat different model, which is not based on modeling the set of primes directly, but rather the Möbius function , which we have seen in Notes 1 to be closely related to the distribution of the primes. One can also work with the closely related Liouville function , if one wants to consider a function consisting of pure sign patterns with no zeroes.

One could adopt a Cramér type model for the Möbius or Liouville functions by replacing them with a random arithmetic function, or perhaps a random *multiplicative* arithmetic function, taking values in or . While one can certainly do this (see e.g. this paper of Harper for a recent study of such models), we will not directly build such models here, but instead focus on studying *linear statistics*

of the Möbius function (or analogous statistics for the Liouville function), tested against various arithmetic functions of interest. The Möbius and Liouville statistics are closely related, with estimates on one of these sums usually leading quite quickly in practice to analogous estimates on the other sum:

Exercise 18Establish the identitiesand

for any arithmetic function and any . (In practice, due to the absolute convergence of , the summation in is fairly manageable, so as a first approximation the above identities are asserting that and have comparable behaviour.)

Henceforth we focus solely on the Möbius statistics. Let us normalise to be bounded in size by , then we have the trivial bound

This bound can be sharp if oscillates with the same sign as , for instance if is equal to itself (or to ). However, in most other situations we expect significant cancellation. One can informally (and imprecisely) summarise this principle as follows:

Principle 19 (Möbius pseudorandomness principle)Let be a large quantity, and let be a function bounded by , which is not deliberately chosen so as to oscillate with the same sign as , and has “polynomial complexity” in some sense (e.g. it is defined using parameters that are natural numbers of size . Then we predict one of the following cancellation properties:

- (Weak pseudorandomness) One has as .
- (Moderate pseudorandomness) One has for any .
- (Strong pseudorandomness) One has for any .
Similarly if we replace by for some non-constant polynomial of degree that takes natural number values on , and which is of polynomial complexity in the sense that its coefficients are .

Note that unlike the Cramér random model, the Möbius pseudorandomness principle is not explicitly probabilistic in nature; the predictions here are completely deterministic in nature, and do not require any analysis of a random model. However, the principle is certainly *compatible* with various random models for the Möbius function; for instance, if one models by a random function from to that is independent of all functions for which one wishes to apply this principle, then one can justify the strong form of this pseudorandomness principle from the Chernoff inequality (or from arguments similar to that used to justify Prediction 3).

The above principle is imprecise, as we have not formally described what functions are admissible for this principle. One clean formalisation of the principle is the *Sarnak conjecture*, in which is generated by a zero-entropy dynamical system; see this survey of Sarnak or this previous blog post for more discussion. However, we will stick with the more imprecise form of Principle 19, as it covers some situations that are not strictly included in the Sarnak conjecture.

We have allowed for three levels of possible strength for this principle. The weak form of this principle can actually be established rigorously for a somewhat broad range of functions , but is not particularly useful for applications concerning primes (basically because the primes are so sparse within the integers that any properties about primes one wants to discern will usually be lost in the error). The moderate form of this principle can be rigorously established for a smaller range of functions than the weak form, but the error bounds are strong enough to make non-trivial predictions regarding the primes. The strong form of course gives the best predictions, but has not been proven unconditionally for any non-trivial of number-theoretic interest, although as noted above it is certainly compatible with Cramér type models for .

Setting in the above principle, we see that the strong Möbius pseudorandomness principle gives

which as noted in Exercise 33 of Notes 2 was equivalent to the Riemann hypothesis. Using shifted polynomials we also obtain the variants

for . One could be more ambitious and extend this asymptotic to larger values of , but Cramér-type probabilistic heuristics suggest that if is too large (of exponential size in ), then just from (pseudo)-random fluctuations in the sign patten of the Möbius function, one should eventually end up with a choice of in which all (or most) of the signs of for line up, leading to a failure of (14). However, the probabilistic heuristics indicate that (14) is a reasonable claim to make for restricted to be of polynomial size in . More generally, by using the affine polynomials we arrive at the prediction

At the other extreme, the weak form of the Möbius pseudorandomness principle gives

which as noted in Theorem 58 of Notes 1 is equivalent to the prime number theorem. It should then be unsurprising that the moderate form of the conjecture gives a version of the prime number theorem with an error term better than , but worse than predicted by the Riemann hypothesis:

for any and .

*Proof:* (Uses moderate Möbius pseudorandomness) We use the Dirichlet hyperbola method. Since , we may write

From the moderate Möbius pseudorandomness principle (applied with replaced by and , and then subtracting), we have

for any , so the contribution of the second sum can be seen to be

Meanwhile, from Stirling’s formula (see equation (8) of Notes 1) one has

and thus

Now from further applications of the Möbius pseudorandomness principle and dyadic decomposition, we see that

and

for sufficiently small . But on noting that for near , we see that and converge to and respectively as , and the claim follows.

Of course, the above prediction already follows from the prime number theorem with classical error term (Corollary 39 of Notes 2). But one can achieve much more general results by using other weights than the constant weight in Proposition 19. For instance, since has no obvious propensity to oscillate with the same sign as , we expect to have

from the weak Möbius pseudorandomness principle; more generally, we expect the Chowla conjecture

for any distinct , and more generally still that

whenever are distinct irreducible polynomials of degree and coefficients that attain natural number values when . Of course, if one assumes the moderate or strong Möbius pseudorandomness principle one can improve the decay rates here to or . Using such asymptotics, one can adapt the justification of Prediction 20 to recover asymptotics such as the twin prime asymptotic in Prediction 11:

Prediction 21 (Improved twin prime asymptotic, again)We havefor any and , where is the twin prime constant from Prediction 11.

It is not difficult to see that this prediction implies Prediction 11, thus demonstrating some consistency between the modified Cramér model and the Möbius pseudorandomness heuristic. We give further examples of this consistency in the exercises below.

*Proof:* (uses moderate Möbius pseudorandomness) To avoid some minor complications later in the argument we will restrict the sum over to odd ; this is easily accomplished because the contribution of even is . For odd we have

where is the principal character of modulus . We can then write the odd contribution to as

We first consider the contribution of those with (say), which we write as

An application of moderate Möbius pseudorandomness (dividing by to make it bounded) gives

for any , so the contribution of this case is acceptable. Thus it will suffice to show that

We write the left-hand side as

where

We once again use the expansion (16) to write this as

Using moderate Möbius pseudorandomness again, the contribution of those with is acceptable, so it suffices to show that

for any . The left-hand side may be rearranged as

As are restricted to be odd, we see that the inner sum vanishes unless and are coprime. In that case, we see from the Chinese remainder theorem the inner sum is counting the portion of an arithmetic progression of modulus that is less than or equal to , so the inner sum is . The contribution of the term is easily seen to be , which is acceptable, so it suffices to show that

for any .

From moderate Möbius pseudorandomness (applied to either the or sum, depending on which of is larger) we have

for any and any , ; by dyadic decomposition we may thus approximate the left-hand side of (17) up to acceptable errors by

if is small enough. But this may be written as

where

Factoring the series over as an Euler product, we have

Comparing this with the Euler product , we have

where

For , the second product is absolutely convergent and defines a smooth function in . Thus, extends smoothly to a neighbourhood of , and the left-hand side of (18) can then be written (using the fact that has a simple pole of residue at ) as , which can be easily computed to equal as required.

Exercise 22 (Goldbach conjecture again)Use moderate Möbius pseudorandomness to give another justification of the Goldbach conjecture (9); indeed, establish the stronger asymptotic

for any even and . (

Hint:here, the condition of coprimality of has to be replaced by a more complicated condition involving , and a denominator of has to similarly be replaced by , where is the least common multiple of . Nevertheless, this more complicated expression can still be handled by Euler products as previously. On the plus side, with these complications there is no additional need to introduce the character .)

Exercise 23Use moderate Möbius pseudorandomness to give another justification of the Hardy-Littlewood prime tuples conjecture (10); indeed, establish the stronger asymptoticfor any and , under the hypotheses of Exercise 13.

Exercise 24Use moderate Möbius pseudorandomness to justify the claimfor any primitive residue class , , and , assuming that for some . What goes wrong when is larger than ?

Remark 25A Cramér type model (using in place of ) predicts the same asymptotic with as large as , rather than . This exposes some limitations of the Möbius pseudorandomness principle as formulated in Principle 19, in that it is not cannot obtain some of the asymptotics that Cramér type models can reach, particularly when trying to detect primes in very sparse sets of integers. (For similar reasons, Principle 19 has trouble recovering the asymptotic in Exercise 15; it also does not appear to be an appropriate tool for predicting the size of the largest prime gaps.) One can address this weakness of Principle 19 by assuming a stronger version of pseudorandomness, in which the function is allowed to be concentrated on a sparse set of integers, rather than being uniformly bounded by , but we will not detail this variant here.

Remark 26Using the above sorts of arguments, one might expect that square root cancellation for Möbius correlations (i.e., the strong Möbius pseudorandomness conjecture) might lead to square root error terms for von Mangoldt correlations . However, this is not always the case. For instance, in Appendix C of this paper of Iwaniec, Luo, and Sarnak, the strong Möbius pseudorandomness conjecture is used to predict an unusual non-square-root cancellationfor some non-zero , where is a smooth compactly supported function and

is a normalised modulated version of the Ramanujan tau function (not to be confused with the divisor function , which we have called in previous notes). This asymptotic does not seem to be easily explainable by a Cramér type model, but is still believed to be accurate, as even the strong Möbius pseudorandomness conjecture appears to be quite reliable in practice (but see the caveat below for the function field analogue).

Remark 27A huge portion of analytic number theory over the natural numbers has a counterpart in thefunction field setting, in which the ring of integers is replaced with the ring of polynomials over some finite field , with the notion of primality, Möbius function, etc. modified accordingly. As it turns out, in that setting there is a modification that needs to be made to the Möbius pseudorandomness principle, due to biases in the Möbius function applied to polynomials in this ring, particularly when the characteristic of divides the degree of . For instance, in characteristic three, the Möbius function of the degree polynomial is twice as likely to be as it is to be (see Example 3.2 of this paper of Conrad, Conrad, and Gross). Similarly, in an arbitrary finite field , the Möbius function of the degree polynomial is never , despite being irreducible. We refer the reader to the Conrad-Conrad-Gross paper for further discussion of these biases, and how to adjust the pseudorandomness principle to account for them; but we will not discuss the function field setting further in these notes.

** — 3. Equidistribution of residues — **

We now discuss a model closely related to the Cramér model, concerning the residue classes of a randomly chosen large number with respect to various residues . For sake of discussion, let us select uniformly at random from the natural numbers less than , for some large quantity . Each residue class then occupies possible values of , so is equal to with probability . Thus, for large compared to , the residue class is approximately uniformly distributed in the cyclic group .

Now we consider the *joint* distribution of the residue classes as varies. From the Chinese remainder theorem, if and are coprime, then the residue class is completely determined by the residue classes and . Thus, in principle at least, we only need to consider the joint distribution of for moduli that are prime powers . To simplify the discussion, we will just consider residue classes for prime. This is already enough information (in principle, at least) for many questions about the primes. For instance, to determine whether a number between and is a twin prime (in the sense that are both prime), it is necessary and sufficient that avoids the residue classes for all , thanks to the sieve of Eratosthenes.

If are distinct primes with product much less than , then by the previous discussion, is approximately uniformly distributed in , and so by the Chinese remainder theorem, the residue classes are approximately jointly independent. But what happens in the opposite case, when is much larger than ? Then is not close to uniformly distributed in in any strong sense (such as in the total variation sense), since it only takes of the possible values in . However, one can still hope for the random variables to be approximately independent in some *weak* sense, in that various probabilistic statistics of these random variables approximately match what they would be if these random variables were genuinely independent.

For instance, suppose one is interested in predicting the number of natural numbers less than or equal to which are *-rough* in the sense that they are not divisible by any prime less than , for some . If we pick uniformly at random from the natural numbers up to , then the probability that is not divisible by any given prime (i.e., that ) is approximately . Assuming that the residue classes behave as if they are independent, we then expect that the probability that is not divisible by *any* of the primes should be approximately . Thus we arrive at the naive prediction

As it turns out, this prediction works well for small , but becomes inaccurate when is very large. Consider for instance the case . The sieve of Erathosthenes then tells us that consists precisely of the primes with , thus

If we have (in the asymptotic regime ), then the prime number theorem tells us that

In contrast, Mertens’ third theorem (Theorem 42 of Notes 1) gives

Thus we see that the naive independence prediction is incorrect in the range (except when is close to , but one should view this as a numerical coincidence); this failure is sometimes known as the *Mertens paradox*. In retrospect, one can see this after noting that for any two distinct primes , the events that and are not independent, since they can hold separately, but not simultaneously. There is a similar issue for slightly smaller primes: for instance, if , the events , , can hold separately, but not simultaneously. As such, one should expect the naive independence prediction to typical fail when for a bounded exponent . This is indeed the case, as worked out in the following exercise (which is the “dual” of Exercise 39 from Notes 1).

Exercise 28 (Density of rough numbers)Let be an asymptotic parameter tending to infinity.

- (i) Show that for any fixed , one has
where the Buchstab function is defined by the series

with the convention that . (Note that for any given that only finitely many of the summands are non-zero; one can also view the first term as the term of the summation after carefully working out what zero-dimensional spaces and empty products evaluate to).

- (ii) (Delay-differential equation) Show that is continuous on except at , continuously differentiable except at , equals for and obeys the equation
for with . Give a heuristic justification for this equation by considering how varies with respect to small perturbations of .

- (iii) (Wirsing integral equation) Show that
Give a heuristic justification for this equation by starting with a -rough number and considering a factor , where is a prime factor of chosen at random with probability (if occurs times in the prime factorisation of ).

- (iv) (Laplace transform) Show that
for all .

- (v) (Duality with Dickman function) If is the Dickman function from Exercise 39 of Notes 1, show that
for all .

- (vi) (Asymptotic constancy) For , show that for all sufficiently large . Conclude in particular that converges to a limit as .
- (vii) (Limiting value) Show that converges to as . (
Hint:Use (iv) with small, combined with (vi).)

Remark 29One can model the dependencies of the events for large more accurately by using a more sophisticated model than a jointly independent model, namely aPoisson-Dirichlet process, discussed in this previous post. This model can be used for instance to recover for instance the Dickman and Buchstab statistics for smooth and rough numbers respectively, thus also helps explain the factor appearing in (22). However, we will not discuss this process further here.

Parts (i) and (vii) of Exercise 28 suggest that the naive prediction (20) is in fact quite accurate for with large, even if it becomes inaccurate when is small. We now informally generalise this to a heuristic:

Heuristic 30 (Equidistribution of residues)Let as , and let be chosen uniformly at random from the numbers up to . Then the residues for behave “as if” they were independent random variables. Similarly for other “reasonable” probability distributions on natural numbers up to (e.g. the uniform distribution on natural numbers between and ). A similar heuristic can apply for as large as if one is prepared to accept some error terms which go to zero as goes to infinity.

In later notes, we will develop the basics of sieve theory, which provides a number of rigorous interpretations of the above heuristic (such as the fundamental lemma of sieve theory).

Because the heuristic stops working for as large as or , one cannot use Heuristic 30 directly to count primes to make predictions for problems such as Landau’s problems. However, one can input this heuristic into the modified Cramér model (Model 10), effectively giving the ability to take the parameter in that model to be as large as . (But, as mentioned before, in many applications the precise choice of the parameter is not terribly important, so long as it is eventually taken to go to infinity.)

Alternatively, by stopping the sieve of Eratosthenes early, one can use Heuristic (30) to predict *upper bounds* for various prime statistics. For instance, let us return to the problem of counting prime twins with . We select a natural number uniformly between and and ask for the probability that are simultaneously prime (the contribution of those primes below is negligible). By the sieve of Eratosthenes, this is equivalent to requiring that avoid the residue classes , for all . We cannot apply the heuristic all the way to , so let us stop at for some small , with the assumption that the error terms will be somehow manageable if is small enough. For each , the probability that avoids both of and is , except when , where the probability is . Thus, by Heuristic 30, the probability that avoids for all should be approximately

Writing as and recalling the twin prime constant (7), this probability is approximately

which by Mertens’ third theorem is approximately

leading to an upper bound of approximately for the number of twin primes up to . As with the problem of counting , this upper bound fails to recover the predicted asymptotic from Prediction 11 if one naively sets equal to or , but it is in some sense off “by the same factor” that (22) is off from (21). In particular, if one formally sets equal to the “magic” value of , one recovers Prediction 11; the use of this magic value has been proposed by Pólya as a computational trick for arriving at suitable predictions for prime asymptotics up to , though I would be cautious about using this choice of blindly. One can perform an analogous upper bound analysis to the problems considered in Exercises 12, 13, 15; we leave this as an exercise to the interested reader.

** — 4. The pair correlation conjecture and the GUE hypothesis — **

We now turn to what appears at first glance to be a completely different way to model the primes, via the explicit formula connecting primes with zeroes of the zeta function.

To simplify the exposition, and to obtain the strongest results and predictions, we will work in this section under the assumption of the Riemann hypothesis (Conjecture 28 from Notes 2), which from the functional equation (Theorem 1 from Supplement 3) is equivalent to the assertion that all the non-trivial zeroes of the Riemann zeta function lie on the critical line . To emphasise this fact, we will write such zeroes as for various real numbers , known as the *ordinates* of non-trivial zeroes of the zeta function. All sums over (or , etc.) are understood to be over such ordinates. For the purposes of notational simplicity, we will also assume that the zeroes of are simple, so the ordinates are all distinct; it is not difficult to modify the discussion and notation below to take into account the (unlikely) possibility of repeated zeroes, but we will leave this as an exercise to the interested reader.

The explicit formula connects primes with the ordinates . There are many equivalent ways to express this formula, but we will begin with the smoothed explicit formula (Exercise 46 from Supplement 3), which asserts that

for any smooth compactly supported supported on , where is the Fourier-Laplace transform. (One could also use the Riemann-Weil explicit formula, see Exercise 47 of Supplement 3, which has the advantage of not requiring to be supported on the positive real axis.) We can get rid of the and factors and the inconvenient shift of in the sum over ordinates by introducing the normalised signed measure on the real line defined by

and then the explicit formula (after replacing by ) becomes

for any smooth compactly supported supported on . This implies a multidimensional analogue

of the explicit formula, where and is a smooth compactly supported function supported on , and is now the multidimensional Fourier transform

Exercise 31Derive (25) from (24). (Hint:first handle the case when the multidimensional function decomposes as a finite linear combination of tensor products of one-dimensional smooth compactly supported functions supported on , and use the Stone-Weierstrass theorem and a limiting argument to then obtain the general case. See this paper of Rudnick and Sarnak for a more general version of (25).

One can use (25) with various choices of to relate various correlations between the primes to correlations between ordinates. To illustrate this, let us apply (25) with and equal to the function

for some (which we view as an asymptotic parameter going to infinity), and are fixed smooth compactly supported functions with supported on ; it will be convenient to normalise . This may seem like a somewhat arbitrary choice of test function , but informally this choice is measuring a smoothed pair correlation between primes of size which differ by a multiplicative factor of .

With this choice of , (25) becomes (after a routine change of variables)

Let us first look at the right-hand side of (26). From conjugation symmetry we see that the ordinates are symmetric around the origin, so on replacing with , we can rewrite this expression as

where we have introduced the normalising factor . The significance of this factor comes from the Riemann-von Mangoldt formula (Theorem 40 from Supplement 3), which tells us that the number of ordinates in is equal to , so the average spacing between ordinates in this interval is approximately . The sum is a smoothed count of ordinates that lie within of .

We can use the Riemann-von Mangoldt formula to isolate the diagonal term:

Exercise 32Show that

The right-hand side of (26) is thus

Let us now make the hypothesis that the *normalised differences* between distinct ordinates of size (as weighted by ) have a limiting density , in the sense that

for any fixed smooth, rapidly decreasing function ; we refer to as the *pair correlation function*. Intuitively, should be thought of as the asymptotic probability that a randomly chosen ordinate has another ordinate with normalised difference lying in the interval if is infinitesimal. If we assume such an asymptotic, then the right-hand side of (26) is now

Since the average spacing between ordinates is , we expect to be asymptotically equal to , at least on average, so we write as . By hypothesis , so from the Fourier inversion formula we have . Thus the left-hand side of (26) is now

which (assuming is absolutely integrable) can be expressed using Fubini’s theorem as

Of course, to make use of this formula we will need to know or conjecture something about the pair correlation function . To do this we return to the left-hand side of (26). In principle, we can use (23) to rewrite this as an expression involving the von Mangoldt function; very roughly, the left-hand side of (26) is a normalised variance of the error term in the prime number theorem in smoothed out versions of the interval with drawn from the support of . The second term on the right-hand side in (23) is somewhat annoying to deal with in general, but if is supported close enough to the origin then things simplify somewhat:

Proposition 33If is supported in , then the left-hand side of (26) is equal to

*Proof:* By hypothesis, is supported on for some fixed . In this argument we will allow implied constants to depend on .

Expanding out (23), we see that the left-hand side of (26) is the sum of the four terms

We first inspect (32). From the support of and we can write

on the support of the integrand. The contribution of the error to (32) is , so (32) simplifies after a change of variables to

Next we turn to (30). On the support of the integrand we have

and the integral vanishes unless . From this we see that the contribution of the error is , and (30) simplifies to

As we are assuming the Riemann hypothesis, we see from Proposition 24 from Notes 2 that

for all in the support of . A routine summation by parts then shows that

and so (30) simplifies to

As for (31), a Taylor expansion gives

on the support of the integrand, and the integral vanishes unless . We can then repeat the argument used to estimate (30) to obtain the same estimate (33) for (31). Adding together all these estimates, we obtain the claim.

We thus arrive at the asymptotic

whenever are fixed functions with and supported in . We can use this relation to try to recover the pair correlation function , or at least the portion of this function described by the Fourier transform with .

We first deal with the easy case when is in fact supported on :

Proposition 34 (Easy case)If is supported on , then the left-hand side of (34) is equal to

Note that when is supported on , the first term on the right-hand side of (34) is , and so this proposition indicates that

for .

*Proof:* By hypothesis, is supported in for some fixed . The summand in the left-hand side of (34) vanishes unless and , which implies that . But are natural numbers, and so (for large enough) this forces . Thus the left-hand side of (34) simplifies (using the normalisation ) to

The contribution of those that are prime powers for is easily seen to be , so we can write this expression as

The claim now follows from Exercise 37 of Notes 1.

Now we look at the complementary case when is supported on , and thus on for some fixed . (This skips over a transitional case when is supported near , but it turns out that the analysis there is basically a mixture of the two other cases, and is left as an exercise to the reader below.) This is significantly more difficult, because are no longer forced to be equal. As such, we are unable to compute the left-hand side of (34) just from the Riemann hypothesis; we will need an even stronger claim, namely the strong Hardy-Littlewood prime tuples conjecture (11) (or more precisely, the case of this conjecture) to make headway.

The diagonal contribution to the left-hand side of (34) can still be computed to be by the arguments used to prove Proposition 34, so we focus on the contribution of the off-diagonal term . We can write with non-zero, arriving at

Note that the summand vanishes unless and , so that . By Taylor expansion, we then have

and

The contribution of the errors can be easily seen to be , so the off-diagonal contribution to (34) simplifies to

Now, the strong Hardy-Littlewood prime tuples conjecture gives

(say) for any and non-zero with , where is the singular series

A summation by parts then lets us write as the sum of the integral

and the error term

For the error term, the derivative can be computed to be , and vanish unless , so the error term is , and then summing over non-zero we see that this contribution is . Thus the off-diagonal contribution has simplified (after some rearranging) to

To proceed further we need to understand mean values of the singular series . It turns out (as first observed by Bombieri and Davenport, and then generalised by Gallagher) has average value of , (see e.g. this paper of Ford for a short proof, related to the equidistribution of residues heuristics discussed previously). However, for our analysis we will need a finer estimate that detects a deviation from coming from the exclusion of from the sum (where the singular series is undefined). Namely:

Proposition 35 (Mean value of singular series)Let , and let be a fixed smooth, compactly supported function. Then

Informally, the term here is reflecting a very slight but non-trivial repulsion estimate among the primes (detected by the modified Cramér model, but not the naive Cramér model): if is prime, and is small and *non-zero*, then is slightly less likely to be prime than a typical number of size . This is because most small primes will not divide either or , and so will be slightly more likely to divide than a typical number of this size. (Conversely, the small primes that do divide will increase the probability that is prime, but because is restricted to be non-zero, this latter effect is slightly weaker than the former.)

*Proof:* We allow implied constants to depend on . Since vanishes for odd , we can restrict attention to even . We can then factor

where is the twin prime constant (7) (cf. (9)). This can then be expanded as

where is the multiplicative function supported on square-free numbers with and for . Writing , we may therefore write the left-hand side of (35) as

The summand vanishes unless . In this regime, we may apply Exercise 11 of Notes 1 (or the Poisson summation formula, see Exercise 35 of Supplement 2) and restore the term to conclude that

Noting that whenever is sufficiently large, we see that . From this we see that the contribution of the term is . The left-hand side of (35) is now

Using the bound again, we see that ; also, from Theorem 27 of Notes 1 (with ) we have

From (7) we have

and from Euler products we have

and the claim follows.

We insert this proposition into the previous asymptotic. The contribution of the error in (35) is easily seen to be , so the off-diagonal contribution to (34) is now

where we have used the normalisation . The contribution of the term can be written after a change of variables as

that is to say the first term of (34). A similar change of variables expresses the contribution of the term as

Combining this with the diagonal contribution of to (34) computed previously, we conclude (on the strong Hardy-Littlewood prime tuples conjecture) that the left-hand side of (34) is equal to

when is supported on .

Exercise 36Assuming the strong Hardy-Littlewood prime tuples conjecture, obtain the combined formulafor the left-hand side of (34) when is supported on .

By interchanging and in (27) one sees that the asymptotic pair correlation function should be even. If one compares the above exercise with (34), one arrives at the prediction

for all ; the justification provided for this prediction requires the strong Hardy-Littlewood prime tuples conjecture when , but is unconditional for .

Remark 37If one uses the strong Möbius pseudorandomness principle (see Principle 19) in place of the strong Hardy-Littlewood prime tuples conjecture, one will also recover (36), but in a range intermediate between and , due to some inefficiencies in deploying this principle to study the primes, as discussed in Remark 26. The moderate or weak Möbius pseudorandomness principles do not appear to give any improvement over the easy range , except possibly to clarify some endpoint behaviour at ; this is because the term we are trying to isolate in (34), namely the second term on the right-hand side, is dominated by the first term on the right-hand side when , and so very strong asymptotics are needed in order to discern this lower order behaviour. It seems of interest though to come up with a stronger version of Möbius pseudorandomness (possibly involving multidimensional sums rather than just one-dimensional sums) that could recover (36) in the full range , or possibly even beyond.

Based on a similar calculation to the above, Montgomery conjectured that (36) should in fact hold for *all* , not just for . Using Fourier inversion, this leads to a prediction

for the pair correlation function for all , with the convention that the sinc function equals when . Note that vanishes to second order as ; this indicates a strong *repulsion* effect between ordinates (far stronger than the weak repulsion between primes discussed previously), asserting roughly speaking that the probability that the gap between adjacent ordinates is less than decays like , rather than the naive prediction of which would occur if there was no repulsion phenomenon.

Exercise 38Formally derive (37) from the hypothesis that (36) holds for all .

We informally refer to the prediction (37) for all as the pair correlation conjecture. Actually, because the above analysis only assumes the asymptotic (27) for smooth compactly supported , we have only been working thus far with a smoothed version of the pair correlation conjecture. In the literature, stronger versions of the pair correlation conjecture (replacing with rougher cutoffs) are sometimes formulated; also, in some conventions, a Dirac mass is added to in order to remove the restriction of distinct ordinates; as such, one should view the pair correlation conjecture as a *family* of closely related conjectures, rather than a single conjecture (cf. the Möbius pseudorandomness principle, Principle 19). With such strengthenings of the pair correlation conjecture, one can obtain asymptotics for rougher expressions than (26). For instance, we have the following elegant result of Goldston: assuming the Riemann hypothesis, the statement

is logically equivalent to the statement

- (Variance asymptotic for prime number theorem in short intervals) One has
as for any fixed , where is the von Mangoldt summatory function.

which cleanly illustrates the duality between correlations of ordinates and correlations of primes. The implication of (39) from (38) was previously obtained by Gallagher and Mueller. See also this paper of Goldston and Montgomery for a similar equivalence (in which both of the above statements are replaced by slightly stronger versions with less averaging), as well as this paper of Keating. One should view the above variance asymptotic as a prediction that the statistic for has variance about for . (Note that is the derivative of .) In contrast, the naive Cramér model predicts a variance of approximately ; there is a variance saturation effect expected for which is coming from the slight repulsion between nearby primes discussed previously; this is a further manifestation of the slight “subrandom” nature of the primes, as discussed in Remark 6. See this survey of Goldston for a further discussion of these results; see also this paper of Montgomery and Soundarajan for a more refined analysis, predicting a central limit theorem for the prime number theorem in short intervals that is consistent both with the refined Cramér model and the GUE hypothesis (see below).

Exercise 39Modify the proof of Proposition 34 to verify (39) unconditionally for .

The pair correlation conjecture is also supported by substantial numerical evidence, notably through the computations of Andrew Odlyzko and his co-authors.

It was famously observed by Freeman Dyson, in a tea-time conversation with Hugh Montgomery at the institute for advanced study, that the pair correlation function (37) predicted for ordinates of the zeta function also appeared as the pair correlation function for many models of random matrix theory. Here is one typical such place where it appears:

Theorem 40 (Pair correlation for GUE)Let be a natural number going to infinity, and let be an Hermitian random matrix whose diagonal entries are independent real Gaussians of mean zero and variance , and whose strictly upper triangular entries are independent complex Gaussians of mean zero and variance (independent of the diagonal entries); this random matrix ensemble is known as the Gaussian unitary ensemble (GUE). Let be the (necessarily real) eigenvalues of , counted with multiplicity. Let be fixed, and define the normalisation factor . Then we havefor any fixed whenever goes to infinity slower than , and where is the function (37).

See this previous blog post for a proof of this claim. The factor of reflects the famous Wigner semicircular law for the GUE ensemble, which does not directly appear in the theory of ordinates of zeta (the nearest analogue is the Riemann-von Mangoldt formula). But the appearance of the pair correlation function (37) does not appear to be closely tied with bulk distribution laws such as the Wigner law. For instance, here is another occurrence of the pair correlation function (37) in random matrix theory that does not involve the semicircle law:

Theorem 41 (Pair correlation for CUE)Let be a natural number going to infinity, and let be an unitary random matrix chosen using Haar measure; this random matrix ensemble is known as the circular unitary ensemble (CUE). Let be the (necessarily unitary) eigenvalues of , counted with multiplicity, with . Let be fixed, and define the normalisation factor . Then we havefor any fixed whenever goes to infinity slower than , and where is the function (37).

*Proof:* This is a result of Dyson.

In fact, the pair correlation function arises in extremely broad classes of random matrices, reflecting a general phenomenon now known as *random matrix universality*; see this previous blog post for some non-technical discussion of this phenomenon.

This result inspired a more general *GUE hypothesis*, which broadly speaking asserts that the (suitably normalised) distribution of ordinates of size at the “microscopic” scale of (or at the “mesoscopic” scale between and ) as should asymptotically attain the same limit as the analogous (suitably normalised) asymptotics of the eigenvalues of GUE (or CUE) at the analogous microscopic or mesoscopic scales as . (Roughly speaking, is analogous to , as can be seen by comparing the normalising factors in all of the above examples.) An analogous hypothesis has also been made for other -functions, although in some cases the GUE model needs to be replaced by another standard random matrix model; see this expository article of Katz and Sarnak for details. As such, rigorous results about the asymptotic distribution of eigenvalues of GUE or CUE can be used to make *predictions* about the distribution of ordinates, which through the explicit formula can also lead to predictions about prime numbers.

For instance, the theory of ensembles such as GUE and CUE in fact provides control not just of pair correlations such as

but of higher order correlations involving -tuples of nearby eigenvalues for . We will not state the precise results here, but remark simply that the role of the two-point correlation function from (37) is replaced by the more general function

with the convention that . Again, we refer the interested reader to this previous blog post for the relationship between these correlation functions and the GUE ensemble. The determinantal structure of the -point correlation function is an example of the correlation function of a more general class of determinantal point processes, discussed in this previous blog post.

The -point correlation asymptotics for GUE or CUE leads to one precise instance (but certainly not the *only* such instance) of the GUE hypothesis:

Prediction 42 (GUE hypothesis)Let be fixed, and let be a smooth compactly supported function. Let be an asymptotic parameter going to infinity, and set . Thenwhere is defined by (40).

Much as the pair correlation conjecture leads (via the explicit formula) to asymptotics about pairs of primes that are close to each other, the more general GUE hypothesis turns out to lead to asymptotics about pairs of *products of primes* of primes (for some ) that are close to each other, although the precise description of such asymptotics becomes somewhat messy; see this paper of Rodgers for a precise formulation. One can then develop an analogue of Proposition 34 to obtain a partial verification of the GUE hypothesis (essentially verifying that the low frequency component of (40) is correct); this was done (in great generality, for a much wider class of -functions than just the Riemann zeta function, and assuming GRH in place of RH) by Rudnick and Sarnak (with some earlier results by Hejhal). The analogous deployment of the strong Hardy-Littlewood tuples conjecture to verify a larger portion of the GUE hypothesis was carried out by Montgomery and Soundararajan. A heuristic derivation of the entire GUE hypothesis along these lines (without rigorous treatment of error terms) was carried out by Bogomolny and Keating. A function field analogue of the GUE hypothesis, in which the integers are replaced by a function field for some large , was also established by Katz and Sarnak.

As mentioned above, Prediction 42 is not the only way to formalise the GUE hypothesis. The GUE hypothesis can be used to predict the behaviour of the Riemann zeta function , by drawing an analogy between this function and the characteristic polynomial of a GUE or CUE matrix. This can be used for instance to predict central limit theorems for the logarithm of the zeta function, or asymptotics for moments of the zeta function (or its logarithm, or logarithmic derivative) on or near the critical line; furthermore, in the few cases where these predictions can actually be verified rigorously (or on hypotheses such as the Riemann hypothesis), the predictions agree with the rigorous computations. There is now a vast literature on this topic; see for instance this survey of Keating and Snaith for further discussion.

Remark 43Using the analogy between the statistics of the zeta function and random matrix theory, the Riemann hypothesis is then analogous to the trivial fact that the eigenvalues of a GUE matrix are all real (or that the eigenvalues of a CUE matrix all lie on the unit circle). This is reminiscent of an old idea of Hilbert and Pólya that the ordinates of zeta should arise as the eigenvalues of some “natural” unbounded linear operator on a Hilbert space, which could then be shown to be self-adjoint in order to verify the Riemann hypothesis. Unfortunately, the GUE hypothesis does not suggest a useful candidate for such an operator. In fact, the breadth of the universality phenomenon extends well beyond random matrix models, and so the GUE hypothesis may not actually indicate any direct connection between zeta and matrices (or linear operators), instead reflecting the universal nature of the correlation functions (40).

Remark 44As with all of the other models discussed in these notes, one should bear in mind that there are finer adjustments to the GUE hypothesis that need to be made in certain regimes. For instance, it appears that there is a very slight repulsion effect between differences of two large ordinates and one small ordinate , in that is slightly less likely to be close to than a direct application of the GUE hypothesis (or Riemann-von Mangoldt law) would indicate. This repulsion was first observed by Bogolmony and Keating, and is ultimately a reflection of the obvious negative correlation between and ; see the papers of Snaith or of Rodgers for further discussion.

## 51 comments

Comments feed for this article

4 January, 2015 at 4:57 pm

Eytan PaldiThe RHS of (39) seems to be negative(!) for and sufficiently large .

[Corrected, thanks – T.]4 January, 2015 at 8:57 pm

anthonyquasIn prediction 3, I guess it should say that the non-vanishing terms in the expectation are from the ‘s that appear at *least* twice (not at most).

[Corrected, thanks – T.]4 January, 2015 at 11:42 pm

Avi LevyRight after Prediction 11, I believe there is a spurious term in the refined estimate. Should it read:

[Corrected, thanks – T.]5 January, 2015 at 3:16 am

AnonymousIs there a similar probabilistic prediction of EH conjecture?

[Yes, though if one uses only the naive Cramer model one predicts a version of EH that turns out to be a bit too strong to be true. This will be discussed in the notes on sieve theory, which should appear in a few weeks. -T.]6 January, 2015 at 6:17 am

arch1pairs of prime triplets -> prime triplets?

[Corrected, thanks – T.]6 January, 2015 at 9:35 am

MrCactu5 (@MonsieurCactus)I remember getting frustrated the other day I couldn’t find enough primes. Euclid’s argument shows if you have a complete list of primes you can generate one more. But if you just have some of them, you don’t know anything. Then I found some primes on the internet and I was able to produce some of the randomness you speak of.

In fact, if you wanted to prove a universality phenomenon, you could start by arguing that the Gaussian distribution looks the same at all scales, just have to change the variance. The renormalization group flow is much harder to describe in other situations. Nor do I have any idea what it is!

7 January, 2015 at 9:19 pm

arch1“…we will restrict the sum n to odd n; this is easily accomplished the contribution of even n is O(1)” -> “…we will restrict the sum *over* n to odd n; this is easily accomplished *because* the contribution of even n is O(1)”

But this be written as -> But this *may* be written as

view the first as the j=1 term -> view the first *term* as the j=1 term

Also: The number of formula (23) gets clipped in my (Chrome) browser

[Corrected, thanks – T.]9 January, 2015 at 9:27 am

AnonymousThe distribution of prime numbers is always in every interval (Dirichlet serie)

bewen: (30a + 11) and (42a + 43) ; a = (1; 2;3 ;4: 5; …….)

The number of primes per interval ranges from a minimum of two and a maximum of seven.

The Riemann Hypotesis has resolved and shows that there are only zeros for ( a = 1/2) if not also for ( a = 1/3 ; 1/5 ; ……..); in all convergent series have the absence of any ; p > 3

I present one of the series:

and:

U. Journal of Applied Mathematic (2013)

13 January, 2015 at 10:41 am

AnonymousThere is a polynomial generates prime numnbers only (but not all prime numbers).

With, x = (1;2;3;4;5;6……….)

p = (13;53;213;853;3413;13653;……….)

13 January, 2015 at 10:47 am

Terence TaoUnfortunately, the next entry in this sequence (which, incidentally, is not a polynomial sequence) is not prime. (In fact, whenever , the quantity generated by this sequence will be divisible by 13.)

19 June, 2015 at 9:36 am

Sridhar Ramesh213 isn’t prime… It’s 3 * 71. Similarly, 13653 is also divisible by 3, and in general, every third element of this sequence is divisible by 3 [as the remainders modulo 3 increase by 1 each time].

13 January, 2015 at 11:23 am

arch1Naive Q about the Naive Cramer random model: The denominator in the RHS of (1) is the log of a tower of exponentials. Does that tower of exponentials have any independent significance in this context?

13 January, 2015 at 11:36 am

arch1Oh – I think it signifies that I forgot how logarithms work:-)

13 January, 2015 at 7:09 pm

Eytan PaldiFor a probabilistic prediction of RH with the simplicity of the zeta zeros, consider the Mertens function .

It is known that RH + simplicity of zeta zeros is implied by

(1)

Using probabilistic arguments (the law of iterated logarithm) it was conjectured that

(2)

The current conjecture (using more refined analysis) is

(3)

(see e.g. http://arxiv.org/abs/math/0310381)

Showing subrandom growth of .

Clearly (1) is implied by both (2) and (3).

15 January, 2015 at 8:24 pm

arch1Beginner Q: Can you hint at how the Borel-Cantelli lemma is used to establish that (3) is almost surely true for all x? A straightforward application of the lemma (using the probability estimate you cite) seems to establish only that the probability of (3) being *false* for an infinite number of values of x, is 0.

16 January, 2015 at 9:35 am

Terence TaoIf (3) only fails for a finite number of values, then it holds for all values after increasing the implied constant in the notation.

29 January, 2015 at 11:33 am

254A, Supplement 5: The linear sieve and Chen’s theorem (optional) | What's new[…] number theorem for the Liouville function (Exercise 41 from Notes 2, combined with Exercise 18 from Supplement 4) we […]

13 February, 2015 at 10:16 pm

254A, Notes 6: Large values of Dirichlet polynomials, zero density estimates, and primes in short intervals | What's new[…] studied, and are conjectured by Conrey and Gonek (using the random matrix model, see Section 4 of Supplement 4) to be asymptotic to for certain explicit constants , but this remains unproven. It can be shown […]

18 February, 2015 at 1:51 pm

254A, Notes 4: Some sieve theory | What's new[…] from the equidistribution of residues principle (Section 3 of Supplement 4), bearing in mind (from the modified Cramér model, see Section 1 of Supplement […]

22 February, 2015 at 9:01 am

254A, Notes 7: Linnik’s theorem on primes in arithmetic progressions | What's new[…] the form if one assumes the generalised Riemann hypothesis. The probabilistic random models from Supplement 4 suggest that one should in fact be able to take […]

24 February, 2015 at 11:29 am

254A, Supplement 6: A cheap version of the theorems of Halasz and Matomaki-Radziwill | What's new[…] simple sieving argument (see Exercise 18 of Supplement 4) shows that one can replace by the Möbius function and obtain the same conclusion. See this […]

30 March, 2015 at 12:49 pm

254A, Notes 8: The Hardy-Littlewood circle method and Vinogradov’s theorem | What's new[…] Exercise 4 Obtain a heuristic derivation of the main term using the modified Cramér model (Section 1 of Supplement 4). […]

2 May, 2015 at 5:09 am

billyProfessor Tao

Is there any proof that the limit of the possibility that the twin prime conjecture is correct equals 1 when prime numbers approach infinity?

Thank you in advance

19 May, 2015 at 1:46 am

Anonymousin order to make prediction 11 rigorous, define for

where the product is over all twin primes .

This function is well defined, non-zero and analytic for (if the twin prime conjecture is false it is defined and analytic for ).

It seems that this function plays a similar role for a possible proof of prediction 11 as the role played by for the PNT.

More precisely, it seems that if the second derivative of has an analytic continuation for a neighborhood of the line except for a simple pole at than prediction 11 implies (via Perron’s formula) that the residue of this pole should be (i.e. the main term constant in prediction 11).

Is it possible to prove prediction 11 by such (conjectural) properties of ?

19 May, 2015 at 6:55 am

Terence TaoOne can indeed reformulate the Hardy-Littlewood asymptotic for the number of twin primes (with strong error term) as an assertion about the analytic continuation of , but unless one has access to some other characterisation of that does not involve twin primes, this is basically just a restatement of the original problem. The reason why is such a useful tool is that it has two different interpretations, one as an Euler product and another as the Dirichlet series associated to the natural numbers, which have additive structure as well as multiplicative structure (leading in particular to the Poisson summation formula and thence to the functional equation). In contrast, is the Dirichlet series of the multiplicative semigroup generated by the twin primes, which has no evident additive structure (other than that of being a subset of the natural numbers). In principle, if one knew something further about this multiplicative semigroup, one could say something about twin primes, but thus far the only things we can say about this semigroup come from known or conjectured facts about twin primes, so one is not really making genuine progress through this reformulation at present.

19 May, 2015 at 10:59 pm

AnonymousIf has a meromorphic continuation for a neighborhood of with a pole at , then the second derivative of should have a double pole (not a simple pole as stated in my previous comment) at with the (conjectured) residue as above.

20 May, 2015 at 8:22 am

AnonymousBy expanding where is the modified Mobius function corresponding to the twin primes, and assuming(!) the corresponding modified RH, i.e.

It follows (via summation by parts) that has an analytic continuation to the half plane (i.e. has a meromorphic non-zero continuation to ).

25 May, 2015 at 11:40 pm

Nick CookIn the proof of Conjecture 7 (using the naive Cramer model), the argument for the upper bound seems to give a bound of rather that for the failure probability, so maybe this should be applied to a lacu

[Corrected, thanks – T.]25 May, 2015 at 11:48 pm

Nick CookTypos in the proof of Prediction 17 (using modified Cramer model): in the paragraph starting “Fix k.”, should be , and a couple of lines later “not coprime to ” should be “coprime to “.

[Corrected, thanks – T.]12 July, 2015 at 6:25 am

AnonymousLandau’s fourth problem may be generalized by defining an integer to be a “Landau number” if there are infinitely many primes of the form . Is there any known “landau number”? (or even a proof that the set of “Landau numbers” is nonempty?)

13 July, 2015 at 8:35 am

Terence TaoAs far as I know even the weaker statement that there exists a nonlinear polynomial which attains prime values infinitely often is still open (though everyone believes it to be true, of course). Informally, the basic problem is that the set of large numbers which are representable as nonlinear polynomials with bounded coefficients and degree applied to a medium-sized input (with ) is just too sparse a set for any of our current methods to be capable of detecting primes.

23 October, 2015 at 8:06 am

275A, Notes 3: The weak and strong law of large numbers | What's new[…] number theory. (It can be refined to give what are believed to be more accurate predictions; see this previous blog post for further […]

19 November, 2015 at 3:08 pm

sylvainjulienThe following link might be of some interest: http://ideasfornumbertheory.com/2015/11/18/a-pnt-based-variance-inequality-for-cramers-conjecture/

4 January, 2016 at 12:34 am

Pumping the Primes | TRIFORCE STUDIO[…] the individual primes fall within the interval. For more on randomness and the primes, see the erudite essay of Terry Tao.There’s nothing random or indeterminate about the distribution of the primes, and […]

20 January, 2016 at 1:24 pm

AnonymousIs it possible to predict by the above probabilistic models a distribution law for Mersenne primes? similarly, is it possible to predict that there are only finitely many Fermat primes?

22 January, 2016 at 2:52 pm

Terence TaoYes, though things are a little tricky for Mersenne and Fermat primes because of some patterns in the distribution of modulo various primes, and I think we are in general not as confident in our heuristics for such exponentially sparse sequences as we are for, say, twin primes, for which much more numerical evidence is available. See e.g. http://www.utm.edu/staff/caldwell/preprints/Heuristics.pdf for a discussion.

23 January, 2016 at 3:26 am

AnonymousThanks for your answer and the link!

31 January, 2016 at 11:21 am

ZakIf one applies the naive Cramer model to provide a hueristic to log(x)-rough numbers, that is locating an interval free of log(x)-rough numbers, then this predicts order of magnitude . But it seems to me if one assumes a uniform k-tuples conjecture, one could prove something significantly stronger using your work with Ford, Green, Konyagin and Maynard. I am curious what is going on here.

5 February, 2016 at 9:04 am

ZakI made a silly mistake in the previous post, but I still have a question. The Cramer model for log(x)-rough numbers would predict gaps of order log(x)log(log(x)), yet it is conjectured that , which would improve upon gaps for log(x)-rough number beyond the Cramer model.

5 February, 2016 at 11:20 am

Terence TaoYes, the primes have (or are conjectured to have) unexpected irregularity (not predicted by the naive Cramer model) when measured in intervals of length close to due to the largeness of the Jacobsthal function, a fact first observed by Maier. See this survey of Granville for some more discussion.

7 March, 2016 at 6:42 pm

Alan ChangAt the end of prediction 3, it says “From Chebyshev’s inequality this implies that with probability .” Should this be “with probability “?

[Corrected, thanks – T.]14 March, 2016 at 4:22 pm

Biases between consecutive primes | What's new[…] nature of the bias.) The naive Cramér random model for the primes (discussed for instance in this post) suggests, as a first approximation, that for any , the number of prime differences that are equal […]

2 April, 2016 at 11:47 pm

AnonymousConsider Ramanujan’s description (as told by Hardy) of 1729 as “the smallest number expressible as a sum of two cubes in two different ways”, is there any probabilistic prediction for the size of this number with the corresponding property for a general exponent ?

3 April, 2016 at 10:12 am

Terence TaoIt is believed that no such solutions exist for , see https://en.wikipedia.org/wiki/Lander,_Parkin,_and_Selfridge_conjecture ; this is consistent with the probabilistic heuristics of this post. For , some parametric infinite families of solutions are known, starting with . One can in fact find (a translation of) Euler’s original article on this on the arXiv at http://arxiv.org/abs/math/0505629 . The situation here is similar to that of the FLT, in that naive probabilistic heuristics, not taking into account the presence of low genus curves, give slightly incorrect predictions.

12 April, 2016 at 5:57 pm

Bayesianism, frequentism, and the planted clique, or do algorithms believe in unicorns? | Windows On Theory[…] hope is to obtain general heuristic methods that, like random models for the primes in number theory or the replica method in statistical physics, would allow us to predict the right […]

9 July, 2016 at 4:13 am

AnonymousIs there any probabilistic (perhaps even effective) prediction for the abc conjecture?

20 July, 2016 at 2:34 pm

Clarify and justify how get the derivative of the Laplace transform of the Buchstab function - mathfreebook[…] of Buchstab function, and its Laplace transform from Tao What’s new? I say Exercise 28 of 254A, Supplement 4: Probabilistic models and heuristics for primes, and take required theorems from Wikipedia Differentiation under the integral sign. I¡ve computed […]

30 August, 2016 at 4:10 pm

Heuristic computation of correlations of the divisor function | What's new[…] of (2) and modify it a bit, in the spirit of the “modified Cramér models” from this previous blog post. Our arguments here will be even less rigorous than in the preceding section, for instance we shall […]

5 November, 2016 at 10:52 pm

Ravi BajajThe last remark was over my head but Remark 43 really clarifies the situation. I didn’t actually get that universality was at play in the GUE hypothesis… I was misled by the idea that you refer to about operators.

6 November, 2016 at 6:08 pm

Ravi BajajFollow Up: As I study measure theory… this is sort of the goal of the output of my understanding: to know this PDF:

6 November, 2016 at 6:10 pm

Ravi BajajFollow Up: As I study measure theory… this is sort of the goal of the output of my understanding: to know this article “Nonclassical stochastic flows

and continuous products” by Tsilerson.