A fundamental problem in analytic number theory is to understand the distribution of the prime numbers . For technical reasons, it is convenient not to study the primes directly, but a proxy for the primes known as the von Mangoldt function , defined by setting to equal when is a prime (or a power of that prime) and zero otherwise. The basic reason why the von Mangoldt function is useful is that it encodes the fundamental theorem of arithmetic (which in turn can be viewed as the defining property of the primes) very neatly via the identity

The most important result in this subject is the prime number theorem, which asserts that the number of prime numbers less than a large number is equal to :

Here, of course, denotes a quantity that goes to zero as .

It is not hard to see (e.g. by summation by parts) that this is equivalent to the asymptotic

for the von Mangoldt function (the key point being that the squares, cubes, etc. of primes give a negligible contribution, so is essentially the same quantity as ). Understanding the nature of the term is a very important problem, with the conjectured optimal decay rate of being equivalent to the Riemann hypothesis, but this will not be our concern here.

The prime number theorem has several important generalisations (for instance, there are analogues for other number fields such as the Chebotarev density theorem). One of the more elementary such generalisations is the prime number theorem in arithmetic progressions, which asserts that for fixed and with coprime to (thus ), the number of primes less than equal to mod is equal to , where is the Euler totient function:

(Of course, if is not coprime to , the number of primes less than equal to mod is . The subscript in the and notation denotes that the implied constants in that notation is allowed to depend on .) This is a more quantitative version of Dirichlet’s theorem, which asserts the weaker statement that the number of primes equal to mod is infinite. This theorem is important in many applications in analytic number theory, for instance in Vinogradov’s theorem that every sufficiently large odd number is the sum of three odd primes. (Imagine for instance if almost all of the primes were clustered in the residue class mod , rather than mod . Then almost all sums of three odd primes would be divisible by , leaving dangerously few sums left to cover the remaining two residue classes. Similarly for other moduli than . This does not fully rule out the possibility that Vinogradov’s theorem could still be true, but it does indicate why the prime number theorem in arithmetic progressions is a relevant tool in the proof of that theorem.)

As before, one can rewrite the prime number theorem in arithmetic progressions in terms of the von Mangoldt function as the equivalent form

Philosophically, one of the main reasons why it is so hard to control the distribution of the primes is that we do not currently have too many tools with which one can rule out “conspiracies” between the primes, in which the primes (or the von Mangoldt function) decide to correlate with some structured object (and in particular, with a totally multiplicative function) which then visibly distorts the distribution of the primes. For instance, one could imagine a scenario in which the probability that a randomly chosen large integer is prime is not asymptotic to (as is given by the prime number theorem), but instead to fluctuate depending on the phase of the complex number for some fixed real number , thus for instance the probability might be significantly less than when is close to an integer, and significantly more than when is close to a half-integer. This would contradict the prime number theorem, and so this scenario would have to be somehow eradicated in the course of proving that theorem. In the language of Dirichlet series, this conspiracy is more commonly known as a zero of the Riemann zeta function at .

In the above scenario, the primality of a large integer was somehow sensitive to asymptotic or “Archimedean” information about , namely the approximate value of its logarithm. In modern terminology, this information reflects the local behaviour of at the infinite place . There are also potential consipracies in which the primality of is sensitive to the local behaviour of at finite places, and in particular to the residue class of mod for some fixed modulus . For instance, given a Dirichlet character of modulus , i.e. a completely multiplicative function on the integers which is periodic of period (and vanishes on those integers not coprime to ), one could imagine a scenario in which the probability that a randomly chosen large integer is prime is large when is close to , and small when is close to , which would contradict the prime number theorem in arithmetic progressions. (Note the similarity between this scenario at and the previous scenario at ; in particular, observe that the functions and are both totally multiplicative.) In the language of Dirichlet series, this conspiracy is more commonly known as a zero of the -function of at .

An especially difficult scenario to eliminate is that of *real characters*, such as the Kronecker symbol , in which numbers which are quadratic nonresidues mod are very likely to be prime, and quadratic residues mod are unlikely to be prime. Indeed, there is a scenario of this form – the Siegel zero scenario – which we are still not able to eradicate (without assuming powerful conjectures such as GRH), though fortunately Siegel zeroes are not quite strong enough to destroy the prime number theorem in arithmetic progressions.

It is difficult to prove that no conspiracy between the primes exist. However, it is not entirely impossible, because we have been able to exploit two important phenomena. The first is that there is often a “all or nothing dichotomy” (somewhat resembling the *zero-one laws* in probability) regarding conspiracies: in the asymptotic limit, the primes can either conspire totally (or more precisely, anti-conspire totally) with a multiplicative function, or fail to conspire at all, but there is no middle ground. (In the language of Dirichlet series, this is reflected in the fact that zeroes of a meromorphic function can have order , or order (i.e. are not zeroes after all), but cannot have an intermediate order between and .) As a corollary of this fact, the prime numbers cannot conspire with two distinct multiplicative functions at once (by having a partial correlation with one and another partial correlation with another); thus one can use the existence of one conspiracy to exclude all the others. In other words, there is at most one conspiracy that can significantly distort the distribution of the primes. Unfortunately, this argument is *ineffective*, because it doesn’t give any control at all on what that conspiracy is, or even if it exists in the first place!

But now one can use the second important phenomenon, which is that because of symmetries, one type of conspiracy can lead to another. For instance, because the von Mangoldt function is real-valued rather than complex-valued, we have conjugation symmetry; if the primes correlate with, say, , then they must also correlate with . (In the language of Dirichlet series, this reflects the fact that the zeta function and -functions enjoy symmetries with respect to reflection across the real axis (i.e. complex conjugation).) Combining this observation with the all-or-nothing dichotomy, we conclude that the primes cannot correlate with for any non-zero , which in fact leads directly to the prime number theorem (2), as we shall discuss below. Similarly, if the primes correlated with a Dirichlet character , then they would also correlate with the conjugate , which also is inconsistent with the all-or-nothing dichotomy, except in the exceptional case when is real – which essentially means that is a quadratic character. In this one case (which is the only scenario which comes close to threatening the truth of the prime number theorem in arithmetic progressions), the above tricks fail and one has to instead exploit the algebraic number theory properties of these characters instead, which has so far led to weaker results than in the non-real case.

As mentioned previously in passing, these phenomena are usually presented using the language of Dirichlet series and complex analysis. This is a very slick and powerful way to do things, but I would like here to present the elementary approach to the same topics, which is slightly weaker but which I find to also be very instructive. (However, I will not be *too* dogmatic about keeping things elementary, if this comes at the expense of obscuring the key ideas; in particular, I will rely on multiplicative Fourier analysis (both at and at finite places) as a substitute for complex analysis in order to expedite various parts of the argument. Also, the emphasis here will be more on heuristics and intuition than on rigour.)

The material here is closely related to the theory of *pretentious characters* developed by Granville and Soundararajan, as well as an earlier paper of Granville on elementary proofs of the prime number theorem in arithmetic progressions.

** — 1. A heuristic elementary proof of the prime number theorem — **

To motivate some of the later discussion, let us first give a highly non-rigorous *heuristic* elementary “proof” of the prime number theorem (2). Since we clearly have

one can view the prime number theorem as an assertion that the von Mangoldt function “behaves like on the average”,

where we will be deliberately vague as to what the “” symbol means. (One can think of this symbol as denoting some sort of proximity in the weak topology or vague topology, after suitable normalisation.)

To see why one would expect (3) to be true, we take divisor sums of (3) to heuristically obtain

By (1), the left-hand side is ; meanwhile, the right-hand side is the divisor function of , by definition. So we have a heuristic relationship between (3) and the informal approximation

The right-hand side of (6) can be approximated using the integral test as

(one can also use Stirling’s formula to obtain a similar asymptotic). As for the left-hand side, we write and then make the substitution to obtain

The right-hand side is the number of lattice points underneath the hyperbola , and can be counted using the Dirichlet hyperbola method:

The third sum is equal to . The second sum is equal to the first. The first sum can be computed as

meanwhile, from the integral test and the definition of Euler’s constant one has

for any ; combining all these estimates one obtains

Comparing this with (7) we do see that and are roughly equal “to top order” on average, thus giving some form of (5) and hence (4); if one could somehow invert the divisor sum operation, one could hope to get (3) and thus the prime number theorem.

(Looking at the next highest order terms in (7), (9), we see that we expect to in fact be slightly larger than on the average, and so should be slightly less than on the average. There is indeed a slight effect of this form; for instance, it is possible (using the prime number theorem) to prove

which should be compared with (8).)

One can partially translate the above discussion into the language of Dirichlet series, by transforming various arithmetical functions to their associated Dirichlet series

ignoring for now the issue of convergence of this series. By definition, the constant function transforms to the Riemann zeta function . Taking derivatives in , we see (formally, at least) that if has Dirichlet series , then has Dirichlet series ; thus, for instance, has Dirichlet series .

Most importantly, though, if have Dirichlet series respectively, then their Dirichlet convolution has Dirichlet series ; this is closely related to the well-known ability of the Fourier transform to convert convolutions to pointwise multiplication. Thus, for instance, has Dirichlet series . Also, from (1) and the preceding discussion, we see that has Dirichlet series (formally, at least). This already suggests that the von Mangoldt function will be sensitive to the zeroes of the zeta function.

An integral test computation closely related to (8) gives the asymptotic

for close to one (and , to avoid issues of convergence). This implies that the Dirichlet series for has asymptotic

thus giving support to (3); similarly, the Dirichlet series for and have asymptotic

and

which gives support to (5) (and is also consistent with (7), (9)).

Remark 1One can connect the properties of Dirichlet series more rigorously to asymptotics of partial sums by means of various transforms in Fourier analysis and complex analysis, in particular contour integration or the Hilbert transform, but this becomes somewhat technical and we will not do so here. I will remark, though, that asymptotics of for close to are not enough, by themselves, to get really precise asymptotics for the sharply truncated partial sums , for reasons related to the uncertainty principle; in order to control such sums one also needs to understand the behaviour of far away from , and in particular for for large real . On the other hand, the asymptotics for for near are just about all one needs to controlsmoothlytruncated partial sums such as for suitable cutoff functions . Also, while Dirichlet series are very powerful tools, particularly with regards to understanding Dirichlet convolution identities, and controlling everything in terms of the zeroes and poles of such series, they do have the drawback that they do not easily encode such fundamental “physical space” facts as the pointwise inequalities and , which are also an important aspect of the theory.

** — 2. Almost primes — **

One can hope to make the above heuristics precise by applying the Möbius inversion formula

where is the Möbius function, defined as when is the product of distinct primes for some , and zero otherwise. In terms of Dirichlet series, we thus see that has the Dirichlet series of , and so can invert the divisor sum operation (which corresponds to multiplication by ):

From (1) we then conclude

One can now hope to derive the prime number theorem (2) from the formulae (7), (9). Unfortunately, this doesn’t quite work: the prime number theorem is equivalent to the assertion

but if one inserts (10), (11) into the left-hand side of (12), one obtains

which if one then inserts (7), (9) and the trivial bound , leads to

As noted in this previous post, we have

so we only obtain a bound of for (12) instead of . (A refinement of this argument, though, shows that the prime number theorem would follow if one had the asymptotic , which is in fact equivalent to the prime number theorem; see the previous post.)

We remark that if one computed or by the above methods, one would eventually be led to a variant of (13), namely

which is an estimate which will be useful later.

So we see that when trying to sum the von Mangoldt function by elementary means, the error term overwhelms the main term . But there is a slight tweaking of the von Mangoldt function, the *second von Mangoldt function* , that increases the size of the main term to while keeping the error term at , thus leading to a useful estimate; the price one pays for this is that this function is now a proxy for the almost primes rather than the primes. This function is defined by a variant of (10), namely

It is not hard to see that vanishes once has at least three distinct prime factors (basically because the quadratic function vanishes after being differentiated three or more times). Indeed, one can easily verify the identity

(which corresponds to the Dirichlet series identity ); the first term is mostly concentrated on primes, while the second term is mostly concentrated on semiprimes (products of two distinct primes).

Now let us sum . In analogy with the previous discussion, we will do so by comparing the function with something involving the divisor function. In view of (5), it is reasonable to try the approximation

(which corresponds to the Dirichlet series identity ) we thus expect

Now we make these heuristics more precise. From the integral test we have

while from (9) and summation by parts one has

where are explicit absolute constants whose exact value is not important here. Thus

Meanwhile, from (15), (17) one has

applying (19), (13), (14) we see that the right-hand side is . Computing by the integral test, we deduce the *Selberg symmetry formula*

One can view (20) as the “almost prime number theorem” – the analogue of the prime number theorem for almost primes.

The fact that the almost primes have a relatively easy asymptotic, while the genuine primes do not, is a reflection of the *parity problem* in sieve theory; see this post for further discussion. The symmetry formula is however enough to get “within a factor of two” of the prime number theorem: if we discard the semiprimes from (16), we see that , and thus

which by a summation by parts argument leads to

which is within a factor of of (2) in some sense.

One can “twist” all of the above arguments by a Dirichlet character . For instance, (15) twists to

On the other hand, if is a non-principal character of modulus , then it has mean zero on any interval with length , and it is then not hard to establish the asymptotic

This soon leads to the twisted version of (20):

thus almost primes are asymptotically unbiased with respect to non-principal characters.

From the multiplicative Fourier analysis of Dirichlet characters modulo (and the observation that is quite small on residue classes not coprime to ) one then has an “almost prime number theorem in arithmetic progressions”:

As before, this lets us come within a factor of two of the actual prime number theorem in arithmetic progressions:

One can also twist things by the completely multiplicative function , but with the caveat that the approximation to can locally correlate with . Thus for instance one has

for any fixed and ; in particular, if is non-principal, one has

** — 3. The all-or-nothing dichotomy — **

To summarise so far, the almost primes (as represented by ) are quite uniformly distributed. These almost primes can be split up into the primes (as represented by ) and the semiprimes (as represented by ), thanks to (16).

One can rewrite (16) as a recursive formula for :

One can also twist this formula by a character and/or a completely multiplicative function , thus for instance

This recursion, combined with the uniform distribution properties on , lead to various *all-or-nothing* dichotomies for . Suppose, for instance, that behaves like a constant on the average for some non-principal character :

Then (from (5)) we expect to behave like , thus

On the other hand, from (21), is asymptotically uncorrelated with :

Putting all this together, one obtains

which suggests that must be either close to , or close to .

Basically, the point is that there are only two equilibria for the recursion (23). One equilibrium occurs when is asymptotically uncorrelated with ; the other is when it is completely anti-correlated with , so that is supported primarily on those for which is close to . Note in the latter case for most primes , and thus for most semiprimes , thus leading to an equidistribution of for almost primes (weighted by ). Any intermediate distribution of would be inconsistent with the distribution of . (In terms of Dirichlet series, this assertion corresponds to the fact that the -function of either has a zero of order , or a zero of order (i.e. not a zero at all) at .)

A similar phenomenon occurs when twisting by ; basically, the average value of must asymptotically either be close to , or close to ; no other asymptotic ends up being compatible with the distribution of . (Again, this corresponds to the fact that the Riemann zeta function has a zero of order or at .) More generally, the average value of must asymptotically approach either or .

Remark 2One can make the above heuristics precise either by using Dirichlet series (and analytic continuation, and the theory of zeroes of meromorphic functions), or by smoothing out arithmetic functions such as by a suitable multiplicative convolution with a mollifier (as is basically done in elementary proofs of the prime number theorem); see also the Granville-Soundararajan theory of pretentious characters for a closely related theory. We will not pursue these details here, however.

** — 4. Dueling conspiracies — **

In the previous section we have seen (heuristically, at least), that the von Mangoldt function (or more precisely, ) will either have no correlation, or a maximal amount of anti-correlation, with a completely multiplicative function such as , , or . On the other hand, it is not possible for this function to maximally anti-correlate (or to *conspire*) with two such functions; thus the presence of one conspiracy excludes the presence of all others.

Suppose for instance that we had two distinct non-principal characters for which one had maximal anti-correlation:

One could then combine the two statements to obtain

Meanwhile, doesn’t correlate with either or . It will be convenient to exploit this to normalise , obtaining

(Note from (3), (18) that we expect to have mean zero.)

On the other hand, since , one has

and hence by the triangle inequality

in the sense that averages of the left-hand side should be at least as large as averages of the right-hand side. From this, (18), and Cauchy-Schwarz, one thus expects

But if one expands out the left-hand side using (18), (21), one only ends up with on the average, a contradiction for sufficiently large.

Remark 3The above argument belongs to a family of -based arguments which go by various names (almost orthogonality, ,large sieve, etc.). The argument can more generally be used to establish square-summability estimates on averages such as as varies, but we will not make this precise here.

As one consequence of the above arguments, one can show that cannot maximally anti-correlate with any non-real character , since (by the reality of ) it would then also maximally anti-correlate with the complex conjugate , which is distinct from . A similar argument shows that cannot maximally anti-correlate with for any non-zero , a fact which can soon lead to the prime number theorem, either by Dirichlet series methods, by Fourier-analytic means, or by elementary means. (Sketch of Fourier-analytic proof: methods provide -type bounds on the averages of in , while the above arguments show that these averages are also small in . Applying (22) a few times to take advantage of the smoothing effects of convolution, one eventually concludes that these averages can be made arbitrarily small in asymptotically, at which point the prime number theorem follows from Fourier inversion.)

Remark 4There is a slightly different argument of an nature rather than an nature (i.e. using tools such as the triangle inequality, union bound, etc.) that can also achieve similar results. For instance, suppose that maximally anti-correlates with and . Then for most primes , which implies that for most primes , which is incompatible with the all-or-nothing dichotomy unless is principal. This is an alternate way to exclude correlation with non-real characters. Similarly, if , then , which is also incompatible with the zero-one law; this is essentially the method underlying the standard proof of the prime number theorem (which relates with ).

** — 5. Quadratic characters — **

The one difficult scenario to eliminate is that of maximal anti-correlation with a real non-principal (i.e. quadratic) character , thus

This scenario implies that the quantity

vanishes. Indeed, if one starts with the identity

and sums in , one sees that

The left-hand side is by the mean zero and periodicity properties of . To estimate the right-hand side, we use the hyperbola method and rewrite it as

for some parameter (sufficiently slowly growing in ) to be optimised later. Writing and , we can express this as

sending (and at a slower rate) we conclude as required.

It is remarkably difficult to show that does not, in fact, vanish. One way to do this is to use the class number formula, that relates this quantity to the class number of the quadratic number field associated to the conductor of , together with some related number-theoretic quantities. A more elementary (but significantly weaker) method proceeds by using the easily verified fact that the convolution is non-negative, and is at least on the squares; this should be interpreted as a fact from algebraic number theory, and basically corresponds to the fact that the number of representations of an integer as the norm of an integer in (or more generally, as the norm of an ideal in that ring) is non-negative, and is at least on the squares. In particular we have

On the other hand, from the hyperbola method we can express the left-hand side as

From the mean zero and periodicity properties of we have , so the second term in (24) is . Meanwhile, from the midpoint rule, , and so the first term in (24) is

Putting all this together we have

which leads to a contradiction as if vanishes.

Note in fact that the above argument shows that is positive. If one carefully computes the dependence of the above argument on the modulus , one obtains a lower bound of the form , which is quite poor. Using a non-trivial improvement on the error term in counting lattice points under the hyperbola (or better still, by smoothing the sum ), one can improve this a bit, to . In contrast, the class number method gives a bound .

We can improve this even further for all but at most one real primitive character :

Theorem 1 (Siegel’s theorem)For every , one has for all but at most one real primitive character , where the implied constant is effective, and is the modulus of .

Throwing in this (hypothetical) one exceptional character, we conclude that for *all* real primitive characters , but now the implied constant is ineffective, which is the usual way in which Siegel’s theorem is formulated (but the above nearly effective refinement can be obtained by the same methods). It is a major open problem in the subject to eliminate this exceptional character and recover an effective estimate for some .

*Proof:* Let (which we can assume to be small), and let be a small number depending (effectively) on to be chosen later. Our task is to show that for all but at most one primitive real character . Note we may assume is large (effectively) depending on , as the claim follows from the previous bounds on otherwise.

Suppose then for contradiction that and for two distinct primitive real characters of (large) modulus respectively.

We begin by modifying the proof that was positive, which relied (among other things) on the observation that , and equals at . In particular, one has

for any and any real . (One can get slightly better bounds by exploiting that is also at least on square numbers, as before, but this is really only useful for , and we are now going to take much closer to .)

On the other hand, one has the asymptotics

for any real close (but not equal) to , and similarly

for any real close to ; similarly for . From the hyperbola method, we can then conclude

for all real sufficiently close to . Indeed, one can expand the left-hand side of (26) as

and the claim then follows from the previous asymptotics. (One can improve the error term by smoothing the summation, but we will not need to do so here.)

Now set for a large absolute constant . If , then the error term in is then at most (say) if is large enough. We conclude from (25) that

for . Since and is assumed small (depending on ), this implies that is positive in the range

(this can be seen by checking the cases and separately). On the other hand, has a simple pole at with positive residue, and is thus negative for extremely close to . By the intermediate value theorem, we conclude that has a zero for some . Conversely, it is not difficult (using summation by parts) to show that for , and so by the mean value theorem we see that the zero of must also obey . Thus has a zero for some with

Similarly, has a zero for some with

Now, we consider the function

One can also show that is non-negative and equals at , thus

(The algebraic number theory interpretation of this positivity is that is the number of representations of as the norm of an ideal in the biquadratic field generated by and .)

Also, by (a more complicated version of) the derivation of (26), one has

(say). Arguing as before, we conclude that

for . Using the bound (which can be established by summation by parts), we conclude that is positive in the range

Since we already know and have zeroes for some obeying (27), (28)

taking geometric means and rearranging we obtain

But this contradicts the hypotheses , if is small enough.

Remark 5Siegel’s theorem leads to a version of the prime number theorem in arithmetic progressions known as the Siegel-Walfisz theorem. As with Siegel’s theorem, the bounds are ineffective unless one is allowed to exclude a single exceptional modulus (and its multiples), in which case one has a modified prime number theorem which favours the quadratic nonresidues mod ; see this paper of Granville.

Remark 6One can improve the effective bounds in Siegel’s theorem if one is allowed to exclude a larger set of bad moduli. For instance, the arguments in Section 4 allow one to establish a bound of the form after excluding at most one in each hyper-dyadic range for each ; one can of course replace by other exponents here, but at the cost of worsening the term. (This is essentially an observation of Landau.)

## 31 comments

Comments feed for this article

24 September, 2009 at 7:05 pm

David SpeyerVery nice exposition!

(1) I remember reading that, before the classification of imaginary quadratic fields of class number 1 was proved, it was known that there could be at most one example in addition to the ones already known. Is that the same “at most one”?

(2) If you felt like expanding Remark 1 into a blog post, I would certainly be interested in reading it.

25 September, 2009 at 1:49 am

Terence Tao(1) Yes, this is very closely related, though I think the discovery that there is at most one exceptional field of class number 1 predates Siegel’s theorem slightly. The elimination of this field is quite difficult (it can’t be done by the methods discussed here), but I don’t know many details.

(2) Actually I was already thinking of doing something along those lines, though perhaps not immediately…

26 September, 2009 at 6:06 pm

anonThanks for an excellent posting.

A minor expository point: the numbers 0 and 1 occur in the initial discussion of the “zero-one law”, in a way which (to any reader not already knowing the term) reinforces but also confuses the meaning of the term here. In the original probability context, 0 and 1 had meaning as probabilities, but in the present situation it just refers to an all-or-nothing dichotomy of possible asymptotic behaviors.

26 September, 2009 at 6:33 pm

Terence TaoHmm, that’s a fair point. It’s a shame that the term “zero-one law” is so narrowly defined, because I find it a useful phrase in other contexts. I wonder if anyone can suggest a replacement term; “all-or-nothing dichotomy” is OK, I guess, but perhaps a little clunky.

30 September, 2009 at 6:04 pm

Terence TaoI’ve gone ahead and replaced “zero-one law” with “all-or-nothing dichotomy”; this is not my ideal terminology, but will serve as a placeholder until a better suggestion comes along.

I also fixed an error in which the rings and were confused. (At some point in the future I plan to learn class field theory properly, and perhaps blog about it here; my algebraic number theory knowledge is unfortunately still rather superficial at present.)

1 October, 2009 at 6:52 pm

same anon(from the anon who posted the earlier 0-1 comment)

Actually, “zero-one law” is too good a metaphor to not use. It works beautifully for any reader who already understands the meaning. Not only does it convey the idea, but seeing the term used here outside its traditional settings also suggests the idea of spreading that usage to other contexts.

The only caveat is that readers who aren’t already familiar with the idea would probably benefit if, at the first location in the essay where “0-1 law” appears, there were also:

— an aside, giving a definition or a few words of explanation (e.g., “borrowing a metaphor from the 0-1 laws of probability”);

— disambiguation, in the discussion that follows of the order of vanishing of the associated Dirichlet series being either 0 or 1 and this being an avatar of the 0-1 law, that these orders being “0” and “1” (in whichever order corresponds to the dichotomy in the 0-1 law!) is just a numerical peculiarity of this problem and is otherwise unrelated to the words in the term “zero-one law”.

I guess it also bears clarifying what a 0-1 law is, in general. I don’t think the meaning is simply that of a dichotomy, there is also some dynamics driving the outcome, asymptotically, to one of two antipodal equilibria (indeed you call the two values of $c$ “equilibria”, so maybe you share the same intuition). This is literally or tacitly true in every theorem I can remember being called a 0-1 law (in probability, random graphs, ergodic theory, information theory, etc), and it would be interesting to make a list and see whether this dynamical, asymptotic aspect is always part of the story.

26 September, 2009 at 11:05 pm

AnonymousNear the beginning, when you say that the number of primes congruent to a mod q when a is not coprime to q is O(1), you can leave out the dependence of the implied constant on q.

[Well spotted, thanks! -T.]27 September, 2009 at 4:37 pm

Mustafa SaidOne open problem in analytic number theory is to determine whether or not $\sum_1^{\infty} (-1)^n \frac{n}{p_n}$ converges. Here $p_n$ denotes the $n^th$ prime. The prime number theorem gives us some information on the asymptotics of the terms in the sum, but not enough to solve the problem. Would the validity of the Riemann Hypothesis help us solve this problem?

30 September, 2009 at 4:10 pm

Terence TaoHmm. I played around with this sum a bit and it seems like the convergence of this sum is equivalent to the convergence of the sum (by using the prime number theorem and estimating the difference between and for odd and even n.) So it comes down to understanding the equidistribution of the parity of the prime counting function; roughly speaking, any quantatitative equidistribution which gains more than a log log factor should suffice. By coincidence, we happen to be studying the parity of pi(x) over at the polymath4 project, though I don’t think the methods we have currently for computing that parity are strong enough to establish this sort of equidistribution. Nevertheless it does seem plausible (though difficult to prove) that one has convergence. (If one had a sufficiently uniform and quantitative version of the Poisson statistics conjecture for prime gaps, one might be able to obtain the result; in particular, a sufficiently quantitative version of the prime tuples conjecture might suffice, thanks to Gallagher’s computations relating the two.)

The Riemann hypothesis concerns larger-scale fluctuations in the primes, whereas here it seems that fine-scale fluctuations such as prime gaps statistics are more relevant.

7 March, 2010 at 6:14 pm

Mustafa SaidCan you please explain your statement “any quantatitative equidistribution (of the parity of the prime counting function) which gains more than a log log factor should suffice.”

I also have another question about the parity of the prime counting

function:

Let $E_N = { m \in \mathbb{N} : \pi(m) is even and m \leq N}$

and $O_N = { m \in \mathbb{N} : \pi(m) is odd and m \leq N}

then is it true that $\lim_{N \to \infty} |E_N|/|O_N| = 1$ ?

28 September, 2009 at 9:19 am

AnonymousThere are lots of ways you can use the phrase “all or nothing.” You can say all-or-nothing behavior, all-or-nothing phenomenon, all-or-nothing situation, etc. Another expression you can use in place of all or nothing is “black or white” — as in no shades of gray. One reason “all-or-nothing dichotomy” seems clunky is that it is redundant.

1 October, 2009 at 8:11 pm

Arithmetic Progression in finite fieldsProfessor Tao:

I have a particular problem that is of interest in electrical engineering. It seems to fall under the category of arithmetic progression but in finite fields.

Field field: F_{p^n} with n>1

What is the maximum N so that one can find N different polynomials

p_{i}(x) of degree in F_{p}[x] for i = 1,…,N so that the following

conditions are satisfied?

1) each p_{i}(x) has at least 2^n roots in F_{p^n} in arithmetic

progression. By AP what I mean is: let s be primitive element. If r is

root, then r, r+1, r+s, r+1+s,r+s^2, r+s^2+1, r+s^2+s,

r+s^2+s+1,…,r+s^(n-1)+s^(n-2)+…+1 are roots.

2) If i is not j, p_{i}(x) and p_{j}(x) should have no common roots

from the roots in arithmetic progression.

As n increases from 0 to infinity what is the highest such N one can

get for each F_{p^n}?

(I could not get more than N=5 for p=5. For p=3, N=1.)

Is there any mathematician who has worked on problems involving such progressions in finite fields?

Thank you

2 October, 2009 at 3:42 pm

At the AustMS conference « What’s new[...] follows from (for instance) Siegel’s theorem on the size of the class group, as discussed in this previous post. Ellenberg, Venkatesh, and Westerland have recently established some partial results towards the [...]

6 October, 2009 at 4:36 pm

Giampiero CampaHi Terry,

first of all i am an not a Mathematician, so i’d like to apologize in advance if the language i am using is not entirely correct or appropriate.

Let me put it this way, do we “understand”, say, the Mandelbrot fractal ? I mean, yes we do have a dynamic system that generates it, but as far as i know the only way we can determine whether a point belongs to the set or not is by iterating the dynamic system. Isn’t that somehow similar to the situation that we have with prime numbers where the only way to find out if a number is prime or not is by going through the sieve ? or simply dividing the number by all its predecessors ? (which i am pretty sure it can be done with a simple 1-state nonlinear dynamic system)

In other words, are we expecting too much by saying, as you did in your first sentence, that we want to “understand the distribution of prime numbers” ?

Again, maybe these are trivial questions and i just don’t know enough number theory, in which case i apologize.

28 June, 2012 at 7:23 am

KSomewhat like, to see if a living creature is a human being, it’s a good idea to test its DNA, but perhaps it’s not very useful for the understanding of the distribution of people in the world…

6 November, 2009 at 7:49 am

The “no self-defeating object” argument « What’s new[...] 1 Continuing the number-theoretic theme, the “dueling conspiracies” arguments in a previous blog post can also be viewed as an instance of this type of “no-self-defeating-object”; for [...]

23 July, 2011 at 7:40 pm

Erdos’ divisor bound « What’s new[...] requires one to exclude zeroes of the Riemann zeta function on the line , as discussed in this previous post), one is eventually left with the estimate , which is useless as a lower bound (and recovers only [...]

21 August, 2011 at 12:45 pm

Hilbert’s seventh problem, and powers of 2 and 3 « What’s new[...] (The reason the Thue-Siegel-Roth theorem is ineffective is because it relies heavily on the dueling conspiracies argument, i.e. playing off multiple “conspiracies” against each other; the other results [...]

11 November, 2011 at 1:18 pm

Number Theory: Lecture 16 « Theorem of the week[...] there was no known elementary proof when the first edition was written!). Terry Tao has a nice blog post discussing a number of aspects of the distribution of prime [...]

19 April, 2012 at 7:23 pm

Sumit Kumar JhaProfessor Tao:

Your derivation of Selberg’s symmetry formula is the one which Selberg originally used in his derivation of an elementary proof of prime number theorem.

But in a paper by Iseki and Tatsukava, they easily derived

and this is equivalent to

where

Can you help me to deduce

from above equation.

19 April, 2012 at 7:38 pm

Sumit Kumar JhaActually,

when

when

when

but how is it that

in other cases.

20 April, 2012 at 5:15 am

Terence TaoThis follows from the formula in your previous comment.

20 April, 2012 at 7:22 am

Sumit Kumar JhaProfessor Tao: Thanks!!

I am also looking forward for an article on “sieves” on your blog. Have you already written one??

I especially want to know about the Bomberi sieve which treats

.

Can you also discuss the departure from “almost primes” to “primes” namely parity problem in detail??

20 May, 2012 at 2:58 pm

Heuristic limitations of the circle method « What’s new[...] the only thing which is strong enough to rule out a conspiracy is a conflicting conspiracy (see my previous blog post on this topic). A good example of this is Heath-Brown’s result that the existence of a Siegel zero (which [...]

27 June, 2012 at 8:25 am

What if primes hated to start with nine? « Secret Blogging Seminar[...] appeared earlier in an answer I left on MO. There is also a lot of similarity to Terry Tao’s post on conspiracies and the prime number theorem. One can think of “primes hate to start with nine” as a “conspiracy” whose [...]

27 November, 2012 at 9:04 pm

Sujit NairTypo: There should be a instead of in the rightmost term in the equation above (8)

[Corrected, thanks - T.]19 July, 2013 at 2:31 pm

The Riemann hypothesis in various settings | What's new[…] e.g. this previous blog post for more […]

15 November, 2013 at 3:15 am

Number Theory: Lecture 16 | Theorem of the week[…] there was no known elementary proof when the first edition was written!). Terry Tao has a nice blog post discussing a number of aspects of the distribution of prime […]

11 December, 2013 at 11:50 am

Mertens’ theorems | What's new[…] on the Riemann zeta function in the neighbourhood of the pole at , whereas (as discussed in this previous post) the prime number theorem requires control on the zeta function on (a neighbourhood of) the line . […]

7 July, 2014 at 2:20 am

The subspace theorem approach to Siegel’s theorem on integral points on curves via nonstandard analysis | What's new[…] an effective bound may be easily established). This is ultimately due to the reliance on the “dueling conspiracy” (or “repulsion phenomenon”) strategy. We do not as yet have a good way to rule […]

24 September, 2014 at 2:29 pm

Derived multiplicative functions | What's new[…] a key role in the Erdös-Selberg elementary proof of the prime number theorem (as discussed in this previous blog post), is the top order term of the […]