The prime number theorem in arithmetic progressions, and dueling conspiracies

24 September, 2009 in expository, math.NT | Tags: Dirichlet characters, prime number theorem, Siegel's theorem | by Terence Tao

A fundamental problem in analytic number theory is to understand the distribution of the prime numbers ${\{2,3,5,\ldots\}}$ . For technical reasons, it is convenient not to study the primes directly, but a proxy for the primes known as the von Mangoldt function ${\Lambda: {\mathbb N} \rightarrow {\mathbb R}}$ , defined by setting ${\Lambda(n)}$ to equal ${\log p}$ when ${n}$ is a prime ${p}$ (or a power of that prime) and zero otherwise. The basic reason why the von Mangoldt function is useful is that it encodes the fundamental theorem of arithmetic (which in turn can be viewed as the defining property of the primes) very neatly via the identity

$\displaystyle \log n = \sum_{d|n} \Lambda(d) \ \ \ \ \ (1)$

for every natural number ${n}$ .

The most important result in this subject is the prime number theorem, which asserts that the number of prime numbers less than a large number ${x}$ is equal to ${(1+o(1)) \frac{x}{\log x}}$ :

$\displaystyle \sum_{p \leq x} 1 = (1+o(1)) \frac{x}{\log x}.$

Here, of course, ${o(1)}$ denotes a quantity that goes to zero as ${x \rightarrow \infty}$ .

It is not hard to see (e.g. by summation by parts) that this is equivalent to the asymptotic

$\displaystyle \sum_{n \leq x} \Lambda(n) = (1+o(1)) x \ \ \ \ \ (2)$

for the von Mangoldt function (the key point being that the squares, cubes, etc. of primes give a negligible contribution, so ${\sum_{n \leq x} \Lambda(n)}$ is essentially the same quantity as ${\sum_{p \leq x} \log p}$ ). Understanding the nature of the ${o(1)}$ term is a very important problem, with the conjectured optimal decay rate of ${O(\sqrt{x} \log x)}$ being equivalent to the Riemann hypothesis, but this will not be our concern here.

The prime number theorem has several important generalisations (for instance, there are analogues for other number fields such as the Chebotarev density theorem). One of the more elementary such generalisations is the prime number theorem in arithmetic progressions, which asserts that for fixed ${a}$ and ${q}$ with ${a}$ coprime to ${q}$ (thus ${(a,q)=1}$ ), the number of primes less than ${x}$ equal to ${a}$ mod ${q}$ is equal to ${(1+o_q(1)) \frac{1}{\phi(q)} \frac{x}{\log x}}$ , where ${\phi(q) := \# \{ 1 \leq a \leq q: (a,q)=1 \}}$ is the Euler totient function:

$\displaystyle \sum_{p \leq x: p = a \hbox{ mod } q} 1 = (1+o_q(1)) \frac{1}{\phi(q)} \frac{x}{\log x}.$

(Of course, if ${a}$ is not coprime to ${q}$ , the number of primes less than ${x}$ equal to ${a}$ mod ${q}$ is ${O(1)}$ . The subscript ${q}$ in the ${o()}$ and ${O()}$ notation denotes that the implied constants in that notation is allowed to depend on ${q}$ .) This is a more quantitative version of Dirichlet’s theorem, which asserts the weaker statement that the number of primes equal to ${a}$ mod ${q}$ is infinite. This theorem is important in many applications in analytic number theory, for instance in Vinogradov’s theorem that every sufficiently large odd number is the sum of three odd primes. (Imagine for instance if almost all of the primes were clustered in the residue class ${2}$ mod ${3}$ , rather than ${1}$ mod ${3}$ . Then almost all sums of three odd primes would be divisible by ${3}$ , leaving dangerously few sums left to cover the remaining two residue classes. Similarly for other moduli than ${3}$ . This does not fully rule out the possibility that Vinogradov’s theorem could still be true, but it does indicate why the prime number theorem in arithmetic progressions is a relevant tool in the proof of that theorem.)

As before, one can rewrite the prime number theorem in arithmetic progressions in terms of the von Mangoldt function as the equivalent form

$\displaystyle \sum_{n \leq x: n = a \hbox{ mod } q} \Lambda(n) = (1+o_q(1)) \frac{1}{\phi(q)} x.$

Philosophically, one of the main reasons why it is so hard to control the distribution of the primes is that we do not currently have too many tools with which one can rule out “conspiracies” between the primes, in which the primes (or the von Mangoldt function) decide to correlate with some structured object (and in particular, with a totally multiplicative function) which then visibly distorts the distribution of the primes. For instance, one could imagine a scenario in which the probability that a randomly chosen large integer ${n}$ is prime is not asymptotic to ${\frac{1}{\log n}}$ (as is given by the prime number theorem), but instead to fluctuate depending on the phase of the complex number ${n^{it}}$ for some fixed real number ${t}$ , thus for instance the probability might be significantly less than ${1/\log n}$ when ${t \log n}$ is close to an integer, and significantly more than ${1/\log n}$ when ${t \log n}$ is close to a half-integer. This would contradict the prime number theorem, and so this scenario would have to be somehow eradicated in the course of proving that theorem. In the language of Dirichlet series, this conspiracy is more commonly known as a zero of the Riemann zeta function at ${1+it}$ .

In the above scenario, the primality of a large integer ${n}$ was somehow sensitive to asymptotic or “Archimedean” information about ${n}$ , namely the approximate value of its logarithm. In modern terminology, this information reflects the local behaviour of ${n}$ at the infinite place ${\infty}$ . There are also potential consipracies in which the primality of ${n}$ is sensitive to the local behaviour of ${n}$ at finite places, and in particular to the residue class of ${n}$ mod ${q}$ for some fixed modulus ${q}$ . For instance, given a Dirichlet character ${\chi: {\mathbb Z} \rightarrow {\mathbb C}}$ of modulus ${q}$ , i.e. a completely multiplicative function on the integers which is periodic of period ${q}$ (and vanishes on those integers not coprime to ${q}$ ), one could imagine a scenario in which the probability that a randomly chosen large integer ${n}$ is prime is large when ${\chi(n)}$ is close to ${+1}$ , and small when ${\chi(n)}$ is close to ${-1}$ , which would contradict the prime number theorem in arithmetic progressions. (Note the similarity between this scenario at ${q}$ and the previous scenario at ${\infty}$ ; in particular, observe that the functions ${n \rightarrow \chi(n)}$ and ${n \rightarrow n^{it}}$ are both totally multiplicative.) In the language of Dirichlet series, this conspiracy is more commonly known as a zero of the ${L}$ -function of ${\chi}$ at ${1}$ .

An especially difficult scenario to eliminate is that of real characters, such as the Kronecker symbol ${\chi(n) = \left( \frac{n}{q} \right)}$ , in which numbers ${n}$ which are quadratic nonresidues mod ${q}$ are very likely to be prime, and quadratic residues mod ${q}$ are unlikely to be prime. Indeed, there is a scenario of this form – the Siegel zero scenario – which we are still not able to eradicate (without assuming powerful conjectures such as GRH), though fortunately Siegel zeroes are not quite strong enough to destroy the prime number theorem in arithmetic progressions.

It is difficult to prove that no conspiracy between the primes exist. However, it is not entirely impossible, because we have been able to exploit two important phenomena. The first is that there is often a “all or nothing dichotomy” (somewhat resembling the zero-one laws in probability) regarding conspiracies: in the asymptotic limit, the primes can either conspire totally (or more precisely, anti-conspire totally) with a multiplicative function, or fail to conspire at all, but there is no middle ground. (In the language of Dirichlet series, this is reflected in the fact that zeroes of a meromorphic function can have order ${1}$ , or order ${0}$ (i.e. are not zeroes after all), but cannot have an intermediate order between ${0}$ and ${1}$ .) As a corollary of this fact, the prime numbers cannot conspire with two distinct multiplicative functions at once (by having a partial correlation with one and another partial correlation with another); thus one can use the existence of one conspiracy to exclude all the others. In other words, there is at most one conspiracy that can significantly distort the distribution of the primes. Unfortunately, this argument is ineffective, because it doesn’t give any control at all on what that conspiracy is, or even if it exists in the first place!

But now one can use the second important phenomenon, which is that because of symmetries, one type of conspiracy can lead to another. For instance, because the von Mangoldt function is real-valued rather than complex-valued, we have conjugation symmetry; if the primes correlate with, say, ${n^{it}}$ , then they must also correlate with ${n^{-it}}$ . (In the language of Dirichlet series, this reflects the fact that the zeta function and ${L}$ -functions enjoy symmetries with respect to reflection across the real axis (i.e. complex conjugation).) Combining this observation with the all-or-nothing dichotomy, we conclude that the primes cannot correlate with ${n^{it}}$ for any non-zero ${t}$ , which in fact leads directly to the prime number theorem (2), as we shall discuss below. Similarly, if the primes correlated with a Dirichlet character ${\chi(n)}$ , then they would also correlate with the conjugate ${\overline{\chi}(n)}$ , which also is inconsistent with the all-or-nothing dichotomy, except in the exceptional case when ${\chi}$ is real – which essentially means that ${\chi}$ is a quadratic character. In this one case (which is the only scenario which comes close to threatening the truth of the prime number theorem in arithmetic progressions), the above tricks fail and one has to instead exploit the algebraic number theory properties of these characters instead, which has so far led to weaker results than in the non-real case.

As mentioned previously in passing, these phenomena are usually presented using the language of Dirichlet series and complex analysis. This is a very slick and powerful way to do things, but I would like here to present the elementary approach to the same topics, which is slightly weaker but which I find to also be very instructive. (However, I will not be too dogmatic about keeping things elementary, if this comes at the expense of obscuring the key ideas; in particular, I will rely on multiplicative Fourier analysis (both at ${\infty}$ and at finite places) as a substitute for complex analysis in order to expedite various parts of the argument. Also, the emphasis here will be more on heuristics and intuition than on rigour.)

The material here is closely related to the theory of pretentious characters developed by Granville and Soundararajan, as well as an earlier paper of Granville on elementary proofs of the prime number theorem in arithmetic progressions.

— 1. A heuristic elementary proof of the prime number theorem —

To motivate some of the later discussion, let us first give a highly non-rigorous heuristic elementary “proof” of the prime number theorem (2). Since we clearly have

$\displaystyle \sum_{n \leq x} 1 = x + O(1)$

one can view the prime number theorem as an assertion that the von Mangoldt function ${\Lambda}$ “behaves like ${1}$ on the average”,

$\displaystyle \Lambda(n) \approx 1, \ \ \ \ \ (3)$

where we will be deliberately vague as to what the “ ${\approx}$ ” symbol means. (One can think of this symbol as denoting some sort of proximity in the weak topology or vague topology, after suitable normalisation.)

To see why one would expect (3) to be true, we take divisor sums of (3) to heuristically obtain

$\displaystyle \sum_{d|n} \Lambda(d) \approx \sum_{d|n} 1. \ \ \ \ \ (4)$

By (1), the left-hand side is ${\log n}$ ; meanwhile, the right-hand side is the divisor function ${\tau(n)}$ of ${n}$ , by definition. So we have a heuristic relationship between (3) and the informal approximation

$\displaystyle \tau(n) \approx \log n. \ \ \ \ \ (5)$

In particular, we expect

$\displaystyle \sum_{n \leq x} \tau(n) \approx \sum_{n \leq x} \log n. \ \ \ \ \ (6)$

The right-hand side of (6) can be approximated using the integral test as

$\displaystyle \sum_{n \leq x} \log n = \int_1^x \log t\ dt + O(\log x ) = x \log x - x + O(\log x) \ \ \ \ \ (7)$

(one can also use Stirling’s formula to obtain a similar asymptotic). As for the left-hand side, we write ${\tau(n) = \sum_{d|n} 1}$ and then make the substitution ${n=dm}$ to obtain

$\displaystyle \sum_{n \leq x} \tau(n) = \sum_{d,m: dm \leq x} 1.$

The right-hand side is the number of lattice points underneath the hyperbola ${dm=x}$ , and can be counted using the Dirichlet hyperbola method:

$\displaystyle \sum_{d,m: dm \leq x} 1 = \sum_{d \leq \sqrt{x}} \sum_{m \leq x/d} 1 + \sum_{m \leq \sqrt{x}} \sum_{d \leq x/m} 1 - \sum_{d \leq \sqrt{x}} \sum_{m \leq \sqrt{x}} 1.$

The third sum is equal to ${(\sqrt{x}+O(1))^2 = x + O(\sqrt{x})}$ . The second sum is equal to the first. The first sum can be computed as

$\displaystyle \sum_{d \leq \sqrt{x}} \sum_{m \leq x/d} 1 = \sum_{d \leq \sqrt{x}} (\frac{x}{d} + O(1)) = x \sum_{d \leq \sqrt{x}} \frac{1}{d} + O(\sqrt{x});$

meanwhile, from the integral test and the definition of Euler’s constant ${\gamma = 0.577\ldots}$ one has

$\displaystyle \sum_{d \leq y} \frac{1}{d} = \log y + \gamma + O(1/y) \ \ \ \ \ (8)$

for any ${y \geq 1}$ ; combining all these estimates one obtains

$\displaystyle \sum_{n \leq x} \tau(n) = x \log x + (2\gamma-1) x + O(\sqrt{x}). \ \ \ \ \ (9)$

Comparing this with (7) we do see that ${\tau(n)}$ and ${\log n}$ are roughly equal “to top order” on average, thus giving some form of (5) and hence (4); if one could somehow invert the divisor sum operation, one could hope to get (3) and thus the prime number theorem.

(Looking at the next highest order terms in (7), (9), we see that we expect ${\tau(n)}$ to in fact be slightly larger than ${\log n}$ on the average, and so ${\Lambda(n)}$ should be slightly less than ${1}$ on the average. There is indeed a slight effect of this form; for instance, it is possible (using the prime number theorem) to prove

$\displaystyle \sum_{d \leq y} \frac{\Lambda(d)}{d} = \log y - \gamma + o(1),$

which should be compared with (8).)

One can partially translate the above discussion into the language of Dirichlet series, by transforming various arithmetical functions ${f(n)}$ to their associated Dirichlet series

$\displaystyle F(s) := \sum_{n=1}^\infty \frac{f(n)}{n^s},$

ignoring for now the issue of convergence of this series. By definition, the constant function ${1}$ transforms to the Riemann zeta function ${\zeta(s)}$ . Taking derivatives in ${s}$ , we see (formally, at least) that if ${f(n)}$ has Dirichlet series ${F(s)}$ , then ${f(n) \log n}$ has Dirichlet series ${-F'(s)}$ ; thus, for instance, ${\log n}$ has Dirichlet series ${-\zeta'(s)}$ .

Most importantly, though, if ${f(n), g(n)}$ have Dirichlet series ${F(s), G(s)}$ respectively, then their Dirichlet convolution ${f * g(n) := \sum_{d|n} f(d) g(\frac{n}{d})}$ has Dirichlet series ${F(s) G(s)}$ ; this is closely related to the well-known ability of the Fourier transform to convert convolutions to pointwise multiplication. Thus, for instance, ${\tau(n)}$ has Dirichlet series ${\zeta(s)^2}$ . Also, from (1) and the preceding discussion, we see that ${\Lambda(n)}$ has Dirichlet series ${-\zeta'(s)/\zeta(s)}$ (formally, at least). This already suggests that the von Mangoldt function will be sensitive to the zeroes of the zeta function.

An integral test computation closely related to (8) gives the asymptotic

$\displaystyle \zeta(s) = \frac{1}{s-1} + \gamma + O(s-1)$

for ${s}$ close to one (and ${\hbox{Re}(s) > 1}$ , to avoid issues of convergence). This implies that the Dirichlet series ${-\zeta'(s)/\zeta(s)}$ for ${\Lambda(n)}$ has asymptotic

$\displaystyle \frac{-\zeta'(s)}{\zeta(s)} = \frac{1}{s-1} - \gamma + O(s-1)$

thus giving support to (3); similarly, the Dirichlet series for ${\log n}$ and ${\tau(n)}$ have asymptotic

$\displaystyle -\zeta'(s) = \frac{1}{(s-1)^2} + O(1)$

and

$\displaystyle \zeta(s)^2 = \frac{1}{(s-1)^2} + \frac{2\gamma}{s-1} + O(1)$

which gives support to (5) (and is also consistent with (7), (9)).

Remark 1 One can connect the properties of Dirichlet series ${F(s)}$ more rigorously to asymptotics of partial sums ${\sum_{n \leq x} f(n)}$ by means of various transforms in Fourier analysis and complex analysis, in particular contour integration or the Hilbert transform, but this becomes somewhat technical and we will not do so here. I will remark, though, that asymptotics of ${F(s)}$ for ${s}$ close to ${1}$ are not enough, by themselves, to get really precise asymptotics for the sharply truncated partial sums ${\sum_{n \leq x} f(n)}$ , for reasons related to the uncertainty principle; in order to control such sums one also needs to understand the behaviour of ${F}$ far away from ${s=1}$ , and in particular for ${s=1+it}$ for large real ${t}$ . On the other hand, the asymptotics for ${F(s)}$ for ${s}$ near ${1}$ are just about all one needs to control smoothly truncated partial sums such as ${\sum_n f(n) \eta(n/x)}$ for suitable cutoff functions ${\eta}$ . Also, while Dirichlet series are very powerful tools, particularly with regards to understanding Dirichlet convolution identities, and controlling everything in terms of the zeroes and poles of such series, they do have the drawback that they do not easily encode such fundamental “physical space” facts as the pointwise inequalities ${|\mu(n)| \leq 1}$ and ${\Lambda(n) \geq 0}$ , which are also an important aspect of the theory.

— 2. Almost primes —

One can hope to make the above heuristics precise by applying the Möbius inversion formula

$\displaystyle 1_{n=1} = \sum_{d|n} \mu(d)$

where ${\mu(d)}$ is the Möbius function, defined as ${(-1)^k}$ when ${d}$ is the product of ${k}$ distinct primes for some ${k \geq 0}$ , and zero otherwise. In terms of Dirichlet series, we thus see that ${\mu}$ has the Dirichlet series of ${1/\zeta(s)}$ , and so can invert the divisor sum operation ${f(n) \mapsto \sum_{d|n} f(d)}$ (which corresponds to multiplication by ${\zeta(s)}$ ):

$\displaystyle f(n) = \sum_{m|n} \mu(m) ( \sum_{d|n/m} f(d) ).$

From (1) we then conclude

$\displaystyle \Lambda(n) = \sum_{d|n} \mu(d) \log \frac{n}{d} \ \ \ \ \ (10)$

while from ${\tau(n) = \sum_{d|n} 1}$ we have

$\displaystyle 1 = \sum_{d|n} \mu(d) \tau(\frac{n}{d}). \ \ \ \ \ (11)$

One can now hope to derive the prime number theorem (2) from the formulae (7), (9). Unfortunately, this doesn’t quite work: the prime number theorem is equivalent to the assertion

$\displaystyle \sum_{n \leq x} (\Lambda(n)-1) = o(x), \ \ \ \ \ (12)$

but if one inserts (10), (11) into the left-hand side of (12), one obtains

$\displaystyle \sum_{d \leq x} \mu(d) \sum_{m \leq x/d} (\log m - \tau(m)),$

which if one then inserts (7), (9) and the trivial bound ${\mu(d)=O(1)}$ , leads to

$\displaystyle 2C x \sum_{d \leq x} \frac{\mu(d)}{d} + O(x).$

As noted in this previous post, we have

$\displaystyle |\sum_{d \leq x} \frac{\mu(d)}{d}| \leq 1, \ \ \ \ \ (13)$

so we only obtain a bound of ${O(x)}$ for (12) instead of ${o(x)}$ . (A refinement of this argument, though, shows that the prime number theorem would follow if one had the asymptotic ${\sum_{n \leq x} \mu(n) = o(x)}$ , which is in fact equivalent to the prime number theorem; see the previous post.)

We remark that if one computed ${\sum_{n \leq x} \tau(n)}$ or ${\sum_{n \leq x} \Lambda(n)}$ by the above methods, one would eventually be led to a variant of (13), namely

$\displaystyle \sum_{d \leq x} \frac{\mu(d)}{d} \log \frac{x}{d} = O(1), \ \ \ \ \ (14)$

which is an estimate which will be useful later.

So we see that when trying to sum the von Mangoldt function ${\Lambda}$ by elementary means, the error term ${O(x)}$ overwhelms the main term ${x}$ . But there is a slight tweaking of the von Mangoldt function, the second von Mangoldt function ${\Lambda_2}$ , that increases the size of the main term to ${2x \log x}$ while keeping the error term at ${O(x)}$ , thus leading to a useful estimate; the price one pays for this is that this function is now a proxy for the almost primes rather than the primes. This function is defined by a variant of (10), namely

$\displaystyle \Lambda_2(n) = \sum_{d|n} \mu(d) \log^2 \frac{n}{d}. \ \ \ \ \ (15)$

It is not hard to see that ${\Lambda_2(n)}$ vanishes once ${n}$ has at least three distinct prime factors (basically because the quadratic function ${x \mapsto x^2}$ vanishes after being differentiated three or more times). Indeed, one can easily verify the identity

$\displaystyle \Lambda_2(n) = \Lambda(n) \log n + \Lambda*\Lambda(n) \ \ \ \ \ (16)$

(which corresponds to the Dirichlet series identity ${\zeta''(s)/\zeta(s) = -(-\zeta'(s)/\zeta(s))' + (-\zeta'(s)/\zeta(s))^2}$ ); the first term ${\Lambda(n) \log n}$ is mostly concentrated on primes, while the second term ${\Lambda*\Lambda(n)}$ is mostly concentrated on semiprimes (products of two distinct primes).

Now let us sum ${\Lambda_2(n)}$ . In analogy with the previous discussion, we will do so by comparing the function ${\log^2 n}$ with something involving the divisor function. In view of (5), it is reasonable to try the approximation

$\displaystyle \log^2 n \approx \tau(n) \log n;$

from the identity

$\displaystyle 2 \log n = \sum_{d|n} \mu(d) \tau(\frac{n}{d}) \log \frac{n}{d} \ \ \ \ \ (17)$

(which corresponds to the Dirichlet series identity ${- 2 \zeta'(s) = \frac{1}{\zeta(s)} -(\zeta^2(s))'}$ ) we thus expect

$\displaystyle \Lambda_2(n) \approx 2\log n. \ \ \ \ \ (18)$

Now we make these heuristics more precise. From the integral test we have

$\displaystyle \sum_{n \leq x} \log^2 n = x \log^2 x + C_1 x \log x + C_2 x + O(\log^2 x)$

while from (9) and summation by parts one has

$\displaystyle \sum_{n \leq x} \tau(n) \log n = x \log^2 x + C_3 x \log x + C_4 x + O(\sqrt{x} \log x)$

where ${C_1,C_2,C_3,C_4}$ are explicit absolute constants whose exact value is not important here. Thus

$\displaystyle \sum_{n \leq x} (\log^2 n - \tau(n) \log n) = C_5 x \log x + C_6 x + O(\sqrt{x} \log x) \ \ \ \ \ (19)$

for some other constants ${C_5, C_6}$ .

Meanwhile, from (15), (17) one has

$\displaystyle \sum_{n \leq x} (\Lambda_2(n) - 2\log(n)) = \sum_{d \leq x} \mu(d) \sum_{m \leq x/d} \log^2 n - \tau(n) \log n;$

applying (19), (13), (14) we see that the right-hand side is ${O(x)}$ . Computing ${\sum_{n \leq x} \log n}$ by the integral test, we deduce the Selberg symmetry formula

$\displaystyle \sum_{n \leq x} \Lambda_2(n) = 2 x \log x + O(x). \ \ \ \ \ (20)$

One can view (20) as the “almost prime number theorem” – the analogue of the prime number theorem for almost primes.

The fact that the almost primes have a relatively easy asymptotic, while the genuine primes do not, is a reflection of the parity problem in sieve theory; see this post for further discussion. The symmetry formula is however enough to get “within a factor of two” of the prime number theorem: if we discard the semiprimes ${\Lambda*\Lambda}$ from (16), we see that ${\Lambda(n) \log n \leq \Lambda_2(n)}$ , and thus

$\displaystyle \sum_{n \leq x} \Lambda(n) \log n \leq 2 x \log x + O(x)$

which by a summation by parts argument leads to

$\displaystyle 0 \leq \sum_{n \leq x} \Lambda(n) \leq 2 x + O(\frac{x}{\log x}),$

which is within a factor of ${2}$ of (2) in some sense.

One can “twist” all of the above arguments by a Dirichlet character ${\chi}$ . For instance, (15) twists to

$\displaystyle \Lambda_2(n) \chi(n) = \sum_{d|n} \mu(d)\chi(d) \log^2 \frac{n}{d} \chi(\frac{n}{d}).$

On the other hand, if ${\chi}$ is a non-principal character of modulus ${q}$ , then it has mean zero on any interval with length ${q}$ , and it is then not hard to establish the asymptotic

$\displaystyle \sum_{n \leq y} \log^2 n \chi(n) = O_q(\log^2 y).$

This soon leads to the twisted version of (20):

$\displaystyle \sum_{n \leq x} \Lambda_2(n) \chi(n) = O_q(x), \ \ \ \ \ (21)$

thus almost primes are asymptotically unbiased with respect to non-principal characters.

From the multiplicative Fourier analysis of Dirichlet characters modulo ${q}$ (and the observation that ${\Lambda_2}$ is quite small on residue classes not coprime to ${q}$ ) one then has an “almost prime number theorem in arithmetic progressions”:

$\displaystyle \sum_{n \leq x: n = a \mod q} \Lambda_2(n) = \frac{2}{\phi(q)} x \log x + O_q(x).$

As before, this lets us come within a factor of two of the actual prime number theorem in arithmetic progressions:

$\displaystyle \sum_{n \leq x: n = a \mod q} \Lambda(n) \leq \frac{2}{\phi(q)} x + O_q(\frac{x}{\log x}).$

One can also twist things by the completely multiplicative function ${n \mapsto n^{it}}$ , but with the caveat that the approximation ${2 \log n}$ to ${\Lambda_2(n)}$ can locally correlate with ${n^{it}}$ . Thus for instance one has

$\displaystyle \sum_{n \leq x} (\Lambda_2(n)-2 \log n) \chi(n) n^{it} = O_q(x)$

for any fixed ${t}$ and ${\chi}$ ; in particular, if ${\chi}$ is non-principal, one has

$\displaystyle \sum_{n \leq x} \Lambda_2(n) \chi(n) n^{it} = O_q(x).$

— 3. The all-or-nothing dichotomy —

To summarise so far, the almost primes (as represented by ${\Lambda_2}$ ) are quite uniformly distributed. These almost primes can be split up into the primes (as represented by ${\Lambda(n) \log n}$ ) and the semiprimes (as represented by ${\Lambda*\Lambda(n)}$ ), thanks to (16).

One can rewrite (16) as a recursive formula for ${\Lambda}$ :

$\displaystyle \Lambda(n) = \frac{1}{\log n} \Lambda_2(n) - \frac{1}{\log n} \Lambda * \Lambda(n). \ \ \ \ \ (22)$

One can also twist this formula by a character ${\chi}$ and/or a completely multiplicative function ${n \mapsto n^{it}}$ , thus for instance

$\displaystyle \Lambda \chi(n) = \frac{1}{\log n} \Lambda_2 \chi(n) - \frac{1}{\log n} \Lambda \chi * \Lambda \chi(n). \ \ \ \ \ (23)$

This recursion, combined with the uniform distribution properties on ${\Lambda_2}$ , lead to various all-or-nothing dichotomies for ${\Lambda}$ . Suppose, for instance, that ${\Lambda \chi}$ behaves like a constant ${c}$ on the average for some non-principal character ${\chi}$ :

$\displaystyle \Lambda \chi(n) \approx c.$

Then (from (5)) we expect ${\Lambda \chi * \Lambda \chi}$ to behave like ${c^2 \log n}$ , thus

$\displaystyle \frac{1}{\log n} \Lambda \chi * \Lambda \chi(n) \approx c^2.$

On the other hand, from (21), ${\frac{1}{\log n} \Lambda_2(n)}$ is asymptotically uncorrelated with ${\chi}$ :

$\displaystyle \frac{1}{\log n} \Lambda_2 \chi \approx 0.$

Putting all this together, one obtains

$\displaystyle c \approx -c^2$

which suggests that ${c}$ must be either close to ${0}$ , or close to ${-1}$ .

Basically, the point is that there are only two equilibria for the recursion (23). One equilibrium occurs when ${\Lambda}$ is asymptotically uncorrelated with ${\chi}$ ; the other is when it is completely anti-correlated with ${\chi}$ , so that ${\Lambda(n)}$ is supported primarily on those ${n}$ for which ${\chi(n)}$ is close to ${-1}$ . Note in the latter case ${\chi(n) \approx -1}$ for most primes ${n}$ , and thus ${\chi(n) \approx +1}$ for most semiprimes ${n}$ , thus leading to an equidistribution of ${\chi(n)}$ for almost primes (weighted by ${\Lambda_2}$ ). Any intermediate distribution of ${\Lambda \chi}$ would be inconsistent with the distribution of ${\Lambda_2 \chi}$ . (In terms of Dirichlet series, this assertion corresponds to the fact that the ${L}$ -function of ${\chi}$ either has a zero of order ${1}$ , or a zero of order ${0}$ (i.e. not a zero at all) at ${s=1}$ .)

A similar phenomenon occurs when twisting ${\Lambda}$ by ${n^{it}}$ ; basically, the average value of ${(\Lambda(n)-1) n^{it}}$ must asymptotically either be close to ${0}$ , or close to ${-1}$ ; no other asymptotic ends up being compatible with the distribution of ${(\Lambda_2(n)-2\log n) n^{it}}$ . (Again, this corresponds to the fact that the Riemann zeta function has a zero of order ${1}$ or ${0}$ at ${1+it}$ .) More generally, the average value of ${(\Lambda(n)-1) \chi(n) n^{it}}$ must asymptotically approach either ${0}$ or ${-1}$ .

Remark 2 One can make the above heuristics precise either by using Dirichlet series (and analytic continuation, and the theory of zeroes of meromorphic functions), or by smoothing out arithmetic functions such as ${\Lambda \chi}$ by a suitable multiplicative convolution with a mollifier (as is basically done in elementary proofs of the prime number theorem); see also the Granville-Soundararajan theory of pretentious characters for a closely related theory. We will not pursue these details here, however.

— 4. Dueling conspiracies —

In the previous section we have seen (heuristically, at least), that the von Mangoldt function ${\Lambda(n)}$ (or more precisely, ${\Lambda(n)-1}$ ) will either have no correlation, or a maximal amount of anti-correlation, with a completely multiplicative function such as ${\chi(n)}$ , ${n^{it}}$ , or ${\chi(n) n^{it}}$ . On the other hand, it is not possible for this function to maximally anti-correlate (or to conspire) with two such functions; thus the presence of one conspiracy excludes the presence of all others.

Suppose for instance that we had two distinct non-principal characters ${\chi, \chi'}$ for which one had maximal anti-correlation:

$\displaystyle \Lambda(n) \chi(n), \Lambda(n) \chi'(n) \approx -1.$

One could then combine the two statements to obtain

$\displaystyle \Lambda(n) (\chi(n) + \chi'(n)) \approx -2.$

Meanwhile, ${\frac{1}{\log n} \Lambda_2(n)}$ doesn’t correlate with either ${\chi}$ or ${\chi'}$ . It will be convenient to exploit this to normalise ${\Lambda}$ , obtaining

$\displaystyle (\Lambda(n) - \frac{1}{2\log n} \Lambda_2(n)) (\chi(n) + \chi'(n)) \approx -2.$

(Note from (3), (18) that we expect ${\Lambda(n) - \frac{1}{2\log n} \Lambda_2(n)}$ to have mean zero.)

On the other hand, since ${0 \leq \Lambda(n) \log n \leq \Lambda_2(n)}$ , one has

$\displaystyle |\Lambda(n) - \frac{1}{2\log n} \Lambda_2(n)| \leq \frac{1}{2\log n} \Lambda_2(n)$

and hence by the triangle inequality

$\displaystyle \Lambda_2(n) |\chi(n) + \chi'(n)| \gtrapprox 4 \log n$

in the sense that averages of the left-hand side should be at least as large as averages of the right-hand side. From this, (18), and Cauchy-Schwarz, one thus expects

$\displaystyle \Lambda_2(n) |\chi(n) + \chi'(n)|^2 \gtrapprox 8 \log n.$

But if one expands out the left-hand side using (18), (21), one only ends up with ${4 \log n + O_q(1)}$ on the average, a contradiction for ${n}$ sufficiently large.

Remark 3 The above argument belongs to a family of ${L^2}$ -based arguments which go by various names (almost orthogonality, ${TT^*}$ , large sieve, etc.). The ${L^2}$ argument can more generally be used to establish square-summability estimates on averages such as ${\frac{1}{x} \sum_{n \leq x} \Lambda(n) \chi(n)}$ as ${\chi}$ varies, but we will not make this precise here.

As one consequence of the above arguments, one can show that ${\Lambda(n)}$ cannot maximally anti-correlate with any non-real character ${\chi}$ , since (by the reality of ${\Lambda}$ ) it would then also maximally anti-correlate with the complex conjugate ${\overline{\chi}}$ , which is distinct from ${\chi}$ . A similar argument shows that ${\Lambda(n)}$ cannot maximally anti-correlate with ${n^{it}}$ for any non-zero ${t}$ , a fact which can soon lead to the prime number theorem, either by Dirichlet series methods, by Fourier-analytic means, or by elementary means. (Sketch of Fourier-analytic proof: ${L^2}$ methods provide ${L^2}$ -type bounds on the averages of ${\Lambda(n) n^{it}}$ in ${t}$ , while the above arguments show that these averages are also small in ${L^\infty}$ . Applying (22) a few times to take advantage of the smoothing effects of convolution, one eventually concludes that these averages can be made arbitrarily small in ${L^1}$ asymptotically, at which point the prime number theorem follows from Fourier inversion.)

Remark 4 There is a slightly different argument of an ${L^1}$ nature rather than an ${L^2}$ nature (i.e. using tools such as the triangle inequality, union bound, etc.) that can also achieve similar results. For instance, suppose that ${\Lambda(n)}$ maximally anti-correlates with ${\chi}$ and ${\chi'}$ . Then ${\chi(n), \chi'(n) \approx -1}$ for most primes ${n}$ , which implies that ${\chi \chi'(n) \approx +1}$ for most primes ${n}$ , which is incompatible with the all-or-nothing dichotomy unless ${\chi \chi'}$ is principal. This is an alternate way to exclude correlation with non-real characters. Similarly, if ${\Lambda(n) n^{it} \approx -1}$ , then ${\Lambda(n) n^{2it} \approx +1}$ , which is also incompatible with the zero-one law; this is essentially the method underlying the standard proof of the prime number theorem (which relates ${\zeta(1+it)}$ with ${\zeta(1+2it)}$ ).

— 5. Quadratic characters —

The one difficult scenario to eliminate is that of maximal anti-correlation with a real non-principal (i.e. quadratic) character ${\chi}$ , thus

$\displaystyle \Lambda(n) \chi(n) \approx -1.$

This scenario implies that the quantity

$\displaystyle L(1,\chi) := \sum_{n=1}^\infty \frac{\chi(n)}{n}$

vanishes. Indeed, if one starts with the identity

$\displaystyle \log n \chi(n) = \sum_{d|n} \Lambda \chi(d) \chi(\frac{n}{d})$

and sums in ${n}$ , one sees that

$\displaystyle \sum_{n \leq x} \log n \chi(n) = \sum_{d,m: dm \leq x} \Lambda \chi(d) \chi(m).$

The left-hand side is ${O_q( \log x)}$ by the mean zero and periodicity properties of ${\chi}$ . To estimate the right-hand side, we use the hyperbola method and rewrite it as

$\displaystyle \sum_{m \leq M} \chi(m) \sum_{d \leq x/m} \Lambda \chi(d) + \sum_{d \leq x/M} \Lambda \chi(d) \sum_{M < m \leq x/d} \chi(m)$

for some parameter ${M}$ (sufficiently slowly growing in ${x}$ ) to be optimised later. Writing ${\sum_{d \leq x/m} \Lambda \chi(d) = (-1 + o_q(1)) x/m}$ and ${\sum_{M < m \leq x/d} \chi(m) = O_q(1)}$ , we can express this as

$\displaystyle x( \sum_{m \leq M} \frac{\chi(m)}{m} + o_q(1) ) + O_q( x / M );$

sending ${x \rightarrow \infty}$ (and ${M \rightarrow \infty}$ at a slower rate) we conclude ${L(1,\chi)=0}$ as required.

It is remarkably difficult to show that ${L(1,\chi)}$ does not, in fact, vanish. One way to do this is to use the class number formula, that relates this quantity to the class number of the quadratic number field ${{\Bbb Q}(\sqrt{-d})}$ associated to the conductor ${d}$ of ${\chi}$ , together with some related number-theoretic quantities. A more elementary (but significantly weaker) method proceeds by using the easily verified fact that the convolution ${1*\chi}$ is non-negative, and is at least ${1}$ on the squares; this should be interpreted as a fact from algebraic number theory, and basically corresponds to the fact that the number of representations of an integer ${n}$ as the norm ${x^2 + d y^2}$ of an integer in ${{\Bbb Z}(\sqrt{d})}$ (or more generally, as the norm of an ideal in that ring) is non-negative, and is at least ${1}$ on the squares. In particular we have

$\displaystyle \sum_{n \leq x} \frac{1*\chi(n)}{\sqrt{n}} \geq \frac{1}{2} \log x + O(1).$

On the other hand, from the hyperbola method we can express the left-hand side as

$\displaystyle \sum_{d \leq \sqrt{x}} \frac{\chi(d)}{\sqrt{d}} \sum_{m \leq x/d} \frac{1}{\sqrt{m}} + \sum_{m < \sqrt{x}} \frac{1}{\sqrt{m}} \sum_{\sqrt{x} < d \leq x/m} \frac{\chi(d)}{\sqrt{d}}. \ \ \ \ \ (24)$

From the mean zero and periodicity properties of ${\chi}$ we have ${\sum_{\sqrt{x} < d \leq x/m} \frac{\chi(d)}{\sqrt{d}} = O_q( x^{-1/4} )}$ , so the second term in (24) is ${O_q(1)}$ . Meanwhile, from the midpoint rule, ${\sum_{m \leq y} \frac{1}{\sqrt{m}} = 2 \sqrt{y} - \frac{3}{2} + O( 1 / \sqrt{y} )}$ , and so the first term in (24) is

$\displaystyle 2 \sqrt{x} \sum_{d \leq \sqrt{x}} \frac{\chi(d)}{d} + O( |\sum_{d \leq \sqrt{x}} \frac{\chi(d)}{\sqrt{d}}| ) + O( 1 ) = 2 \sqrt{x} L(1,\chi) + O(1).$

Putting all this together we have

$\displaystyle \frac{1}{2} \log x + O(1) \leq 2 \sqrt{x} L(1,\chi) + O_q(1),$

which leads to a contradiction as ${x \rightarrow \infty}$ if ${L(1,\chi)}$ vanishes.

Note in fact that the above argument shows that ${L(1,\chi)}$ is positive. If one carefully computes the dependence of the above argument on the modulus ${q}$ , one obtains a lower bound of the form ${L(1,\chi) \geq \exp( -q^{1/2+o(1)} )}$ , which is quite poor. Using a non-trivial improvement on the error term in counting lattice points under the hyperbola (or better still, by smoothing the sum ${\sum_{n \leq x}}$ ), one can improve this a bit, to ${L(1,\chi) \geq q^{-O(1)}}$ . In contrast, the class number method gives a bound ${L(1,\chi) \geq q^{-1/2+o(1)}}$ .

We can improve this even further for all but at most one real primitive character ${\chi}$ :

Theorem 1 (Siegel’s theorem) For every ${\epsilon > 0}$ , one has ${L(1,\chi) \gg_\epsilon q^{-\epsilon}}$ for all but at most one real primitive character ${\chi}$ , where the implied constant is effective, and ${q}$ is the modulus of ${\chi}$ .

Throwing in this (hypothetical) one exceptional character, we conclude that ${L(1,\chi) \gg_\epsilon q^{-\epsilon}}$ for all real primitive characters ${\chi}$ , but now the implied constant is ineffective, which is the usual way in which Siegel’s theorem is formulated (but the above nearly effective refinement can be obtained by the same methods). It is a major open problem in the subject to eliminate this exceptional character and recover an effective estimate for some ${\epsilon < 1/2}$ .

Proof: Let ${\epsilon > 0}$ (which we can assume to be small), and let ${c > 0}$ be a small number depending (effectively) on ${\epsilon}$ to be chosen later. Our task is to show that ${L(1,\chi) \geq c q^{-\epsilon}}$ for all but at most one primitive real character ${\chi}$ . Note we may assume ${q}$ is large (effectively) depending on ${\epsilon}$ , as the claim follows from the previous bounds on ${L(1,\chi)}$ otherwise.

Suppose then for contradiction that ${L(1,\chi) < c q^{-\epsilon}}$ and ${L(1,\chi') < c (q')^{-\epsilon}}$ for two distinct primitive real characters ${\chi, \chi'}$ of (large) modulus ${q,q'}$ respectively.

We begin by modifying the proof that ${L(1,\chi)}$ was positive, which relied (among other things) on the observation that ${1 * \chi}$ , and equals ${1}$ at ${1}$ . In particular, one has

$\displaystyle \sum_{n \leq x} \frac{1*\chi(n)}{n^s} \geq 1 \ \ \ \ \ (25)$

for any ${x \geq 1}$ and any real ${s}$ . (One can get slightly better bounds by exploiting that ${1*\chi}$ is also at least ${1}$ on square numbers, as before, but this is really only useful for ${s \leq 1/2}$ , and we are now going to take ${s}$ much closer to ${1}$ .)

On the other hand, one has the asymptotics

$\displaystyle \sum_{n \leq x} \frac{1}{n^s} = \zeta(s) + \frac{x^{1-s}}{1-s} + O( x^{-s} )$

for any real ${s}$ close (but not equal) to ${1}$ , and similarly

$\displaystyle \sum_{n \leq x} \frac{\chi(n)}{n^s} = L(s,\chi) + O( q^{O(1)} x^{-s} )$

for any real ${s}$ close to ${1}$ ; similarly for ${\chi', \chi \chi'}$ . From the hyperbola method, we can then conclude

$\displaystyle \sum_{n \leq x} \frac{1*\chi(n)}{n^s} = \zeta(s) L(s,\chi) + \frac{x^{1-s}}{1-s} L(1,\chi) + O( q^{O(1)} x^{0.5-s} ) \ \ \ \ \ (26)$

for all real ${s}$ sufficiently close to ${1}$ . Indeed, one can expand the left-hand side of (26) as

$\displaystyle \sum_{d \leq \sqrt{x}} \frac{\chi(d)}{d^s} \sum_{m \leq x/d} \frac{1}{m^s} + \sum_{m < \sqrt{x}} \frac{1}{m^s} \sum_{\sqrt{x} < d \leq x/m} \frac{\chi(d)}{d^s}$

and the claim then follows from the previous asymptotics. (One can improve the error term by smoothing the summation, but we will not need to do so here.)

Now set ${x = C q^{C}}$ for a large absolute constant ${C}$ . If ${0.99 \leq s < 1}$ , then the error term in ${O( q^{O(1)} x^{0.5-s} )}$ is then at most ${1/2}$ (say) if ${C}$ is large enough. We conclude from (25) that

$\displaystyle \zeta(s) L(s,\chi) \geq \frac{1}{2} - O( \frac{q^{O(1-s)}}{1-s} L(1,\chi) )$

for ${0.99 \leq s < 1}$ . Since ${L(1,\chi) \leq c q^{-\epsilon}}$ and ${c}$ is assumed small (depending on ${\epsilon}$ ), this implies that ${\zeta(s) L(s,\chi)}$ is positive in the range

$\displaystyle L(1,\chi) \ll 1-s \ll \epsilon$

(this can be seen by checking the cases ${1-s \leq 1/\log q}$ and ${1-s > 1/\log q}$ separately). On the other hand, ${\zeta(s) L(s,\chi)}$ has a simple pole at ${s=1}$ with positive residue, and is thus negative for ${s<1}$ extremely close to ${1}$ . By the intermediate value theorem, we conclude that ${L(s,\chi)}$ has a zero for some ${s = 1 - O(L(1,\chi))}$ . Conversely, it is not difficult (using summation by parts) to show that ${L'(s,\chi) = O(\log^2 q)}$ for ${s = 1 - O( 1/\log q )}$ , and so by the mean value theorem we see that the zero of ${L(s,\chi)}$ must also obey ${1-s \gg L(1,\chi) / \log^2 q}$ . Thus ${L(s,\chi)}$ has a zero for some ${s < 1}$ with

$\displaystyle L(1,\chi)/\log^2 q \ll 1-s \ll L(1,\chi). \ \ \ \ \ (27)$

Similarly, ${L(s',\chi')}$ has a zero for some ${s' < 1}$ with

$\displaystyle L(1,\chi')/\log^2 q' \ll 1-s' \ll L(1,\chi'). \ \ \ \ \ (28)$

Now, we consider the function

$\displaystyle f := 1 * \chi * \chi' * \chi\chi'.$

One can also show that ${f}$ is non-negative and equals ${1}$ at ${1}$ , thus

$\displaystyle \sum_{n \leq x} \frac{f(n)}{n^s} \geq 1.$

(The algebraic number theory interpretation of this positivity is that ${f(n)}$ is the number of representations of ${n}$ as the norm of an ideal in the biquadratic field generated by ${\sqrt{q}}$ and ${\sqrt{q'}}$ .)

Also, by (a more complicated version of) the derivation of (26), one has

$\displaystyle \sum_{n \leq x} \frac{f(n)}{n^s} = \zeta(s) L(s,\chi) L(s,\chi') L(s,\chi\chi') + \frac{x^{1-s}}{1-s} L(1,\chi) L(1,\chi') L(1,\chi\chi') + O( (qq')^{O(1)} x^{0.9-s} )$

(say). Arguing as before, we conclude that

$\displaystyle \zeta(s) L(s,\chi) L(s,\chi') L(s,\chi\chi') \geq \frac{1}{2} - O( \frac{(qq')^{O(1-s)}}{1-s} L(1,\chi) L(1,\chi')L(1,\chi\chi') )$

for ${0.99 \leq s < 1}$ . Using the bound ${L(1,\chi\chi') \ll \log(qq')}$ (which can be established by summation by parts), we conclude that ${\zeta(s) L(s,\chi) L(s,\chi') L(s,\chi\chi')}$ is positive in the range

$\displaystyle L(1,\chi) L(1,\chi') \log(qq') \ll 1-s \ll \epsilon.$

Since we already know ${L(s,\chi)}$ and ${L(s',\chi')}$ have zeroes for some ${s, s'}$ obeying (27), (28)

$\displaystyle \frac{L(1,\chi)}{\log^2 q}, \frac{L(1,\chi')}{\log^2 q'} \ll L(1,\chi) L(1,\chi') \log(qq');$

taking geometric means and rearranging we obtain

$\displaystyle L(1,\chi) L(1,\chi') \gg \log(qq')^{-O(1)}.$

But this contradicts the hypotheses ${L(1,\chi) \leq c q^{-\epsilon}}$ , ${L(1,\chi') \leq c (q')^{-\epsilon}}$ if ${c}$ is small enough. $\Box$

Remark 5 Siegel’s theorem leads to a version of the prime number theorem in arithmetic progressions known as the Siegel-Walfisz theorem. As with Siegel’s theorem, the bounds are ineffective unless one is allowed to exclude a single exceptional modulus ${q}$ (and its multiples), in which case one has a modified prime number theorem which favours the quadratic nonresidues mod ${q}$ ; see this paper of Granville.

Remark 6 One can improve the effective bounds in Siegel’s theorem if one is allowed to exclude a larger set of bad moduli. For instance, the arguments in Section 4 allow one to establish a bound of the form ${L(1,\chi) \gg \log^{-O(1)} q}$ after excluding at most one ${q}$ in each hyper-dyadic range ${2^{100^k} \leq q \leq 2^{100^{k+1}}}$ for each ${k}$ ; one can of course replace ${100}$ by other exponents here, but at the cost of worsening the ${O(1)}$ term. (This is essentially an observation of Landau.)

35 comments

Comments feed for this article

24 September, 2009 at 7:05 pm

David Speyer

Very nice exposition!

(1) I remember reading that, before the classification of imaginary quadratic fields of class number 1 was proved, it was known that there could be at most one example in addition to the ones already known. Is that the same “at most one”?

(2) If you felt like expanding Remark 1 into a blog post, I would certainly be interested in reading it.

25 September, 2009 at 1:49 am

Terence Tao

(1) Yes, this is very closely related, though I think the discovery that there is at most one exceptional field of class number 1 predates Siegel’s theorem slightly. The elimination of this field is quite difficult (it can’t be done by the methods discussed here), but I don’t know many details.

(2) Actually I was already thinking of doing something along those lines, though perhaps not immediately…

26 September, 2009 at 6:06 pm

anon

Thanks for an excellent posting.

A minor expository point: the numbers 0 and 1 occur in the initial discussion of the “zero-one law”, in a way which (to any reader not already knowing the term) reinforces but also confuses the meaning of the term here. In the original probability context, 0 and 1 had meaning as probabilities, but in the present situation it just refers to an all-or-nothing dichotomy of possible asymptotic behaviors.

26 September, 2009 at 6:33 pm

Terence Tao

Hmm, that’s a fair point. It’s a shame that the term “zero-one law” is so narrowly defined, because I find it a useful phrase in other contexts. I wonder if anyone can suggest a replacement term; “all-or-nothing dichotomy” is OK, I guess, but perhaps a little clunky.

30 September, 2009 at 6:04 pm

Terence Tao

I’ve gone ahead and replaced “zero-one law” with “all-or-nothing dichotomy”; this is not my ideal terminology, but will serve as a placeholder until a better suggestion comes along.

I also fixed an error in which the rings $Z[\sqrt{q}]$ and $Z[\sqrt{-q}]$ were confused. (At some point in the future I plan to learn class field theory properly, and perhaps blog about it here; my algebraic number theory knowledge is unfortunately still rather superficial at present.)

1 October, 2009 at 6:52 pm

same anon

(from the anon who posted the earlier 0-1 comment)

Actually, “zero-one law” is too good a metaphor to not use. It works beautifully for any reader who already understands the meaning. Not only does it convey the idea, but seeing the term used here outside its traditional settings also suggests the idea of spreading that usage to other contexts.

The only caveat is that readers who aren’t already familiar with the idea would probably benefit if, at the first location in the essay where “0-1 law” appears, there were also:

— an aside, giving a definition or a few words of explanation (e.g., “borrowing a metaphor from the 0-1 laws of probability”);

— disambiguation, in the discussion that follows of the order of vanishing of the associated Dirichlet series being either 0 or 1 and this being an avatar of the 0-1 law, that these orders being “0” and “1” (in whichever order corresponds to the dichotomy in the 0-1 law!) is just a numerical peculiarity of this problem and is otherwise unrelated to the words in the term “zero-one law”.

I guess it also bears clarifying what a 0-1 law is, in general. I don’t think the meaning is simply that of a dichotomy, there is also some dynamics driving the outcome, asymptotically, to one of two antipodal equilibria (indeed you call the two values of $c$ “equilibria”, so maybe you share the same intuition). This is literally or tacitly true in every theorem I can remember being called a 0-1 law (in probability, random graphs, ergodic theory, information theory, etc), and it would be interesting to make a list and see whether this dynamical, asymptotic aspect is always part of the story.

26 September, 2009 at 11:05 pm

Anonymous

Near the beginning, when you say that the number of primes congruent to a mod q when a is not coprime to q is O(1), you can leave out the dependence of the implied constant on q. [Well spotted, thanks! -T.]

27 September, 2009 at 4:37 pm

Mustafa Said

One open problem in analytic number theory is to determine whether or not $\sum_1^{\infty} (-1)^n \frac{n}{p_n}$ converges. Here $p_n$ denotes the $n^th$ prime. The prime number theorem gives us some information on the asymptotics of the terms in the sum, but not enough to solve the problem. Would the validity of the Riemann Hypothesis help us solve this problem?

30 September, 2009 at 4:10 pm

Terence Tao

Hmm. I played around with this sum a bit and it seems like the convergence of this sum is equivalent to the convergence of the sum $\sum_{m=2}^\infty \frac{(-1)^{\pi(m)}}{m \log m}$ (by using the prime number theorem and estimating the difference between $\frac{n}{p_n}$ and $\frac{n+1}{p_{n+1}}$ for odd and even n.) So it comes down to understanding the equidistribution of the parity of the prime counting function; roughly speaking, any quantatitative equidistribution which gains more than a log log factor should suffice. By coincidence, we happen to be studying the parity of pi(x) over at the polymath4 project, though I don’t think the methods we have currently for computing that parity are strong enough to establish this sort of equidistribution. Nevertheless it does seem plausible (though difficult to prove) that one has convergence. (If one had a sufficiently uniform and quantitative version of the Poisson statistics conjecture for prime gaps, one might be able to obtain the result; in particular, a sufficiently quantitative version of the prime tuples conjecture might suffice, thanks to Gallagher’s computations relating the two.)

The Riemann hypothesis concerns larger-scale fluctuations in the primes, whereas here it seems that fine-scale fluctuations such as prime gaps statistics are more relevant.

7 March, 2010 at 6:14 pm

Mustafa Said

Can you please explain your statement “any quantatitative equidistribution (of the parity of the prime counting function) which gains more than a log log factor should suffice.”

I also have another question about the parity of the prime counting
function:

Let $E_N = { m \in \mathbb{N} : \pi(m) is even and m \leq N}$
and $O_N = { m \in \mathbb{N} : \pi(m) is odd and m \leq N}

then is it true that $\lim_{N \to \infty} |E_N|/|O_N| = 1$ ?

28 September, 2009 at 9:19 am

Anonymous

There are lots of ways you can use the phrase “all or nothing.” You can say all-or-nothing behavior, all-or-nothing phenomenon, all-or-nothing situation, etc. Another expression you can use in place of all or nothing is “black or white” — as in no shades of gray. One reason “all-or-nothing dichotomy” seems clunky is that it is redundant.

1 October, 2009 at 8:11 pm

Arithmetic Progression in finite fields

Professor Tao:
I have a particular problem that is of interest in electrical engineering. It seems to fall under the category of arithmetic progression but in finite fields.

Field field: F_{p^n} with n>1

What is the maximum N so that one can find N different polynomials
p_{i}(x) of degree in F_{p}[x] for i = 1,…,N so that the following
conditions are satisfied?

1) each p_{i}(x) has at least 2^n roots in F_{p^n} in arithmetic
progression. By AP what I mean is: let s be primitive element. If r is
root, then r, r+1, r+s, r+1+s,r+s^2, r+s^2+1, r+s^2+s,
r+s^2+s+1,…,r+s^(n-1)+s^(n-2)+…+1 are roots.
2) If i is not j, p_{i}(x) and p_{j}(x) should have no common roots
from the roots in arithmetic progression.

As n increases from 0 to infinity what is the highest such N one can
get for each F_{p^n}?

(I could not get more than N=5 for p=5. For p=3, N=1.)

Is there any mathematician who has worked on problems involving such progressions in finite fields?

Thank you

2 October, 2009 at 3:42 pm

At the AustMS conference « What’s new

[…] follows from (for instance) Siegel’s theorem on the size of the class group, as discussed in this previous post. Ellenberg, Venkatesh, and Westerland have recently established some partial results towards the […]

6 October, 2009 at 4:36 pm

Giampiero Campa

Hi Terry,
first of all i am an not a Mathematician, so i’d like to apologize in advance if the language i am using is not entirely correct or appropriate.

Let me put it this way, do we “understand”, say, the Mandelbrot fractal ? I mean, yes we do have a dynamic system that generates it, but as far as i know the only way we can determine whether a point belongs to the set or not is by iterating the dynamic system. Isn’t that somehow similar to the situation that we have with prime numbers where the only way to find out if a number is prime or not is by going through the sieve ? or simply dividing the number by all its predecessors ? (which i am pretty sure it can be done with a simple 1-state nonlinear dynamic system)

In other words, are we expecting too much by saying, as you did in your first sentence, that we want to “understand the distribution of prime numbers” ?

Again, maybe these are trivial questions and i just don’t know enough number theory, in which case i apologize.

28 June, 2012 at 7:23 am

Somewhat like, to see if a living creature is a human being, it’s a good idea to test its DNA, but perhaps it’s not very useful for the understanding of the distribution of people in the world…

6 November, 2009 at 7:49 am

The “no self-defeating object” argument « What’s new

[…] 1 Continuing the number-theoretic theme, the “dueling conspiracies” arguments in a previous blog post can also be viewed as an instance of this type of “no-self-defeating-object”; for […]

23 July, 2011 at 7:40 pm

Erdos’ divisor bound « What’s new

[…] requires one to exclude zeroes of the Riemann zeta function on the line , as discussed in this previous post), one is eventually left with the estimate , which is useless as a lower bound (and recovers only […]

21 August, 2011 at 12:45 pm

Hilbert’s seventh problem, and powers of 2 and 3 « What’s new

[…] (The reason the Thue-Siegel-Roth theorem is ineffective is because it relies heavily on the dueling conspiracies argument, i.e. playing off multiple “conspiracies” against each other; the other results […]

11 November, 2011 at 1:18 pm

Number Theory: Lecture 16 « Theorem of the week

[…] there was no known elementary proof when the first edition was written!). Terry Tao has a nice blog post discussing a number of aspects of the distribution of prime […]

19 April, 2012 at 7:23 pm

Sumit Kumar Jha

Professor Tao:
Your derivation of Selberg’s symmetry formula is the one which Selberg originally used in his derivation of an elementary proof of prime number theorem.
But in a paper by Iseki and Tatsukava, they easily derived
$\psi(x)\log x+\sum_{n\leq x}\Lambda(n)\psi\left(\frac{x}{n}\right)=2x\log x+O(x)$
and this is equivalent to
$\sum_{n\leq x}\Lambda_{2}(n)=2x\log x+O(x)$
where
$\Lambda_2=\Lambda'+\Lambda \ast \Lambda=\mu \ast u''$
Can you help me to deduce
$\sum_{p\leq x}\log^2 p+\sum_{pq \leq x} \log p\log q=2x\log x+O(x)$
from above equation.

19 April, 2012 at 7:38 pm

Sumit Kumar Jha

Actually,
$\Lambda_{2}(n)=\log^2 p$ when $n=p$
$\Lambda_{2}(n)=\log p\log{\frac{n^2}{p}}$ when $n=p^\alpha$
$\Lambda_{2}(n)=2\log p\log q$ when $n=p^\alpha q^\beta$
but how is it that
$\Lambda_{2}(n)=0$ in other cases.

20 April, 2012 at 5:15 am

Terence Tao

This follows from the formula $\Lambda_2 = \Lambda' + \Lambda*\Lambda$ in your previous comment.

20 April, 2012 at 7:22 am

Sumit Kumar Jha

Professor Tao: Thanks!!
I am also looking forward for an article on “sieves” on your blog. Have you already written one??
I especially want to know about the Bomberi sieve which treats
$\Lambda_{k}=\sum_{n|d}\mu(d)\log^k(\frac{n}{d})$ .
Can you also discuss the departure from “almost primes” to “primes” namely parity problem in detail??

20 May, 2012 at 2:58 pm

Heuristic limitations of the circle method « What’s new

[…] the only thing which is strong enough to rule out a conspiracy is a conflicting conspiracy (see my previous blog post on this topic). A good example of this is Heath-Brown’s result that the existence of a Siegel zero (which […]

27 June, 2012 at 8:25 am

What if primes hated to start with nine? « Secret Blogging Seminar

[…] appeared earlier in an answer I left on MO. There is also a lot of similarity to Terry Tao’s post on conspiracies and the prime number theorem. One can think of “primes hate to start with nine” as a “conspiracy” whose […]

27 November, 2012 at 9:04 pm

Sujit Nair

Typo: There should be a $O(\sqrt{x})$ instead of $O(1)$ in the rightmost term in the equation above (8)

[Corrected, thanks – T.]

19 July, 2013 at 2:31 pm

The Riemann hypothesis in various settings | What's new

[…] e.g. this previous blog post for more […]

15 November, 2013 at 3:15 am

Number Theory: Lecture 16 | Theorem of the week

[…] there was no known elementary proof when the first edition was written!). Terry Tao has a nice blog post discussing a number of aspects of the distribution of prime […]

11 December, 2013 at 11:50 am

Mertens’ theorems | What's new

[…] on the Riemann zeta function in the neighbourhood of the pole at , whereas (as discussed in this previous post) the prime number theorem requires control on the zeta function on (a neighbourhood of) the line . […]

7 July, 2014 at 2:20 am

The subspace theorem approach to Siegel’s theorem on integral points on curves via nonstandard analysis | What's new

[…] an effective bound may be easily established). This is ultimately due to the reliance on the “dueling conspiracy” (or “repulsion phenomenon”) strategy. We do not as yet have a good way to rule […]

24 September, 2014 at 2:29 pm

Derived multiplicative functions | What's new

[…] a key role in the Erdös-Selberg elementary proof of the prime number theorem (as discussed in this previous blog post), is the top order term of the […]

9 December, 2014 at 6:05 pm

Anonymous

Prof Tao

I was wondering how exactly the summation by parts works to get equation (2) from prime number theorem ? Thanks.

[This is an instructive exercise and I recommend that you try your hand at it. You can take a look at the proof of Theorem 36 of https://terrytao.wordpress.com/2014/11/23/254a-notes-1-elementary-multiplicative-number-theory/ for some related arguments. -T.]

17 June, 2015 at 7:50 am

The prime tuples conjecture, sieve theory, and the work of Goldston-Pintz-Yildirim, Motohashi-Pintz, and Zhang | fragments

[…] annoyingly, the implied constant here is only ineffectively bounded with current methods; see this previous post for further […]

12 November, 2015 at 8:20 am

Klaus Roth | What's new

[…] when the denominators are very different in size. (I refer to this sort of argument as a “dueling conspiracies” argument; they are strangely prevalent throughout analytic number […]

5 December, 2021 at 1:48 pm

Sam Li

Great post. Minor thing: The link to PlanetMath for “Dirichlet hyperbola method” has inevitably rotted.

[Corrected, thanks -T.]

	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Anonymous on 275A, Notes 3: The weak and st…
	Anonymous on It ought to be common knowledg…
	Ring Theory Intervie… on Reading seminar: “Stable…
	Anonymous on Work hard
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…

The prime number theorem in arithmetic progressions, and dueling conspiracies

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

35 comments

Leave a comment Cancel reply

For commenters

The prime number theorem in arithmetic progressions, and dueling conspiracies

Share this:

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

35 comments

Leave a comment Cancel reply

For commenters