You are currently browsing the tag archive for the ‘Siegel zero’ tag.

The twin prime conjecture is one of the oldest unsolved problems in analytic number theory. There are several reasons why this conjecture remains out of reach of current techniques, but the most important obstacle is the parity problem which prevents purely sieve-theoretic methods (or many other popular methods in analytic number theory, such as the circle method) from detecting pairs of prime twins in a way that can distinguish them from other twins of almost primes. The parity problem is discussed in these previous blog posts; this obstruction is ultimately powered by the Möbius pseudorandomness principle that asserts that the Möbius function {\mu} is asymptotically orthogonal to all “structured” functions (and in particular, to the weight functions constructed from sieve theory methods).

However, there is an intriguing “alternate universe” in which the Möbius function is strongly correlated with some structured functions, and specifically with some Dirichlet characters, leading to the existence of the infamous “Siegel zero“. In this scenario, the parity problem obstruction disappears, and it becomes possible, in principle, to attack problems such as the twin prime conjecture. In particular, we have the following result of Heath-Brown:

Theorem 1 At least one of the following two statements are true:

  • (Twin prime conjecture) There are infinitely many primes {p} such that {p+2} is also prime.
  • (No Siegel zeroes) There exists a constant {c>0} such that for every real Dirichlet character {\chi} of conductor {q > 1}, the associated Dirichlet {L}-function {s \mapsto L(s,\chi)} has no zeroes in the interval {[1-\frac{c}{\log q}, 1]}.

Informally, this result asserts that if one had an infinite sequence of Siegel zeroes, one could use this to generate infinitely many twin primes. See this survey of Friedlander and Iwaniec for more on this “illusory” or “ghostly” parallel universe in analytic number theory that should not actually exist, but is surprisingly self-consistent and to date proven to be impossible to banish from the realm of possibility.

The strategy of Heath-Brown’s proof is fairly straightforward to describe. The usual starting point is to try to lower bound

\displaystyle \sum_{x \leq n \leq 2x} \Lambda(n) \Lambda(n+2) \ \ \ \ \ (1)


for some large value of {x}, where {\Lambda} is the von Mangoldt function. Actually, in this post we will work with the slight variant

\displaystyle \sum_{x \leq n \leq 2x} \Lambda_2(n(n+2)) \nu(n(n+2))


\displaystyle \Lambda_2(n) = (\mu * L^2)(n) = \sum_{d|n} \mu(d) \log^2 \frac{n}{d}

is the second von Mangoldt function, and {*} denotes Dirichlet convolution, and {\nu} is an (unsquared) Selberg sieve that damps out small prime factors. This sum also detects twin primes, but will lead to slightly simpler computations. For technical reasons we will also smooth out the interval {x \leq n \leq 2x} and remove very small primes from {n}, but we will skip over these steps for the purpose of this informal discussion. (In Heath-Brown’s original paper, the Selberg sieve {\nu} is essentially replaced by the more combinatorial restriction {1_{(n(n+2),q^{1/C}\#)=1}} for some large {C}, where {q^{1/C}\#} is the primorial of {q^{1/C}}, but I found the computations to be slightly easier if one works with a Selberg sieve, particularly if the sieve is not squared to make it nonnegative.)

If there is a Siegel zero {L(\beta,\chi)=0} with {\beta} close to {1} and {\chi} a Dirichlet character of conductor {q}, then multiplicative number theory methods can be used to show that the Möbius function {\mu} “pretends” to be like the character {\chi} in the sense that {\mu(p) \approx \chi(p)} for “most” primes {p} near {q} (e.g. in the range {q^\varepsilon \leq p \leq q^C} for some small {\varepsilon>0} and large {C>0}). Traditionally, one uses complex-analytic methods to demonstrate this, but one can also use elementary multiplicative number theory methods to establish these results (qualitatively at least), as will be shown below the fold.

The fact that {\mu} pretends to be like {\chi} can be used to construct a tractable approximation (after inserting the sieve weight {\nu}) in the range {[x,2x]} (where {x = q^C} for some large {C}) for the second von Mangoldt function {\Lambda_2}, namely the function

\displaystyle \tilde \Lambda_2(n) := (\chi * L)(n) = \sum_{d|n} \chi(d) \log^2 \frac{n}{d}.

Roughly speaking, we think of the periodic function {\chi} and the slowly varying function {\log^2} as being of about the same “complexity” as the constant function {1}, so that {\tilde \Lambda_2} is roughly of the same “complexity” as the divisor function

\displaystyle \tau(n) := (1*1)(n) = \sum_{d|n} 1,

which is considerably simpler to obtain asymptotics for than the von Mangoldt function as the Möbius function is no longer present. (For instance, note from the Dirichlet hyperbola method that one can estimate {\sum_{x \leq n \leq 2x} \tau(n)} to accuracy {O(\sqrt{x})} with little difficulty, whereas to obtain a comparable level of accuracy for {\sum_{x \leq n \leq 2x} \Lambda(n)} or {\sum_{x \leq n \leq 2x} \Lambda_2(n)} is essentially the Riemann hypothesis.)

One expects {\tilde \Lambda_2(n)} to be a good approximant to {\Lambda_2(n)} if {n} is of size {O(x)} and has no prime factors less than {q^{1/C}} for some large constant {C}. The Selberg sieve {\nu} will be mostly supported on numbers with no prime factor less than {q^{1/C}}. As such, one can hope to approximate (1) by the expression

\displaystyle \sum_{x \leq n \leq 2x} \tilde \Lambda_2(n(n+2)) \nu(n(n+2)); \ \ \ \ \ (2)


as it turns out, the error between this expression and (1) is easily controlled by sieve-theoretic techniques. Let us ignore the Selberg sieve for now and focus on the slightly simpler sum

\displaystyle \sum_{x \leq n \leq 2x} \tilde \Lambda_2(n(n+2)).

As discussed above, this sum should be thought of as a slightly more complicated version of the sum

\displaystyle \sum_{x \leq n \leq 2x} \tau(n(n+2)). \ \ \ \ \ (3)


Accordingly, let us look (somewhat informally) at the task of estimating the model sum (3). One can think of this problem as basically that of counting solutions to the equation {ab+2=cd} with {a,b,c,d} in various ranges; this is clearly related to understanding the equidistribution of the hyperbola {\{ (a,b) \in {\bf Z}/d{\bf Z}: ab + 2 = 0 \hbox{ mod } d \}} in {({\bf Z}/d{\bf Z})^2}. Taking Fourier transforms, the latter problem is closely related to estimation of the Kloosterman sums

\displaystyle \sum_{m \in ({\bf Z}/r{\bf Z})^\times} e( \frac{a_1 m + a_2 \overline{m}}{r} )

where {\overline{m}} denotes the inverse of {m} in {({\bf Z}/r{\bf Z})^\times}. One can then use the Weil bound

\displaystyle \sum_{m \in ({\bf Z}/r{\bf Z})^\times} e( \frac{am+b\overline{m}}{r} ) \ll r^{1/2 + o(1)} (a,b,r)^{1/2} \ \ \ \ \ (4)


where {(a,b,r)} is the greatest common divisor of {a,b,r} (with the convention that this is equal to {r} if {a,b} vanish), and the {o(1)} decays to zero as {r \rightarrow \infty}. The Weil bound yields good enough control on error terms to estimate (3), and as it turns out the same method also works to estimate (2) (provided that {x=q^C} with {C} large enough).

Actually one does not need the full strength of the Weil bound here; any power savings over the trivial bound of {r} will do. In particular, it will suffice to use the weaker, but easier to prove, bounds of Kloosterman:

Lemma 2 (Kloosterman bound) One has

\displaystyle \sum_{m \in ({\bf Z}/r{\bf Z})^\times} e( \frac{am+b\overline{m}}{r} ) \ll r^{3/4 + o(1)} (a,b,r)^{1/4} \ \ \ \ \ (5)


whenever {r \geq 1} and {a,b} are coprime to {r}, where the {o(1)} is with respect to the limit {r \rightarrow \infty} (and is uniform in {a,b}).

Proof: Observe from change of variables that the Kloosterman sum {\sum_{m \in ({\bf Z}/r{\bf Z})^\times} e( \frac{am+b\overline{m}}{r} )} is unchanged if one replaces {(a,b)} with {(\lambda a, \lambda^{-1} b)} for {\lambda \in ({\bf Z}/d{\bf Z})^\times}. For fixed {a,b}, the number of such pairs {(\lambda a, \lambda^{-1} b)} is at least {r^{1-o(1)} / (a,b,r)}, thanks to the divisor bound. Thus it will suffice to establish the fourth moment bound

\displaystyle \sum_{a,b \in {\bf Z}/r{\bf Z}} |\sum_{m \in ({\bf Z}/r{\bf Z})^\times} e\left( \frac{am+b\overline{m}}{r} \right)|^4 \ll d^{4+o(1)}.

The left-hand side can be rearranged as

\displaystyle \sum_{m_1,m_2,m_3,m_4 \in ({\bf Z}/r{\bf Z})^\times} \sum_{a,b \in {\bf Z}/d{\bf Z}}

\displaystyle e\left( \frac{a(m_1+m_2-m_3-m_4) + b(\overline{m_1}+\overline{m_2}-\overline{m_3}-\overline{m_4})}{r} \right)

which by Fourier summation is equal to

\displaystyle d^2 \# \{ (m_1,m_2,m_3,m_4) \in (({\bf Z}/r{\bf Z})^\times)^4:

\displaystyle m_1+m_2-m_3-m_4 = \frac{1}{m_1} + \frac{1}{m_2} - \frac{1}{m_3} - \frac{1}{m_4} = 0 \hbox{ mod } r \}.

Observe from the quadratic formula and the divisor bound that each pair {(x,y)\in ({\bf Z}/r{\bf Z})^2} has at most {O(r^{o(1)})} solutions {(m_1,m_2)} to the system of equations {m_1+m_2=x; \frac{1}{m_1} + \frac{1}{m_2} = y}. Hence the number of quadruples {(m_1,m_2,m_3,m_4)} of the desired form is {r^{2+o(1)}}, and the claim follows. \Box

We will also need another easy case of the Weil bound to handle some other portions of (2):

Lemma 3 (Easy Weil bound) Let {\chi} be a primitive real Dirichlet character of conductor {q}, and let {a,b,c,d \in{\bf Z}/q{\bf Z}}. Then

\displaystyle \sum_{n \in {\bf Z}/q{\bf Z}} \chi(an+b) \chi(cn+d) \ll q^{o(1)} (ad-bc, q).

Proof: As {q} is the conductor of a primitive real Dirichlet character, {q} is equal to {2^j} times a squarefree odd number for some {j \leq 3}. By the Chinese remainder theorem, it thus suffices to establish the claim when {q} is an odd prime. We may assume that {ad-bc} is not divisible by this prime {q}, as the claim is trivial otherwise. If {a} vanishes then {c} does not vanish, and the claim follows from the mean zero nature of {\chi}; similarly if {c} vanishes. Hence we may assume that {a,c} do not vanish, and then we can normalise them to equal {1}. By completing the square it now suffices to show that

\displaystyle \sum_{n \in {\bf Z}/p{\bf Z}} \chi( n^2 - b ) \ll 1

whenever {b \neq 0 \hbox{ mod } p}. As {\chi} is {+1} on the quadratic residues and {-1} on the non-residues, it now suffices to show that

\displaystyle \# \{ (m,n) \in ({\bf Z}/p{\bf Z})^2: n^2 - b = m^2 \} = p + O(1).

But by making the change of variables {(x,y) = (n+m,n-m)}, the left-hand side becomes {\# \{ (x,y) \in ({\bf Z}/p{\bf Z})^2: xy=b\}}, and the claim follows. \Box

While the basic strategy of Heath-Brown’s argument is relatively straightforward, implementing it requires a large amount of computation to control both main terms and error terms. I experimented for a while with rearranging the argument to try to reduce the amount of computation; I did not fully succeed in arriving at a satisfactorily minimal amount of superfluous calculation, but I was able to at least reduce this amount a bit, mostly by replacing a combinatorial sieve with a Selberg-type sieve (which was not needed to be positive, so I dispensed with the squaring aspect of the Selberg sieve to simplify the calculations a little further; also for minor reasons it was convenient to retain a tiny portion of the combinatorial sieve to eliminate extremely small primes). Also some modest reductions in complexity can be obtained by using the second von Mangoldt function {\Lambda_2(n(n+2))} in place of {\Lambda(n) \Lambda(n+2)}. These exercises were primarily for my own benefit, but I am placing them here in case they are of interest to some other readers.

Read the rest of this entry »

In Notes 1, we approached multiplicative number theory (the study of multiplicative functions {f: {\bf N} \rightarrow {\bf C}} and their relatives) via elementary methods, in which attention was primarily focused on obtaining asymptotic control on summatory functions {\sum_{n \leq x} f(n)} and logarithmic sums {\sum_{n \leq x} \frac{f(n)}{n}}. Now we turn to the complex approach to multiplicative number theory, in which the focus is instead on obtaining various types of control on the Dirichlet series {{\mathcal D} f}, defined (at least for {s} of sufficiently large real part) by the formula

\displaystyle  {\mathcal D} f(s) := \sum_n \frac{f(n)}{n^s}.

These series also made an appearance in the elementary approach to the subject, but only for real {s} that were larger than {1}. But now we will exploit the freedom to extend the variable {s} to the complex domain; this gives enough freedom (in principle, at least) to recover control of elementary sums such as {\sum_{n\leq x} f(n)} or {\sum_{n\leq x} \frac{f(n)}{n}} from control on the Dirichlet series. Crucially, for many key functions {f} of number-theoretic interest, the Dirichlet series {{\mathcal D} f} can be analytically (or at least meromorphically) continued to the left of the line {\{ s: \hbox{Re}(s) = 1 \}}. The zeroes and poles of the resulting meromorphic continuations of {{\mathcal D} f} (and of related functions) then turn out to control the asymptotic behaviour of the elementary sums of {f}; the more one knows about the former, the more one knows about the latter. In particular, knowledge of where the zeroes of the Riemann zeta function {\zeta} are located can give very precise information about the distribution of the primes, by means of a fundamental relationship known as the explicit formula. There are many ways of phrasing this explicit formula (both in exact and in approximate forms), but they are all trying to formalise an approximation to the von Mangoldt function {\Lambda} (and hence to the primes) of the form

\displaystyle  \Lambda(n) \approx 1 - \sum_\rho n^{\rho-1} \ \ \ \ \ (1)

where the sum is over zeroes {\rho} (counting multiplicity) of the Riemann zeta function {\zeta = {\mathcal D} 1} (with the sum often restricted so that {\rho} has large real part and bounded imaginary part), and the approximation is in a suitable weak sense, so that

\displaystyle  \sum_n \Lambda(n) g(n) \approx \int_0^\infty g(y)\ dy - \sum_\rho \int_0^\infty g(y) y^{\rho-1}\ dy \ \ \ \ \ (2)

for suitable “test functions” {g} (which in practice are restricted to be fairly smooth and slowly varying, with the precise amount of restriction dependent on the amount of truncation in the sum over zeroes one wishes to take). Among other things, such approximations can be used to rigorously establish the prime number theorem

\displaystyle  \sum_{n \leq x} \Lambda(n) = x + o(x) \ \ \ \ \ (3)

as {x \rightarrow \infty}, with the size of the error term {o(x)} closely tied to the location of the zeroes {\rho} of the Riemann zeta function.

The explicit formula (1) (or any of its more rigorous forms) is closely tied to the counterpart approximation

\displaystyle  -\frac{\zeta'}{\zeta}(s) \approx \frac{1}{s-1} - \sum_\rho \frac{1}{s-\rho} \ \ \ \ \ (4)

for the Dirichlet series {{\mathcal D} \Lambda = -\frac{\zeta'}{\zeta}} of the von Mangoldt function; note that (4) is formally the special case of (2) when {g(n) = n^{-s}}. Such approximations come from the general theory of local factorisations of meromorphic functions, as discussed in Supplement 2; the passage from (4) to (2) is accomplished by such tools as the residue theorem and the Fourier inversion formula, which were also covered in Supplement 2. The relative ease of uncovering the Fourier-like duality between primes and zeroes (sometimes referred to poetically as the “music of the primes”) is one of the major advantages of the complex-analytic approach to multiplicative number theory; this important duality tends to be rather obscured in the other approaches to the subject, although it can still in principle be discernible with sufficient effort.

More generally, one has an explicit formula

\displaystyle  \Lambda(n) \chi(n) \approx - \sum_\rho n^{\rho-1} \ \ \ \ \ (5)

for any (non-principal) Dirichlet character {\chi}, where {\rho} now ranges over the zeroes of the associated Dirichlet {L}-function {L(s,\chi) := {\mathcal D} \chi(s)}; we view this formula as a “twist” of (1) by the Dirichlet character {\chi}. The explicit formula (5), proven similarly (in any of its rigorous forms) to (1), is important in establishing the prime number theorem in arithmetic progressions, which asserts that

\displaystyle  \sum_{n \leq x: n = a\ (q)} \Lambda(n) = \frac{x}{\phi(q)} + o(x) \ \ \ \ \ (6)

as {x \rightarrow \infty}, whenever {a\ (q)} is a fixed primitive residue class. Again, the size of the error term {o(x)} here is closely tied to the location of the zeroes of the Dirichlet {L}-function, with particular importance given to whether there is a zero very close to {s=1} (such a zero is known as an exceptional zero or Siegel zero).

While any information on the behaviour of zeta functions or {L}-functions is in principle welcome for the purposes of analytic number theory, some regions of the complex plane are more important than others in this regard, due to the differing weights assigned to each zero in the explicit formula. Roughly speaking, in descending order of importance, the most crucial regions on which knowledge of these functions is useful are

  1. The region on or near the point {s=1}.
  2. The region on or near the right edge {\{ 1+it: t \in {\bf R} \}} of the critical strip {\{ s: 0 \leq \hbox{Re}(s) \leq 1 \}}.
  3. The right half {\{ s: \frac{1}{2} < \hbox{Re}(s) < 1 \}} of the critical strip.
  4. The region on or near the critical line {\{ \frac{1}{2} + it: t \in {\bf R} \}} that bisects the critical strip.
  5. Everywhere else.

For instance:

  1. We will shortly show that the Riemann zeta function {\zeta} has a simple pole at {s=1} with residue {1}, which is already sufficient to recover much of the classical theorems of Mertens discussed in the previous set of notes, as well as results on mean values of multiplicative functions such as the divisor function {\tau}. For Dirichlet {L}-functions, the behaviour is instead controlled by the quantity {L(1,\chi)} discussed in Notes 1, which is in turn closely tied to the existence and location of a Siegel zero.
  2. The zeta function is also known to have no zeroes on the right edge {\{1+it: t \in {\bf R}\}} of the critical strip, which is sufficient to prove (and is in fact equivalent to) the prime number theorem. Any enlargement of the zero-free region for {\zeta} into the critical strip leads to improved error terms in that theorem, with larger zero-free regions leading to stronger error estimates. Similarly for {L}-functions and the prime number theorem in arithmetic progressions.
  3. The (as yet unproven) Riemann hypothesis prohibits {\zeta} from having any zeroes within the right half {\{ s: \frac{1}{2} < \hbox{Re}(s) < 1 \}} of the critical strip, and gives very good control on the number of primes in intervals, even when the intervals are relatively short compared to the size of the entries. Even without assuming the Riemann hypothesis, zero density estimates in this region are available that give some partial control of this form. Similarly for {L}-functions, primes in short arithmetic progressions, and the generalised Riemann hypothesis.
  4. Assuming the Riemann hypothesis, further distributional information about the zeroes on the critical line (such as Montgomery’s pair correlation conjecture, or the more general GUE hypothesis) can give finer information about the error terms in the prime number theorem in short intervals, as well as other arithmetic information. Again, one has analogues for {L}-functions and primes in short arithmetic progressions.
  5. The functional equation of the zeta function describes the behaviour of {\zeta} to the left of the critical line, in terms of the behaviour to the right of the critical line. This is useful for building a “global” picture of the structure of the zeta function, and for improving a number of estimates about that function, but (in the absence of unproven conjectures such as the Riemann hypothesis or the pair correlation conjecture) it turns out that many of the basic analytic number theory results using the zeta function can be established without relying on this equation. Similarly for {L}-functions.

Remark 1 If one takes an “adelic” viewpoint, one can unite the Riemann zeta function {\zeta(\sigma+it) = \sum_n n^{-\sigma-it}} and all of the {L}-functions {L(\sigma+it,\chi) = \sum_n \chi(n) n^{-\sigma-it}} for various Dirichlet characters {\chi} into a single object, viewing {n \mapsto \chi(n) n^{-it}} as a general multiplicative character on the adeles; thus the imaginary coordinate {t} and the Dirichlet character {\chi} are really the Archimedean and non-Archimedean components respectively of a single adelic frequency parameter. This viewpoint was famously developed in Tate’s thesis, which among other things helps to clarify the nature of the functional equation, as discussed in this previous post. We will not pursue the adelic viewpoint further in these notes, but it does supply a “high-level” explanation for why so much of the theory of the Riemann zeta function extends to the Dirichlet {L}-functions. (The non-Archimedean character {\chi(n)} and the Archimedean character {n^{it}} behave similarly from an algebraic point of view, but not so much from an analytic point of view; as such, the adelic viewpoint is well suited for algebraic tasks (such as establishing the functional equation), but not for analytic tasks (such as establishing a zero-free region).)

Roughly speaking, the elementary multiplicative number theory from Notes 1 corresponds to the information one can extract from the complex-analytic method in region 1 of the above hierarchy, while the more advanced elementary number theory used to prove the prime number theorem (and which we will not cover in full detail in these notes) corresponds to what one can extract from regions 1 and 2.

As a consequence of this hierarchy of importance, information about the {\zeta} function away from the critical strip, such as Euler’s identity

\displaystyle  \zeta(2) = \frac{\pi^2}{6}

or equivalently

\displaystyle  1 + \frac{1}{2^2} + \frac{1}{3^2} + \dots = \frac{\pi^2}{6}

or the infamous identity

\displaystyle  \zeta(-1) = -\frac{1}{12},

which is often presented (slightly misleadingly, if one’s conventions for divergent summation are not made explicit) as

\displaystyle  1 + 2 + 3 + \dots = -\frac{1}{12},

are of relatively little direct importance in analytic prime number theory, although they are still of interest for some other, non-number-theoretic, applications. (The quantity {\zeta(2)} does play a minor role as a normalising factor in some asymptotics, see e.g. Exercise 28 from Notes 1, but its precise value is usually not of major importance.) In contrast, the value {L(1,\chi)} of an {L}-function at {s=1} turns out to be extremely important in analytic number theory, with many results in this subject relying ultimately on a non-trivial lower-bound on this quantity coming from Siegel’s theorem, discussed below the fold.

For a more in-depth treatment of the topics in this set of notes, see Davenport’s “Multiplicative number theory“.

Read the rest of this entry »