You are currently browsing the tag archive for the ‘prime number theorem’ tag.

A basic object of study in multiplicative number theory are the arithmetic functions: functions {f: {\bf N} \rightarrow {\bf C}} from the natural numbers to the complex numbers. Some fundamental examples of such functions include

Given an arithmetic function {f}, we are often interested in statistics such as the summatory function

\displaystyle \sum_{n \leq x} f(n), \ \ \ \ \ (1)


the logarithmically (or harmonically) weighted summatory function

\displaystyle \sum_{n \leq x} \frac{f(n)}{n}, \ \ \ \ \ (2)


or the Dirichlet series

\displaystyle {\mathcal D}[f](s) := \sum_n \frac{f(n)}{n^s}.

In the latter case, one typically has to first restrict {s} to those complex numbers whose real part is large enough in order to ensure the series on the right converges; but in many important cases, one can then extend the Dirichlet series to almost all of the complex plane by analytic continuation. One is also interested in correlations involving additive shifts, such as {\sum_{n \leq x} f(n) f(n+h)}, but these are significantly more difficult to study and cannot be easily estimated by the methods of classical multiplicative number theory.

A key operation on arithmetic functions is that of Dirichlet convolution, which when given two arithmetic functions {f,g: {\bf N} \rightarrow {\bf C}}, forms a new arithmetic function {f*g: {\bf N} \rightarrow {\bf C}}, defined by the formula

\displaystyle f*g(n) := \sum_{d|n} f(d) g(\frac{n}{d}).

Thus for instance {1*1 = d_2}, {1 * \Lambda = L}, {1 * \mu = \delta}, and {\delta * f = f} for any arithmetic function {f}. Dirichlet convolution and Dirichlet series are related by the fundamental formula

\displaystyle {\mathcal D}[f * g](s) = {\mathcal D}[f](s) {\mathcal D}[g](s), \ \ \ \ \ (3)


at least when the real part of {s} is large enough that all sums involved become absolutely convergent (but in practice one can use analytic continuation to extend this identity to most of the complex plane). There is also the identity

\displaystyle {\mathcal D}[Lf](s) = - \frac{d}{ds} {\mathcal D}[f](s), \ \ \ \ \ (4)


at least when the real part of {s} is large enough to justify interchange of differentiation and summation. As a consequence, many Dirichlet series can be expressed in terms of the Riemann zeta function {\zeta = {\mathcal D}[1]}, thus for instance

\displaystyle {\mathcal D}[d_2](s) = \zeta^2(s)

\displaystyle {\mathcal D}[L](s) = - \zeta'(s)

\displaystyle {\mathcal D}[\delta](s) = 1

\displaystyle {\mathcal D}[\mu](s) = \frac{1}{\zeta(s)}

\displaystyle {\mathcal D}[\Lambda](s) = -\frac{\zeta'(s)}{\zeta(s)}.

Much of the difficulty of multiplicative number theory can be traced back to the discrete nature of the natural numbers {{\bf N}}, which form a rather complicated abelian semigroup with respect to multiplication (in particular the set of generators is the set of prime numbers). One can obtain a simpler analogue of the subject by working instead with the half-infinite interval {{\bf N}_\infty := [1,+\infty)}, which is a much simpler abelian semigroup under multiplication (being a one-dimensional Lie semigroup). (I will think of this as a sort of “completion” of {{\bf N}} at the infinite place {\infty}, hence the terminology.) Accordingly, let us define a continuous arithmetic function to be a locally integrable function {f: {\bf N}_\infty \rightarrow {\bf C}}. The analogue of the summatory function (1) is then an integral

\displaystyle \int_1^x f(t)\ dt,

and similarly the analogue of (2) is

\displaystyle \int_1^x \frac{f(t)}{t}\ dt.

The analogue of the Dirichlet series is the Mellin-type transform

\displaystyle {\mathcal D}_\infty[f](s) := \int_1^\infty \frac{f(t)}{t^s}\ dt,

which will be well-defined at least if the real part of {s} is large enough and if the continuous arithmetic function {f: {\bf N}_\infty \rightarrow {\bf C}} does not grow too quickly, and hopefully will also be defined elsewhere in the complex plane by analytic continuation.

For instance, the continuous analogue of the discrete constant function {1: {\bf N} \rightarrow {\bf C}} would be the constant function {1_\infty: {\bf N}_\infty \rightarrow {\bf C}}, which maps any {t \in [1,+\infty)} to {1}, and which we will denote by {1_\infty} in order to keep it distinct from {1}. The two functions {1_\infty} and {1} have approximately similar statistics; for instance one has

\displaystyle \sum_{n \leq x} 1 = \lfloor x \rfloor \approx x-1 = \int_1^x 1\ dt


\displaystyle \sum_{n \leq x} \frac{1}{n} = H_{\lfloor x \rfloor} \approx \log x = \int_1^x \frac{1}{t}\ dt

where {H_n} is the {n^{th}} harmonic number, and we are deliberately vague as to what the symbol {\approx} means. Continuing this analogy, we would expect

\displaystyle {\mathcal D}[1](s) = \zeta(s) \approx \frac{1}{s-1} = {\mathcal D}_\infty[1_\infty](s)

which reflects the fact that {\zeta} has a simple pole at {s=1} with residue {1}, and no other poles. Note that the identity {{\mathcal D}_\infty[1_\infty](s) = \frac{1}{s-1}} is initially only valid in the region {\mathrm{Re} s > 1}, but clearly the right-hand side can be continued analytically to the entire complex plane except for the pole at {1}, and so one can define {{\mathcal D}_\infty[1_\infty]} in this region also.

In a similar vein, the logarithm function {L: {\bf N} \rightarrow {\bf C}} is approximately similar to the logarithm function {L_\infty: {\bf N}_\infty \rightarrow {\bf C}}, giving for instance the crude form

\displaystyle \sum_{n \leq x} L(n) = \log \lfloor x \rfloor! \approx x \log x - x = \int_1^\infty L_\infty(t)\ dt

of Stirling’s formula, or the Dirichlet series approximation

\displaystyle {\mathcal D}[L](s) = -\zeta'(s) \approx \frac{1}{(s-1)^2} = {\mathcal D}_\infty[L_\infty](s).

The continuous analogue of Dirichlet convolution is multiplicative convolution using the multiplicative Haar measure {\frac{dt}{t}}: given two continuous arithmetic functions {f_\infty, g_\infty: {\bf N}_\infty \rightarrow {\bf C}}, one can define their convolution {f_\infty *_\infty g_\infty: {\bf N}_\infty \rightarrow {\bf C}} by the formula

\displaystyle f_\infty *_\infty g_\infty(t) := \int_1^t f_\infty(s) g_\infty(\frac{t}{s}) \frac{ds}{s}.

Thus for instance {1_\infty * 1_\infty = L_\infty}. A short computation using Fubini’s theorem shows the analogue

\displaystyle D_\infty[f_\infty *_\infty g_\infty](s) = D_\infty[f_\infty](s) D_\infty[g_\infty](s)

of (3) whenever the real part of {s} is large enough that Fubini’s theorem can be justified; similarly, differentiation under the integral sign shows that

\displaystyle D_\infty[L_\infty f_\infty](s) = -\frac{d}{ds} D_\infty[f_\infty](s) \ \ \ \ \ (5)


again assuming that the real part of {s} is large enough that differentiation under the integral sign (or some other tool like this, such as the Cauchy integral formula for derivatives) can be justified.

Direct calculation shows that for any complex number {\rho}, one has

\displaystyle \frac{1}{s-\rho} = D_\infty[ t \mapsto t^{\rho-1} ](s)

(at least for the real part of {s} large enough), and hence by several applications of (5)

\displaystyle \frac{1}{(s-\rho)^k} = D_\infty[ t \mapsto \frac{1}{(k-1)!} t^{\rho-1} \log^{k-1} t ](s)

for any natural number {k}. This can lead to the following heuristic: if a Dirichlet series {D[f](s)} behaves like a linear combination of poles {\frac{1}{(s-\rho)^k}}, in that

\displaystyle D[f](s) \approx \sum_\rho \frac{c_\rho}{(s-\rho)^{k_\rho}}

for some set {\rho} of poles and some coefficients {c_\rho} and natural numbers {k_\rho} (where we again are vague as to what {\approx} means, and how to interpret the sum {\sum_\rho} if the set of poles is infinite), then one should expect the arithmetic function {f} to behave like the continuous arithmetic function

\displaystyle t \mapsto \sum_\rho \frac{c_\rho}{(k_\rho-1)!} t^{\rho-1} \log^{k_\rho-1} t.

In particular, if we only have simple poles,

\displaystyle D[f](s) \approx \sum_\rho \frac{c_\rho}{s-\rho}

then we expect to have {f} behave like continuous arithmetic function

\displaystyle t \mapsto \sum_\rho c_\rho t^{\rho-1}.

Integrating this from {1} to {x}, this heuristically suggests an approximation

\displaystyle \sum_{n \leq x} f(n) \approx \sum_\rho c_\rho \frac{x^\rho-1}{\rho}

for the summatory function, and similarly

\displaystyle \sum_{n \leq x} \frac{f(n)}{n} \approx \sum_\rho c_\rho \frac{x^{\rho-1}-1}{\rho-1},

with the convention that {\frac{x^\rho-1}{\rho}} is {\log x} when {\rho=0}, and similarly {\frac{x^{\rho-1}-1}{\rho-1}} is {\log x} when {\rho=1}. One can make these sorts of approximations more rigorous by means of Perron’s formula (or one of its variants) combined with the residue theorem, provided that one has good enough control on the relevant Dirichlet series, but we will not pursue these rigorous calculations here. (But see for instance this previous blog post for some examples.)

For instance, using the more refined approximation

\displaystyle \zeta(s) \approx \frac{1}{s-1} + \gamma

to the zeta function near {s=1}, we have

\displaystyle {\mathcal D}[d_2](s) = \zeta^2(s) \approx \frac{1}{(s-1)^2} + \frac{2 \gamma}{s-1}

we would expect that

\displaystyle d_2 \approx L_\infty + 2 \gamma

and thus for instance

\displaystyle \sum_{n \leq x} d_2(n) \approx x \log x - x + 2 \gamma x

which matches what one actually gets from the Dirichlet hyperbola method (see e.g. equation (44) of this previous post).

Or, noting that {\zeta(s)} has a simple pole at {s=1} and assuming simple zeroes elsewhere, the log derivative {-\zeta'(s)/\zeta(s)} will have simple poles of residue {+1} at {s=1} and {-1} at all the zeroes, leading to the heuristic

\displaystyle {\mathcal D}[\Lambda](s) = -\frac{\zeta'(s)}{\zeta(s)} \approx \frac{1}{s-1} - \sum_\rho \frac{1}{s-\rho}

suggesting that {\Lambda} should behave like the continuous arithmetic function

\displaystyle t \mapsto 1 - \sum_\rho t^{\rho-1}

leading for instance to the summatory approximation

\displaystyle \sum_{n \leq x} \Lambda(n) \approx x - \sum_\rho \frac{x^\rho-1}{\rho}

which is a heuristic form of the Riemann-von Mangoldt explicit formula (see Exercise 45 of these notes for a rigorous version of this formula).

Exercise 1 Go through some of the other explicit formulae listed at this Wikipedia page and give heuristic justifications for them (up to some lower order terms) by similar calculations to those given above.

Given the “adelic” perspective on number theory, I wonder if there are also {p}-adic analogues of arithmetic functions to which a similar set of heuristics can be applied, perhaps to study sums such as {\sum_{n \leq x: n = a \hbox{ mod } p^j} f(n)}. A key problem here is that there does not seem to be any good interpretation of the expression {\frac{1}{t^s}} when {s} is complex and {t} is a {p}-adic number, so it is not clear that one can analyse a Dirichlet series {p}-adically. For similar reasons, we don’t have a canonical way to define {\chi(t)} for a Dirichlet character {\chi} (unless its conductor happens to be a power of {p}), so there doesn’t seem to be much to say in the {q}-aspect either.

In analytic number theory, there is a well known analogy between the prime factorisation of a large integer, and the cycle decomposition of a large permutation; this analogy is central to the topic of “anatomy of the integers”, as discussed for instance in this survey article of Granville. Consider for instance the following two parallel lists of facts (stated somewhat informally). Firstly, some facts about the prime factorisation of large integers:

  • Every positive integer {m} has a prime factorisation

    \displaystyle  m = p_1 p_2 \dots p_r

    into (not necessarily distinct) primes {p_1,\dots,p_r}, which is unique up to rearrangement. Taking logarithms, we obtain a partition

    \displaystyle  \log m = \log p_1 + \log p_2 + \dots + \log p_r

    of {\log m}.

  • (Prime number theorem) A randomly selected integer {m} of size {m \sim N} will be prime with probability {\approx \frac{1}{\log N}} when {N} is large.
  • If {m \sim N} is a randomly selected large integer of size {N}, and {p = p_i} is a randomly selected prime factor of {m = p_1 \dots p_r} (with each index {i} being chosen with probability {\frac{\log p_i}{\log m}}), then {\log p_i} is approximately uniformly distributed between {0} and {\log N}. (See Proposition 9 of this previous blog post.)
  • The set of real numbers {\{ \frac{\log p_i}{\log m}: i=1,\dots,r \}} arising from the prime factorisation {m = p_1 \dots p_r} of a large random number {m \sim N} converges (away from the origin, and in a suitable weak sense) to the Poisson-Dirichlet process in the limit {N \rightarrow \infty}. (See the previously mentioned blog post for a definition of the Poisson-Dirichlet process, and a proof of this claim.)

Now for the facts about the cycle decomposition of large permutations:

  • Every permutation {\sigma \in S_n} has a cycle decomposition

    \displaystyle  \sigma = C_1 \dots C_r

    into disjoint cycles {C_1,\dots,C_r}, which is unique up to rearrangement, and where we count each fixed point of {\sigma} as a cycle of length {1}. If {|C_i|} is the length of the cycle {C_i}, we obtain a partition

    \displaystyle  n = |C_1| + \dots + |C_r|

    of {n}.

  • (Prime number theorem for permutations) A randomly selected permutation of {S_n} will be an {n}-cycle with probability exactly {1/n}. (This was noted in this previous blog post.)
  • If {\sigma} is a random permutation in {S_n}, and {C_i} is a randomly selected cycle of {\sigma} (with each {i} being selected with probability {|C_i|/n}), then {|C_i|} is exactly uniformly distributed on {\{1,\dots,n\}}. (See Proposition 8 of this blog post.)
  • The set of real numbers {\{ \frac{|C_i|}{n} \}} arising from the cycle decomposition {\sigma = C_1 \dots C_r} of a random permutation {\sigma \in S_n} converges (in a suitable sense) to the Poisson-Dirichlet process in the limit {n \rightarrow \infty}. (Again, see this previous blog post for details.)

See this previous blog post (or the aforementioned article of Granville, or the Notices article of Arratia, Barbour, and Tavaré) for further exploration of the analogy between prime factorisation of integers and cycle decomposition of permutations.

There is however something unsatisfying about the analogy, in that it is not clear why there should be such a kinship between integer prime factorisation and permutation cycle decomposition. It turns out that the situation is clarified if one uses another fundamental analogy in number theory, namely the analogy between integers and polynomials {P \in {\mathbf F}_q[T]} over a finite field {{\mathbf F}_q}, discussed for instance in this previous post; this is the simplest case of the more general function field analogy between number fields and function fields. Just as we restrict attention to positive integers when talking about prime factorisation, it will be reasonable to restrict attention to monic polynomials {P}. We then have another analogous list of facts, proven very similarly to the corresponding list of facts for the integers:

  • Every monic polynomial {f \in {\mathbf F}_q[T]} has a factorisation

    \displaystyle  f = P_1 \dots P_r

    into irreducible monic polynomials {P_1,\dots,P_r \in {\mathbf F}_q[T]}, which is unique up to rearrangement. Taking degrees, we obtain a partition

    \displaystyle  \hbox{deg} f = \hbox{deg} P_1 + \dots + \hbox{deg} P_r

    of {\hbox{deg} f}.

  • (Prime number theorem for polynomials) A randomly selected monic polynomial {f \in {\mathbf F}_q[T]} of degree {n} will be irreducible with probability {\approx \frac{1}{n}} when {q} is fixed and {n} is large.
  • If {f \in {\mathbf F}_q[T]} is a random monic polynomial of degree {n}, and {P_i} is a random irreducible factor of {f = P_1 \dots P_r} (with each {i} selected with probability {\hbox{deg} P_i / n}), then {\hbox{deg} P_i} is approximately uniformly distributed in {\{1,\dots,n\}} when {q} is fixed and {n} is large.
  • The set of real numbers {\{ \hbox{deg} P_i / n \}} arising from the factorisation {f = P_1 \dots P_r} of a randomly selected polynomial {f \in {\mathbf F}_q[T]} of degree {n} converges (in a suitable sense) to the Poisson-Dirichlet process when {q} is fixed and {n} is large.

The above list of facts addressed the large {n} limit of the polynomial ring {{\mathbf F}_q[T]}, where the order {q} of the field is held fixed, but the degrees of the polynomials go to infinity. This is the limit that is most closely analogous to the integers {{\bf Z}}. However, there is another interesting asymptotic limit of polynomial rings to consider, namely the large {q} limit where it is now the degree {n} that is held fixed, but the order {q} of the field goes to infinity. Actually to simplify the exposition we will use the slightly more restrictive limit where the characteristic {p} of the field goes to infinity (again keeping the degree {n} fixed), although all of the results proven below for the large {p} limit turn out to be true as well in the large {q} limit.

The large {q} (or large {p}) limit is technically a different limit than the large {n} limit, but in practice the asymptotic statistics of the two limits often agree quite closely. For instance, here is the prime number theorem in the large {q} limit:

Theorem 1 (Prime number theorem) The probability that a random monic polynomial {f \in {\mathbf F}_q[T]} of degree {n} is irreducible is {\frac{1}{n}+o(1)} in the limit where {n} is fixed and the characteristic {p} goes to infinity.

Proof: There are {q^n} monic polynomials {f \in {\mathbf F}_q[T]} of degree {n}. If {f} is irreducible, then the {n} zeroes of {f} are distinct and lie in the finite field {{\mathbf F}_{q^n}}, but do not lie in any proper subfield of that field. Conversely, every element {\alpha} of {{\mathbf F}_{q^n}} that does not lie in a proper subfield is the root of a unique monic polynomial in {{\mathbf F}_q[T]} of degree {f} (the minimal polynomial of {\alpha}). Since the union of all the proper subfields of {{\mathbf F}_{q^n}} has size {o(q^n)}, the total number of irreducible polynomials of degree {n} is thus {\frac{q^n - o(q^n)}{n}}, and the claim follows. \Box

Remark 2 The above argument and inclusion-exclusion in fact gives the well known exact formula {\frac{1}{n} \sum_{d|n} \mu(\frac{n}{d}) q^d} for the number of irreducible monic polynomials of degree {n}.

Now we can give a precise connection between the cycle distribution of a random permutation, and (the large {p} limit of) the irreducible factorisation of a polynomial, giving a (somewhat indirect, but still connected) link between permutation cycle decomposition and integer factorisation:

Theorem 3 The partition {\{ \hbox{deg}(P_1), \dots, \hbox{deg}(P_r) \}} of a random monic polynomial {f= P_1 \dots P_r\in {\mathbf F}_q[T]} of degree {n} converges in distribution to the partition {\{ |C_1|, \dots, |C_r|\}} of a random permutation {\sigma = C_1 \dots C_r \in S_n} of length {n}, in the limit where {n} is fixed and the characteristic {p} goes to infinity.

We can quickly prove this theorem as follows. We first need a basic fact:

Lemma 4 (Most polynomials square-free in large {q} limit) A random monic polynomial {f \in {\mathbf F}_q[T]} of degree {n} will be square-free with probability {1-o(1)} when {n} is fixed and {q} (or {p}) goes to infinity. In a similar spirit, two randomly selected monic polynomials {f,g} of degree {n,m} will be coprime with probability {1-o(1)} if {n,m} are fixed and {q} or {p} goes to infinity.

Proof: For any polynomial {g} of degree {m}, the probability that {f} is divisible by {g^2} is at most {1/q^{2m}}. Summing over all polynomials of degree {1 \leq m \leq n/2}, and using the union bound, we see that the probability that {f} is not squarefree is at most {\sum_{1 \leq m \leq n/2} \frac{q^m}{q^{2m}} = o(1)}, giving the first claim. For the second, observe from the first claim (and the fact that {fg} has only a bounded number of factors) that {fg} is squarefree with probability {1-o(1)}, giving the claim. \Box

Now we can prove the theorem. Elementary combinatorics tells us that the probability of a random permutation {\sigma \in S_n} consisting of {c_k} cycles of length {k} for {k=1,\dots,r}, where {c_k} are nonnegative integers with {\sum_{k=1}^r k c_k = n}, is precisely

\displaystyle  \frac{1}{\prod_{k=1}^r c_k! k^{c_k}},

since there are {\prod_{k=1}^r c_k! k^{c_k}} ways to write a given tuple of cycles {C_1,\dots,C_r} in cycle notation in nondecreasing order of length, and {n!} ways to select the labels for the cycle notation. On the other hand, by Theorem 1 (and using Lemma 4 to isolate the small number of cases involving repeated factors) the number of monic polynomials of degree {n} that are the product of {c_k} irreducible polynomials of degree {k} is

\displaystyle  \frac{1}{\prod_{k=1}^r c_k!} \prod_{k=1}^r ( (\frac{1}{k}+o(1)) q^k )^{c_k} + o( q^n )

which simplifies to

\displaystyle  \frac{1+o(1)}{\prod_{k=1}^r c_k! k^{c_k}} q^n,

and the claim follows.

This was a fairly short calculation, but it still doesn’t quite explain why there is such a link between the cycle decomposition {\sigma = C_1 \dots C_r} of permutations and the factorisation {f = P_1 \dots P_r} of a polynomial. One immediate thought might be to try to link the multiplication structure of permutations in {S_n} with the multiplication structure of polynomials; however, these structures are too dissimilar to set up a convincing analogy. For instance, the multiplication law on polynomials is abelian and non-invertible, whilst the multiplication law on {S_n} is (extremely) non-abelian but invertible. Also, the multiplication of a degree {n} and a degree {m} polynomial is a degree {n+m} polynomial, whereas the group multiplication law on permutations does not take a permutation in {S_n} and a permutation in {S_m} and return a permutation in {S_{n+m}}.

I recently found (after some discussions with Ben Green) what I feel to be a satisfying conceptual (as opposed to computational) explanation of this link, which I will place below the fold.

Read the rest of this entry »

In the previous set of notes, we saw how zero-density theorems for the Riemann zeta function, when combined with the zero-free region of Vinogradov and Korobov, could be used to obtain prime number theorems in short intervals. It turns out that a more sophisticated version of this type of argument also works to obtain prime number theorems in arithmetic progressions, in particular establishing the celebrated theorem of Linnik:

Theorem 1 (Linnik’s theorem) Let {a\ (q)} be a primitive residue class. Then {a\ (q)} contains a prime {p} with {p \ll q^{O(1)}}.

In fact it is known that one can find a prime {p} with {p \ll q^{5}}, a result of Xylouris. For sake of comparison, recall from Exercise 65 of Notes 2 that the Siegel-Walfisz theorem gives this theorem with a bound of {p \ll \exp( q^{o(1)} )}, and from Exercise 48 of Notes 2 one can obtain a bound of the form {p \ll \phi(q)^2 \log^2 q} if one assumes the generalised Riemann hypothesis. The probabilistic random models from Supplement 4 suggest that one should in fact be able to take {p \ll q^{1+o(1)}}.

We will not aim to obtain the optimal exponents for Linnik’s theorem here, and follow the treatment in Chapter 18 of Iwaniec and Kowalski. We will in fact establish the following more quantitative result (a special case of a more powerful theorem of Gallagher), which splits into two cases, depending on whether there is an exceptional zero or not:

Theorem 2 (Quantitative Linnik theorem) Let {a\ (q)} be a primitive residue class for some {q \geq 2}. For any {x > 1}, let {\psi(x;q,a)} denote the quantity

\displaystyle  \psi(x;q,a) := \sum_{n \leq x: n=a\ (q)} \Lambda(n).

Assume that {x \geq q^C} for some sufficiently large {C}.

  • (i) (No exceptional zero) If all the real zeroes {\beta} of {L}-functions {L(\cdot,\chi)} of real characters {\chi} of modulus {q} are such that {1-\beta \gg \frac{1}{\log q}}, then

    \displaystyle  \psi(x;q,a) = \frac{x}{\phi(q)} ( 1 + O( \exp( - c \frac{\log x}{\log q} ) ) + O( \frac{\log^2 q}{q} ) )

    for all {x \geq 1} and some absolute constant {c>0}.

  • (ii) (Exceptional zero) If there is a zero {\beta} of an {L}-function {L(\cdot,\chi_1)} of a real character {\chi_1} of modulus {q} with {\beta = 1 - \frac{\varepsilon}{\log q}} for some sufficiently small {\varepsilon>0}, then

    \displaystyle  \psi(x;q,a) = \frac{x}{\phi(q)} ( 1 - \chi_1(a) \frac{x^{\beta-1}}{\beta} \ \ \ \ \ (1)

    \displaystyle + O( \exp( - c \frac{\log x}{\log q} \log \frac{1}{\varepsilon} ) )

    \displaystyle  + O( \frac{\log^2 q}{q} ) )

    for all {x \geq 1} and some absolute constant {c>0}.

The implied constants here are effective.

Note from the Landau-Page theorem (Exercise 54 from Notes 2) that at most one exceptional zero exists (if {\varepsilon} is small enough). A key point here is that the error term {O( \exp( - c \frac{\log x}{\log q} \log \frac{1}{\varepsilon} ) )} in the exceptional zero case is an improvement over the error term when no exceptional zero is present; this compensates for the potential reduction in the main term coming from the {\chi_1(a) \frac{x^{\beta-1}}{\beta}} term. The splitting into cases depending on whether an exceptional zero exists or not turns out to be an essential technique in many advanced results in analytic number theory (though presumably such a splitting will one day become unnecessary, once the possibility of exceptional zeroes are finally eliminated for good).

Exercise 3 Assuming Theorem 2, and assuming {x \geq q^C} for some sufficiently large absolute constant {C}, establish the lower bound

\displaystyle  \psi(x;a,q) \gg \frac{x}{\phi(q)}

when there is no exceptional zero, and

\displaystyle  \psi(x;a,q) \gg \varepsilon \frac{x}{\phi(q)}

when there is an exceptional zero {\beta = 1 - \frac{\varepsilon}{\log q}}. Conclude that Theorem 2 implies Theorem 1, regardless of whether an exceptional zero exists or not.

Remark 4 The Brun-Titchmarsh theorem (Exercise 33 from Notes 4), in the sharp form of Montgomery and Vaughan, gives that

\displaystyle  \pi(x; q, a) \leq 2 \frac{x}{\phi(q) \log (x/q)}

for any primitive residue class {a\ (q)} and any {x \geq q}. This is (barely) consistent with the estimate (1). Any lowering of the coefficient {2} in the Brun-Titchmarsh inequality (with reasonable error terms), in the regime when {x} is a large power of {q}, would then lead to at least some elimination of the exceptional zero case. However, this has not led to any progress on the Landau-Siegel zero problem (and may well be just a reformulation of that problem). (When {x} is a relatively small power of {q}, some improvements to Brun-Titchmarsh are possible that are not in contradiction with the presence of an exceptional zero; see this paper of Maynard for more discussion.)

Theorem 2 is deduced in turn from facts about the distribution of zeroes of {L}-functions. Recall from the truncated explicit formula (Exercise 45(iv) of Notes 2) with (say) {T := q^2} that

\displaystyle  \sum_{n \leq x} \Lambda(n) \chi(n) = - \sum_{\hbox{Re}(\rho) > 3/4; |\hbox{Im}(\rho)| \leq q^2; L(\rho,\chi)=0} \frac{x^\rho}{\rho} + O( \frac{x}{q^2} \log^2 q)

for any non-principal character {\chi} of modulus {q}, where we assume {x \geq q^C} for some large {C}; for the principal character one has the same formula with an additional term of {x} on the right-hand side (as is easily deduced from Theorem 21 of Notes 2). Using the Fourier inversion formula

\displaystyle  1_{n = a\ (q)} = \frac{1}{\phi(q)} \sum_{\chi\ (q)} \overline{\chi(a)} \chi(n)

(see Theorem 69 of Notes 1), we thus have

\displaystyle  \psi(x;a,q) = \frac{x}{\phi(q)} ( 1 - \sum_{\chi\ (q)} \overline{\chi(a)} \sum_{\hbox{Re}(\rho) > 3/4; |\hbox{Im}(\rho)| \leq q^2; L(\rho,\chi)=0} \frac{x^{\rho-1}}{\rho}

\displaystyle  + O( \frac{\log^2 q}{q} ) )

and so it suffices by the triangle inequality (bounding {1/\rho} very crudely by {O(1)}, as the contribution of the low-lying zeroes already turns out to be quite dominant) to show that

\displaystyle  \sum_{\chi\ (q)} \sum_{\sigma > 3/4; |t| \leq q^2; L(\sigma+it,\chi)=0} x^{\sigma-1} \ll \exp( - c \frac{\log x}{\log q} ) \ \ \ \ \ (2)

when no exceptional zero is present, and

\displaystyle  \sum_{\chi\ (q)} \sum_{\sigma > 3/4; |t| \leq q^2; L(\sigma+it,\chi)=0; \sigma+it \neq \beta} x^{\sigma-1} \ll \exp( - c \frac{\log x}{\log q} \log \frac{1}{\varepsilon} ) \ \ \ \ \ (3)

when an exceptional zero is present.

To handle the former case (2), one uses two facts about zeroes. The first is the classical zero-free region (Proposition 51 from Notes 2), which we reproduce in our context here:

Proposition 5 (Classical zero-free region) Let {q, T \geq 2}. Apart from a potential exceptional zero {\beta}, all zeroes {\sigma+it} of {L}-functions {L(\cdot,\chi)} with {\chi} of modulus {q} and {|t| \leq T} are such that

\displaystyle  \sigma \leq 1 - \frac{c}{\log qT}

for some absolute constant {c>0}.

Using this zero-free region, we have

\displaystyle  x^{\sigma-1} \ll \log x \int_{1/2}^{1-c/\log q} 1_{\alpha < \sigma} x^{\alpha-1}\ d\alpha

whenever {\sigma} contributes to the sum in (2), and so the left-hand side of (2) is bounded by

\displaystyle  \ll \log x \int_{1/2}^{1 - c/\log q} N( \alpha, q, q^2 ) x^{\alpha-1}\ d\alpha

where we recall that {N(\alpha,q,T)} is the number of zeroes {\sigma+it} of any {L}-function of a character {\chi} of modulus {q} with {\sigma \geq \alpha} and {0 \leq t \leq T} (here we use conjugation symmetry to make {t} non-negative, accepting a multiplicative factor of two).

In Exercise 25 of Notes 6, the grand density estimate

\displaystyle  N(\alpha,q,T) \ll (qT)^{4(1-\alpha)} \log^{O(1)}(qT) \ \ \ \ \ (4)

is proven. If one inserts this bound into the above expression, one obtains a bound for (2) which is of the form

\displaystyle  \ll (\log^{O(1)} q) \exp( - c \frac{\log x}{\log q} ).

Unfortunately this is off from what we need by a factor of {\log^{O(1)} q} (and would lead to a weak form of Linnik’s theorem in which {p} was bounded by {O( \exp( \log^{O(1)} q ) )} rather than by {q^{O(1)}}). In the analogous problem for prime number theorems in short intervals, we could use the Vinogradov-Korobov zero-free region to compensate for this loss, but that region does not help here for the contribution of the low-lying zeroes with {t = O(1)}, which as mentioned before give the dominant contribution. Fortunately, it is possible to remove this logarithmic loss from the zero-density side of things:

Theorem 6 (Log-free grand density estimate) For any {q, T > 1} and {1/2 \leq \alpha \leq 1}, one has

\displaystyle  N(\alpha,q,T) \ll (qT)^{O(1-\alpha)}.

The implied constants are effective.

We prove this estimate below the fold. The proof follows the methods of the previous section, but one inserts various sieve weights to restrict sums over natural numbers to essentially become sums over “almost primes”, as this turns out to remove the logarithmic losses. (More generally, the trick of restricting to almost primes by inserting suitable sieve weights is quite useful for avoiding any unnecessary losses of logarithmic factors in analytic number theory estimates.)

Exercise 7 Use Theorem 6 to complete the proof of (2).

Now we turn to the case when there is an exceptional zero (3). The argument used to prove (2) applies here also, but does not gain the factor of {\log \frac{1}{\varepsilon}} in the exponent. To achieve this, we need an additional tool, a version of the Deuring-Heilbronn repulsion phenomenon due to Linnik:

Theorem 8 (Deuring-Heilbronn repulsion phenomenon) Suppose {q \geq 2} is such that there is an exceptional zero {\beta = 1 - \frac{\varepsilon}{\log q}} with {\varepsilon} small. Then all other zeroes {\sigma+it} of {L}-functions of modulus {q} are such that

\displaystyle  \sigma \leq 1 - c \frac{\log \frac{1}{\varepsilon}}{\log(q(2+|t|))}.

In other words, the exceptional zero enlarges the classical zero-free region by a factor of {\log \frac{1}{\varepsilon}}. The implied constants are effective.

Exercise 9 Use Theorem 6 and Theorem 8 to complete the proof of (3), and thus Linnik’s theorem.

Exercise 10 Use Theorem 8 to give an alternate proof of (Tatuzawa’s version of) Siegel’s theorem (Theorem 62 of Notes 2). (Hint: if two characters have different moduli, then they can be made to have the same modulus by multiplying by suitable principal characters.)

Theorem 8 is proven by similar methods to that of Theorem 6, the basic idea being to insert a further weight of {1 * \chi_1} (in addition to the sieve weights), the point being that the exceptional zero causes this weight to be quite small on the average. There is a strengthening of Theorem 8 due to Bombieri that is along the lines of Theorem 6, obtaining the improvement

\displaystyle  N'(\alpha,q,T) \ll \varepsilon (1 + \frac{\log T}{\log q}) (qT)^{O(1-\alpha)} \ \ \ \ \ (5)

with effective implied constants for any {1/2 \leq \alpha \leq 1} and {T \geq 1} in the presence of an exceptional zero, where the prime in {N'(\alpha,q,T)} means that the exceptional zero {\beta} is omitted (thus {N'(\alpha,q,T) = N(\alpha,q,T)-1} if {\alpha \leq \beta}). Note that the upper bound on {N'(\alpha,q,T)} falls below one when {\alpha > 1 - c \frac{\log \frac{1}{\varepsilon}}{\log(qT)}} for a sufficiently small {c>0}, thus recovering Theorem 8. Bombieri’s theorem can be established by the methods in this set of notes, and will be given as an exercise to the reader.

Remark 11 There are a number of alternate ways to derive the results in this set of notes, for instance using the Turan power sums method which is based on studying derivatives such as

\displaystyle \frac{L'}{L}(s,\chi)^{(k)} = (-1)^k \sum_n \frac{\Lambda(n) \chi(n) \log^k n}{n^s}

\displaystyle  \approx (-1)^{k+1} k! \sum_\rho \frac{1}{(s-\rho)^{k+1}}

for {\hbox{Re}(s)>1} and large {k}, and performing various sorts of averaging in {k} to attenuate the contribution of many of the zeroes {\rho}. We will not develop this method here, but see for instance Chapter 9 of Montgomery’s book. See the text of Friedlander and Iwaniec for yet another approach based primarily on sieve-theoretic ideas.

Remark 12 When one optimises all the exponents, it turns out that the exponent in Linnik’s theorem is extremely good in the presence of an exceptional zero – indeed Friedlander and Iwaniec showed can even get a bound of the form {p \ll q^{2-c}} for some {c>0}, which is even stronger than one can obtain from GRH! There are other places in which exceptional zeroes can be used to obtain results stronger than what one can obtain even on the Riemann hypothesis; for instance, Heath-Brown used the hypothesis of an infinite sequence of Siegel zeroes to obtain the twin prime conejcture.

Read the rest of this entry »

In the previous set of notes, we studied upper bounds on sums such as {|\sum_{N \leq n \leq N+M} n^{-it}|} for {1 \leq M \leq N} that were valid for all {t} in a given range, such as {[T,2T]}; this led in turn to upper bounds on the Riemann zeta {\zeta(\sigma+it)} for {t} in the same range, and for various choices of {\sigma}. While some improvement over the trivial bound of {O(N)} was obtained by these methods, we did not get close to the conjectural bound of {O( N^{1/2+o(1)})} that one expects from pseudorandomness heuristics (assuming that {T} is not too large compared with {N}, e.g. {T = O(N^{O(1)})}.

However, it turns out that one can get much better bounds if one settles for estimating sums such as {|\sum_{N \leq n \leq N+M} n^{-it}|}, or more generally finite Dirichlet series (also known as Dirichlet polynomials) such as {|\sum_n a_n n^{-it}|}, for most values of {t} in a given range such as {[T,2T]}. Equivalently, we will be able to get some control on the large values of such Dirichlet polynomials, in the sense that we can control the set of {t} for which {|\sum_n a_n n^{-it}|} exceeds a certain threshold, even if we cannot show that this set is empty. These large value theorems are often closely tied with estimates for mean values such as {\frac{1}{T}\int_T^{2T} |\sum_n a_n n^{-it}|^{2k}\ dt} of a Dirichlet series; these latter estimates are thus known as mean value theorems for Dirichlet series. Our approach to these theorems will follow the same sort of methods used in Notes 3, in particular relying on the generalised Bessel inequality from those notes.

Our main application of the large value theorems for Dirichlet polynomials will be to control the number of zeroes of the Riemann zeta function {\zeta(s)} (or the Dirichlet {L}-functions {L(s,\chi)}) in various rectangles of the form {\{ \sigma+it: \sigma \geq \alpha, |t| \leq T \}} for various {T > 1} and {1/2 < \alpha < 1}. These rectangles will be larger than the zero-free regions for which we can exclude zeroes completely, but we will often be able to limit the number of zeroes in such rectangles to be quite small. For instance, we will be able to show the following weak form of the Riemann hypothesis: as {T \rightarrow \infty}, a proportion {1-o(1)} of zeroes of the Riemann zeta function in the critical strip with {|\hbox{Im}(s)| \leq T} will have real part {1/2+o(1)}. Related to this, the number of zeroes with {|\hbox{Im}(s)| \leq T} and {|\hbox{Re}(s)| \geq \alpha} can be shown to be bounded by {O( T^{O(1-\alpha)+o(1)} )} as {T \rightarrow \infty} for any {1/2 < \alpha < 1}.

In the next set of notes we will use refined versions of these theorems to establish Linnik’s theorem on the least prime in an arithmetic progression.

Our presentation here is broadly based on Chapters 9 and 10 in Iwaniec and Kowalski, who give a number of more sophisticated large value theorems than the ones discussed here.

Read the rest of this entry »

The prime number theorem can be expressed as the assertion

\displaystyle  \sum_{n \leq x} \Lambda(n) = x + o(x) \ \ \ \ \ (1)

as {x \rightarrow \infty}, where

\displaystyle  \Lambda(n) := \sum_{d|n} \mu(d) \log \frac{n}{d}

is the von Mangoldt function. It is a basic result in analytic number theory, but requires a bit of effort to prove. One “elementary” proof of this theorem proceeds through the Selberg symmetry formula

\displaystyle  \sum_{n \leq x} \Lambda_2(n) = 2 x \log x + O(x) \ \ \ \ \ (2)

where the second von Mangoldt function {\Lambda_2} is defined by the formula

\displaystyle  \Lambda_2(n) := \sum_{d|n} \mu(d) \log^2 \frac{n}{d} \ \ \ \ \ (3)

or equivalently

\displaystyle  \Lambda_2(n) = \Lambda(n) \log n + \sum_{d|n} \Lambda(d) \Lambda(\frac{n}{d}). \ \ \ \ \ (4)

(We are avoiding the use of the {*} symbol here to denote Dirichlet convolution, as we will need this symbol to denote ordinary convolution shortly.) For the convenience of the reader, we give a proof of the Selberg symmetry formula below the fold. Actually, for the purposes of proving the prime number theorem, the weaker estimate

\displaystyle  \sum_{n \leq x} \Lambda_2(n) = 2 x \log x + o(x \log x) \ \ \ \ \ (5)


In this post I would like to record a somewhat “soft analysis” reformulation of the elementary proof of the prime number theorem in terms of Banach algebras, and specifically in Banach algebra structures on (completions of) the space {C_c({\bf R})} of compactly supported continuous functions {f: {\bf R} \rightarrow {\bf C}} equipped with the convolution operation

\displaystyle  f*g(t) := \int_{\bf R} f(u) g(t-u)\ du.

This soft argument does not easily give any quantitative decay rate in the prime number theorem, but by the same token it avoids many of the quantitative calculations in the traditional proofs of this theorem. Ultimately, the key “soft analysis” fact used is the spectral radius formula

\displaystyle  \lim_{n \rightarrow \infty} \|f^n\|^{1/n} = \sup_{\lambda \in \hat B} |\lambda(f)| \ \ \ \ \ (6)

for any element {f} of a unital commutative Banach algebra {B}, where {\hat B} is the space of characters (i.e., continuous unital algebra homomorphisms from {B} to {{\bf C}}) of {B}. This formula is due to Gelfand and may be found in any text on Banach algebras; for sake of completeness we prove it below the fold.

The connection between prime numbers and Banach algebras is given by the following consequence of the Selberg symmetry formula.

Theorem 1 (Construction of a Banach algebra norm) For any {G \in C_c({\bf R})}, let {\|G\|} denote the quantity

\displaystyle  \|G\| := \limsup_{x \rightarrow \infty} |\sum_n \frac{\Lambda(n)}{n} G( \log \frac{x}{n} ) - \int_{\bf R} G(t)\ dt|.

Then {\| \|} is a seminorm on {C_c({\bf R})} with the bound

\displaystyle  \|G\| \leq \|G\|_{L^1({\bf R})} := \int_{\bf R} |G(t)|\ dt \ \ \ \ \ (7)

for all {G \in C_c({\bf R})}. Furthermore, we have the Banach algebra bound

\displaystyle  \| G * H \| \leq \|G\| \|H\| \ \ \ \ \ (8)

for all {G,H \in C_c({\bf R})}.

We prove this theorem below the fold. The prime number theorem then follows from Theorem 1 and the following two assertions. The first is an application of the spectral radius formula (6) and some basic Fourier analysis (in particular, the observation that {C_c({\bf R})} contains a plentiful supply of local units:

Theorem 2 (Non-trivial Banach algebras with many local units have non-trivial spectrum) Let {\| \|} be a seminorm on {C_c({\bf R})} obeying (7), (8). Suppose that {\| \|} is not identically zero. Then there exists {\xi \in {\bf R}} such that

\displaystyle  |\int_{\bf R} G(t) e^{-it\xi}\ dt| \leq \|G\|

for all {G \in C_c}. In particular, by (7), one has

\displaystyle  \|G\| = \| G \|_{L^1({\bf R})}

whenever {G(t) e^{-it\xi}} is a non-negative function.

The second is a consequence of the Selberg symmetry formula and the fact that {\Lambda} is real (as well as Mertens’ theorem, in the {\xi=0} case), and is closely related to the non-vanishing of the Riemann zeta function {\zeta} on the line {\{ 1+i\xi: \xi \in {\bf R}\}}:

Theorem 3 (Breaking the parity barrier) Let {\xi \in {\bf R}}. Then there exists {G \in C_c({\bf R})} such that {G(t) e^{-it\xi}} is non-negative, and

\displaystyle  \|G\| < \|G\|_{L^1({\bf R})}.

Assuming Theorems 1, 2, 3, we may now quickly establish the prime number theorem as follows. Theorem 2 and Theorem 3 imply that the seminorm {\| \|} constructed in Theorem 1 is trivial, and thus

\displaystyle  \sum_n \frac{\Lambda(n)}{n} G( \log \frac{x}{n} ) = \int_{\bf R} G(t)\ dt + o(1)

as {x \rightarrow \infty} for any Schwartz function {G} (the decay rate in {o(1)} may depend on {G}). Specialising to functions of the form {G(t) = e^{-t} \eta( e^{-t} )} for some smooth compactly supported {\eta} on {(0,+\infty)}, we conclude that

\displaystyle  \sum_n \Lambda(n) \eta(\frac{n}{x}) = \int_{\bf R} \eta(u)\ du + o(x)

as {x \rightarrow \infty}; by the smooth Urysohn lemma this implies that

\displaystyle  \sum_{\varepsilon x \leq n \leq x} \Lambda(n) = x - \varepsilon x + o(x)

as {x \rightarrow \infty} for any fixed {\varepsilon>0}, and the prime number theorem then follows by a telescoping series argument.

The same argument also yields the prime number theorem in arithmetic progressions, or equivalently that

\displaystyle  \sum_{n \leq x} \Lambda(n) \chi(n) = o(x)

for any fixed Dirichlet character {\chi}; the one difference is that the use of Mertens’ theorem is replaced by the basic fact that the quantity {L(1,\chi) = \sum_n \frac{\chi(n)}{n}} is non-vanishing.

Read the rest of this entry »

One of the most basic methods in additive number theory is the Hardy-Littlewood circle method. This method is based on expressing a quantity of interest to additive number theory, such as the number of representations {f_3(x)} of an integer {x} as the sum of three primes {x = p_1+p_2+p_3}, as a Fourier-analytic integral over the unit circle {{\bf R}/{\bf Z}} involving exponential sums such as

\displaystyle  S(x,\alpha) := \sum_{p \leq x} e( \alpha p) \ \ \ \ \ (1)

where the sum here ranges over all primes up to {x}, and {e(x) := e^{2\pi i x}}. For instance, the expression {f(x)} mentioned earlier can be written as

\displaystyle  f_3(x) = \int_{{\bf R}/{\bf Z}} S(x,\alpha)^3 e(-x\alpha)\ d\alpha. \ \ \ \ \ (2)

The strategy is then to obtain sufficiently accurate bounds on exponential sums such as {S(x,\alpha)} in order to obtain non-trivial bounds on quantities such as {f_3(x)}. For instance, if one can show that {f_3(x)>0} for all odd integers {x} greater than some given threshold {x_0}, this implies that all odd integers greater than {x_0} are expressible as the sum of three primes, thus establishing all but finitely many instances of the odd Goldbach conjecture.

Remark 1 In practice, it can be more efficient to work with smoother sums than the partial sum (1), for instance by replacing the cutoff {p \leq x} with a smoother cutoff {\chi(p/x)} for a suitable choice of cutoff function {\chi}, or by replacing the restriction of the summation to primes by a more analytically tractable weight, such as the von Mangoldt function {\Lambda(n)}. However, these improvements to the circle method are primarily technical in nature and do not have much impact on the heuristic discussion in this post, so we will not emphasise them here. One can also certainly use the circle method to study additive combinations of numbers from other sets than the set of primes, but we will restrict attention to additive combinations of primes for sake of discussion, as it is historically one of the most studied sets in additive number theory.

In many cases, it turns out that one can get fairly precise evaluations on sums such as {S(x,\alpha)} in the major arc case, when {\alpha} is close to a rational number {a/q} with small denominator {q}, by using tools such as the prime number theorem in arithmetic progressions. For instance, the prime number theorem itself tells us that

\displaystyle  S(x,0) \approx \frac{x}{\log x}

and the prime number theorem in residue classes modulo {q} suggests more generally that

\displaystyle  S(x,\frac{a}{q}) \approx \frac{\mu(q)}{\phi(q)} \frac{x}{\log x}

when {q} is small and {a} is close to {q}, basically thanks to the elementary calculation that the phase {e(an/q)} has an average value of {\mu(q)/\phi(q)} when {n} is uniformly distributed amongst the residue classes modulo {q} that are coprime to {q}. Quantifying the precise error in these approximations can be quite challenging, though, unless one assumes powerful hypotheses such as the Generalised Riemann Hypothesis.

In the minor arc case when {\alpha} is not close to a rational {a/q} with small denominator, one no longer expects to have such precise control on the value of {S(x,\alpha)}, due to the “pseudorandom” fluctuations of the quantity {e(\alpha p)}. Using the standard probabilistic heuristic (supported by results such as the central limit theorem or Chernoff’s inequality) that the sum of {k} “pseudorandom” phases should fluctuate randomly and be of typical magnitude {\sim \sqrt{k}}, one expects upper bounds of the shape

\displaystyle  |S(x,\alpha)| \lessapprox \sqrt{\frac{x}{\log x}} \ \ \ \ \ (3)

for “typical” minor arc {\alpha}. Indeed, a simple application of the Plancherel identity, followed by the prime number theorem, reveals that

\displaystyle  \int_{{\bf R}/{\bf Z}} |S(x,\alpha)|^2\ d\alpha \sim \frac{x}{\log x} \ \ \ \ \ (4)

which is consistent with (though weaker than) the above heuristic. In practice, though, we are unable to rigorously establish bounds anywhere near as strong as (3); upper bounds such as {x^{4/5+o(1)}} are far more typical.

Because one only expects to have upper bounds on {|S(x,\alpha)|}, rather than asymptotics, in the minor arc case, one cannot realistically hope to make much use of phases such as {e(-x\alpha)} for the minor arc contribution to integrals such as (2) (at least if one is working with a single, deterministic, value of {x}, so that averaging in {x} is unavailable). In particular, from upper bound information alone, it is difficult to avoid the “conspiracy” that the magnitude {|S(x,\alpha)|^3} oscillates in sympathetic resonance with the phase {e(-x\alpha)}, thus essentially eliminating almost all of the possible gain in the bounds that could arise from exploiting cancellation from that phase. Thus, one basically has little option except to use the triangle inequality to control the portion of the integral on the minor arc region {\Omega_{minor}}:

\displaystyle  |\int_{\Omega_{minor}} S(x,\alpha)^3 e(-x\alpha)\ d\alpha| \leq \int_{\Omega_{minor}} |S(x,\alpha)|^3\ d\alpha.

Despite this handicap, though, it is still possible to get enough bounds on both the major and minor arc contributions of integrals such as (2) to obtain non-trivial lower bounds on quantities such as {f(x)}, at least when {x} is large. In particular, this sort of method can be developed to give a proof of Vinogradov’s famous theorem that every sufficiently large odd integer {x} is the sum of three primes; my own result that all odd numbers greater than {1} can be expressed as the sum of at most five primes is also proven by essentially the same method (modulo a number of minor refinements, and taking advantage of some numerical work on both the Goldbach problems and on the Riemann hypothesis ). It is certainly conceivable that some further variant of the circle method (again combined with a suitable amount of numerical work, such as that of numerically establishing zero-free regions for the Generalised Riemann Hypothesis) can be used to settle the full odd Goldbach conjecture; indeed, under the assumption of the Generalised Riemann Hypothesis, this was already achieved by Deshouillers, Effinger, te Riele, and Zinoviev back in 1997. I am optimistic that an unconditional version of this result will be possible within a few years or so, though I should say that there are still significant technical challenges to doing so, and some clever new ideas will probably be needed to get either the Vinogradov-style argument or numerical verification to work unconditionally for the three-primes problem at medium-sized ranges of {x}, such as {x \sim 10^{50}}. (But the intermediate problem of representing all even natural numbers as the sum of at most four primes looks somewhat closer to being feasible, though even this would require some substantially new and non-trivial ideas beyond what is in my five-primes paper.)

However, I (and many other analytic number theorists) are considerably more skeptical that the circle method can be applied to the even Goldbach problem of representing a large even number {x} as the sum {x = p_1 + p_2} of two primes, or the similar (and marginally simpler) twin prime conjecture of finding infinitely many pairs of twin primes, i.e. finding infinitely many representations {2 = p_1 - p_2} of {2} as the difference of two primes. At first glance, the situation looks tantalisingly similar to that of the Vinogradov theorem: to settle the even Goldbach problem for large {x}, one has to find a non-trivial lower bound for the quantity

\displaystyle  f_2(x) = \int_{{\bf R}/{\bf Z}} S(x,\alpha)^2 e(-x\alpha)\ d\alpha \ \ \ \ \ (5)

for sufficiently large {x}, as this quantity {f_2(x)} is also the number of ways to represent {x} as the sum {x=p_1+p_2} of two primes {p_1,p_2}. Similarly, to settle the twin prime problem, it would suffice to obtain a lower bound for the quantity

\displaystyle  \tilde f_2(x) = \int_{{\bf R}/{\bf Z}} |S(x,\alpha)|^2 e(-2\alpha)\ d\alpha \ \ \ \ \ (6)

that goes to infinity as {x \rightarrow \infty}, as this quantity {\tilde f_2(x)} is also the number of ways to represent {2} as the difference {2 = p_1-p_2} of two primes less than or equal to {x}.

In principle, one can achieve either of these two objectives by a sufficiently fine level of control on the exponential sums {S(x,\alpha)}. Indeed, there is a trivial (and uninteresting) way to take any (hypothetical) solution of either the asymptotic even Goldbach problem or the twin prime problem and (artificially) convert it to a proof that “uses the circle method”; one simply begins with the quantity {f_2(x)} or {\tilde f_2(x)}, expresses it in terms of {S(x,\alpha)} using (5) or (6), and then uses (5) or (6) again to convert these integrals back into a the combinatorial expression of counting solutions to {x=p_1+p_2} or {2=p_1-p_2}, and then uses the hypothetical solution to the given problem to obtain the required lower bounds on {f_2(x)} or {\tilde f_2(x)}.

Of course, this would not qualify as a genuine application of the circle method by any reasonable measure. One can then ask the more refined question of whether one could hope to get non-trivial lower bounds on {f_2(x)} or {\tilde f_2(x)} (or similar quantities) purely from the upper and lower bounds on {S(x,\alpha)} or similar quantities (and of various {L^p} type norms on such quantities, such as the {L^2} bound (4)). Of course, we do not yet know what the strongest possible upper and lower bounds in {S(x,\alpha)} are yet (otherwise we would already have made progress on major conjectures such as the Riemann hypothesis); but we can make plausible heuristic conjectures on such bounds. And this is enough to make the following heuristic conclusions:

  • (i) For “binary” problems such as computing (5), (6), the contribution of the minor arcs potentially dominates that of the major arcs (if all one is given about the minor arc sums is magnitude information), in contrast to “ternary” problems such as computing (2), in which it is the major arc contribution which is absolutely dominant.
  • (ii) Upper and lower bounds on the magnitude of {S(x,\alpha)} are not sufficient, by themselves, to obtain non-trivial bounds on (5), (6) unless these bounds are extremely tight (within a relative error of {O(1/\log x)} or better); but
  • (iii) obtaining such tight bounds is a problem of comparable difficulty to the original binary problems.

I will provide some justification for these conclusions below the fold; they are reasonably well known “folklore” to many researchers in the field, but it seems that they are rarely made explicit in the literature (in part because these arguments are, by their nature, heuristic instead of rigorous) and I have been asked about them from time to time, so I decided to try to write them down here.

In view of the above conclusions, it seems that the best one can hope to do by using the circle method for the twin prime or even Goldbach problems is to reformulate such problems into a statement of roughly comparable difficulty to the original problem, even if one assumes powerful conjectures such as the Generalised Riemann Hypothesis (which lets one make very precise control on major arc exponential sums, but not on minor arc ones). These are not rigorous conclusions – after all, we have already seen that one can always artifically insert the circle method into any viable approach on these problems – but they do strongly suggest that one needs a method other than the circle method in order to fully solve either of these two problems. I do not know what such a method would be, though I can give some heuristic objections to some of the other popular methods used in additive number theory (such as sieve methods, or more recently the use of inverse theorems); this will be done at the end of this post.

Read the rest of this entry »

A fundamental problem in analytic number theory is to understand the distribution of the prime numbers {\{2,3,5,\ldots\}}. For technical reasons, it is convenient not to study the primes directly, but a proxy for the primes known as the von Mangoldt function {\Lambda: {\mathbb N} \rightarrow {\mathbb R}}, defined by setting {\Lambda(n)} to equal {\log p} when {n} is a prime {p} (or a power of that prime) and zero otherwise. The basic reason why the von Mangoldt function is useful is that it encodes the fundamental theorem of arithmetic (which in turn can be viewed as the defining property of the primes) very neatly via the identity

\displaystyle  \log n = \sum_{d|n} \Lambda(d) \ \ \ \ \ (1)

for every natural number {n}.

The most important result in this subject is the prime number theorem, which asserts that the number of prime numbers less than a large number {x} is equal to {(1+o(1)) \frac{x}{\log x}}:

\displaystyle  \sum_{p \leq x} 1 = (1+o(1)) \frac{x}{\log x}.

Here, of course, {o(1)} denotes a quantity that goes to zero as {x \rightarrow \infty}.

It is not hard to see (e.g. by summation by parts) that this is equivalent to the asymptotic

\displaystyle  \sum_{n \leq x} \Lambda(n) = (1+o(1)) x \ \ \ \ \ (2)

for the von Mangoldt function (the key point being that the squares, cubes, etc. of primes give a negligible contribution, so {\sum_{n \leq x} \Lambda(n)} is essentially the same quantity as {\sum_{p \leq x} \log p}). Understanding the nature of the {o(1)} term is a very important problem, with the conjectured optimal decay rate of {O(\sqrt{x} \log x)} being equivalent to the Riemann hypothesis, but this will not be our concern here.

The prime number theorem has several important generalisations (for instance, there are analogues for other number fields such as the Chebotarev density theorem). One of the more elementary such generalisations is the prime number theorem in arithmetic progressions, which asserts that for fixed {a} and {q} with {a} coprime to {q} (thus {(a,q)=1}), the number of primes less than {x} equal to {a} mod {q} is equal to {(1+o_q(1)) \frac{1}{\phi(q)} \frac{x}{\log x}}, where {\phi(q) := \# \{ 1 \leq a \leq q: (a,q)=1 \}} is the Euler totient function:

\displaystyle  \sum_{p \leq x: p = a \hbox{ mod } q} 1 = (1+o_q(1)) \frac{1}{\phi(q)} \frac{x}{\log x}.

(Of course, if {a} is not coprime to {q}, the number of primes less than {x} equal to {a} mod {q} is {O(1)}. The subscript {q} in the {o()} and {O()} notation denotes that the implied constants in that notation is allowed to depend on {q}.) This is a more quantitative version of Dirichlet’s theorem, which asserts the weaker statement that the number of primes equal to {a} mod {q} is infinite. This theorem is important in many applications in analytic number theory, for instance in Vinogradov’s theorem that every sufficiently large odd number is the sum of three odd primes. (Imagine for instance if almost all of the primes were clustered in the residue class {2} mod {3}, rather than {1} mod {3}. Then almost all sums of three odd primes would be divisible by {3}, leaving dangerously few sums left to cover the remaining two residue classes. Similarly for other moduli than {3}. This does not fully rule out the possibility that Vinogradov’s theorem could still be true, but it does indicate why the prime number theorem in arithmetic progressions is a relevant tool in the proof of that theorem.)

As before, one can rewrite the prime number theorem in arithmetic progressions in terms of the von Mangoldt function as the equivalent form

\displaystyle  \sum_{n \leq x: n = a \hbox{ mod } q} \Lambda(n) = (1+o_q(1)) \frac{1}{\phi(q)} x.

Philosophically, one of the main reasons why it is so hard to control the distribution of the primes is that we do not currently have too many tools with which one can rule out “conspiracies” between the primes, in which the primes (or the von Mangoldt function) decide to correlate with some structured object (and in particular, with a totally multiplicative function) which then visibly distorts the distribution of the primes. For instance, one could imagine a scenario in which the probability that a randomly chosen large integer {n} is prime is not asymptotic to {\frac{1}{\log n}} (as is given by the prime number theorem), but instead to fluctuate depending on the phase of the complex number {n^{it}} for some fixed real number {t}, thus for instance the probability might be significantly less than {1/\log n} when {t \log n} is close to an integer, and significantly more than {1/\log n} when {t \log n} is close to a half-integer. This would contradict the prime number theorem, and so this scenario would have to be somehow eradicated in the course of proving that theorem. In the language of Dirichlet series, this conspiracy is more commonly known as a zero of the Riemann zeta function at {1+it}.

In the above scenario, the primality of a large integer {n} was somehow sensitive to asymptotic or “Archimedean” information about {n}, namely the approximate value of its logarithm. In modern terminology, this information reflects the local behaviour of {n} at the infinite place {\infty}. There are also potential consipracies in which the primality of {n} is sensitive to the local behaviour of {n} at finite places, and in particular to the residue class of {n} mod {q} for some fixed modulus {q}. For instance, given a Dirichlet character {\chi: {\mathbb Z} \rightarrow {\mathbb C}} of modulus {q}, i.e. a completely multiplicative function on the integers which is periodic of period {q} (and vanishes on those integers not coprime to {q}), one could imagine a scenario in which the probability that a randomly chosen large integer {n} is prime is large when {\chi(n)} is close to {+1}, and small when {\chi(n)} is close to {-1}, which would contradict the prime number theorem in arithmetic progressions. (Note the similarity between this scenario at {q} and the previous scenario at {\infty}; in particular, observe that the functions {n \rightarrow \chi(n)} and {n \rightarrow n^{it}} are both totally multiplicative.) In the language of Dirichlet series, this conspiracy is more commonly known as a zero of the {L}-function of {\chi} at {1}.

An especially difficult scenario to eliminate is that of real characters, such as the Kronecker symbol {\chi(n) = \left( \frac{n}{q} \right)}, in which numbers {n} which are quadratic nonresidues mod {q} are very likely to be prime, and quadratic residues mod {q} are unlikely to be prime. Indeed, there is a scenario of this form – the Siegel zero scenario – which we are still not able to eradicate (without assuming powerful conjectures such as GRH), though fortunately Siegel zeroes are not quite strong enough to destroy the prime number theorem in arithmetic progressions.

It is difficult to prove that no conspiracy between the primes exist. However, it is not entirely impossible, because we have been able to exploit two important phenomena. The first is that there is often a “all or nothing dichotomy” (somewhat resembling the zero-one laws in probability) regarding conspiracies: in the asymptotic limit, the primes can either conspire totally (or more precisely, anti-conspire totally) with a multiplicative function, or fail to conspire at all, but there is no middle ground. (In the language of Dirichlet series, this is reflected in the fact that zeroes of a meromorphic function can have order {1}, or order {0} (i.e. are not zeroes after all), but cannot have an intermediate order between {0} and {1}.) As a corollary of this fact, the prime numbers cannot conspire with two distinct multiplicative functions at once (by having a partial correlation with one and another partial correlation with another); thus one can use the existence of one conspiracy to exclude all the others. In other words, there is at most one conspiracy that can significantly distort the distribution of the primes. Unfortunately, this argument is ineffective, because it doesn’t give any control at all on what that conspiracy is, or even if it exists in the first place!

But now one can use the second important phenomenon, which is that because of symmetries, one type of conspiracy can lead to another. For instance, because the von Mangoldt function is real-valued rather than complex-valued, we have conjugation symmetry; if the primes correlate with, say, {n^{it}}, then they must also correlate with {n^{-it}}. (In the language of Dirichlet series, this reflects the fact that the zeta function and {L}-functions enjoy symmetries with respect to reflection across the real axis (i.e. complex conjugation).) Combining this observation with the all-or-nothing dichotomy, we conclude that the primes cannot correlate with {n^{it}} for any non-zero {t}, which in fact leads directly to the prime number theorem (2), as we shall discuss below. Similarly, if the primes correlated with a Dirichlet character {\chi(n)}, then they would also correlate with the conjugate {\overline{\chi}(n)}, which also is inconsistent with the all-or-nothing dichotomy, except in the exceptional case when {\chi} is real – which essentially means that {\chi} is a quadratic character. In this one case (which is the only scenario which comes close to threatening the truth of the prime number theorem in arithmetic progressions), the above tricks fail and one has to instead exploit the algebraic number theory properties of these characters instead, which has so far led to weaker results than in the non-real case.

As mentioned previously in passing, these phenomena are usually presented using the language of Dirichlet series and complex analysis. This is a very slick and powerful way to do things, but I would like here to present the elementary approach to the same topics, which is slightly weaker but which I find to also be very instructive. (However, I will not be too dogmatic about keeping things elementary, if this comes at the expense of obscuring the key ideas; in particular, I will rely on multiplicative Fourier analysis (both at {\infty} and at finite places) as a substitute for complex analysis in order to expedite various parts of the argument. Also, the emphasis here will be more on heuristics and intuition than on rigour.)

The material here is closely related to the theory of pretentious characters developed by Granville and Soundararajan, as well as an earlier paper of Granville on elementary proofs of the prime number theorem in arithmetic progressions.

Read the rest of this entry »

Atle Selberg, who made immense and fundamental contributions to analytic number theory and related areas of mathematics, died last Monday, aged 90.

Selberg’s early work was focused on the study of the Riemann zeta function \zeta(s). In 1942, Selberg showed that a positive fraction of the zeroes of this function lie on the critical line \hbox{Re}(s)=1/2. Apart from improvements in the fraction (the best value currently being a little over 40%, a result of Conrey), this is still one of the strongest partial results we have towards the Riemann hypothesis. (I discuss Selberg’s result, and the method of mollifiers he introduced there, in a little more detail after the jump.)

In working on the zeta function, Selberg developed two powerful tools which are still used routinely in analytic number theory today. The first is the method of mollifiers to smooth out the magnitude oscillations of the zeta function, making the (more interesting) phase oscillation more visible. The second was the method of the Selberg \Lambda^2 sieve, which is a particularly elegant choice of sieve which allows one to count patterns in almost primes (and hence to upper bound patterns in primes) quite accurately. Variants of the Selberg sieve were a crucial ingredient in, for instance, the recent work of Goldston-Yıldırım-Pintz on prime gaps, as well as the work of Ben Green and myself on arithmetic progressions in primes. (I discuss the Selberg sieve, as well as the Selberg symmetry formula below, in my post on the parity problem. Incidentally, Selberg was the first to formalise this problem as a significant obstruction in sieve theory.)

For all of these achievements, Selberg was awarded the Fields Medal in 1950. Around that time, Selberg and Erdős also produced the first elementary proof of the prime number theorem. A key ingredient here was the Selberg symmetry formula, which is an elementary analogue of the prime number theorem for almost primes.

But perhaps Selberg’s greatest contribution to mathematics was his discovery of the Selberg trace formula, which is a non-abelian generalisation of the Poisson summation formula, and which led to many further deep connections between representation theory and number theory, and in particular being one of the main inspirations for the Langlands program, which in turn has had an impact on many different parts of mathematics (for instance, it plays a role in Wiles’ proof of Fermat’s last theorem). For an introduction to the trace formula, its history, and its impact, I recommend the survey article of Arthur.

Other major contributions of Selberg include the Rankin-Selberg theory connecting Artin L-functions from representation theory to the integrals of automorphic forms (very much in the spirit of the Langlands program), and the Chowla-Selberg formula relating the Gamma function at rational values to the periods of elliptic curves with complex multiplication. He also made an influential conjecture on the spectral gap of the Laplacian on quotients of SL(2,{\Bbb R}) by congruence groups, which is still open today (Selberg had the first non-trivial partial result). As an example of this conjecture’s impact, Selberg’s eigenvalue conjecture has inspired some recent work of Sarnak-Xue, Gamburd, and Bourgain-Gamburd on new constructions of expander graphs, and has revealed some further connections between number theory and arithmetic combinatorics (such as sum-product theorems); see this announcement of Bourgain-Gamburd-Sarnak for the most recent developments (this work, incidentally, also employs the Selberg sieve). As observed by Satake, Selberg’s eigenvalue conjecture and the more classical Ramanujan-Petersson conjecture can be unified into a single conjecture, now known as the Ramanujan-Selberg conjecture; the eigenvalue conjecture is then essentially an archimedean (or “non-dyadic“) special case of the general Ramanujan-Selberg conjecture. (The original (dyadic) Ramanujan-Petersson conjecture was finally proved by Deligne-Serre, after many important contributions by other authors, but the non-dyadic version remains open.)

Read the rest of this entry »