You are currently browsing the monthly archive for November 2014.

Analytic number theory is only one of many different approaches to number theory. Another important branch of the subject is algebraic number theory, which studies algebraic structures (e.g. groups, rings, and fields) of number-theoretic interest. With this perspective, the classical field of rationals ${{\bf Q}}$, and the classical ring of integers ${{\bf Z}}$, are placed inside the much larger field ${\overline{{\bf Q}}}$ of algebraic numbers, and the much larger ring ${{\mathcal A}}$ of algebraic integers, respectively. Recall that an algebraic number is a root of a polynomial with integer coefficients, and an algebraic integer is a root of a monic polynomial with integer coefficients; thus for instance ${\sqrt{2}}$ is an algebraic integer (a root of ${x^2-2}$), while ${\sqrt{2}/2}$ is merely an algebraic number (a root of ${4x^2-2}$). For the purposes of this post, we will adopt the concrete (but somewhat artificial) perspective of viewing algebraic numbers and integers as lying inside the complex numbers ${{\bf C}}$, thus ${{\mathcal A} \subset \overline{{\bf Q}} \subset {\bf C}}$. (From a modern algebraic perspective, it is better to think of ${\overline{{\bf Q}}}$ as existing as an abstract field separate from ${{\bf C}}$, but which has a number of embeddings into ${{\bf C}}$ (as well as into other fields, such as the completed p-adics ${{\bf C}_p}$), no one of which should be considered favoured over any other; cf. this mathOverflow post. But for the rudimentary algebraic number theory in this post, we will not need to work at this level of abstraction.) In particular, we identify the algebraic integer ${\sqrt{-d}}$ with the complex number ${\sqrt{d} i}$ for any natural number ${d}$.

Exercise 1 Show that the field of algebraic numbers ${\overline{{\bf Q}}}$ is indeed a field, and that the ring of algebraic integers ${{\mathcal A}}$ is indeed a ring, and is in fact an integral domain. Also, show that ${{\bf Z} = {\mathcal A} \cap {\bf Q}}$, that is to say the ordinary integers are precisely the algebraic integers that are also rational. Because of this, we will sometimes refer to elements of ${{\bf Z}}$ as rational integers.

In practice, the field ${\overline{{\bf Q}}}$ is too big to conveniently work with directly, having infinite dimension (as a vector space) over ${{\bf Q}}$. Thus, algebraic number theory generally restricts attention to intermediate fields ${{\bf Q} \subset F \subset \overline{{\bf Q}}}$ between ${{\bf Q}}$ and ${\overline{{\bf Q}}}$, which are of finite dimension over ${{\bf Q}}$; that is to say, finite degree extensions of ${{\bf Q}}$. Such fields are known as algebraic number fields, or number fields for short. Apart from ${{\bf Q}}$ itself, the simplest examples of such number fields are the quadratic fields, which have dimension exactly two over ${{\bf Q}}$.

Exercise 2 Show that if ${\alpha}$ is a rational number that is not a perfect square, then the field ${{\bf Q}(\sqrt{\alpha})}$ generated by ${{\bf Q}}$ and either of the square roots of ${\alpha}$ is a quadratic field. Conversely, show that all quadratic fields arise in this fashion. (Hint: show that every element of a quadratic field is a root of a quadratic polynomial over the rationals.)

The ring of algebraic integers ${{\mathcal A}}$ is similarly too large to conveniently work with directly, so in algebraic number theory one usually works with the rings ${{\mathcal O}_F := {\mathcal A} \cap F}$ of algebraic integers inside a given number field ${F}$. One can (and does) study this situation in great generality, but for the purposes of this post we shall restrict attention to a simple but illustrative special case, namely the quadratic fields with a certain type of negative discriminant. (The positive discriminant case will be briefly discussed in Remark 42 below.)

Exercise 3 Let ${d}$ be a square-free natural number with ${d=1\ (4)}$ or ${d=2\ (4)}$. Show that the ring ${{\mathcal O} = {\mathcal O}_{{\bf Q}(\sqrt{-d})}}$ of algebraic integers in ${{\bf Q}(\sqrt{-d})}$ is given by

$\displaystyle {\mathcal O} = {\bf Z}[\sqrt{-d}] = \{ a + b \sqrt{-d}: a,b \in {\bf Z} \}.$

If instead ${d}$ is square-free with ${d=3\ (4)}$, show that the ring ${{\mathcal O} = {\mathcal O}_{{\bf Q}(\sqrt{-d})}}$ is instead given by

$\displaystyle {\mathcal O} = {\bf Z}[\frac{1+\sqrt{-d}}{2}] = \{ a + b \frac{1+\sqrt{-d}}{2}: a,b \in {\bf Z} \}.$

What happens if ${d}$ is not square-free, or negative?

Remark 4 In the case ${d=3\ (4)}$, it may naively appear more natural to work with the ring ${{\bf Z}[\sqrt{-d}]}$, which is an index two subring of ${{\mathcal O}}$. However, because this ring only captures some of the algebraic integers in ${{\bf Q}(\sqrt{-d})}$ rather than all of them, the algebraic properties of these rings are somewhat worse than those of ${{\mathcal O}}$ (in particular, they generally fail to be Dedekind domains) and so are not convenient to work with in algebraic number theory.

We refer to fields of the form ${{\bf Q}(\sqrt{-d})}$ for natural square-free numbers ${d}$ as quadratic fields of negative discriminant, and similarly refer to ${{\mathcal O}_{{\bf Q}(\sqrt{-d})}}$ as a ring of quadratic integers of negative discriminant. Quadratic fields and quadratic integers of positive discriminant are just as important to analytic number theory as their negative discriminant counterparts, but we will restrict attention to the latter here for simplicity of discussion.

Thus, for instance, when ${d=1}$, the ring of integers in ${{\bf Q}(\sqrt{-1})}$ is the ring of Gaussian integers

$\displaystyle {\bf Z}[\sqrt{-1}] = \{ x + y \sqrt{-1}: x,y \in {\bf Z} \}$

and when ${d=3}$, the ring of integers in ${{\bf Q}(\sqrt{-3})}$ is the ring of Eisenstein integers

$\displaystyle {\bf Z}[\omega] := \{ x + y \omega: x,y \in {\bf Z} \}$

where ${\omega := e^{2\pi i /3}}$ is a cube root of unity.

As these examples illustrate, the additive structure of a ring ${{\mathcal O} = {\mathcal O}_{{\bf Q}(\sqrt{-d})}}$ of quadratic integers is that of a two-dimensional lattice in ${{\bf C}}$, which is isomorphic as an additive group to ${{\bf Z}^2}$. Thus, from an additive viewpoint, one can view quadratic integers as “two-dimensional” analogues of rational integers. From a multiplicative viewpoint, however, the quadratic integers (and more generally, integers in a number field) behave very similarly to the rational integers (as opposed to being some sort of “higher-dimensional” version of such integers). Indeed, a large part of basic algebraic number theory is devoted to treating the multiplicative theory of integers in number fields in a unified fashion, that naturally generalises the classical multiplicative theory of the rational integers.

For instance, every rational integer ${n \in {\bf Z}}$ has an absolute value ${|n| \in {\bf N} \cup \{0\}}$, with the multiplicativity property ${|nm| = |n| |m|}$ for ${n,m \in {\bf Z}}$, and the positivity property ${|n| > 0}$ for all ${n \neq 0}$. Among other things, the absolute value detects units: ${|n| = 1}$ if and only if ${n}$ is a unit in ${{\bf Z}}$ (that is to say, it is multiplicatively invertible in ${{\bf Z}}$). Similarly, in any ring of quadratic integers ${{\mathcal O} = {\mathcal O}_{{\bf Q}(\sqrt{-d})}}$ with negative discriminant, we can assign a norm ${N(n) \in {\bf N} \cup \{0\}}$ to any quadratic integer ${n \in {\mathcal O}_{{\bf Q}(\sqrt{-d})}}$ by the formula

$\displaystyle N(n) = n \overline{n}$

where ${\overline{n}}$ is the complex conjugate of ${n}$. (When working with other number fields than quadratic fields of negative discriminant, one instead defines ${N(n)}$ to be the product of all the Galois conjugates of ${n}$.) Thus for instance, when ${d=1,2\ (4)}$ one has

$\displaystyle N(x + y \sqrt{-d}) = x^2 + dy^2 \ \ \ \ \ (1)$

and when ${d=3\ (4)}$ one has

$\displaystyle N(x + y \frac{1+\sqrt{-d}}{2}) = x^2 + xy + \frac{d+1}{4} y^2. \ \ \ \ \ (2)$

Analogously to the rational integers, we have the multiplicativity property ${N(nm) = N(n) N(m)}$ for ${n,m \in {\mathcal O}}$ and the positivity property ${N(n) > 0}$ for ${n \neq 0}$, and the units in ${{\mathcal O}}$ are precisely the elements of norm one.

Exercise 5 Establish the three claims of the previous paragraph. Conclude that the units (invertible elements) of ${{\mathcal O}}$ consist of the four elements ${\pm 1, \pm i}$ if ${d=1}$, the six elements ${\pm 1, \pm \omega, \pm \omega^2}$ if ${d=3}$, and the two elements ${\pm 1}$ if ${d \neq 1,3}$.

For the rational integers, we of course have the fundamental theorem of arithmetic, which asserts that every non-zero rational integer can be uniquely factored (up to permutation and units) as the product of irreducible integers, that is to say non-zero, non-unit integers that cannot be factored into the product of integers of strictly smaller norm. As it turns out, the same claim is true for a few additional rings of quadratic integers, such as the Gaussian integers and Eisenstein integers, but fails in general; for instance, in the ring ${{\bf Z}[\sqrt{-5}]}$, we have the famous counterexample

$\displaystyle 6 = 2 \times 3 = (1+\sqrt{-5}) (1-\sqrt{-5})$

that decomposes ${6}$ non-uniquely into the product of irreducibles in ${{\bf Z}[\sqrt{-5}]}$. Nevertheless, it is an important fact that the fundamental theorem of arithmetic can be salvaged if one uses an “idealised” notion of a number in a ring of integers ${{\mathcal O}}$, now known in modern language as an ideal of that ring. For instance, in ${{\bf Z}[\sqrt{-5}]}$, the principal ideal ${(6)}$ turns out to uniquely factor into the product of (non-principal) ideals ${(2) + (1+\sqrt{-5}), (2) + (1-\sqrt{-5}), (3) + (1+\sqrt{-5}), (3) + (1-\sqrt{-5})}$; see Exercise 27. We will review the basic theory of ideals in number fields (focusing primarily on quadratic fields of negative discriminant) below the fold.

The norm forms (1), (2) can be viewed as examples of positive definite quadratic forms ${Q: {\bf Z}^2 \rightarrow {\bf Z}}$ over the integers, by which we mean a polynomial of the form

$\displaystyle Q(x,y) = ax^2 + bxy + cy^2$

for some integer coefficients ${a,b,c}$. One can declare two quadratic forms ${Q, Q': {\bf Z}^2 \rightarrow {\bf Z}}$ to be equivalent if one can transform one to the other by an invertible linear transformation ${T: {\bf Z}^2 \rightarrow {\bf Z}^2}$, so that ${Q' = Q \circ T}$. For example, the quadratic forms ${(x,y) \mapsto x^2 + y^2}$ and ${(x',y') \mapsto 2 (x')^2 + 2 x' y' + (y')^2}$ are equivalent, as can be seen by using the invertible linear transformation ${(x,y) = (x',x'+y')}$. Such equivalences correspond to the different choices of basis available when expressing a ring such as ${{\mathcal O}}$ (or an ideal thereof) additively as a copy of ${{\bf Z}^2}$.

There is an important and classical invariant of a quadratic form ${(x,y) \mapsto ax^2 + bxy + c y^2}$, namely the discriminant ${\Delta := b^2 - 4ac}$, which will of course be familiar to most readers via the quadratic formula, which among other things tells us that a quadratic form will be positive definite precisely when its discriminant is negative. It is not difficult (particularly if one exploits the multiplicativity of the determinant of ${2 \times 2}$ matrices) to show that two equivalent quadratic forms have the same discriminant. Thus for instance any quadratic form equivalent to (1) has discriminant ${-4d}$, while any quadratic form equivalent to (2) has discriminant ${-d}$. Thus we see that each ring ${{\mathcal O}[\sqrt{-d}]}$ of quadratic integers is associated with a certain negative discriminant ${D}$, defined to equal ${-4d}$ when ${d=1,2\ (4)}$ and ${-d}$ when ${d=3\ (4)}$.

Exercise 6 (Geometric interpretation of discriminant) Let ${Q: {\bf Z}^2 \rightarrow {\bf Z}}$ be a quadratic form of negative discriminant ${D}$, and extend it to a real form ${Q: {\bf R}^2 \rightarrow {\bf R}}$ in the obvious fashion. Show that for any ${X>0}$, the set ${\{ (x,y) \in {\bf R}^2: Q(x,y) \leq X \}}$ is an ellipse of area ${2\pi X / \sqrt{|D|}}$.

It is natural to ask the converse question: if two quadratic forms have the same discriminant, are they necessarily equivalent? For certain choices of discriminant, this is the case:

Exercise 7 Show that any quadratic form ${ax^2+bxy+cy^2}$ of discriminant ${-4}$ is equivalent to the form ${x^2+y^2}$, and any quadratic form of discriminant ${-3}$ is equivalent to ${x^2+xy+y^2}$. (Hint: use elementary transformations to try to make ${|b|}$ as small as possible, to the point where one only has to check a finite number of cases; this argument is due to Legendre.) More generally, show that for any negative discriminant ${D}$, there are only finitely many quadratic forms of that discriminant up to equivalence (a result first established by Gauss).

Unfortunately, for most choices of discriminant, the converse question fails; for instance, the quadratic forms ${x^2+5y^2}$ and ${2x^2+2xy+3y^2}$ both have discriminant ${-20}$, but are not equivalent (Exercise 38). This particular failure of equivalence turns out to be intimately related to the failure of unique factorisation in the ring ${{\bf Z}[\sqrt{-5}]}$.

It turns out that there is a fundamental connection between quadratic fields, equivalence classes of quadratic forms of a given discriminant, and real Dirichlet characters, thus connecting the material discussed above with the last section of the previous set of notes. Here is a typical instance of this connection:

Proposition 8 Let ${\chi_4: {\bf N} \rightarrow {\bf R}}$ be the real non-principal Dirichlet character of modulus ${4}$, or more explicitly ${\chi_4(n)}$ is equal to ${+1}$ when ${n = 1\ (4)}$, ${-1}$ when ${n = 3\ (4)}$, and ${0}$ when ${n = 0,2\ (4)}$.

• (i) For any natural number ${n}$, the number of Gaussian integers ${m \in {\bf Z}[\sqrt{-1}]}$ with norm ${N(m)=n}$ is equal to ${4(1 * \chi_4)(n)}$. Equivalently, the number of solutions to the equation ${n = x^2+y^2}$ with ${x,y \in{\bf Z}}$ is ${4(1*\chi_4)(n)}$. (Here, as in the previous post, the symbol ${*}$ denotes Dirichlet convolution.)
• (ii) For any natural number ${n}$, the number of Gaussian integers ${m \in {\bf Z}[\sqrt{-1}]}$ that divide ${n}$ (thus ${n = dm}$ for some ${d \in {\bf Z}[\sqrt{-1}]}$) is ${4(1*1*1*\mu\chi_4)(n)}$.

We will prove this proposition later in these notes. We observe that as a special case of part (i) of this proposition, we recover the Fermat two-square theorem: an odd prime ${p}$ is expressible as the sum of two squares if and only if ${p = 1\ (4)}$. This proposition should also be compared with the fact, used crucially in the previous post to prove Dirichlet’s theorem, that ${1*\chi(n)}$ is non-negative for any ${n}$, and at least one when ${n}$ is a square, for any quadratic character ${\chi}$.

As an illustration of the relevance of such connections to analytic number theory, let us now explicitly compute ${L(1,\chi_4)}$.

Corollary 9 ${L(1,\chi_4) = \frac{\pi}{4}}$.

This particular identity is also known as the Leibniz formula.

Proof: For a large number ${x}$, consider the quantity

$\displaystyle \sum_{n \in {\bf Z}[\sqrt{-1}]: N(n) \leq x} 1$

of all the Gaussian integers of norm less than ${x}$. On the one hand, this is the same as the number of lattice points of ${{\bf Z}^2}$ in the disk ${\{ (a,b) \in {\bf R}^2: a^2+b^2 \leq x \}}$ of radius ${\sqrt{x}}$. Placing a unit square centred at each such lattice point, we obtain a region which differs from the disk by a region contained in an annulus of area ${O(\sqrt{x})}$. As the area of the disk is ${\pi x}$, we conclude the Gauss bound

$\displaystyle \sum_{n \in {\bf Z}[\sqrt{-1}]: N(n) \leq x} 1 = \pi x + O(\sqrt{x}).$

On the other hand, by Proposition 8(i) (and removing the ${n=0}$ contribution), we see that

$\displaystyle \sum_{n \in {\bf Z}[\sqrt{-1}]: N(n) \leq x} 1 = 1 + 4 \sum_{n \leq x} 1 * \chi_4(n).$

Now we use the Dirichlet hyperbola method to expand the right-hand side sum, first expressing

$\displaystyle \sum_{n \leq x} 1 * \chi_4(n) = \sum_{d \leq \sqrt{x}} \chi_4(d) \sum_{m \leq x/d} 1 + \sum_{m \leq \sqrt{x}} \sum_{d \leq x/m} \chi_4(d)$

$\displaystyle - (\sum_{d \leq \sqrt{x}} \chi_4(d)) (\sum_{m \leq \sqrt{x}} 1)$

and then using the bounds ${\sum_{d \leq y} \chi_4(d) = O(1)}$, ${\sum_{m \leq y} 1 = y + O(1)}$, ${\sum_{d \leq \sqrt{x}} \frac{\chi_4(d)}{d} = L(1,\chi_4) + O(\frac{1}{\sqrt{x}})}$ from the previous set of notes to conclude that

$\displaystyle \sum_{n \leq x} 1 * \chi_4(n) = x L(1,\chi_4) + O(\sqrt{x}).$

Comparing the two formulae for ${\sum_{n \in {\bf Z}[\sqrt{-1}]: N(n) \leq x} 1}$ and sending ${x \rightarrow \infty}$, we obtain the claim. $\Box$

Exercise 10 Give an alternate proof of Corollary 9 that relies on obtaining asymptotics for the Dirichlet series ${\sum_{n \in {\bf Z}} \frac{1 * \chi_4(n)}{n^s}}$ as ${s \rightarrow 1^+}$, rather than using the Dirichlet hyperbola method.

Exercise 11 Give a direct proof of Corollary 9 that does not use Proposition 8, instead using Taylor expansion of the complex logarithm ${\log(1+z)}$. (One can also use Taylor expansions of some other functions related to the complex logarithm here, such as the arctangent function.)

More generally, one can relate ${L(1,\chi)}$ for a real Dirichlet character ${\chi}$ with the number of inequivalent quadratic forms of a certain discriminant, via the famous class number formula; we will give a special case of this formula below the fold.

The material here is only a very rudimentary introduction to algebraic number theory, and is not essential to the rest of the course. A slightly expanded version of the material here, from the perspective of analytic number theory, may be found in Sections 5 and 6 of Davenport’s book. A more in-depth treatment of algebraic number theory may be found in a number of texts, e.g. Fröhlich and Taylor.

In analytic number theory, an arithmetic function is simply a function ${f: {\bf N} \rightarrow {\bf C}}$ from the natural numbers ${{\bf N} = \{1,2,3,\dots\}}$ to the real or complex numbers. (One occasionally also considers arithmetic functions taking values in more general rings than ${{\bf R}}$ or ${{\bf C}}$, as in this previous blog post, but we will restrict attention here to the classical situation of real ofr complex arithmetic functions.) Experience has shown that a particularly tractable and relevant class of arithmetic functions for analytic number theory are the multiplicative functions, which are arithmetic functions ${f: {\bf N} \rightarrow {\bf C}}$ with the additional property that

$\displaystyle f(nm) = f(n) f(m) \ \ \ \ \ (1)$

whenever ${n,m \in{\bf N}}$ are coprime. (One also considers arithmetic functions, such as the logarithm function ${L(n) := \log n}$ or the von Mangoldt function, that are not genuinely multiplicative, but interact closely with multiplicative functions, and can be viewed as “derived” versions of multiplicative functions; see this previous post.) A typical example of a multiplicative function is the divisor function

$\displaystyle \tau(n) := \sum_{d|n} 1 \ \ \ \ \ (2)$

that counts the number of divisors of a natural number ${n}$. (The divisor function ${n \mapsto \tau(n)}$ is also denoted ${n \mapsto d(n)}$ in the literature.) The study of asymptotic behaviour of multiplicative functions (and their relatives) is known as multiplicative number theory, and is a basic cornerstone of modern analytic number theory.

There are various approaches to multiplicative number theory, each of which focuses on different asymptotic statistics of arithmetic functions ${f}$. In elementary multiplicative number theory, which is the focus of this set of notes, particular emphasis is given on the following two statistics of a given arithmetic function ${f: {\bf N} \rightarrow {\bf C}}$:

1. The summatory functions

$\displaystyle \sum_{n \leq x} f(n)$

of an arithmetic function ${f}$, as well as the associated natural density

$\displaystyle \lim_{x \rightarrow \infty} \frac{1}{x} \sum_{n \leq x} f(n)$

(if it exists).

2. The logarithmic sums

$\displaystyle \sum_{n\leq x} \frac{f(n)}{n}$

of an arithmetic function ${f}$, as well as the associated logarithmic density

$\displaystyle \lim_{x \rightarrow \infty} \frac{1}{\log x} \sum_{n \leq x} \frac{f(n)}{n}$

(if it exists).

Here, we are normalising the arithmetic function ${f}$ being studied to be of roughly unit size up to logarithms, obeying bounds such as ${f(n)=O(1)}$, ${f(n) = O(\log^{O(1)} n)}$, or at worst

$\displaystyle f(n) = O(n^{o(1)}). \ \ \ \ \ (3)$

A classical case of interest is when ${f}$ is an indicator function ${f=1_A}$ of some set ${A}$ of natural numbers, in which case we also refer to the natural or logarithmic density of ${f}$ as the natural or logarithmic density of ${A}$ respectively. However, in analytic number theory it is usually more convenient to replace such indicator functions with other related functions that have better multiplicative properties. For instance, the indicator function ${1_{\mathcal P}}$ of the primes is often replaced with the von Mangoldt function ${\Lambda}$.

Typically, the logarithmic sums are relatively easy to control, but the summatory functions require more effort in order to obtain satisfactory estimates; see Exercise 7 below.

If an arithmetic function ${f}$ is multiplicative (or closely related to a multiplicative function), then there is an important further statistic on an arithmetic function ${f}$ beyond the summatory function and the logarithmic sum, namely the Dirichlet series

$\displaystyle {\mathcal D}f(s) := \sum_{n=1}^\infty \frac{f(n)}{n^s} \ \ \ \ \ (4)$

for various real or complex numbers ${s}$. Under the hypothesis (3), this series is absolutely convergent for real numbers ${s>1}$, or more generally for complex numbers ${s}$ with ${\hbox{Re}(s)>1}$. As we will see below the fold, when ${f}$ is multiplicative then the Dirichlet series enjoys an important Euler product factorisation which has many consequences for analytic number theory.

In the elementary approach to multiplicative number theory presented in this set of notes, we consider Dirichlet series only for real numbers ${s>1}$ (and focusing particularly on the asymptotic behaviour as ${s \rightarrow 1^+}$); in later notes we will focus instead on the important complex-analytic approach to multiplicative number theory, in which the Dirichlet series (4) play a central role, and are defined not only for complex numbers with large real part, but are often extended analytically or meromorphically to the rest of the complex plane as well.

Remark 1 The elementary and complex-analytic approaches to multiplicative number theory are the two classical approaches to the subject. One could also consider a more “Fourier-analytic” approach, in which one studies convolution-type statistics such as

$\displaystyle \sum_n \frac{f(n)}{n} G( t - \log n ) \ \ \ \ \ (5)$

as ${t \rightarrow \infty}$ for various cutoff functions ${G: {\bf R} \rightarrow {\bf C}}$, such as smooth, compactly supported functions. See for instance this previous blog post for an instance of such an approach. Another related approach is the “pretentious” approach to multiplicative number theory currently being developed by Granville-Soundararajan and their collaborators. We will occasionally make reference to these more modern approaches in these notes, but will primarily focus on the classical approaches.

To reverse the process and derive control on summatory functions or logarithmic sums starting from control of Dirichlet series is trickier, and usually requires one to allow ${s}$ to be complex-valued rather than real-valued if one wants to obtain really accurate estimates; we will return to this point in subsequent notes. However, there is a cheap way to get upper bounds on such sums, known as Rankin’s trick, which we will discuss later in these notes.

The basic strategy of elementary multiplicative theory is to first gather useful estimates on the statistics of “smooth” or “non-oscillatory” functions, such as the constant function ${n \mapsto 1}$, the harmonic function ${n \mapsto \frac{1}{n}}$, or the logarithm function ${n \mapsto \log n}$; one also considers the statistics of periodic functions such as Dirichlet characters. These functions can be understood without any multiplicative number theory, using basic tools from real analysis such as the (quantitative version of the) integral test or summation by parts. Once one understands the statistics of these basic functions, one can then move on to statistics of more arithmetically interesting functions, such as the divisor function (2) or the von Mangoldt function ${\Lambda}$ that we will discuss below. A key tool to relate these functions to each other is that of Dirichlet convolution, which is an operation that interacts well with summatory functions, logarithmic sums, and particularly well with Dirichlet series.

This is only an introduction to elementary multiplicative number theory techniques. More in-depth treatments may be found in this text of Montgomery-Vaughan, or this text of Bateman-Diamond.

Many problems and results in analytic prime number theory can be formulated in the following general form: given a collection of (affine-)linear forms ${L_1(n),\dots,L_k(n)}$, none of which is a multiple of any other, find a number ${n}$ such that a certain property ${P( L_1(n),\dots,L_k(n) )}$ of the linear forms ${L_1(n),\dots,L_k(n)}$ are true. For instance:

• For the twin prime conjecture, one can use the linear forms ${L_1(n) := n}$, ${L_2(n) := n+2}$, and the property ${P( L_1(n), L_2(n) )}$ in question is the assertion that ${L_1(n)}$ and ${L_2(n)}$ are both prime.
• For the even Goldbach conjecture, the claim is similar but one uses the linear forms ${L_1(n) := n}$, ${L_2(n) := N-n}$ for some even integer ${N}$.
• For Chen’s theorem, we use the same linear forms ${L_1(n),L_2(n)}$ as in the previous two cases, but now ${P(L_1(n), L_2(n))}$ is the assertion that ${L_1(n)}$ is prime and ${L_2(n)}$ is an almost prime (in the sense that there are at most two prime factors).
• In the recent results establishing bounded gaps between primes, we use the linear forms ${L_i(n) = n + h_i}$ for some admissible tuple ${h_1,\dots,h_k}$, and take ${P(L_1(n),\dots,L_k(n))}$ to be the assertion that at least two of ${L_1(n),\dots,L_k(n)}$ are prime.

For these sorts of results, one can try a sieve-theoretic approach, which can broadly be formulated as follows:

1. First, one chooses a carefully selected sieve weight ${\nu: {\bf N} \rightarrow {\bf R}^+}$, which could for instance be a non-negative function having a divisor sum form

$\displaystyle \nu(n) := \sum_{d_1|L_1(n), \dots, d_k|L_k(n); d_1 \dots d_k \leq x^{1-\varepsilon}} \lambda_{d_1,\dots,d_k}$

for some coefficients ${\lambda_{d_1,\dots,d_k}}$, where ${x}$ is a natural scale parameter. The precise choice of sieve weight is often quite a delicate matter, but will not be discussed here. (In some cases, one may work with multiple sieve weights ${\nu_1, \nu_2, \dots}$.)

2. Next, one uses tools from analytic number theory (such as the Bombieri-Vinogradov theorem) to obtain upper and lower bounds for sums such as

$\displaystyle \sum_n \nu(n) \ \ \ \ \ (1)$

or

$\displaystyle \sum_n \nu(n) 1_{L_i(n) \hbox{ prime}} \ \ \ \ \ (2)$

or more generally of the form

$\displaystyle \sum_n \nu(n) f(L_i(n)) \ \ \ \ \ (3)$

where ${f(L_i(n))}$ is some “arithmetic” function involving the prime factorisation of ${L_i(n)}$ (we will be a bit vague about what this means precisely, but a typical choice of ${f}$ might be a Dirichlet convolution ${\alpha*\beta(L_i(n))}$ of two other arithmetic functions ${\alpha,\beta}$).

3. Using some combinatorial arguments, one manipulates these upper and lower bounds, together with the non-negative nature of ${\nu}$, to conclude the existence of an ${n}$ in the support of ${\nu}$ (or of at least one of the sieve weights ${\nu_1, \nu_2, \dots}$ being considered) for which ${P( L_1(n), \dots, L_k(n) )}$ holds

For instance, in the recent results on bounded gaps between primes, one selects a sieve weight ${\nu}$ for which one has upper bounds on

$\displaystyle \sum_n \nu(n)$

and lower bounds on

$\displaystyle \sum_n \nu(n) 1_{n+h_i \hbox{ prime}}$

so that one can show that the expression

$\displaystyle \sum_n \nu(n) (\sum_{i=1}^k 1_{n+h_i \hbox{ prime}} - 1)$

is strictly positive, which implies the existence of an ${n}$ in the support of ${\nu}$ such that at least two of ${n+h_1,\dots,n+h_k}$ are prime. As another example, to prove Chen’s theorem to find ${n}$ such that ${L_1(n)}$ is prime and ${L_2(n)}$ is almost prime, one uses a variety of sieve weights to produce a lower bound for

$\displaystyle S_1 := \sum_{n \leq x} 1_{L_1(n) \hbox{ prime}} 1_{L_2(n) \hbox{ rough}}$

and an upper bound for

$\displaystyle S_2 := \sum_{z \leq p < x^{1/3}} \sum_{n \leq x} 1_{L_1(n) \hbox{ prime}} 1_{p|L_2(n)} 1_{L_2(n) \hbox{ rough}}$

and

$\displaystyle S_3 := \sum_{n \leq x} 1_{L_1(n) \hbox{ prime}} 1_{L_2(n)=pqr \hbox{ for some } z \leq p \leq x^{1/3} < q \leq r},$

where ${z}$ is some parameter between ${1}$ and ${x^{1/3}}$, and “rough” means that all prime factors are at least ${z}$. One can observe that if ${S_1 - \frac{1}{2} S_2 - \frac{1}{2} S_3 > 0}$, then there must be at least one ${n}$ for which ${L_1(n)}$ is prime and ${L_2(n)}$ is almost prime, since for any rough number ${m}$, the quantity

$\displaystyle 1 - \frac{1}{2} \sum_{z \leq p < x^{1/3}} 1_{p|m} - \frac{1}{2} \sum_{z \leq p \leq x^{1/3} < q \leq r} 1_{m = pqr}$

is only positive when ${m}$ is an almost prime (if ${m}$ has three or more factors, then either it has at least two factors less than ${x^{1/3}}$, or it is of the form ${pqr}$ for some ${p \leq x^{1/3} < q \leq r}$). The upper and lower bounds on ${S_1,S_2,S_3}$ are ultimately produced via asymptotics for expressions of the form (1), (2), (3) for various divisor sums ${\nu}$ and various arithmetic functions ${f}$.

Unfortunately, there is an obstruction to sieve-theoretic techniques working for certain types of properties ${P(L_1(n),\dots,L_k(n))}$, which Zeb Brady and I recently formalised at an AIM workshop this week. To state the result, we recall the Liouville function ${\lambda(n)}$, defined by setting ${\lambda(n) = (-1)^j}$ whenever ${n}$ is the product of exactly ${j}$ primes (counting multiplicity). Define a sign pattern to be an element ${(\epsilon_1,\dots,\epsilon_k)}$ of the discrete cube ${\{-1,+1\}^k}$. Given a property ${P(l_1,\dots,l_k)}$ of ${k}$ natural numbers ${l_1,\dots,l_k}$, we say that a sign pattern ${(\epsilon_1,\dots,\epsilon_k)}$ is forbidden by ${P}$ if there does not exist any natural numbers ${l_1,\dots,l_k}$ obeying ${P(l_1,\dots,l_k)}$ for which

$\displaystyle (\lambda(l_1),\dots,\lambda(l_k)) = (\epsilon_1,\dots,\epsilon_k).$

Example 1 Let ${P(l_1,l_2,l_3)}$ be the property that at least two of ${l_1,l_2,l_3}$ are prime. Then the sign patterns ${(+1,+1,+1)}$, ${(+1,+1,-1)}$, ${(+1,-1,+1)}$, ${(-1,+1,+1)}$ are forbidden, because prime numbers have a Liouville function of ${-1}$, so that ${P(l_1,l_2,l_3)}$ can only occur when at least two of ${\lambda(l_1),\lambda(l_2), \lambda(l_3)}$ are equal to ${-1}$.

Example 2 Let ${P(l_1,l_2)}$ be the property that ${l_1}$ is prime and ${l_2}$ is almost prime. Then the only forbidden sign patterns are ${(+1,+1)}$ and ${(+1,-1)}$.

Example 3 Let ${P(l_1,l_2)}$ be the property that ${l_1}$ and ${l_2}$ are both prime. Then ${(+1,+1), (+1,-1), (-1,+1)}$ are all forbidden sign patterns.

We then have a parity obstruction as soon as ${P}$ has “too many” forbidden sign patterns, in the following (slightly informal) sense:

Claim 1 (Parity obstruction) Suppose ${P(l_1,\dots,l_k)}$ is such that that the convex hull of the forbidden sign patterns of ${P}$ contains the origin. Then one cannot use the above sieve-theoretic approach to establish the existence of an ${n}$ such that ${P(L_1(n),\dots,L_k(n))}$ holds.

Thus for instance, the property in Example 3 is subject to the parity obstruction since ${0}$ is a convex combination of ${(+1,-1)}$ and ${(-1,+1)}$, whereas the properties in Examples 1, 2 are not. One can also check that the property “at least ${j}$ of the ${k}$ numbers ${l_1,\dots,l_k}$ is prime” is subject to the parity obstruction as soon as ${j \geq \frac{k}{2}+1}$. Thus, the largest number of elements of a ${k}$-tuple that one can force to be prime by purely sieve-theoretic methods is ${k/2}$, rounded up.

This claim is not precisely a theorem, because it presumes a certain “Liouville pseudorandomness conjecture” (a very close cousin of the more well known “Möbius pseudorandomness conjecture”) which is a bit difficult to formalise precisely. However, this conjecture is widely believed by analytic number theorists, see e.g. this blog post for a discussion. (Note though that there are scenarios, most notably the “Siegel zero” scenario, in which there is a severe breakdown of this pseudorandomness conjecture, and the parity obstruction then disappears. A typical instance of this is Heath-Brown’s proof of the twin prime conjecture (which would ordinarily be subject to the parity obstruction) under the hypothesis of a Siegel zero.) The obstruction also does not prevent the establishment of an ${n}$ such that ${P(L_1(n),\dots,L_k(n))}$ holds by introducing additional sieve axioms beyond upper and lower bounds on quantities such as (1), (2), (3). The proof of the Friedlander-Iwaniec theorem is a good example of this latter scenario.

Now we give a (slightly nonrigorous) proof of the claim.

Proof: (Nonrigorous) Suppose that the convex hull of the forbidden sign patterns contain the origin. Then we can find non-negative numbers ${p_{\epsilon_1,\dots,\epsilon_k}}$ for sign patterns ${(\epsilon_1,\dots,\epsilon_k)}$, which sum to ${1}$, are non-zero only for forbidden sign patterns, and which have mean zero in the sense that

$\displaystyle \sum_{(\epsilon_1,\dots,\epsilon_k)} p_{\epsilon_1,\dots,\epsilon_k} \epsilon_i = 0$

for all ${i=1,\dots,k}$. By Fourier expansion (or Lagrange interpolation), one can then write ${p_{\epsilon_1,\dots,\epsilon_k}}$ as a polynomial

$\displaystyle p_{\epsilon_1,\dots,\epsilon_k} = 1 + Q( \epsilon_1,\dots,\epsilon_k)$

where ${Q(t_1,\dots,t_k)}$ is a polynomial in ${k}$ variables that is a linear combination of monomials ${t_{i_1} \dots t_{i_r}}$ with ${i_1 < \dots < i_r}$ and ${r \geq 2}$ (thus ${Q}$ has no constant or linear terms, and no monomials with repeated terms). The point is that the mean zero condition allows one to eliminate the linear terms. If we now consider the weight function

$\displaystyle w(n) := 1 + Q( \lambda(L_1(n)), \dots, \lambda(L_k(n)) )$

then ${w}$ is non-negative, is supported solely on ${n}$ for which ${(\lambda(L_1(n)),\dots,\lambda(L_k(n)))}$ is a forbidden pattern, and is equal to ${1}$ plus a linear combination of monomials ${\lambda(L_{i_1}(n)) \dots \lambda(L_{i_r}(n))}$ with ${r \geq 2}$.

The Liouville pseudorandomness principle then predicts that sums of the form

$\displaystyle \sum_n \nu(n) Q( \lambda(L_1(n)), \dots, \lambda(L_k(n)) )$

and

$\displaystyle \sum_n \nu(n) Q( \lambda(L_1(n)), \dots, \lambda(L_k(n)) ) 1_{L_i(n) \hbox{ prime}}$

or more generally

$\displaystyle \sum_n \nu(n) Q( \lambda(L_1(n)), \dots, \lambda(L_k(n)) ) f(L_i(n))$

should be asymptotically negligible; intuitively, the point here is that the prime factorisation of ${L_i(n)}$ should not influence the Liouville function of ${L_j(n)}$, even on the short arithmetic progressions that the divisor sum ${\nu}$ is built out of, and so any monomial ${\lambda(L_{i_1}(n)) \dots \lambda(L_{i_r}(n))}$ occurring in ${Q( \lambda(L_1(n)), \dots, \lambda(L_k(n)) )}$ should exhibit strong cancellation for any of the above sums. If one accepts this principle, then all the expressions (1), (2), (3) should be essentially unchanged when ${\nu(n)}$ is replaced by ${\nu(n) w(n)}$.

Suppose now for sake of contradiction that one could use sieve-theoretic methods to locate an ${n}$ in the support of some sieve weight ${\nu(n)}$ obeying ${P( L_1(n),\dots,L_k(n))}$. Then, by reweighting all sieve weights by the additional multiplicative factor of ${w(n)}$, the same arguments should also be able to locate ${n}$ in the support of ${\nu(n) w(n)}$ for which ${P( L_1(n),\dots,L_k(n))}$ holds. But ${w}$ is only supported on those ${n}$ whose Liouville sign pattern is forbidden, a contradiction. $\Box$

Claim 1 is sharp in the following sense: if the convex hull of the forbidden sign patterns of ${P}$ do not contain the origin, then by the Hahn-Banach theorem (in the hyperplane separation form), there exist real coefficients ${c_1,\dots,c_k}$ such that

$\displaystyle c_1 \epsilon_1 + \dots + c_k \epsilon_k < -c$

for all forbidden sign patterns ${(\epsilon_1,\dots,\epsilon_k)}$ and some ${c>0}$. On the other hand, from Liouville pseudorandomness one expects that

$\displaystyle \sum_n \nu(n) (c_1 \lambda(L_1(n)) + \dots + c_k \lambda(L_k(n)))$

is negligible (as compared against ${\sum_n \nu(n)}$ for any reasonable sieve weight ${\nu}$. We conclude that for some ${n}$ in the support of ${\nu}$, that

$\displaystyle c_1 \lambda(L_1(n)) + \dots + c_k \lambda(L_k(n)) > -c \ \ \ \ \ (4)$

and hence ${(\lambda(L_1(n)),\dots,\lambda(L_k(n)))}$ is not a forbidden sign pattern. This does not actually imply that ${P(L_1(n),\dots,L_k(n))}$ holds, but it does not prevent ${P(L_1(n),\dots,L_k(n))}$ from holding purely from parity considerations. Thus, we do not expect a parity obstruction of the type in Claim 1 to hold when the convex hull of forbidden sign patterns does not contain the origin.

Example 4 Let ${G}$ be a graph on ${k}$ vertices ${\{1,\dots,k\}}$, and let ${P(l_1,\dots,l_k)}$ be the property that one can find an edge ${\{i,j\}}$ of ${G}$ with ${l_i,l_j}$ both prime. We claim that this property is subject to the parity problem precisely when ${G}$ is two-colourable. Indeed, if ${G}$ is two-colourable, then we can colour ${\{1,\dots,k\}}$ into two colours (say, red and green) such that all edges in ${G}$ connect a red vertex to a green vertex. If we then consider the two sign patterns in which all the red vertices have one sign and the green vertices have the opposite sign, these are two forbidden sign patterns which contain the origin in the convex hull, and so the parity problem applies. Conversely, suppose that ${G}$ is not two-colourable, then it contains an odd cycle. Any forbidden sign pattern then must contain more ${+1}$s on this odd cycle than ${-1}$s (since otherwise two of the ${-1}$s are adjacent on this cycle by the pigeonhole principle, and this is not forbidden), and so by convexity any tuple in the convex hull of this sign pattern has a positive sum on this odd cycle. Hence the origin is not in the convex hull, and the parity obstruction does not apply. (See also this previous post for a similar obstruction ultimately coming from two-colourability).

Example 5 An example of a parity-obstructed property (supplied by Zeb Brady) that does not come from two-colourability: we let ${P( l_{\{1,2\}}, l_{\{1,3\}}, l_{\{1,4\}}, l_{\{2,3\}}, l_{\{2,4\}}, l_{\{3,4\}} )}$ be the property that ${l_{A_1},\dots,l_{A_r}}$ are prime for some collection ${A_1,\dots,A_r}$ of pair sets that cover ${\{1,\dots,4\}}$. For instance, this property holds if ${l_{\{1,2\}}, l_{\{3,4\}}}$ are both prime, or if ${l_{\{1,2\}}, l_{\{1,3\}}, l_{\{1,4\}}}$ are all prime, but not if ${l_{\{1,2\}}, l_{\{1,3\}}, l_{\{2,3\}}}$ are the only primes. An example of a forbidden sign pattern is the pattern where ${\{1,2\}, \{2,3\}, \{1,3\}}$ are given the sign ${-1}$, and the other three pairs are given ${+1}$. Averaging over permutations of ${1,2,3,4}$ we see that zero lies in the convex hull, and so this example is blocked by parity. However, there is no sign pattern such that it and its negation are both forbidden, which is another formulation of two-colourability.

Of course, the absence of a parity obstruction does not automatically mean that the desired claim is true. For instance, given an admissible ${5}$-tuple ${h_1,\dots,h_5}$, parity obstructions do not prevent one from establishing the existence of infinitely many ${n}$ such that at least three of ${n+h_1,\dots,n+h_5}$ are prime, however we are not yet able to actually establish this, even assuming strong sieve-theoretic hypotheses such as the generalised Elliott-Halberstam hypothesis. (However, the argument giving (4) does easily give the far weaker claim that there exist infinitely many ${n}$ such that at least three of ${n+h_1,\dots,n+h_5}$ have a Liouville function of ${-1}$.)

Remark 1 Another way to get past the parity problem in some cases is to take advantage of linear forms that are constant multiples of each other (which correlates the Liouville functions to each other). For instance, on GEH we can find two ${E_3}$ numbers (products of exactly three primes) that differ by exactly ${60}$; a direct sieve approach using the linear forms ${n,n+60}$ fails due to the parity obstruction, but instead one can first find ${n}$ such that two of ${n,n+4,n+10}$ are prime, and then among the pairs of linear forms ${(15n,15n+60)}$, ${(6n,6n+60)}$, ${(10n+40,10n+100)}$ one can find a pair of ${E_3}$ numbers that differ by exactly ${60}$. See this paper of Goldston, Graham, Pintz, and Yildirim for more examples of this type.

I thank John Friedlander and Sid Graham for helpful discussions and encouragement.

In the winter quarter (starting January 5) I will be teaching a graduate topics course entitled “An introduction to analytic prime number theory“. As the name suggests, this is a course covering many of the analytic number theory techniques used to study the distribution of the prime numbers ${{\mathcal P} = \{2,3,5,7,11,\dots\}}$. I will list the topics I intend to cover in this course below the fold. As with my previous courses, I will place lecture notes online on my blog in advance of the physical lectures.

The type of results about primes that one aspires to prove here is well captured by Landau’s classical list of problems:

1. Even Goldbach conjecture: every even number ${N}$ greater than two is expressible as the sum of two primes.
2. Twin prime conjecture: there are infinitely many pairs ${n,n+2}$ which are simultaneously prime.
3. Legendre’s conjecture: for every natural number ${N}$, there is a prime between ${N^2}$ and ${(N+1)^2}$.
4. There are infinitely many primes of the form ${n^2+1}$.

All four of Landau’s problems remain open, but we have convincing heuristic evidence that they are all true, and in each of the four cases we have some highly non-trivial partial results, some of which will be covered in this course. We also now have some understanding of the barriers we are facing to fully resolving each of these problems, such as the parity problem; this will also be discussed in the course.

One of the main reasons that the prime numbers ${{\mathcal P}}$ are so difficult to deal with rigorously is that they have very little usable algebraic or geometric structure that we know how to exploit; for instance, we do not have any useful prime generating functions. One of course can create non-useful functions of this form, such as the ordered parameterisation ${n \mapsto p_n}$ that maps each natural number ${n}$ to the ${n^{th}}$ prime ${p_n}$, or one could invoke Matiyasevich’s theorem to produce a polynomial of many variables whose only positive values are prime, but these sorts of functions have no usable structure to exploit (for instance, they give no insight into any of the Landau problems listed above; see also Remark 2 below). The various primality tests in the literature, while useful for practical applications (e.g. cryptography) involving primes, have also proven to be of little utility for these sorts of problems; again, see Remark 2. In fact, in order to make plausible heuristic predictions about the primes, it is best to take almost the opposite point of view to the structured viewpoint, using as a starting point the belief that the primes exhibit strong pseudorandomness properties that are largely incompatible with the presence of rigid algebraic or geometric structure. We will discuss such heuristics later in this course.

It may be in the future that some usable structure to the primes (or related objects) will eventually be located (this is for instance one of the motivations in developing a rigorous theory of the “field with one element“, although this theory is far from being fully realised at present). For now, though, analytic and combinatorial methods have proven to be the most effective way forward, as they can often be used even in the near-complete absence of structure.

In this course, we will not discuss combinatorial approaches (such as the deployment of tools from additive combinatorics) in depth, but instead focus on the analytic methods. The basic principles of this approach can be summarised as follows:

1. Rather than try to isolate individual primes ${p}$ in ${{\mathcal P}}$, one works with the set of primes ${{\mathcal P}}$ in aggregate, focusing in particular on asymptotic statistics of this set. For instance, rather than try to find a single pair ${n,n+2}$ of twin primes, one can focus instead on the count ${|\{ n \leq x: n,n+2 \in {\mathcal P} \}|}$ of twin primes up to some threshold ${x}$. Similarly, one can focus on counts such as ${|\{ n \leq N: n, N-n \in {\mathcal P} \}|}$, ${|\{ p \in {\mathcal P}: N^2 < p < (N+1)^2 \}|}$, or ${|\{ n \leq x: n^2 + 1 \in {\mathcal P} \}|}$, which are the natural counts associated to the other three Landau problems. In all four of Landau’s problems, the basic task is now to obtain a non-trivial lower bounds on these counts.
2. If one wishes to proceed analytically rather than combinatorially, one should convert all these counts into sums, using the fundamental identity

$\displaystyle |A| = \sum_n 1_A(n),$

(or variants thereof) for the cardinality ${|A|}$ of subsets ${A}$ of the natural numbers ${{\bf N}}$, where ${1_A}$ is the indicator function of ${A}$ (and ${n}$ ranges over ${{\bf N}}$). Thus we are now interested in estimating (and particularly in lower bounding) sums such as

$\displaystyle \sum_{n \leq N} 1_{{\mathcal P}}(n) 1_{{\mathcal P}}(N-n),$

$\displaystyle \sum_{n \leq x} 1_{{\mathcal P}}(n) 1_{{\mathcal P}}(n+2),$

$\displaystyle \sum_{N^2 < n < (N+1)^2} 1_{{\mathcal P}}(n),$

or

$\displaystyle \sum_{n \leq x} 1_{{\mathcal P}}(n^2+1).$

3. Once one expresses number-theoretic problems in this fashion, we are naturally led to the more general question of how to accurately estimate (or, less ambitiously, to lower bound or upper bound) sums such as

$\displaystyle \sum_n f(n)$

or more generally bilinear or multilinear sums such as

$\displaystyle \sum_n \sum_m f(n,m)$

or

$\displaystyle \sum_{n_1,\dots,n_k} f(n_1,\dots,n_k)$

for various functions ${f}$ of arithmetic interest. (Importantly, one should also generalise to include integrals as well as sums, particularly contour integrals or integrals over the unit circle or real line, but we postpone discussion of these generalisations to later in the course.) Indeed, a huge portion of modern analytic number theory is devoted to precisely this sort of question. In many cases, we can predict an expected main term for such sums, and then the task is to control the error term between the true sum and its expected main term. It is often convenient to normalise the expected main term to be zero or negligible (e.g. by subtracting a suitable constant from ${f}$), so that one is now trying to show that a sum of signed real numbers (or perhaps complex numbers) is small. In other words, the question becomes one of rigorously establishing a significant amount of cancellation in one’s sums (also referred to as a gain or savings over a benchmark “trivial bound”). Or to phrase it negatively, the task is to rigorously prevent a conspiracy of non-cancellation, caused for instance by two factors in the summand ${f(n)}$ exhibiting an unexpectedly large correlation with each other.

4. It is often difficult to discern cancellation (or to prevent conspiracy) directly for a given sum (such as ${\sum_n f(n)}$) of interest. However, analytic number theory has developed a large number of techniques to relate one sum to another, and then the strategy is to keep transforming the sum into more and more analytically tractable expressions, until one arrives at a sum for which cancellation can be directly exhibited. (Note though that there is often a short-term tradeoff between analytic tractability and algebraic simplicity; in a typical analytic number theory argument, the sums will get expanded and decomposed into many quite messy-looking sub-sums, until at some point one applies some crude estimation to replace these messy sub-sums by tractable ones again.) There are many transformations available, ranging such basic tools as the triangle inequality, pointwise domination, or the Cauchy-Schwarz inequality to key identities such as multiplicative number theory identities (such as the Vaughan identity and the Heath-Brown identity), Fourier-analytic identities (e.g. Fourier inversion, Poisson summation, or more advanced trace formulae), or complex analytic identities (e.g. the residue theorem, Perron’s formula, or Jensen’s formula). The sheer range of transformations available can be intimidating at first; there is no shortage of transformations and identities in this subject, and if one applies them randomly then one will typically just transform a difficult sum into an even more difficult and intractable expression. However, one can make progress if one is guided by the strategy of isolating and enhancing a desired cancellation (or conspiracy) to the point where it can be easily established (or dispelled), or alternatively to reach the point where no deep cancellation is needed for the application at hand (or equivalently, that no deep conspiracy can disrupt the application).
5. One particularly powerful technique (albeit one which, ironically, can be highly “ineffective” in a certain technical sense to be discussed later) is to use one potential conspiracy to defeat another, a technique I refer to as the “dueling conspiracies” method. This technique may be unable to prevent a single strong conspiracy, but it can sometimes be used to prevent two or more such conspiracies from occurring, which is particularly useful if conspiracies come in pairs (e.g. through complex conjugation symmetry, or a functional equation). A related (but more “effective”) strategy is to try to “disperse” a single conspiracy into several distinct conspiracies, which can then be used to defeat each other.

As stated before, the above strategy has not been able to establish any of the four Landau problems as stated. However, they can come close to such problems (and we now have some understanding as to why these problems remain out of reach of current methods). For instance, by using these techniques (and a lot of additional effort) one can obtain the following sample partial results in the Landau problems:

1. Chen’s theorem: every sufficiently large even number ${N}$ is expressible as the sum of a prime and an almost prime (the product of at most two primes). The proof proceeds by finding a nontrivial lower bound on ${\sum_{n \leq N} 1_{\mathcal P}(n) 1_{{\mathcal E}_2}(N-n)}$, where ${{\mathcal E}_2}$ is the set of almost primes.
2. Zhang’s theorem: There exist infinitely many pairs ${p_n, p_{n+1}}$ of consecutive primes with ${p_{n+1} - p_n \leq 7 \times 10^7}$. The proof proceeds by giving a non-negative lower bound on the quantity ${\sum_{x \leq n \leq 2x} (\sum_{i=1}^k 1_{\mathcal P}(n+h_i) - 1)}$ for large ${x}$ and certain distinct integers ${h_1,\dots,h_k}$ between ${0}$ and ${7 \times 10^7}$. (The bound ${7 \times 10^7}$ has since been lowered to ${246}$.)
3. The Baker-Harman-Pintz theorem: for sufficiently large ${x}$, there is a prime between ${x}$ and ${x + x^{0.525}}$. Proven by finding a nontrivial lower bound on ${\sum_{x \leq n \leq x+x^{0.525}} 1_{\mathcal P}(n)}$.
4. The Friedlander-Iwaniec theorem: There are infinitely many primes of the form ${n^2+m^4}$. Proven by finding a nontrivial lower bound on ${\sum_{n,m: n^2+m^4 \leq x} 1_{{\mathcal P}}(n^2+m^4)}$.

We will discuss (simpler versions of) several of these results in this course.

Of course, for the above general strategy to have any chance of succeeding, one must at some point use some information about the set ${{\mathcal P}}$ of primes. As stated previously, usefully structured parametric descriptions of ${{\mathcal P}}$ do not appear to be available. However, we do have two other fundamental and useful ways to describe ${{\mathcal P}}$:

1. (Sieve theory description) The primes ${{\mathcal P}}$ consist of those numbers greater than one, that are not divisible by any smaller prime.
2. (Multiplicative number theory description) The primes ${{\mathcal P}}$ are the multiplicative generators of the natural numbers ${{\bf N}}$: every natural number is uniquely factorisable (up to permutation) into the product of primes (the fundamental theorem of arithmetic).

The sieve-theoretic description and its variants lead one to a good understanding of the almost primes, which turn out to be excellent tools for controlling the primes themselves, although there are known limitations as to how much information on the primes one can extract from sieve-theoretic methods alone, which we will discuss later in this course. The multiplicative number theory methods lead one (after some complex or Fourier analysis) to the Riemann zeta function (and other L-functions, particularly the Dirichlet L-functions), with the distribution of zeroes (and poles) of these functions playing a particularly decisive role in the multiplicative methods.

Many of our strongest results in analytic prime number theory are ultimately obtained by incorporating some combination of the above two fundamental descriptions of ${{\mathcal P}}$ (or variants thereof) into the general strategy described above. In contrast, more advanced descriptions of ${{\mathcal P}}$, such as those coming from the various primality tests available, have (until now, at least) been surprisingly ineffective in practice for attacking problems such as Landau’s problems. One reason for this is that such tests generally involve operations such as exponentiation ${a \mapsto a^n}$ or the factorial function ${n \mapsto n!}$, which grow too quickly to be amenable to the analytic techniques discussed above.

To give a simple illustration of these two basic approaches to the primes, let us first give two variants of the usual proof of Euclid’s theorem:

Theorem 1 (Euclid’s theorem) There are infinitely many primes.

Proof: (Multiplicative number theory proof) Suppose for contradiction that there were only finitely many primes ${p_1,\dots,p_n}$. Then, by the fundamental theorem of arithmetic, every natural number is expressible as the product of the primes ${p_1,\dots,p_n}$. But the natural number ${p_1 \dots p_n + 1}$ is larger than one, but not divisible by any of the primes ${p_1,\dots,p_n}$, a contradiction.

(Sieve-theoretic proof) Suppose for contradiction that there were only finitely many primes ${p_1,\dots,p_n}$. Then, by the Chinese remainder theorem, the set of natural numbers ${A}$ that is not divisible by any of the ${p_1,\dots,p_n}$ has density ${\prod_{i=1}^n (1-\frac{1}{p_i})}$, that is to say

$\displaystyle \lim_{N \rightarrow \infty} \frac{1}{N} | A \cap \{1,\dots,N\} | = \prod_{i=1}^n (1-\frac{1}{p_i}).$

In particular, ${A}$ has positive density and thus contains an element larger than ${1}$. But the least such element is one further prime in addition to ${p_1,\dots,p_n}$, a contradiction. $\Box$

Remark 1 One can also phrase the proof of Euclid’s theorem in a fashion that largely avoids the use of contradiction; see this previous blog post for more discussion.

Both proofs in fact extend to give a stronger result:

Theorem 2 (Euler’s theorem) The sum ${\sum_{p \in {\mathcal P}} \frac{1}{p}}$ is divergent.

Proof: (Multiplicative number theory proof) By the fundamental theorem of arithmetic, every natural number is expressible uniquely as the product ${p_1^{a_1} \dots p_n^{a_n}}$ of primes in increasing order. In particular, we have the identity

$\displaystyle \sum_{n=1}^\infty \frac{1}{n} = \prod_{p \in {\mathcal P}} ( 1 + \frac{1}{p} + \frac{1}{p^2} + \dots )$

(both sides make sense in ${[0,+\infty]}$ as everything is unsigned). Since the left-hand side is divergent, the right-hand side is as well. But

$\displaystyle ( 1 + \frac{1}{p} + \frac{1}{p^2} + \dots ) = \exp( \frac{1}{p} + O( \frac{1}{p^2} ) )$

and ${\sum_{p \in {\mathcal P}} \frac{1}{p^2}\leq \sum_{n=1}^\infty \frac{1}{n^2} < \infty}$, so ${\sum_{p \in {\mathcal P}} \frac{1}{p}}$ must be divergent.

(Sieve-theoretic proof) Suppose for contradiction that the sum ${\sum_{p \in {\mathcal P}} \frac{1}{p}}$ is convergent. For each natural number ${k}$, let ${A_k}$ be the set of natural numbers not divisible by the first ${k}$ primes ${p_1,\dots,p_k}$, and let ${A}$ be the set of numbers not divisible by any prime in ${{\mathcal P}}$. As in the previous proof, each ${A_k}$ has density ${\prod_{i=1}^k (1-\frac{1}{p_i})}$. Also, since ${\{1,\dots,N\}}$ contains at most ${\frac{N}{p}}$ multiples of ${p}$, we have from the union bound that

$\displaystyle | A \cap \{1,\dots,N \}| = |A_k \cap \{1,\dots,N\}| - O( N \sum_{i > k} \frac{1}{p_i} ).$

Since ${\sum_{i=1}^\infty \frac{1}{p_i}}$ is assumed to be convergent, we conclude that the density of ${A_k}$ converges to the density of ${A}$; thus ${A}$ has density ${\prod_{i=1}^\infty (1-\frac{1}{p_i})}$, which is non-zero by the hypothesis that ${\sum_{i=1}^\infty \frac{1}{p_i}}$ converges. On the other hand, since the primes are the only numbers greater than one not divisible by smaller primes, ${A}$ is just ${\{1\}}$, which has density zero, giving the desired contradiction. $\Box$

Remark 2 We have seen how easy it is to prove Euler’s theorem by analytic methods. In contrast, there does not seem to be any known proof of this theorem that proceeds by using any sort of prime-generating formula or a primality test, which is further evidence that such tools are not the most effective way to make progress on problems such as Landau’s problems. (But the weaker theorem of Euclid, Theorem 1, can sometimes be proven by such devices.)

The two proofs of Theorem 2 given above are essentially the same proof, as is hinted at by the geometric series identity

$\displaystyle 1 + \frac{1}{p} + \frac{1}{p^2} + \dots = (1 - \frac{1}{p})^{-1}.$

One can also see the Riemann zeta function begin to make an appearance in both proofs. Once one goes beyond Euler’s theorem, though, the sieve-theoretic and multiplicative methods begin to diverge significantly. On one hand, sieve theory can still handle to some extent sets such as twin primes, despite the lack of multiplicative structure (one simply has to sieve out two residue classes per prime, rather than one); on the other, multiplicative number theory can attain results such as the prime number theorem for which purely sieve theoretic techniques have not been able to establish. The deepest results in analytic number theory will typically require a combination of both sieve-theoretic methods and multiplicative methods in conjunction with the many transforms discussed earlier (and, in many cases, additional inputs from other fields of mathematics such as arithmetic geometry, ergodic theory, or additive combinatorics).

The wave equation is usually expressed in the form

$\displaystyle \partial_{tt} u - \Delta u = 0$

where ${u \colon {\bf R} \times {\bf R}^d \rightarrow {\bf C}}$ is a function of both time ${t \in {\bf R}}$ and space ${x \in {\bf R}^d}$, with ${\Delta}$ being the Laplacian operator. One can generalise this equation in a number of ways, for instance by replacing the spatial domain ${{\bf R}^d}$ with some other manifold and replacing the Laplacian ${\Delta}$ with the Laplace-Beltrami operator or adding lower order terms (such as a potential, or a coupling with a magnetic field). But for sake of discussion let us work with the classical wave equation on ${{\bf R}^d}$. We will work formally in this post, being unconcerned with issues of convergence, justifying interchange of integrals, derivatives, or limits, etc.. One then has a conserved energy

$\displaystyle \int_{{\bf R}^d} \frac{1}{2} |\nabla u(t,x)|^2 + \frac{1}{2} |\partial_t u(t,x)|^2\ dx$

which we can rewrite using integration by parts and the ${L^2}$ inner product ${\langle, \rangle}$ on ${{\bf R}^d}$ as

$\displaystyle \frac{1}{2} \langle -\Delta u(t), u(t) \rangle + \frac{1}{2} \langle \partial_t u(t), \partial_t u(t) \rangle.$

A key feature of the wave equation is finite speed of propagation: if, at time ${t=0}$ (say), the initial position ${u(0)}$ and initial velocity ${\partial_t u(0)}$ are both supported in a ball ${B(x_0,R) := \{ x \in {\bf R}^d: |x-x_0| \leq R \}}$, then at any later time ${t>0}$, the position ${u(t)}$ and velocity ${\partial_t u(t)}$ are supported in the larger ball ${B(x_0,R+t)}$. This can be seen for instance (formally, at least) by inspecting the exterior energy

$\displaystyle \int_{|x-x_0| > R+t} \frac{1}{2} |\nabla u(t,x)|^2 + \frac{1}{2} |\partial_t u(t,x)|^2\ dx$

and observing (after some integration by parts and differentiation under the integral sign) that it is non-increasing in time, non-negative, and vanishing at time ${t=0}$.

The wave equation is second order in time, but one can turn it into a first order system by working with the pair ${(u(t),v(t))}$ rather than just the single field ${u(t)}$, where ${v(t) := \partial_t u(t)}$ is the velocity field. The system is then

$\displaystyle \partial_t u(t) = v(t)$

$\displaystyle \partial_t v(t) = \Delta u(t)$

and the conserved energy is now

$\displaystyle \frac{1}{2} \langle -\Delta u(t), u(t) \rangle + \frac{1}{2} \langle v(t), v(t) \rangle. \ \ \ \ \ (1)$

Finite speed of propagation then tells us that if ${u(0),v(0)}$ are both supported on ${B(x_0,R)}$, then ${u(t),v(t)}$ are supported on ${B(x_0,R+t)}$ for all ${t>0}$. One also has time reversal symmetry: if ${t \mapsto (u(t),v(t))}$ is a solution, then ${t \mapsto (u(-t), -v(-t))}$ is a solution also, thus for instance one can establish an analogue of finite speed of propagation for negative times ${t<0}$ using this symmetry.

If one has an eigenfunction

$\displaystyle -\Delta \phi = \lambda^2 \phi$

of the Laplacian, then we have the explicit solutions

$\displaystyle u(t) = e^{\pm it \lambda} \phi$

$\displaystyle v(t) = \pm i \lambda e^{\pm it \lambda} \phi$

of the wave equation, which formally can be used to construct all other solutions via the principle of superposition.

When one has vanishing initial velocity ${v(0)=0}$, the solution ${u(t)}$ is given via functional calculus by

$\displaystyle u(t) = \cos(t \sqrt{-\Delta}) u(0)$

and the propagator ${\cos(t \sqrt{-\Delta})}$ can be expressed as the average of half-wave operators:

$\displaystyle \cos(t \sqrt{-\Delta}) = \frac{1}{2} ( e^{it\sqrt{-\Delta}} + e^{-it\sqrt{-\Delta}} ).$

One can view ${\cos(t \sqrt{-\Delta} )}$ as a minor of the full wave propagator

$\displaystyle U(t) := \exp \begin{pmatrix} 0 & t \\ t\Delta & 0 \end{pmatrix}$

$\displaystyle = \begin{pmatrix} \cos(t \sqrt{-\Delta}) & \frac{\sin(t\sqrt{-\Delta})}{\sqrt{-\Delta}} \\ \sin(t\sqrt{-\Delta}) \sqrt{-\Delta} & \cos(t \sqrt{-\Delta} ) \end{pmatrix}$

which is unitary with respect to the energy form (1), and is the fundamental solution to the wave equation in the sense that

$\displaystyle \begin{pmatrix} u(t) \\ v(t) \end{pmatrix} = U(t) \begin{pmatrix} u(0) \\ v(0) \end{pmatrix}. \ \ \ \ \ (2)$

Viewing the contraction ${\cos(t\sqrt{-\Delta})}$ as a minor of a unitary operator is an instance of the “dilation trick“.

It turns out (as I learned from Yuval Peres) that there is a useful discrete analogue of the wave equation (and of all of the above facts), in which the time variable ${t}$ now lives on the integers ${{\bf Z}}$ rather than on ${{\bf R}}$, and the spatial domain can be replaced by discrete domains also (such as graphs). Formally, the system is now of the form

$\displaystyle u(t+1) = P u(t) + v(t) \ \ \ \ \ (3)$

$\displaystyle v(t+1) = P v(t) - (1-P^2) u(t)$

where ${t}$ is now an integer, ${u(t), v(t)}$ take values in some Hilbert space (e.g. ${\ell^2}$ functions on a graph ${G}$), and ${P}$ is some operator on that Hilbert space (which in applications will usually be a self-adjoint contraction). To connect this with the classical wave equation, let us first consider a rescaling of this system

$\displaystyle u(t+\varepsilon) = P_\varepsilon u(t) + \varepsilon v(t)$

$\displaystyle v(t+\varepsilon) = P_\varepsilon v(t) - \frac{1}{\varepsilon} (1-P_\varepsilon^2) u(t)$

where ${\varepsilon>0}$ is a small parameter (representing the discretised time step), ${t}$ now takes values in the integer multiples ${\varepsilon {\bf Z}}$ of ${\varepsilon}$, and ${P_\varepsilon}$ is the wave propagator operator ${P_\varepsilon := \cos( \varepsilon \sqrt{-\Delta} )}$ or the heat propagator ${P_\varepsilon := \exp( - \varepsilon^2 \Delta/2 )}$ (the two operators are different, but agree to fourth order in ${\varepsilon}$). One can then formally verify that the wave equation emerges from this rescaled system in the limit ${\varepsilon \rightarrow 0}$. (Thus, ${P}$ is not exactly the direct analogue of the Laplacian ${\Delta}$, but can be viewed as something like ${P_\varepsilon = 1 - \frac{\varepsilon^2}{2} \Delta + O( \varepsilon^4 )}$ in the case of small ${\varepsilon}$, or ${P = 1 - \frac{1}{2}\Delta + O(\Delta^2)}$ if we are not rescaling to the small ${\varepsilon}$ case. The operator ${P}$ is sometimes known as the diffusion operator)

Assuming ${P}$ is self-adjoint, solutions to the system (3) formally conserve the energy

$\displaystyle \frac{1}{2} \langle (1-P^2) u(t), u(t) \rangle + \frac{1}{2} \langle v(t), v(t) \rangle. \ \ \ \ \ (4)$

This energy is positive semi-definite if ${P}$ is a contraction. We have the same time reversal symmetry as before: if ${t \mapsto (u(t),v(t))}$ solves the system (3), then so does ${t \mapsto (u(-t), -v(-t))}$. If one has an eigenfunction

$\displaystyle P \phi = \cos(\lambda) \phi$

to the operator ${P}$, then one has an explicit solution

$\displaystyle u(t) = e^{\pm it \lambda} \phi$

$\displaystyle v(t) = \pm i \sin(\lambda) e^{\pm it \lambda} \phi$

to (3), and (in principle at least) this generates all other solutions via the principle of superposition.

Finite speed of propagation is a lot easier in the discrete setting, though one has to offset the support of the “velocity” field ${v}$ by one unit. Suppose we know that ${P}$ has unit speed in the sense that whenever ${f}$ is supported in a ball ${B(x,R)}$, then ${Pf}$ is supported in the ball ${B(x,R+1)}$. Then an easy induction shows that if ${u(0), v(0)}$ are supported in ${B(x_0,R), B(x_0,R+1)}$ respectively, then ${u(t), v(t)}$ are supported in ${B(x_0,R+t), B(x_0, R+t+1)}$.

The fundamental solution ${U(t) = U^t}$ to the discretised wave equation (3), in the sense of (2), is given by the formula

$\displaystyle U(t) = U^t = \begin{pmatrix} P & 1 \\ P^2-1 & P \end{pmatrix}^t$

$\displaystyle = \begin{pmatrix} T_t(P) & U_{t-1}(P) \\ (P^2-1) U_{t-1}(P) & T_t(P) \end{pmatrix}$

where ${T_t}$ and ${U_t}$ are the Chebyshev polynomials of the first and second kind, thus

$\displaystyle T_t( \cos \theta ) = \cos(t\theta)$

and

$\displaystyle U_t( \cos \theta ) = \frac{\sin((t+1)\theta)}{\sin \theta}.$

In particular, ${P}$ is now a minor of ${U(1) = U}$, and can also be viewed as an average of ${U}$ with its inverse ${U^{-1}}$:

$\displaystyle \begin{pmatrix} P & 0 \\ 0 & P \end{pmatrix} = \frac{1}{2} (U + U^{-1}). \ \ \ \ \ (5)$

As before, ${U}$ is unitary with respect to the energy form (4), so this is another instance of the dilation trick in action. The powers ${P^n}$ and ${U^n}$ are discrete analogues of the heat propagators ${e^{t\Delta/2}}$ and wave propagators ${U(t)}$ respectively.

One nice application of all this formalism, which I learned from Yuval Peres, is the Varopoulos-Carne inequality:

Theorem 1 (Varopoulos-Carne inequality) Let ${G}$ be a (possibly infinite) regular graph, let ${n \geq 1}$, and let ${x, y}$ be vertices in ${G}$. Then the probability that the simple random walk at ${x}$ lands at ${y}$ at time ${n}$ is at most ${2 \exp( - d(x,y)^2 / 2n )}$, where ${d}$ is the graph distance.

This general inequality is quite sharp, as one can see using the standard Cayley graph on the integers ${{\bf Z}}$. Very roughly speaking, it asserts that on a regular graph of reasonably controlled growth (e.g. polynomial growth), random walks of length ${n}$ concentrate on the ball of radius ${O(\sqrt{n})}$ or so centred at the origin of the random walk.

Proof: Let ${P \colon \ell^2(G) \rightarrow \ell^2(G)}$ be the graph Laplacian, thus

$\displaystyle Pf(x) = \frac{1}{D} \sum_{y \sim x} f(y)$

for any ${f \in \ell^2(G)}$, where ${D}$ is the degree of the regular graph and sum is over the ${D}$ vertices ${y}$ that are adjacent to ${x}$. This is a contraction of unit speed, and the probability that the random walk at ${x}$ lands at ${y}$ at time ${n}$ is

$\displaystyle \langle P^n \delta_x, \delta_y \rangle$

where ${\delta_x, \delta_y}$ are the Dirac deltas at ${x,y}$. Using (5), we can rewrite this as

$\displaystyle \langle (\frac{1}{2} (U + U^{-1}))^n \begin{pmatrix} 0 \\ \delta_x\end{pmatrix}, \begin{pmatrix} 0 \\ \delta_y\end{pmatrix} \rangle$

where we are now using the energy form (4). We can write

$\displaystyle (\frac{1}{2} (U + U^{-1}))^n = {\bf E} U^{S_n}$

where ${S_n}$ is the simple random walk of length ${n}$ on the integers, that is to say ${S_n = \xi_1 + \dots + \xi_n}$ where ${\xi_1,\dots,\xi_n = \pm 1}$ are independent uniform Bernoulli signs. Thus we wish to show that

$\displaystyle {\bf E} \langle U^{S_n} \begin{pmatrix} 0 \\ \delta_x\end{pmatrix}, \begin{pmatrix} 0 \\ \delta_y\end{pmatrix} \rangle \leq 2 \exp(-d(x,y)^2 / 2n ).$

By finite speed of propagation, the inner product here vanishes if ${|S_n| < d(x,y)}$. For ${|S_n| \geq d(x,y)}$ we can use Cauchy-Schwarz and the unitary nature of ${U}$ to bound the inner product by ${1}$. Thus the left-hand side may be upper bounded by

$\displaystyle {\bf P}( |S_n| \geq d(x,y) )$

and the claim now follows from the Chernoff inequality. $\Box$

This inequality has many applications, particularly with regards to relating the entropy, mixing time, and concentration of random walks with volume growth of balls; see this text of Lyons and Peres for some examples.

For sake of comparison, here is a continuous counterpart to the Varopoulos-Carne inequality:

Theorem 2 (Continuous Varopoulos-Carne inequality) Let ${t > 0}$, and let ${f,g \in L^2({\bf R}^d)}$ be supported on compact sets ${F,G}$ respectively. Then

$\displaystyle |\langle e^{t\Delta/2} f, g \rangle| \leq \sqrt{\frac{2t}{\pi d(F,G)^2}} \exp( - d(F,G)^2 / 2t ) \|f\|_{L^2} \|g\|_{L^2}$

where ${d(F,G)}$ is the Euclidean distance between ${F}$ and ${G}$.

Proof: By Fourier inversion one has

$\displaystyle e^{-t\xi^2/2} = \frac{1}{\sqrt{2\pi t}} \int_{\bf R} e^{-s^2/2t} e^{is\xi}\ ds$

$\displaystyle = \sqrt{\frac{2}{\pi t}} \int_0^\infty e^{-s^2/2t} \cos(s \xi )\ ds$

for any real ${\xi}$, and thus

$\displaystyle \langle e^{t\Delta/2} f, g\rangle = \sqrt{\frac{2}{\pi}} \int_0^\infty e^{-s^2/2t} \langle \cos(s \sqrt{-\Delta} ) f, g \rangle\ ds.$

By finite speed of propagation, the inner product ${\langle \cos(s \sqrt{-\Delta} ) f, g \rangle\ ds}$ vanishes when ${s < d(F,G)}$; otherwise, we can use Cauchy-Schwarz and the contractive nature of ${\cos(s \sqrt{-\Delta} )}$ to bound this inner product by ${\|f\|_{L^2} \|g\|_{L^2}}$. Thus

$\displaystyle |\langle e^{t\Delta/2} f, g\rangle| \leq \sqrt{\frac{2}{\pi t}} \|f\|_{L^2} \|g\|_{L^2} \int_{d(F,G)}^\infty e^{-s^2/2t}\ ds.$

Bounding ${e^{-s^2/2t}}$ by ${e^{-d(F,G)^2/2t} e^{-d(F,G) (s-d(F,G))/t}}$, we obtain the claim. $\Box$

Observe that the argument is quite general and can be applied for instance to other Riemannian manifolds than ${{\bf R}^d}$.