You are currently browsing the category archive for the ‘teaching’ category.

We will shortly turn to the complex-analytic approach to multiplicative number theory, which relies on the basic properties of complex analytic functions. In this supplement to the main notes, we quickly review the portions of complex analysis that we will be using in this course. We will not attempt a comprehensive review of this subject; for instance, we will completely neglect the conformal geometry or Riemann surface aspect of complex analysis, and we will also avoid using the various boundary convergence theorems for Taylor series or Dirichlet series (the latter type of result is traditionally utilised in multiplicative number theory, but I personally find them a little unintuitive to use, and will instead rely on a slightly different set of complex-analytic tools). We will also focus on the “local” structure of complex analytic functions, in particular adopting the philosophy that such functions behave locally like complex polynomials; the classical “global” theory of entire functions, while traditionally used in the theory of the Riemann zeta function, will be downplayed in these notes. On the other hand, we will play up the relationship between complex analysis and Fourier analysis, as we will incline to using the latter tool over the former in some of the subsequent material. (In the traditional approach to the subject, the Mellin transform is used in place of the Fourier transform, but we will not emphasise the role of the Mellin transform here.)

We begin by recalling the notion of a holomorphic function, which will later be shown to be essentially synonymous with that of a complex analytic function.

Definition 1 (Holomorphic function) Let ${\Omega}$ be an open subset of ${{\bf C}}$, and let ${f: \Omega \rightarrow {\bf C}}$ be a function. If ${z \in {\bf C}}$, we say that ${f}$ is complex differentiable at ${z}$ if the limit

$\displaystyle f'(z) := \lim_{h \rightarrow 0; h \in {\bf C} \backslash \{0\}} \frac{f(z+h)-f(z)}{h}$

exists, in which case we refer to ${f'(z)}$ as the (complex) derivative of ${f}$ at ${z}$. If ${f}$ is differentiable at every point ${z}$ of ${\Omega}$, and the derivative ${f': \Omega \rightarrow {\bf C}}$ is continuous, we say that ${f}$ is holomorphic on ${\Omega}$.

Exercise 2 Show that a function ${f: \Omega \rightarrow {\bf C}}$ is holomorphic if and only if the two-variable function ${(x,y) \mapsto f(x+iy)}$ is continuously differentiable on ${\{ (x,y) \in {\bf R}^2: x+iy \in \Omega\}}$ and obeys the Cauchy-Riemann equation

$\displaystyle \frac{\partial}{\partial x} f(x+iy) = \frac{1}{i} \frac{\partial}{\partial y} f(x+iy). \ \ \ \ \ (1)$

Basic examples of holomorphic functions include complex polynomials

$\displaystyle P(z) = a_n z^n + \dots + a_1 z + a_0$

as well as the complex exponential function

$\displaystyle \exp(z) := \sum_{n=0}^\infty \frac{z^n}{n!}$

which are holomorphic on the entire complex plane ${{\bf C}}$ (i.e., they are entire functions). The sum or product of two holomorphic functions is again holomorphic; the quotient of two holomorphic functions is holomorphic so long as the denominator is non-zero. Finally, the composition of two holomorphic functions is holomorphic wherever the composition is defined.

Exercise 3

• (i) Establish Euler’s formula

$\displaystyle \exp(x+iy) = e^x (\cos y + i \sin y)$

for all ${x,y \in {\bf R}}$. (Hint: it is a bit tricky to do this starting from the trigonometric definitions of sine and cosine; I recommend either using the Taylor series formulations of these functions instead, or alternatively relying on the ordinary differential equations obeyed by sine and cosine.)

• (ii) Show that every non-zero complex number ${z}$ has a complex logarithm ${\log(z)}$ such that ${\exp(\log(z))=z}$, and that this logarithm is unique up to integer multiples of ${2\pi i}$.
• (iii) Show that there exists a unique principal branch ${\hbox{Log}(z)}$ of the complex logarithm in the region ${{\bf C} \backslash (-\infty,0]}$, defined by requiring ${\hbox{Log}(z)}$ to be a logarithm of ${z}$ with imaginary part between ${-\pi}$ and ${\pi}$. Show that this principal branch is holomorphic with derivative ${1/z}$.

In real analysis, we have the fundamental theorem of calculus, which asserts that

$\displaystyle \int_a^b F'(t)\ dt = F(b) - F(a)$

whenever ${[a,b]}$ is a real interval and ${F: [a,b] \rightarrow {\bf R}}$ is a continuously differentiable function. The complex analogue of this fact is that

$\displaystyle \int_\gamma F'(z)\ dz = F(\gamma(1)) - F(\gamma(0)) \ \ \ \ \ (2)$

whenever ${F: \Omega \rightarrow {\bf C}}$ is a holomorphic function, and ${\gamma: [0,1] \rightarrow \Omega}$ is a contour in ${\Omega}$, by which we mean a piecewise continuously differentiable function, and the contour integral ${\int_\gamma f(z)\ dz}$ for a continuous function ${f}$ is defined via change of variables as

$\displaystyle \int_\gamma f(z)\ dz := \int_0^1 f(\gamma(t)) \gamma'(t)\ dt.$

The complex fundamental theorem of calculus (2) follows easily from the real fundamental theorem and the chain rule.

In real analysis, we have the rather trivial fact that the integral of a continuous function on a closed contour is always zero:

$\displaystyle \int_a^b f(t)\ dt + \int_b^a f(t)\ dt = 0.$

In complex analysis, the analogous fact is significantly more powerful, and is known as Cauchy’s theorem:

Theorem 4 (Cauchy’s theorem) Let ${f: \Omega \rightarrow {\bf C}}$ be a holomorphic function in a simply connected open set ${\Omega}$, and let ${\gamma: [0,1] \rightarrow \Omega}$ be a closed contour in ${\Omega}$ (thus ${\gamma(1)=\gamma(0)}$). Then ${\int_\gamma f(z)\ dz = 0}$.

Exercise 5 Use Stokes’ theorem to give a proof of Cauchy’s theorem.

A useful reformulation of Cauchy’s theorem is that of contour shifting: if ${f: \Omega \rightarrow {\bf C}}$ is a holomorphic function on a open set ${\Omega}$, and ${\gamma, \tilde \gamma}$ are two contours in an open set ${\Omega}$ with ${\gamma(0)=\tilde \gamma(0)}$ and ${\gamma(1) = \tilde \gamma(1)}$, such that ${\gamma}$ can be continuously deformed into ${\tilde \gamma}$, then ${\int_\gamma f(z)\ dz = \int_{\tilde \gamma} f(z)\ dz}$. A basic application of contour shifting is the Cauchy integral formula:

Theorem 6 (Cauchy integral formula) Let ${f: \Omega \rightarrow {\bf C}}$ be a holomorphic function in a simply connected open set ${\Omega}$, and let ${\gamma: [0,1] \rightarrow \Omega}$ be a closed contour which is simple (thus ${\gamma}$ does not traverse any point more than once, with the exception of the endpoint ${\gamma(0)=\gamma(1)}$ that is traversed twice), and which encloses a bounded region ${U}$ in the anticlockwise direction. Then for any ${z_0 \in U}$, one has

$\displaystyle \int_\gamma \frac{f(z)}{z-z_0}\ dz= 2\pi i f(z_0).$

Proof: Let ${\varepsilon > 0}$ be a sufficiently small quantity. By contour shifting, one can replace the contour ${\gamma}$ by the sum (concatenation) of three contours: a contour ${\rho}$ from ${\gamma(0)}$ to ${z_0+\varepsilon}$, a contour ${C_\varepsilon}$ traversing the circle ${\{z: |z-z_0|=\varepsilon\}}$ once anticlockwise, and the reversal ${-\rho}$ of the contour ${\rho}$ that goes from ${z_0+\varepsilon}$ to ${\gamma_0}$. The contributions of the contours ${\rho, -\rho}$ cancel each other, thus

$\displaystyle \int_\gamma \frac{f(z)}{z-z_0}\ dz = \int_{C_\varepsilon} \frac{f(z)}{z-z_0}\ dz.$

By a change of variables, the right-hand side can be expanded as

$\displaystyle 2\pi i \int_0^1 f(z_0 + \varepsilon e^{2\pi i t})\ dt.$

Sending ${\varepsilon \rightarrow 0}$, we obtain the claim. $\Box$

The Cauchy integral formula has many consequences. Specialising to the case when ${\gamma}$ traverses a circle ${\{ z: |z-z_0|=r\}}$ around ${z_0}$, we conclude the mean value property

$\displaystyle f(z_0) = \int_0^1 f(z_0 + re^{2\pi i t})\ dt \ \ \ \ \ (3)$

whenever ${f}$ is holomorphic in a neighbourhood of the disk ${\{ z: |z-z_0| \leq r \}}$. In a similar spirit, we have the maximum principle for holomorphic functions:

Lemma 7 (Maximum principle) Let ${\Omega}$ be a simply connected open set, and let ${\gamma}$ be a simple closed contour in ${\Omega}$ enclosing a bounded region ${U}$ anti-clockwise. Let ${f: \Omega \rightarrow {\bf C}}$ be a holomorphic function. If we have the bound ${|f(z)| \leq M}$ for all ${z}$ on the contour ${\gamma}$, then we also have the bound ${|f(z_0)| \leq M}$ for all ${z_0 \in U}$.

Proof: We use an argument of Landau. Fix ${z_0 \in U}$. From the Cauchy integral formula and the triangle inequality we have the bound

$\displaystyle |f(z_0)| \leq C_{z_0,\gamma} M$

for some constant ${C_{z_0,\gamma} > 0}$ depending on ${z_0}$ and ${\gamma}$. This ostensibly looks like a weaker bound than what we want, but we can miraculously make the constant ${C_{z_0,\gamma}}$ disappear by the “tensor power trick“. Namely, observe that if ${f}$ is a holomorphic function bounded in magnitude by ${M}$ on ${\gamma}$, and ${n}$ is a natural number, then ${f^n}$ is a holomorphic function bounded in magnitude by ${M^n}$ on ${\gamma}$. Applying the preceding argument with ${f, M}$ replaced by ${f^n, M^n}$ we conclude that

$\displaystyle |f(z_0)|^n \leq C_{z_0,\gamma} M^n$

and hence

$\displaystyle |f(z_0)| \leq C_{z_0,\gamma}^{1/n} M.$

Sending ${n \rightarrow \infty}$, we obtain the claim. $\Box$

Another basic application of the integral formula is

Corollary 8 Every holomorphic function ${f: \Omega \rightarrow {\bf C}}$ is complex analytic, thus it has a convergent Taylor series around every point ${z_0}$ in the domain. In particular, holomorphic functions are smooth, and the derivative of a holomorphic function is again holomorphic.

Conversely, it is easy to see that complex analytic functions are holomorphic. Thus, the terms “complex analytic” and “holomorphic” are synonymous, at least when working on open domains. (On a non-open set ${\Omega}$, saying that ${f}$ is analytic on ${\Omega}$ is equivalent to asserting that ${f}$ extends to a holomorphic function of an open neighbourhood of ${\Omega}$.) This is in marked contrast to real analysis, in which a function can be continuously differentiable, or even smooth, without being real analytic.

Proof: By translation, we may suppose that ${z_0=0}$. Let ${C_r}$ be a a contour traversing the circle ${\{ z: |z|=r\}}$ that is contained in the domain ${\Omega}$, then by the Cauchy integral formula one has

$\displaystyle f(z) = \frac{1}{2\pi i} \int_{C_r} \frac{f(w)}{w-z}\ dw$

for all ${z}$ in the disk ${\{ z: |z| < r \}}$. As ${f}$ is continuously differentiable (and hence continuous) on ${C_r}$, it is bounded. From the geometric series formula

$\displaystyle \frac{1}{w-z} = \frac{1}{w} + \frac{1}{w^2} z + \frac{1}{w^3} z^2 + \dots$

and dominated convergence, we conclude that

$\displaystyle f(z) = \sum_{n=0}^\infty (\frac{1}{2\pi i} \int_{C_r} \frac{f(w)}{w^{n+1}}\ dw) z^n$

with the right-hand side an absolutely convergent series for ${|z| < r}$, and the claim follows. $\Box$

Exercise 9 Establish the generalised Cauchy integral formulae

$\displaystyle f^{(k)}(z_0) = \frac{k!}{2\pi i} \int_\gamma \frac{f(z)}{(z-z_0)^{k+1}}\ dz$

for any non-negative integer ${k}$, where ${f^{(k)}}$ is the ${k}$-fold complex derivative of ${f}$.

This in turn leads to a converse to Cauchy’s theorem, known as Morera’s theorem:

Corollary 10 (Morera’s theorem) Let ${f: \Omega \rightarrow {\bf C}}$ be a continuous function on an open set ${\Omega}$ with the property that ${\int_\gamma f(z)\ dz = 0}$ for all closed contours ${\gamma: [0,1] \rightarrow \Omega}$. Then ${f}$ is holomorphic.

Proof: We can of course assume ${\Omega}$ to be non-empty and connected (hence path-connected). Fix a point ${z_0 \in \Omega}$, and define a “primitive” ${F: \Omega \rightarrow {\bf C}}$ of ${f}$ by defining ${F(z_1) = \int_\gamma f(z)\ dz}$, with ${\gamma: [0,1] \rightarrow \Omega}$ being any contour from ${z_0}$ to ${z_1}$ (this is well defined by hypothesis). By mimicking the proof of the real fundamental theorem of calculus, we see that ${F}$ is holomorphic with ${F'=f}$, and the claim now follows from Corollary 8. $\Box$

An important consequence of Morera’s theorem for us is

Corollary 11 (Locally uniform limit of holomorphic functions is holomorphic) Let ${f_n: \Omega \rightarrow {\bf C}}$ be holomorphic functions on an open set ${\Omega}$ which converge locally uniformly to a function ${f: \Omega \rightarrow {\bf C}}$. Then ${f}$ is also holomorphic on ${\Omega}$.

Proof: By working locally we may assume that ${\Omega}$ is a ball, and in particular simply connected. By Cauchy’s theorem, ${\int_\gamma f_n(z)\ dz = 0}$ for all closed contours ${\gamma}$ in ${\Omega}$. By local uniform convergence, this implies that ${\int_\gamma f(z)\ dz = 0}$ for all such contours, and the claim then follows from Morera’s theorem. $\Box$

Now we study the zeroes of complex analytic functions. If a complex analytic function ${f}$ vanishes at a point ${z_0}$, but is not identically zero in a neighbourhood of that point, then by Taylor expansion we see that ${f}$ factors in a sufficiently small neighbourhood of ${z_0}$ as

$\displaystyle f(z) = (z-z_0)^n g(z_0) \ \ \ \ \ (4)$

for some natural number ${n}$ (which we call the order or multiplicity of the zero at ${f}$) and some function ${g}$ that is complex analytic and non-zero near ${z_0}$; this generalises the factor theorem for polynomials. In particular, the zero ${z_0}$ is isolated if ${f}$ does not vanish identically near ${z_0}$. We conclude that if ${\Omega}$ is connected and ${f}$ vanishes on a neighbourhood of some point ${z_0}$ in ${\Omega}$, then it must vanish on all of ${\Omega}$ (since the maximal connected neighbourhood of ${z_0}$ in ${\Omega}$ on which ${f}$ vanishes cannot have any boundary point in ${\Omega}$). This implies unique continuation of analytic functions: if two complex analytic functions on ${\Omega}$ agree on a non-empty open set, then they agree everywhere. In particular, if a complex analytic function does not vanish everywhere, then all of its zeroes are isolated, so in particular it has only finitely many zeroes on any given compact set.

Recall that a rational function is a function ${f}$ which is a quotient ${g/h}$ of two polynomials (at least outside of the set where ${h}$ vanishes). Analogously, let us define a meromorphic function on an open set ${\Omega}$ to be a function ${f: \Omega \backslash S \rightarrow {\bf C}}$ defined outside of a discrete subset ${S}$ of ${\Omega}$ (the singularities of ${f}$), which is locally the quotient ${g/h}$ of holomorphic functions, in the sense that for every ${z_0 \in \Omega}$, one has ${f=g/h}$ in a neighbourhood of ${z_0}$ excluding ${S}$, with ${g, h}$ holomorphic near ${z_0}$ and with ${h}$ non-vanishing outside of ${S}$. If ${z_0 \in S}$ and ${g}$ has a zero of equal or higher order than ${h}$ at ${z_0}$, then the singularity is removable and one can extend the meromorphic function holomorphically across ${z_0}$ (by the holomorphic factor theorem (4)); otherwise, the singularity is non-removable and is known as a pole, whose order is equal to the difference between the order of ${h}$ and the order of ${g}$ at ${z_0}$. (If one wished, one could extend meromorphic functions to the poles by embedding ${{\bf C}}$ in the Riemann sphere ${{\bf C} \cup \{\infty\}}$ and mapping each pole to ${\infty}$, but we will not do so here. One could also consider non-meromorphic functions with essential singularities at various points, but we will have no need to analyse such singularities in this course.) If the order of a pole or zero is one, we say that it is simple; if it is two, we say it is double; and so forth.

Exercise 12 Show that the space of meromorphic functions on a non-empty open set ${\Omega}$, quotiented by almost everywhere equivalence, forms a field.

By quotienting two Taylor series, we see that if a meromorphic function ${f}$ has a pole of order ${n}$ at some point ${z_0}$, then it has a Laurent expansion

$\displaystyle f = \sum_{m=-n}^\infty a_m (z-z_0)^m,$

absolutely convergent in a neighbourhood of ${z_0}$ excluding ${z_0}$ itself, and with ${a_{-n}}$ non-zero. The Laurent coefficient ${a_{-1}}$ has a special significance, and is called the residue of the meromorphic function ${f}$ at ${z_0}$, which we will denote as ${\hbox{Res}(f;z_0)}$. The importance of this coefficient comes from the following significant generalisation of the Cauchy integral formula, known as the residue theorem:

Exercise 13 (Residue theorem) Let ${f}$ be a meromorphic function on a simply connected domain ${\Omega}$, and let ${\gamma}$ be a closed contour in ${\Omega}$ enclosing a bounded region ${U}$ anticlockwise, and avoiding all the singularities of ${f}$. Show that

$\displaystyle \int_\gamma f(z)\ dz = 2\pi i \sum_\rho \hbox{Res}(f;\rho)$

where ${\rho}$ is summed over all the poles of ${f}$ that lie in ${U}$.

The residue theorem is particularly useful when applied to logarithmic derivatives ${f'/f}$ of meromorphic functions ${f}$, because the residue is of a specific form:

Exercise 14 Let ${f}$ be a meromorphic function on an open set ${\Omega}$ that does not vanish identically. Show that the only poles of ${f'/f}$ are simple poles (poles of order ${1}$), occurring at the poles and zeroes of ${f}$ (after all removable singularities have been removed). Furthermore, the residue of ${f'/f}$ at a pole ${z_0}$ is an integer, equal to the order of zero of ${f}$ if ${f}$ has a zero at ${z_0}$, or equal to negative the order of pole at ${f}$ if ${f}$ has a pole at ${z_0}$.

Remark 15 The fact that residues of logarithmic derivatives of meromorphic functions are automatically integers is a remarkable feature of the complex analytic approach to multiplicative number theory, which is difficult (though not entirely impossible) to duplicate in other approaches to the subject. Here is a sample application of this integrality, which is challenging to reproduce by non-complex-analytic means: if ${f}$ is meromorphic near ${z_0}$, and one has the bound ${|\frac{f'}{f}(z_0+t)| \leq \frac{0.9}{t} + O(1)}$ as ${t \rightarrow 0^+}$, then ${\frac{f'}{f}}$ must in fact stay bounded near ${z_0}$, because the only integer of magnitude less than ${0.9}$ is zero.

Analytic number theory is only one of many different approaches to number theory. Another important branch of the subject is algebraic number theory, which studies algebraic structures (e.g. groups, rings, and fields) of number-theoretic interest. With this perspective, the classical field of rationals ${{\bf Q}}$, and the classical ring of integers ${{\bf Z}}$, are placed inside the much larger field ${\overline{{\bf Q}}}$ of algebraic numbers, and the much larger ring ${{\mathcal A}}$ of algebraic integers, respectively. Recall that an algebraic number is a root of a polynomial with integer coefficients, and an algebraic integer is a root of a monic polynomial with integer coefficients; thus for instance ${\sqrt{2}}$ is an algebraic integer (a root of ${x^2-2}$), while ${\sqrt{2}/2}$ is merely an algebraic number (a root of ${4x^2-2}$). For the purposes of this post, we will adopt the concrete (but somewhat artificial) perspective of viewing algebraic numbers and integers as lying inside the complex numbers ${{\bf C}}$, thus ${{\mathcal A} \subset \overline{{\bf Q}} \subset {\bf C}}$. (From a modern algebraic perspective, it is better to think of ${\overline{{\bf Q}}}$ as existing as an abstract field separate from ${{\bf C}}$, but which has a number of embeddings into ${{\bf C}}$ (as well as into other fields, such as the completed p-adics ${{\bf C}_p}$), no one of which should be considered favoured over any other; cf. this mathOverflow post. But for the rudimentary algebraic number theory in this post, we will not need to work at this level of abstraction.) In particular, we identify the algebraic integer ${\sqrt{-d}}$ with the complex number ${\sqrt{d} i}$ for any natural number ${d}$.

Exercise 1 Show that the field of algebraic numbers ${\overline{{\bf Q}}}$ is indeed a field, and that the ring of algebraic integers ${{\mathcal A}}$ is indeed a ring, and is in fact an integral domain. Also, show that ${{\bf Z} = {\mathcal A} \cap {\bf Q}}$, that is to say the ordinary integers are precisely the algebraic integers that are also rational. Because of this, we will sometimes refer to elements of ${{\bf Z}}$ as rational integers.

In practice, the field ${\overline{{\bf Q}}}$ is too big to conveniently work with directly, having infinite dimension (as a vector space) over ${{\bf Q}}$. Thus, algebraic number theory generally restricts attention to intermediate fields ${{\bf Q} \subset F \subset \overline{{\bf Q}}}$ between ${{\bf Q}}$ and ${\overline{{\bf Q}}}$, which are of finite dimension over ${{\bf Q}}$; that is to say, finite degree extensions of ${{\bf Q}}$. Such fields are known as algebraic number fields, or number fields for short. Apart from ${{\bf Q}}$ itself, the simplest examples of such number fields are the quadratic fields, which have dimension exactly two over ${{\bf Q}}$.

Exercise 2 Show that if ${\alpha}$ is a rational number that is not a perfect square, then the field ${{\bf Q}(\sqrt{\alpha})}$ generated by ${{\bf Q}}$ and either of the square roots of ${\alpha}$ is a quadratic field. Conversely, show that all quadratic fields arise in this fashion. (Hint: show that every element of a quadratic field is a root of a quadratic polynomial over the rationals.)

The ring of algebraic integers ${{\mathcal A}}$ is similarly too large to conveniently work with directly, so in algebraic number theory one usually works with the rings ${{\mathcal O}_F := {\mathcal A} \cap F}$ of algebraic integers inside a given number field ${F}$. One can (and does) study this situation in great generality, but for the purposes of this post we shall restrict attention to a simple but illustrative special case, namely the quadratic fields with a certain type of negative discriminant. (The positive discriminant case will be briefly discussed in Remark 42 below.)

Exercise 3 Let ${d}$ be a square-free natural number with ${d=1\ (4)}$ or ${d=2\ (4)}$. Show that the ring ${{\mathcal O} = {\mathcal O}_{{\bf Q}(\sqrt{-d})}}$ of algebraic integers in ${{\bf Q}(\sqrt{-d})}$ is given by

$\displaystyle {\mathcal O} = {\bf Z}[\sqrt{-d}] = \{ a + b \sqrt{-d}: a,b \in {\bf Z} \}.$

If instead ${d}$ is square-free with ${d=3\ (4)}$, show that the ring ${{\mathcal O} = {\mathcal O}_{{\bf Q}(\sqrt{-d})}}$ is instead given by

$\displaystyle {\mathcal O} = {\bf Z}[\frac{1+\sqrt{-d}}{2}] = \{ a + b \frac{1+\sqrt{-d}}{2}: a,b \in {\bf Z} \}.$

What happens if ${d}$ is not square-free, or negative?

Remark 4 In the case ${d=3\ (4)}$, it may naively appear more natural to work with the ring ${{\bf Z}[\sqrt{-d}]}$, which is an index two subring of ${{\mathcal O}}$. However, because this ring only captures some of the algebraic integers in ${{\bf Q}(\sqrt{-d})}$ rather than all of them, the algebraic properties of these rings are somewhat worse than those of ${{\mathcal O}}$ (in particular, they generally fail to be Dedekind domains) and so are not convenient to work with in algebraic number theory.

We refer to fields of the form ${{\bf Q}(\sqrt{-d})}$ for natural square-free numbers ${d}$ as quadratic fields of negative discriminant, and similarly refer to ${{\mathcal O}_{{\bf Q}(\sqrt{-d})}}$ as a ring of quadratic integers of negative discriminant. Quadratic fields and quadratic integers of positive discriminant are just as important to analytic number theory as their negative discriminant counterparts, but we will restrict attention to the latter here for simplicity of discussion.

Thus, for instance, when ${d=1}$, the ring of integers in ${{\bf Q}(\sqrt{-1})}$ is the ring of Gaussian integers

$\displaystyle {\bf Z}[\sqrt{-1}] = \{ x + y \sqrt{-1}: x,y \in {\bf Z} \}$

and when ${d=3}$, the ring of integers in ${{\bf Q}(\sqrt{-3})}$ is the ring of Eisenstein integers

$\displaystyle {\bf Z}[\omega] := \{ x + y \omega: x,y \in {\bf Z} \}$

where ${\omega := e^{2\pi i /3}}$ is a cube root of unity.

As these examples illustrate, the additive structure of a ring ${{\mathcal O} = {\mathcal O}_{{\bf Q}(\sqrt{-d})}}$ of quadratic integers is that of a two-dimensional lattice in ${{\bf C}}$, which is isomorphic as an additive group to ${{\bf Z}^2}$. Thus, from an additive viewpoint, one can view quadratic integers as “two-dimensional” analogues of rational integers. From a multiplicative viewpoint, however, the quadratic integers (and more generally, integers in a number field) behave very similarly to the rational integers (as opposed to being some sort of “higher-dimensional” version of such integers). Indeed, a large part of basic algebraic number theory is devoted to treating the multiplicative theory of integers in number fields in a unified fashion, that naturally generalises the classical multiplicative theory of the rational integers.

For instance, every rational integer ${n \in {\bf Z}}$ has an absolute value ${|n| \in {\bf N} \cup \{0\}}$, with the multiplicativity property ${|nm| = |n| |m|}$ for ${n,m \in {\bf Z}}$, and the positivity property ${|n| > 0}$ for all ${n \neq 0}$. Among other things, the absolute value detects units: ${|n| = 1}$ if and only if ${n}$ is a unit in ${{\bf Z}}$ (that is to say, it is multiplicatively invertible in ${{\bf Z}}$). Similarly, in any ring of quadratic integers ${{\mathcal O} = {\mathcal O}_{{\bf Q}(\sqrt{-d})}}$ with negative discriminant, we can assign a norm ${N(n) \in {\bf N} \cup \{0\}}$ to any quadratic integer ${n \in {\mathcal O}_{{\bf Q}(\sqrt{-d})}}$ by the formula

$\displaystyle N(n) = n \overline{n}$

where ${\overline{n}}$ is the complex conjugate of ${n}$. (When working with other number fields than quadratic fields of negative discriminant, one instead defines ${N(n)}$ to be the product of all the Galois conjugates of ${n}$.) Thus for instance, when ${d=1,2\ (4)}$ one has

$\displaystyle N(x + y \sqrt{-d}) = x^2 + dy^2 \ \ \ \ \ (1)$

and when ${d=3\ (4)}$ one has

$\displaystyle N(x + y \frac{1+\sqrt{-d}}{2}) = x^2 + xy + \frac{d+1}{4} y^2. \ \ \ \ \ (2)$

Analogously to the rational integers, we have the multiplicativity property ${N(nm) = N(n) N(m)}$ for ${n,m \in {\mathcal O}}$ and the positivity property ${N(n) > 0}$ for ${n \neq 0}$, and the units in ${{\mathcal O}}$ are precisely the elements of norm one.

Exercise 5 Establish the three claims of the previous paragraph. Conclude that the units (invertible elements) of ${{\mathcal O}}$ consist of the four elements ${\pm 1, \pm i}$ if ${d=1}$, the six elements ${\pm 1, \pm \omega, \pm \omega^2}$ if ${d=3}$, and the two elements ${\pm 1}$ if ${d \neq 1,3}$.

For the rational integers, we of course have the fundamental theorem of arithmetic, which asserts that every non-zero rational integer can be uniquely factored (up to permutation and units) as the product of irreducible integers, that is to say non-zero, non-unit integers that cannot be factored into the product of integers of strictly smaller norm. As it turns out, the same claim is true for a few additional rings of quadratic integers, such as the Gaussian integers and Eisenstein integers, but fails in general; for instance, in the ring ${{\bf Z}[\sqrt{-5}]}$, we have the famous counterexample

$\displaystyle 6 = 2 \times 3 = (1+\sqrt{-5}) (1-\sqrt{-5})$

that decomposes ${6}$ non-uniquely into the product of irreducibles in ${{\bf Z}[\sqrt{-5}]}$. Nevertheless, it is an important fact that the fundamental theorem of arithmetic can be salvaged if one uses an “idealised” notion of a number in a ring of integers ${{\mathcal O}}$, now known in modern language as an ideal of that ring. For instance, in ${{\bf Z}[\sqrt{-5}]}$, the principal ideal ${(6)}$ turns out to uniquely factor into the product of (non-principal) ideals ${(2) + (1+\sqrt{-5}), (2) + (1-\sqrt{-5}), (3) + (1+\sqrt{-5}), (3) + (1-\sqrt{-5})}$; see Exercise 27. We will review the basic theory of ideals in number fields (focusing primarily on quadratic fields of negative discriminant) below the fold.

The norm forms (1), (2) can be viewed as examples of positive definite quadratic forms ${Q: {\bf Z}^2 \rightarrow {\bf Z}}$ over the integers, by which we mean a polynomial of the form

$\displaystyle Q(x,y) = ax^2 + bxy + cy^2$

for some integer coefficients ${a,b,c}$. One can declare two quadratic forms ${Q, Q': {\bf Z}^2 \rightarrow {\bf Z}}$ to be equivalent if one can transform one to the other by an invertible linear transformation ${T: {\bf Z}^2 \rightarrow {\bf Z}^2}$, so that ${Q' = Q \circ T}$. For example, the quadratic forms ${(x,y) \mapsto x^2 + y^2}$ and ${(x',y') \mapsto 2 (x')^2 + 2 x' y' + (y')^2}$ are equivalent, as can be seen by using the invertible linear transformation ${(x,y) = (x',x'+y')}$. Such equivalences correspond to the different choices of basis available when expressing a ring such as ${{\mathcal O}}$ (or an ideal thereof) additively as a copy of ${{\bf Z}^2}$.

There is an important and classical invariant of a quadratic form ${(x,y) \mapsto ax^2 + bxy + c y^2}$, namely the discriminant ${\Delta := b^2 - 4ac}$, which will of course be familiar to most readers via the quadratic formula, which among other things tells us that a quadratic form will be positive definite precisely when its discriminant is negative. It is not difficult (particularly if one exploits the multiplicativity of the determinant of ${2 \times 2}$ matrices) to show that two equivalent quadratic forms have the same discriminant. Thus for instance any quadratic form equivalent to (1) has discriminant ${-4d}$, while any quadratic form equivalent to (2) has discriminant ${-d}$. Thus we see that each ring ${{\mathcal O}[\sqrt{-d}]}$ of quadratic integers is associated with a certain negative discriminant ${D}$, defined to equal ${-4d}$ when ${d=1,2\ (4)}$ and ${-d}$ when ${d=3\ (4)}$.

Exercise 6 (Geometric interpretation of discriminant) Let ${Q: {\bf Z}^2 \rightarrow {\bf Z}}$ be a quadratic form of negative discriminant ${D}$, and extend it to a real form ${Q: {\bf R}^2 \rightarrow {\bf R}}$ in the obvious fashion. Show that for any ${X>0}$, the set ${\{ (x,y) \in {\bf R}^2: Q(x,y) \leq X \}}$ is an ellipse of area ${2\pi X / \sqrt{|D|}}$.

It is natural to ask the converse question: if two quadratic forms have the same discriminant, are they necessarily equivalent? For certain choices of discriminant, this is the case:

Exercise 7 Show that any quadratic form ${ax^2+bxy+cy^2}$ of discriminant ${-4}$ is equivalent to the form ${x^2+y^2}$, and any quadratic form of discriminant ${-3}$ is equivalent to ${x^2+xy+y^2}$. (Hint: use elementary transformations to try to make ${|b|}$ as small as possible, to the point where one only has to check a finite number of cases; this argument is due to Legendre.) More generally, show that for any negative discriminant ${D}$, there are only finitely many quadratic forms of that discriminant up to equivalence (a result first established by Gauss).

Unfortunately, for most choices of discriminant, the converse question fails; for instance, the quadratic forms ${x^2+5y^2}$ and ${2x^2+2xy+3y^2}$ both have discriminant ${-20}$, but are not equivalent (Exercise 38). This particular failure of equivalence turns out to be intimately related to the failure of unique factorisation in the ring ${{\bf Z}[\sqrt{-5}]}$.

It turns out that there is a fundamental connection between quadratic fields, equivalence classes of quadratic forms of a given discriminant, and real Dirichlet characters, thus connecting the material discussed above with the last section of the previous set of notes. Here is a typical instance of this connection:

Proposition 8 Let ${\chi_4: {\bf N} \rightarrow {\bf R}}$ be the real non-principal Dirichlet character of modulus ${4}$, or more explicitly ${\chi_4(n)}$ is equal to ${+1}$ when ${n = 1\ (4)}$, ${-1}$ when ${n = 3\ (4)}$, and ${0}$ when ${n = 0,2\ (4)}$.

• (i) For any natural number ${n}$, the number of Gaussian integers ${m \in {\bf Z}[\sqrt{-1}]}$ with norm ${N(m)=n}$ is equal to ${4(1 * \chi_4)(n)}$. Equivalently, the number of solutions to the equation ${n = x^2+y^2}$ with ${x,y \in{\bf Z}}$ is ${4(1*\chi_4)(n)}$. (Here, as in the previous post, the symbol ${*}$ denotes Dirichlet convolution.)
• (ii) For any natural number ${n}$, the number of Gaussian integers ${m \in {\bf Z}[\sqrt{-1}]}$ that divide ${n}$ (thus ${n = dm}$ for some ${d \in {\bf Z}[\sqrt{-1}]}$) is ${4(1*1*1*\mu\chi_4)(n)}$.

We will prove this proposition later in these notes. We observe that as a special case of part (i) of this proposition, we recover the Fermat two-square theorem: an odd prime ${p}$ is expressible as the sum of two squares if and only if ${p = 1\ (4)}$. This proposition should also be compared with the fact, used crucially in the previous post to prove Dirichlet’s theorem, that ${1*\chi(n)}$ is non-negative for any ${n}$, and at least one when ${n}$ is a square, for any quadratic character ${\chi}$.

As an illustration of the relevance of such connections to analytic number theory, let us now explicitly compute ${L(1,\chi_4)}$.

Corollary 9 ${L(1,\chi_4) = \frac{\pi}{4}}$.

This particular identity is also known as the Leibniz formula.

Proof: For a large number ${x}$, consider the quantity

$\displaystyle \sum_{n \in {\bf Z}[\sqrt{-1}]: N(n) \leq x} 1$

of all the Gaussian integers of norm less than ${x}$. On the one hand, this is the same as the number of lattice points of ${{\bf Z}^2}$ in the disk ${\{ (a,b) \in {\bf R}^2: a^2+b^2 \leq x \}}$ of radius ${\sqrt{x}}$. Placing a unit square centred at each such lattice point, we obtain a region which differs from the disk by a region contained in an annulus of area ${O(\sqrt{x})}$. As the area of the disk is ${\pi x}$, we conclude the Gauss bound

$\displaystyle \sum_{n \in {\bf Z}[\sqrt{-1}]: N(n) \leq x} 1 = \pi x + O(\sqrt{x}).$

On the other hand, by Proposition 8(i) (and removing the ${n=0}$ contribution), we see that

$\displaystyle \sum_{n \in {\bf Z}[\sqrt{-1}]: N(n) \leq x} 1 = 1 + 4 \sum_{n \leq x} 1 * \chi_4(n).$

Now we use the Dirichlet hyperbola method to expand the right-hand side sum, first expressing

$\displaystyle \sum_{n \leq x} 1 * \chi_4(n) = \sum_{d \leq \sqrt{x}} \chi_4(d) \sum_{m \leq x/d} 1 + \sum_{m \leq \sqrt{x}} \sum_{d \leq x/m} \chi_4(d)$

$\displaystyle - (\sum_{d \leq \sqrt{x}} \chi_4(d)) (\sum_{m \leq \sqrt{x}} 1)$

and then using the bounds ${\sum_{d \leq y} \chi_4(d) = O(1)}$, ${\sum_{m \leq y} 1 = y + O(1)}$, ${\sum_{d \leq \sqrt{x}} \frac{\chi_4(d)}{d} = L(1,\chi_4) + O(\frac{1}{\sqrt{x}})}$ from the previous set of notes to conclude that

$\displaystyle \sum_{n \leq x} 1 * \chi_4(n) = x L(1,\chi_4) + O(\sqrt{x}).$

Comparing the two formulae for ${\sum_{n \in {\bf Z}[\sqrt{-1}]: N(n) \leq x} 1}$ and sending ${x \rightarrow \infty}$, we obtain the claim. $\Box$

Exercise 10 Give an alternate proof of Corollary 9 that relies on obtaining asymptotics for the Dirichlet series ${\sum_{n \in {\bf Z}} \frac{1 * \chi_4(n)}{n^s}}$ as ${s \rightarrow 1^+}$, rather than using the Dirichlet hyperbola method.

Exercise 11 Give a direct proof of Corollary 9 that does not use Proposition 8, instead using Taylor expansion of the complex logarithm ${\log(1+z)}$. (One can also use Taylor expansions of some other functions related to the complex logarithm here, such as the arctangent function.)

More generally, one can relate ${L(1,\chi)}$ for a real Dirichlet character ${\chi}$ with the number of inequivalent quadratic forms of a certain discriminant, via the famous class number formula; we will give a special case of this formula below the fold.

The material here is only a very rudimentary introduction to algebraic number theory, and is not essential to the rest of the course. A slightly expanded version of the material here, from the perspective of analytic number theory, may be found in Sections 5 and 6 of Davenport’s book. A more in-depth treatment of algebraic number theory may be found in a number of texts, e.g. Fröhlich and Taylor.

In analytic number theory, an arithmetic function is simply a function ${f: {\bf N} \rightarrow {\bf C}}$ from the natural numbers ${{\bf N} = \{1,2,3,\dots\}}$ to the real or complex numbers. (One occasionally also considers arithmetic functions taking values in more general rings than ${{\bf R}}$ or ${{\bf C}}$, as in this previous blog post, but we will restrict attention here to the classical situation of real or complex arithmetic functions.) Experience has shown that a particularly tractable and relevant class of arithmetic functions for analytic number theory are the multiplicative functions, which are arithmetic functions ${f: {\bf N} \rightarrow {\bf C}}$ with the additional property that

$\displaystyle f(nm) = f(n) f(m) \ \ \ \ \ (1)$

whenever ${n,m \in{\bf N}}$ are coprime. (One also considers arithmetic functions, such as the logarithm function ${L(n) := \log n}$ or the von Mangoldt function, that are not genuinely multiplicative, but interact closely with multiplicative functions, and can be viewed as “derived” versions of multiplicative functions; see this previous post.) A typical example of a multiplicative function is the divisor function

$\displaystyle \tau(n) := \sum_{d|n} 1 \ \ \ \ \ (2)$

that counts the number of divisors of a natural number ${n}$. (The divisor function ${n \mapsto \tau(n)}$ is also denoted ${n \mapsto d(n)}$ in the literature.) The study of asymptotic behaviour of multiplicative functions (and their relatives) is known as multiplicative number theory, and is a basic cornerstone of modern analytic number theory.

There are various approaches to multiplicative number theory, each of which focuses on different asymptotic statistics of arithmetic functions ${f}$. In elementary multiplicative number theory, which is the focus of this set of notes, particular emphasis is given on the following two statistics of a given arithmetic function ${f: {\bf N} \rightarrow {\bf C}}$:

1. The summatory functions

$\displaystyle \sum_{n \leq x} f(n)$

of an arithmetic function ${f}$, as well as the associated natural density

$\displaystyle \lim_{x \rightarrow \infty} \frac{1}{x} \sum_{n \leq x} f(n)$

(if it exists).

2. The logarithmic sums

$\displaystyle \sum_{n\leq x} \frac{f(n)}{n}$

of an arithmetic function ${f}$, as well as the associated logarithmic density

$\displaystyle \lim_{x \rightarrow \infty} \frac{1}{\log x} \sum_{n \leq x} \frac{f(n)}{n}$

(if it exists).

Here, we are normalising the arithmetic function ${f}$ being studied to be of roughly unit size up to logarithms, obeying bounds such as ${f(n)=O(1)}$, ${f(n) = O(\log^{O(1)} n)}$, or at worst

$\displaystyle f(n) = O(n^{o(1)}). \ \ \ \ \ (3)$

A classical case of interest is when ${f}$ is an indicator function ${f=1_A}$ of some set ${A}$ of natural numbers, in which case we also refer to the natural or logarithmic density of ${f}$ as the natural or logarithmic density of ${A}$ respectively. However, in analytic number theory it is usually more convenient to replace such indicator functions with other related functions that have better multiplicative properties. For instance, the indicator function ${1_{\mathcal P}}$ of the primes is often replaced with the von Mangoldt function ${\Lambda}$.

Typically, the logarithmic sums are relatively easy to control, but the summatory functions require more effort in order to obtain satisfactory estimates; see Exercise 7 below.

If an arithmetic function ${f}$ is multiplicative (or closely related to a multiplicative function), then there is an important further statistic on an arithmetic function ${f}$ beyond the summatory function and the logarithmic sum, namely the Dirichlet series

$\displaystyle {\mathcal D}f(s) := \sum_{n=1}^\infty \frac{f(n)}{n^s} \ \ \ \ \ (4)$

for various real or complex numbers ${s}$. Under the hypothesis (3), this series is absolutely convergent for real numbers ${s>1}$, or more generally for complex numbers ${s}$ with ${\hbox{Re}(s)>1}$. As we will see below the fold, when ${f}$ is multiplicative then the Dirichlet series enjoys an important Euler product factorisation which has many consequences for analytic number theory.

In the elementary approach to multiplicative number theory presented in this set of notes, we consider Dirichlet series only for real numbers ${s>1}$ (and focusing particularly on the asymptotic behaviour as ${s \rightarrow 1^+}$); in later notes we will focus instead on the important complex-analytic approach to multiplicative number theory, in which the Dirichlet series (4) play a central role, and are defined not only for complex numbers with large real part, but are often extended analytically or meromorphically to the rest of the complex plane as well.

Remark 1 The elementary and complex-analytic approaches to multiplicative number theory are the two classical approaches to the subject. One could also consider a more “Fourier-analytic” approach, in which one studies convolution-type statistics such as

$\displaystyle \sum_n \frac{f(n)}{n} G( t - \log n ) \ \ \ \ \ (5)$

as ${t \rightarrow \infty}$ for various cutoff functions ${G: {\bf R} \rightarrow {\bf C}}$, such as smooth, compactly supported functions. See for instance this previous blog post for an instance of such an approach. Another related approach is the “pretentious” approach to multiplicative number theory currently being developed by Granville-Soundararajan and their collaborators. We will occasionally make reference to these more modern approaches in these notes, but will primarily focus on the classical approaches.

To reverse the process and derive control on summatory functions or logarithmic sums starting from control of Dirichlet series is trickier, and usually requires one to allow ${s}$ to be complex-valued rather than real-valued if one wants to obtain really accurate estimates; we will return to this point in subsequent notes. However, there is a cheap way to get upper bounds on such sums, known as Rankin’s trick, which we will discuss later in these notes.

The basic strategy of elementary multiplicative theory is to first gather useful estimates on the statistics of “smooth” or “non-oscillatory” functions, such as the constant function ${n \mapsto 1}$, the harmonic function ${n \mapsto \frac{1}{n}}$, or the logarithm function ${n \mapsto \log n}$; one also considers the statistics of periodic functions such as Dirichlet characters. These functions can be understood without any multiplicative number theory, using basic tools from real analysis such as the (quantitative version of the) integral test or summation by parts. Once one understands the statistics of these basic functions, one can then move on to statistics of more arithmetically interesting functions, such as the divisor function (2) or the von Mangoldt function ${\Lambda}$ that we will discuss below. A key tool to relate these functions to each other is that of Dirichlet convolution, which is an operation that interacts well with summatory functions, logarithmic sums, and particularly well with Dirichlet series.

This is only an introduction to elementary multiplicative number theory techniques. More in-depth treatments may be found in this text of Montgomery-Vaughan, or this text of Bateman-Diamond.

In the winter quarter (starting January 5) I will be teaching a graduate topics course entitled “An introduction to analytic prime number theory“. As the name suggests, this is a course covering many of the analytic number theory techniques used to study the distribution of the prime numbers ${{\mathcal P} = \{2,3,5,7,11,\dots\}}$. I will list the topics I intend to cover in this course below the fold. As with my previous courses, I will place lecture notes online on my blog in advance of the physical lectures.

The type of results about primes that one aspires to prove here is well captured by Landau’s classical list of problems:

1. Even Goldbach conjecture: every even number ${N}$ greater than two is expressible as the sum of two primes.
2. Twin prime conjecture: there are infinitely many pairs ${n,n+2}$ which are simultaneously prime.
3. Legendre’s conjecture: for every natural number ${N}$, there is a prime between ${N^2}$ and ${(N+1)^2}$.
4. There are infinitely many primes of the form ${n^2+1}$.

All four of Landau’s problems remain open, but we have convincing heuristic evidence that they are all true, and in each of the four cases we have some highly non-trivial partial results, some of which will be covered in this course. We also now have some understanding of the barriers we are facing to fully resolving each of these problems, such as the parity problem; this will also be discussed in the course.

One of the main reasons that the prime numbers ${{\mathcal P}}$ are so difficult to deal with rigorously is that they have very little usable algebraic or geometric structure that we know how to exploit; for instance, we do not have any useful prime generating functions. One of course can create non-useful functions of this form, such as the ordered parameterisation ${n \mapsto p_n}$ that maps each natural number ${n}$ to the ${n^{th}}$ prime ${p_n}$, or one could invoke Matiyasevich’s theorem to produce a polynomial of many variables whose only positive values are prime, but these sorts of functions have no usable structure to exploit (for instance, they give no insight into any of the Landau problems listed above; see also Remark 2 below). The various primality tests in the literature, while useful for practical applications (e.g. cryptography) involving primes, have also proven to be of little utility for these sorts of problems; again, see Remark 2. In fact, in order to make plausible heuristic predictions about the primes, it is best to take almost the opposite point of view to the structured viewpoint, using as a starting point the belief that the primes exhibit strong pseudorandomness properties that are largely incompatible with the presence of rigid algebraic or geometric structure. We will discuss such heuristics later in this course.

It may be in the future that some usable structure to the primes (or related objects) will eventually be located (this is for instance one of the motivations in developing a rigorous theory of the “field with one element“, although this theory is far from being fully realised at present). For now, though, analytic and combinatorial methods have proven to be the most effective way forward, as they can often be used even in the near-complete absence of structure.

In this course, we will not discuss combinatorial approaches (such as the deployment of tools from additive combinatorics) in depth, but instead focus on the analytic methods. The basic principles of this approach can be summarised as follows:

1. Rather than try to isolate individual primes ${p}$ in ${{\mathcal P}}$, one works with the set of primes ${{\mathcal P}}$ in aggregate, focusing in particular on asymptotic statistics of this set. For instance, rather than try to find a single pair ${n,n+2}$ of twin primes, one can focus instead on the count ${|\{ n \leq x: n,n+2 \in {\mathcal P} \}|}$ of twin primes up to some threshold ${x}$. Similarly, one can focus on counts such as ${|\{ n \leq N: n, N-n \in {\mathcal P} \}|}$, ${|\{ p \in {\mathcal P}: N^2 < p < (N+1)^2 \}|}$, or ${|\{ n \leq x: n^2 + 1 \in {\mathcal P} \}|}$, which are the natural counts associated to the other three Landau problems. In all four of Landau’s problems, the basic task is now to obtain a non-trivial lower bounds on these counts.
2. If one wishes to proceed analytically rather than combinatorially, one should convert all these counts into sums, using the fundamental identity

$\displaystyle |A| = \sum_n 1_A(n),$

(or variants thereof) for the cardinality ${|A|}$ of subsets ${A}$ of the natural numbers ${{\bf N}}$, where ${1_A}$ is the indicator function of ${A}$ (and ${n}$ ranges over ${{\bf N}}$). Thus we are now interested in estimating (and particularly in lower bounding) sums such as

$\displaystyle \sum_{n \leq N} 1_{{\mathcal P}}(n) 1_{{\mathcal P}}(N-n),$

$\displaystyle \sum_{n \leq x} 1_{{\mathcal P}}(n) 1_{{\mathcal P}}(n+2),$

$\displaystyle \sum_{N^2 < n < (N+1)^2} 1_{{\mathcal P}}(n),$

or

$\displaystyle \sum_{n \leq x} 1_{{\mathcal P}}(n^2+1).$

3. Once one expresses number-theoretic problems in this fashion, we are naturally led to the more general question of how to accurately estimate (or, less ambitiously, to lower bound or upper bound) sums such as

$\displaystyle \sum_n f(n)$

or more generally bilinear or multilinear sums such as

$\displaystyle \sum_n \sum_m f(n,m)$

or

$\displaystyle \sum_{n_1,\dots,n_k} f(n_1,\dots,n_k)$

for various functions ${f}$ of arithmetic interest. (Importantly, one should also generalise to include integrals as well as sums, particularly contour integrals or integrals over the unit circle or real line, but we postpone discussion of these generalisations to later in the course.) Indeed, a huge portion of modern analytic number theory is devoted to precisely this sort of question. In many cases, we can predict an expected main term for such sums, and then the task is to control the error term between the true sum and its expected main term. It is often convenient to normalise the expected main term to be zero or negligible (e.g. by subtracting a suitable constant from ${f}$), so that one is now trying to show that a sum of signed real numbers (or perhaps complex numbers) is small. In other words, the question becomes one of rigorously establishing a significant amount of cancellation in one’s sums (also referred to as a gain or savings over a benchmark “trivial bound”). Or to phrase it negatively, the task is to rigorously prevent a conspiracy of non-cancellation, caused for instance by two factors in the summand ${f(n)}$ exhibiting an unexpectedly large correlation with each other.

4. It is often difficult to discern cancellation (or to prevent conspiracy) directly for a given sum (such as ${\sum_n f(n)}$) of interest. However, analytic number theory has developed a large number of techniques to relate one sum to another, and then the strategy is to keep transforming the sum into more and more analytically tractable expressions, until one arrives at a sum for which cancellation can be directly exhibited. (Note though that there is often a short-term tradeoff between analytic tractability and algebraic simplicity; in a typical analytic number theory argument, the sums will get expanded and decomposed into many quite messy-looking sub-sums, until at some point one applies some crude estimation to replace these messy sub-sums by tractable ones again.) There are many transformations available, ranging such basic tools as the triangle inequality, pointwise domination, or the Cauchy-Schwarz inequality to key identities such as multiplicative number theory identities (such as the Vaughan identity and the Heath-Brown identity), Fourier-analytic identities (e.g. Fourier inversion, Poisson summation, or more advanced trace formulae), or complex analytic identities (e.g. the residue theorem, Perron’s formula, or Jensen’s formula). The sheer range of transformations available can be intimidating at first; there is no shortage of transformations and identities in this subject, and if one applies them randomly then one will typically just transform a difficult sum into an even more difficult and intractable expression. However, one can make progress if one is guided by the strategy of isolating and enhancing a desired cancellation (or conspiracy) to the point where it can be easily established (or dispelled), or alternatively to reach the point where no deep cancellation is needed for the application at hand (or equivalently, that no deep conspiracy can disrupt the application).
5. One particularly powerful technique (albeit one which, ironically, can be highly “ineffective” in a certain technical sense to be discussed later) is to use one potential conspiracy to defeat another, a technique I refer to as the “dueling conspiracies” method. This technique may be unable to prevent a single strong conspiracy, but it can sometimes be used to prevent two or more such conspiracies from occurring, which is particularly useful if conspiracies come in pairs (e.g. through complex conjugation symmetry, or a functional equation). A related (but more “effective”) strategy is to try to “disperse” a single conspiracy into several distinct conspiracies, which can then be used to defeat each other.

As stated before, the above strategy has not been able to establish any of the four Landau problems as stated. However, they can come close to such problems (and we now have some understanding as to why these problems remain out of reach of current methods). For instance, by using these techniques (and a lot of additional effort) one can obtain the following sample partial results in the Landau problems:

1. Chen’s theorem: every sufficiently large even number ${N}$ is expressible as the sum of a prime and an almost prime (the product of at most two primes). The proof proceeds by finding a nontrivial lower bound on ${\sum_{n \leq N} 1_{\mathcal P}(n) 1_{{\mathcal E}_2}(N-n)}$, where ${{\mathcal E}_2}$ is the set of almost primes.
2. Zhang’s theorem: There exist infinitely many pairs ${p_n, p_{n+1}}$ of consecutive primes with ${p_{n+1} - p_n \leq 7 \times 10^7}$. The proof proceeds by giving a non-negative lower bound on the quantity ${\sum_{x \leq n \leq 2x} (\sum_{i=1}^k 1_{\mathcal P}(n+h_i) - 1)}$ for large ${x}$ and certain distinct integers ${h_1,\dots,h_k}$ between ${0}$ and ${7 \times 10^7}$. (The bound ${7 \times 10^7}$ has since been lowered to ${246}$.)
3. The Baker-Harman-Pintz theorem: for sufficiently large ${x}$, there is a prime between ${x}$ and ${x + x^{0.525}}$. Proven by finding a nontrivial lower bound on ${\sum_{x \leq n \leq x+x^{0.525}} 1_{\mathcal P}(n)}$.
4. The Friedlander-Iwaniec theorem: There are infinitely many primes of the form ${n^2+m^4}$. Proven by finding a nontrivial lower bound on ${\sum_{n,m: n^2+m^4 \leq x} 1_{{\mathcal P}}(n^2+m^4)}$.

We will discuss (simpler versions of) several of these results in this course.

Of course, for the above general strategy to have any chance of succeeding, one must at some point use some information about the set ${{\mathcal P}}$ of primes. As stated previously, usefully structured parametric descriptions of ${{\mathcal P}}$ do not appear to be available. However, we do have two other fundamental and useful ways to describe ${{\mathcal P}}$:

1. (Sieve theory description) The primes ${{\mathcal P}}$ consist of those numbers greater than one, that are not divisible by any smaller prime.
2. (Multiplicative number theory description) The primes ${{\mathcal P}}$ are the multiplicative generators of the natural numbers ${{\bf N}}$: every natural number is uniquely factorisable (up to permutation) into the product of primes (the fundamental theorem of arithmetic).

The sieve-theoretic description and its variants lead one to a good understanding of the almost primes, which turn out to be excellent tools for controlling the primes themselves, although there are known limitations as to how much information on the primes one can extract from sieve-theoretic methods alone, which we will discuss later in this course. The multiplicative number theory methods lead one (after some complex or Fourier analysis) to the Riemann zeta function (and other L-functions, particularly the Dirichlet L-functions), with the distribution of zeroes (and poles) of these functions playing a particularly decisive role in the multiplicative methods.

Many of our strongest results in analytic prime number theory are ultimately obtained by incorporating some combination of the above two fundamental descriptions of ${{\mathcal P}}$ (or variants thereof) into the general strategy described above. In contrast, more advanced descriptions of ${{\mathcal P}}$, such as those coming from the various primality tests available, have (until now, at least) been surprisingly ineffective in practice for attacking problems such as Landau’s problems. One reason for this is that such tests generally involve operations such as exponentiation ${a \mapsto a^n}$ or the factorial function ${n \mapsto n!}$, which grow too quickly to be amenable to the analytic techniques discussed above.

To give a simple illustration of these two basic approaches to the primes, let us first give two variants of the usual proof of Euclid’s theorem:

Theorem 1 (Euclid’s theorem) There are infinitely many primes.

Proof: (Multiplicative number theory proof) Suppose for contradiction that there were only finitely many primes ${p_1,\dots,p_n}$. Then, by the fundamental theorem of arithmetic, every natural number is expressible as the product of the primes ${p_1,\dots,p_n}$. But the natural number ${p_1 \dots p_n + 1}$ is larger than one, but not divisible by any of the primes ${p_1,\dots,p_n}$, a contradiction.

(Sieve-theoretic proof) Suppose for contradiction that there were only finitely many primes ${p_1,\dots,p_n}$. Then, by the Chinese remainder theorem, the set of natural numbers ${A}$ that is not divisible by any of the ${p_1,\dots,p_n}$ has density ${\prod_{i=1}^n (1-\frac{1}{p_i})}$, that is to say

$\displaystyle \lim_{N \rightarrow \infty} \frac{1}{N} | A \cap \{1,\dots,N\} | = \prod_{i=1}^n (1-\frac{1}{p_i}).$

In particular, ${A}$ has positive density and thus contains an element larger than ${1}$. But the least such element is one further prime in addition to ${p_1,\dots,p_n}$, a contradiction. $\Box$

Remark 1 One can also phrase the proof of Euclid’s theorem in a fashion that largely avoids the use of contradiction; see this previous blog post for more discussion.

Both proofs in fact extend to give a stronger result:

Theorem 2 (Euler’s theorem) The sum ${\sum_{p \in {\mathcal P}} \frac{1}{p}}$ is divergent.

Proof: (Multiplicative number theory proof) By the fundamental theorem of arithmetic, every natural number is expressible uniquely as the product ${p_1^{a_1} \dots p_n^{a_n}}$ of primes in increasing order. In particular, we have the identity

$\displaystyle \sum_{n=1}^\infty \frac{1}{n} = \prod_{p \in {\mathcal P}} ( 1 + \frac{1}{p} + \frac{1}{p^2} + \dots )$

(both sides make sense in ${[0,+\infty]}$ as everything is unsigned). Since the left-hand side is divergent, the right-hand side is as well. But

$\displaystyle ( 1 + \frac{1}{p} + \frac{1}{p^2} + \dots ) = \exp( \frac{1}{p} + O( \frac{1}{p^2} ) )$

and ${\sum_{p \in {\mathcal P}} \frac{1}{p^2}\leq \sum_{n=1}^\infty \frac{1}{n^2} < \infty}$, so ${\sum_{p \in {\mathcal P}} \frac{1}{p}}$ must be divergent.

(Sieve-theoretic proof) Suppose for contradiction that the sum ${\sum_{p \in {\mathcal P}} \frac{1}{p}}$ is convergent. For each natural number ${k}$, let ${A_k}$ be the set of natural numbers not divisible by the first ${k}$ primes ${p_1,\dots,p_k}$, and let ${A}$ be the set of numbers not divisible by any prime in ${{\mathcal P}}$. As in the previous proof, each ${A_k}$ has density ${\prod_{i=1}^k (1-\frac{1}{p_i})}$. Also, since ${\{1,\dots,N\}}$ contains at most ${\frac{N}{p}}$ multiples of ${p}$, we have from the union bound that

$\displaystyle | A \cap \{1,\dots,N \}| = |A_k \cap \{1,\dots,N\}| - O( N \sum_{i > k} \frac{1}{p_i} ).$

Since ${\sum_{i=1}^\infty \frac{1}{p_i}}$ is assumed to be convergent, we conclude that the density of ${A_k}$ converges to the density of ${A}$; thus ${A}$ has density ${\prod_{i=1}^\infty (1-\frac{1}{p_i})}$, which is non-zero by the hypothesis that ${\sum_{i=1}^\infty \frac{1}{p_i}}$ converges. On the other hand, since the primes are the only numbers greater than one not divisible by smaller primes, ${A}$ is just ${\{1\}}$, which has density zero, giving the desired contradiction. $\Box$

Remark 2 We have seen how easy it is to prove Euler’s theorem by analytic methods. In contrast, there does not seem to be any known proof of this theorem that proceeds by using any sort of prime-generating formula or a primality test, which is further evidence that such tools are not the most effective way to make progress on problems such as Landau’s problems. (But the weaker theorem of Euclid, Theorem 1, can sometimes be proven by such devices.)

The two proofs of Theorem 2 given above are essentially the same proof, as is hinted at by the geometric series identity

$\displaystyle 1 + \frac{1}{p} + \frac{1}{p^2} + \dots = (1 - \frac{1}{p})^{-1}.$

One can also see the Riemann zeta function begin to make an appearance in both proofs. Once one goes beyond Euler’s theorem, though, the sieve-theoretic and multiplicative methods begin to diverge significantly. On one hand, sieve theory can still handle to some extent sets such as twin primes, despite the lack of multiplicative structure (one simply has to sieve out two residue classes per prime, rather than one); on the other, multiplicative number theory can attain results such as the prime number theorem for which purely sieve theoretic techniques have not been able to establish. The deepest results in analytic number theory will typically require a combination of both sieve-theoretic methods and multiplicative methods in conjunction with the many transforms discussed earlier (and, in many cases, additional inputs from other fields of mathematics such as arithmetic geometry, ergodic theory, or additive combinatorics).

Things are pretty quiet here during the holiday season, but one small thing I have been working on recently is a set of notes on special relativity that I will be working through in a few weeks with some bright high school students here at our local math circle.  I have only two hours to spend with this group, and it is unlikely that we will reach the end of the notes (in which I derive the famous mass-energy equivalence relation E=mc^2, largely following Einstein’s original derivation as discussed in this previous blog post); instead we will probably spend a fair chunk of time on related topics which do not actually require special relativity per se, such as spacetime diagrams, the Doppler shift effect, and an analysis of my airport puzzle.  This will be my first time doing something of this sort (in which I will be spending as much time interacting directly with the students as I would lecturing);  I’m not sure exactly how it will play out, being a little outside of my usual comfort zone of undergraduate and graduate teaching, but am looking forward to finding out how it goes.   (In particular, it may end up that the discussion deviates somewhat from my prepared notes.)

The material covered in my notes is certainly not new, but I ultimately decided that it was worth putting up here in case some readers here had any corrections or other feedback to contribute (which, as always, would be greatly appreciated).

[Dec 24 and then Jan 21: notes updated, in response to comments.]

I recently finished the first draft of the the first of my books, entitled “Hilbert’s fifth problem and related topics“, based on the lecture notes for my graduate course of the same name.    The PDF of this draft is available here.  As always, comments and corrections are welcome.

This is an addendum to last quarter’s course notes on Hilbert’s fifth problem, which I am in the process of reviewing in order to transcribe them into a book (as was done similarly for several other sets of lecture notes on this blog). When reviewing the zeroth set of notes in particular, I found that I had made a claim (Proposition 11 from those notes) which asserted, roughly speaking, that any sufficiently large nilprogression was an approximate group, and promised to prove it later in the course when we had developed the ability to calculate efficiently in nilpotent groups. As it turned out, I managed finish the course without the need to develop these calculations, and so the proposition remained unproven. In order to rectify this, I will use this post to lay out some of the basic algebra of nilpotent groups, and use it to prove the above proposition, which turns out to be a bit tricky. (In my paper with Breuillard and Green, we avoid the need for this proposition by restricting attention to a special type of nilprogression, which we call a nilprogression in ${C}$-normal form, for which the computations are simpler.)

There are several ways to think about nilpotent groups; for instance one can use the model example of the Heisenberg group

$\displaystyle H(R) :=\begin{pmatrix} 1 & R & R \\ 0 & 1 & R\\ 0 & 0 & 1 \end{pmatrix}$

over an arbitrary ring ${R}$ (which need not be commutative), or more generally any matrix group consisting of unipotent upper triangular matrices, and view a general nilpotent group as being an abstract generalisation of such concrete groups. (In the case of nilpotent Lie groups, at least, this is quite an accurate intuition, thanks to Engel’s theorem.) Or, one can adopt a Lie-theoretic viewpoint and try to think of nilpotent groups as somehow arising from nilpotent Lie algebras; this intuition is rigorous when working with nilpotent Lie groups (at least when the characteristic is large, in order to avoid issues coming from the denominators in the Baker-Campbell-Hausdorff formula), but also retains some conceptual value in the non-Lie setting. In particular, nilpotent groups (particularly finitely generated ones) can be viewed in some sense as “nilpotent Lie groups over ${{\bf Z}}$“, even though Lie theory does not quite work perfectly when the underlying scalars merely form an integral domain instead of a field.

Another point of view, which arises naturally both in analysis and in algebraic geometry, is to view nilpotent groups as modeling “infinitesimal” perturbations of the identity, where the infinitesimals have a certain finite order. For instance, given a (not necessarily commutative) ring ${R}$ without identity (representing all the “small” elements of some larger ring or algebra), we can form the powers ${R^j}$ for ${j=1,2,\ldots}$, defined as the ring generated by ${j}$-fold products ${r_1 \ldots r_j}$ of elements ${r_1,\ldots,r_j}$ in ${R}$; this is an ideal of ${R}$ which represents the elements which are “${j^{th}}$ order” in some sense. If one then formally adjoins an identity ${1}$ onto the ring ${R}$, then for any ${s \geq 1}$, the multiplicative group ${G := 1+R \hbox{ mod } R^{s+1}}$ is a nilpotent group of step at most ${s}$. For instance, if ${R}$ is the ring of strictly upper ${s \times s}$ matrices (over some base ring), then ${R^{s+1}}$ vanishes and ${G}$ becomes the group of unipotent upper triangular matrices over the same ring, thus recovering the previous matrix-based example. In analysis applications, ${R}$ might be a ring of operators which are somehow of “order” ${O(\epsilon)}$ or ${O(\hbar)}$ for some small parameter ${\epsilon}$ or ${\hbar}$, and one wishes to perform Taylor expansions up to order ${O(\epsilon^s)}$ or ${O(\hbar^s)}$, thus discarding (i.e. quotienting out) all errors in ${R^{s+1}}$.

From a dynamical or group-theoretic perspective, one can also view nilpotent groups as towers of central extensions of a trivial group. Finitely generated nilpotent groups can also be profitably viewed as a special type of polycylic group; this is the perspective taken in this previous blog post. Last, but not least, one can view nilpotent groups from a combinatorial group theory perspective, as being words from some set of generators of various “degrees” subject to some commutation relations, with commutators of two low-degree generators being expressed in terms of higher degree objects, and all commutators of a sufficiently high degree vanishing. In particular, generators of a given degree can be moved freely around a word, as long as one is willing to generate commutator errors of higher degree.

With this last perspective, in particular, one can start computing in nilpotent groups by adopting the philosophy that the lowest order terms should be attended to first, without much initial concern for the higher order errors generated in the process of organising the lower order terms. Only after the lower order terms are in place should attention then turn to higher order terms, working successively up the hierarchy of degrees until all terms are dealt with. This turns out to be a relatively straightforward philosophy to implement in many cases (particularly if one is not interested in explicit expressions and constants, being content instead with qualitative expansions of controlled complexity), but the arguments are necessarily recursive in nature and as such can become a bit messy, and require a fair amount of notation to express precisely. So, unfortunately, the arguments here will be somewhat cumbersome and notation-heavy, even if the underlying methods of proof are relatively simple.

In this final set of course notes, we discuss how (a generalisation of) the expansion results obtained in the preceding notes can be used for some nnumber-theoretic applications, and in particular to locate almost primes inside orbits of thin groups, following the work of Bourgain, Gamburd, and Sarnak. We will not attempt here to obtain the sharpest or most general results in this direction, but instead focus on the simplest instances of these results which are still illustrative of the ideas involved.

One of the basic general problems in analytic number theory is to locate tuples of primes of a certain form; for instance, the famous (and still unsolved) twin prime conjecture asserts that there are infinitely many pairs ${(n_1,n_2)}$ in the line ${\{ (n_1,n_2) \in {\bf Z}^2: n_2-n_1=2\}}$ in which both entries are prime. In a similar spirit, one of the Landau conjectures (also still unsolved) asserts that there are infinitely many primes in the set ${\{ n^2+1: n \in {\bf Z} \}}$. The Mersenne prime conjecture (also unsolved) asserts that there are infinitely many primes in the set ${\{ 2^n - 1: n \in {\bf Z} \}}$, and so forth.

More generally, given some explicit subset ${V}$ in ${{\bf R}^d}$ (or ${{\bf C}^d}$, if one wishes), such as an algebraic variety, one can ask the question of whether there are infinitely many integer lattice points ${(n_1,\ldots,n_d)}$ in ${V \cap {\bf Z}^d}$ in which all the coefficients ${n_1,\ldots,n_d}$ are simultaneously prime; let us refer to such points as prime points.

At this level of generality, this problem is impossibly difficult. Indeed, even the much simpler problem of deciding whether the set ${V \cap {\bf Z}^d}$ is non-empty (let alone containing prime points) when ${V}$ is a hypersurface ${\{ x \in {\bf R}^d: P(x) = 0 \}}$ cut out by a polynomial ${P}$ is essentially Hilbert’s tenth problem, which is known to be undecidable in general by Matiyasevich’s theorem. So one needs to restrict attention to a more special class of sets ${V}$, in which the question of finding integer points is not so difficult. One model case is to consider orbits ${V = \Gamma b}$, where ${b \in {\bf Z}^d}$ is a fixed lattice vector and ${\Gamma}$ is some discrete group that acts on ${{\bf Z}^d}$ somehow (e.g. ${\Gamma}$ might be embedded as a subgroup of the special linear group ${SL_d({\bf Z})}$, or on the affine group ${SL_d({\bf Z}) \ltimes {\bf Z}^d}$). In such a situation it is then quite easy to show that ${V = V \cap {\bf Z}^d}$ is large; for instance, ${V}$ will be infinite precisely when the stabiliser of ${b}$ in ${\Gamma}$ has infinite index in ${\Gamma}$.

Even in this simpler setting, the question of determining whether an orbit ${V = \Gamma b}$ contains infinitely prime points is still extremely difficult; indeed the three examples given above of the twin prime conjecture, Landau conjecture, and Mersenne prime conjecture are essentially of this form (possibly after some slight modification of the underlying ring ${{\bf Z}}$, see this paper of Bourgain-Gamburd-Sarnak for details), and are all unsolved (and generally considered well out of reach of current technology). Indeed, the list of non-trivial orbits ${V = \Gamma b}$ which are known to contain infinitely many prime points is quite slim; Euclid’s theorem on the infinitude of primes handles the case ${V = {\bf Z}}$, Dirichlet’s theorem handles infinite arithmetic progressions ${V = a{\bf Z} + r}$, and a somewhat complicated result of Green, Tao, and Ziegler handles “non-degenerate” affine lattices in ${{\bf Z}^d}$ of rank two or more (such as the lattice of length ${d}$ arithmetic progressions), but there are few other positive results known that are not based on the above cases (though we will note the remarkable theorem of Friedlander and Iwaniec that there are infinitely many primes of the form ${a^2+b^4}$, and the related result of Heath-Brown that there are infinitely many primes of the form ${a^3+2b^3}$, as being in a kindred spirit to the above results, though they are not explicitly associated to an orbit of a reasonable action as far as I know).

On the other hand, much more is known if one is willing to replace the primes by the larger set of almost primes – integers with a small number of prime factors (counting multiplicity). Specifically, for any ${r \geq 1}$, let us call an ${r}$-almost prime an integer which is the product of at most ${r}$ primes, and possibly by the unit ${-1}$ as well. Many of the above sorts of questions which are open for primes, are known for ${r}$-almost primes for ${r}$ sufficiently large. For instance, with regards to the twin prime conjecture, it is a result of Chen that there are infinitely many pairs ${p,p+2}$ where ${p}$ is a prime and ${p+2}$ is a ${2}$-almost prime; in a similar vein, it is a result of Iwaniec that there are infinitely many ${2}$-almost primes of the form ${n^2+1}$. On the other hand, it is still open for any fixed ${r}$ whether there are infinitely many Mersenne numbers ${2^n-1}$ which are ${r}$-almost primes. (For the superficially similar situation with the numbers ${2^n+1}$, it is in fact believed (but again unproven) that there are only finitely many ${r}$-almost primes for any fixed ${r}$ (the Fermat prime conjecture).)

The main tool that allows one to count almost primes in orbits is sieve theory. The reason for this lies in the simple observation that in order to ensure that an integer ${n}$ of magnitude at most ${x}$ is an ${r}$-almost prime, it suffices to guarantee that ${n}$ is not divisible by any prime less than ${x^{1/(r+1)}}$. Thus, to create ${r}$-almost primes, one can start with the integers up to some large threshold ${x}$ and remove (or “sieve out”) all the integers that are multiples of any prime ${p}$ less than ${x^{1/(r+1)}}$. The difficulty is then to ensure that a sufficiently non-trivial quantity of integers remain after this process, for the purposes of finding points in the given set ${V}$.

The most basic sieve of this form is the sieve of Eratosthenes, which when combined with the inclusion-exclusion principle gives the Legendre sieve (or exact sieve), which gives an exact formula for quantities such as the number ${\pi(x,z)}$ of natural numbers less than or equal to ${x}$ that are not divisible by any prime less than or equal to a given threshold ${z}$. Unfortunately, when one tries to evaluate this formula, one encounters error terms which grow exponentially in ${z}$, rendering this sieve useful only for very small thresholds ${z}$ (of logarithmic size in ${x}$). To improve the sieve level up to a small power of ${x}$ such as ${x^{1/(r+1)}}$, one has to replace the exact sieve by upper bound sieves and lower bound sieves which only seek to obtain upper or lower bounds on quantities such as ${\pi(x,z)}$, but contain a polynomial number of terms rather than an exponential number. There are a variety of such sieves, with the two most common such sieves being combinatorial sieves (such as the beta sieve), based on various combinatorial truncations of the inclusion-exclusion formula, and the Selberg upper bound sieve, based on upper bounds that are the square of a divisor sum. (There is also the large sieve, which is somewhat different in nature and based on ${L^2}$ almost orthogonality considerations, rather than on any actual sieving, to obtain upper bounds.) We will primarily work with a specific sieve in this notes, namely the beta sieve, and we will not attempt to optimise all the parameters of this sieve (which ultimately means that the almost primality parameter ${r}$ in our results will be somewhat large). For a more detailed study of sieve theory, see the classic text of Halberstam and Richert, or the more recent texts of Iwaniec-Kowalski or of Friedlander-Iwaniec.

Very roughly speaking, the end result of sieve theory is that excepting some degenerate and “exponentially thin” settings (such as those associated with the Mersenne primes), all the orbits which are expected to have a large number of primes, can be proven to at least have a large number of ${r}$-almost primes for some finite ${r}$. (Unfortunately, there is a major obstruction, known as the parity problem, which prevents sieve theory from lowering ${r}$ all the way to ${1}$; see this blog post for more discussion.) One formulation of this principle was established by Bourgain, Gamburd, and Sarnak:

Theorem 1 (Bourgain-Gamburd-Sarnak) Let ${\Gamma}$ be a subgroup of ${SL_2({\bf Z})}$ which is not virtually solvable. Let ${f: {\bf Z}^4 \rightarrow {\bf Z}}$ be a polynomial with integer coefficients obeying the following primitivity condition: for any positive integer ${q}$, there exists ${A \in \Gamma \subset {\bf Z}^4}$ such that ${f(A)}$ is coprime to ${q}$. Then there exists an ${r \geq 1}$ such that there are infinitely many ${A \in \Gamma}$ with ${f(A)}$ non-zero and ${r}$-almost prime.

This is not the strongest version of the Bourgain-Gamburd-Sarnak theorem, but it captures the general flavour of their results. Note that the theorem immediately implies an analogous result for orbits ${\Gamma b \subset {\bf Z}^2}$, in which ${f}$ is now a polynomial from ${{\bf Z}^2}$ to ${{\bf Z}}$, and one uses ${f(Ab)}$ instead of ${f(A)}$. It is in fact conjectured that one can set ${r=1}$ here, but this is well beyond current technology. For the purpose of reaching ${r=1}$, it is very natural to impose the primitivity condition, but as long as one is content with larger values of ${r}$, it is possible to relax the primitivity condition somewhat; see the paper of Bourgain, Gamburd, and Sarnak for more discussion.

By specialising to the polynomial ${f: \begin{pmatrix} a & b \\ c & d \end{pmatrix} \rightarrow abcd}$, we conclude as a corollary that as long as ${\Gamma}$ is primitive in the sense that it contains matrices with all coefficients coprime to ${q}$ for any given ${q}$, then ${\Gamma}$ contains infinitely many matrices whose elements are all ${r}$-almost primes for some ${r}$ depending only on ${\Gamma}$. For further applications of these sorts of results, for instance to Appolonian packings, see the paper of Bourgain, Gamburd, and Sarnak.

It turns out that to prove Theorem 1, the Cayley expansion results in ${SL_2(F_p)}$ from the previous set of notes are not quite enough; one needs a more general Cayley expansion result in ${SL_2({\bf Z}/q{\bf Z})}$ where ${q}$ is square-free but not necessarily prime. The proof of this expansion result uses the same basic methods as in the ${SL_2(F_p)}$ case, but is significantly more complicated technically, and we will only discuss it briefly here. As such, we do not give a complete proof of Theorem 1, but hopefully the portion of the argument presented here is still sufficient to give an impression of the ideas involved.

In the last three notes, we discussed the Bourgain-Gamburd expansion machine and two of its three ingredients, namely quasirandomness and product theorems, leaving only the non-concentration ingredient to discuss. We can summarise the results of the last three notes, in the case of fields of prime order, as the following theorem.

Theorem 1 (Non-concentration implies expansion in ${SL_d}$) Let ${p}$ be a prime, let ${d \geq 1}$, and let ${S}$ be a symmetric set of elements in ${G := SL_d(F_p)}$ of cardinality ${|S|=k}$ not containing the identity. Write ${\mu := \frac{1}{|S|} \sum_{s\in S}\delta_s}$, and suppose that one has the non-concentration property

$\displaystyle \sup_{H < G}\mu^{(n)}(H) < |G|^{-\kappa} \ \ \ \ \ (1)$

for some ${\kappa>0}$ and some even integer ${n \leq \Lambda \log |G|}$. Then ${Cay(G,S)}$ is a two-sided ${\epsilon}$-expander for some ${\epsilon>0}$ depending only on ${k, d, \kappa,\Lambda}$.

Proof: From (1) we see that ${\mu^{(n)}}$ is not supported in any proper subgroup ${H}$ of ${G}$, which implies that ${S}$ generates ${G}$. The claim now follows from the Bourgain-Gamburd expansion machine (Theorem 2 of Notes 4), the product theorem (Theorem 1 of Notes 5), and quasirandomness (Exercise 8 of Notes 3). $\Box$

Remark 1 The same argument also works if we replace ${F_p}$ by the field ${F_{p^j}}$ of order ${p^j}$ for some bounded ${j}$. However, there is a difficulty in the regime when ${j}$ is unbounded, because the quasirandomness property becomes too weak for the Bourgain-Gamburd expansion machine to be directly applicable. On theother hand, the above type of theorem was generalised to the setting of cyclic groups ${{\bf Z}/q{\bf Z}}$ with ${q}$ square-free by Varju, to arbitrary ${q}$ by Bourgain and Varju, and to more general algebraic groups than ${SL_d}$ and square-free ${q}$ by Salehi Golsefidy and Varju. It may be that some modification of the proof techniques in these papers may also be able to handle the field case ${F_{p^j}}$ with unbounded ${j}$.

It thus remains to construct tools that can establish the non-concentration property (1). The situation is particularly simple in ${SL_2(F_p)}$, as we have a good understanding of the subgroups of that group. Indeed, from Theorem 14 from Notes 5, we obtain the following corollary to Theorem 1:

Corollary 2 (Non-concentration implies expansion in ${SL_2}$) Let ${p}$ be a prime, and let ${S}$ be a symmetric set of elements in ${G := SL_2(F_p)}$ of cardinality ${|S|=k}$ not containing the identity. Write ${\mu := \frac{1}{|S|} \sum_{s\in S}\delta_s}$, and suppose that one has the non-concentration property

$\displaystyle \sup_{B}\mu^{(n)}(B) < |G|^{-\kappa} \ \ \ \ \ (2)$

for some ${\kappa>0}$ and some even integer ${n \leq \Lambda \log |G|}$, where ${B}$ ranges over all Borel subgroups of ${SL_2(\overline{F})}$. Then, if ${|G|}$ is sufficiently large depending on ${k,\kappa,\Lambda}$, ${Cay(G,S)}$ is a two-sided ${\epsilon}$-expander for some ${\epsilon>0}$ depending only on ${k, \kappa,\Lambda}$.

It turns out (2) can be verified in many cases by exploiting the solvable nature of the Borel subgroups ${B}$. We give two examples of this in these notes. The first result, due to Bourgain and Gamburd (with earlier partial results by Gamburd and by Shalom) generalises Selberg’s expander construction to the case when ${S}$ generates a thin subgroup of ${SL_2({\bf Z})}$:

Theorem 3 (Expansion in thin subgroups) Let ${S}$ be a symmetric subset of ${SL_2({\bf Z})}$ not containing the identity, and suppose that the group ${\langle S \rangle}$ generated by ${S}$ is not virtually solvable. Then as ${p}$ ranges over all sufficiently large primes, the Cayley graphs ${Cay(SL_2(F_p), \pi_p(S))}$ form a two-sided expander family, where ${\pi_p: SL_2({\bf Z}) \rightarrow SL_2(F_p)}$ is the usual projection.

Remark 2 One corollary of Theorem 3 (or of the non-concentration estimate (3) below) is that ${\pi_p(S)}$ generates ${SL_2(F_p)}$ for all sufficiently large ${p}$, if ${\langle S \rangle}$ is not virtually solvable. This is a special case of a much more general result, known as the strong approximation theorem, although this is certainly not the most direct way to prove such a theorem. Conversely, the strong approximation property is used in generalisations of this result to higher rank groups than ${SL_2}$.

Exercise 1 In the converse direction, if ${\langle S\rangle}$ is virtually solvable, show that for sufficiently large ${p}$, ${\pi_p(S)}$ fails to generate ${SL_2(F_p)}$. (Hint: use Theorem 14 from Notes 5 to prevent ${SL_2(F_p)}$ from having bounded index solvable subgroups.)

Exercise 2 (Lubotzsky’s 1-2-3 problem) Let ${S := \{ \begin{pmatrix}1 & \pm 3 \\ 0 & 1 \end{pmatrix}, \begin{pmatrix}1 & 0 \\ \pm 3 & 1 \end{pmatrix}}$.

• (i) Show that ${S}$ generates a free subgroup of ${SL_2({\bf Z})}$. (Hint: use a ping-pong argument, as in Exercise 23 of Notes 2.)
• (ii) Show that if ${v, w}$ are two distinct elements of the sector ${\{ (x,y) \in {\bf R}^2_+: x/2 < y < 2x \}}$, then there os no element ${g \in \langle S \rangle}$ for which ${gv = w}$. (Hint: this is another ping-pong argument.) Conclude that ${\langle S \rangle}$ has infinite index in ${SL_2({\bf Z})}$. (Contrast this with the situation in which the ${3}$ coefficients in ${S}$ are replaced by ${1}$ or ${2}$, in which case ${\langle S \rangle}$ is either all of ${SL_2({\bf Z})}$, or a finite index subgroup, as demonstrated in Exercise 23 of Notes 2).
• (iii) Show that ${Cay(SL_2(F_p), \pi_p(S))}$ for sufficiently large primes ${p}$ form a two-sided expander family.

Remark 3 Theorem 3 has been generalised to arbitrary linear groups, and with ${F_p}$ replaced by ${{\bf Z}/q{\bf Z}}$ for square-free ${q}$; see this paper of Salehi Golsefidy and Varju. In this more general setting, the condition of virtual solvability must be replaced by the condition that the connected component of the Zariski closure of ${\langle S \rangle}$ is perfect. An effective version of Theorem 3 (with completely explicit constants) was recently obtained by Kowalski.

The second example concerns Cayley graphs constructed using random elements of ${SL_2(F_p)}$.

Theorem 4 (Random generators expand) Let ${p}$ be a prime, and let ${x,y}$ be two elements of ${SL_2(F_p)}$ chosen uniformly at random. Then with probability ${1-o_{p \rightarrow \infty}(1)}$, ${Cay(SL_2(F_p), \{x,x^{-1},y,y^{-1}\})}$ is a two-sided ${\epsilon}$-expander for some absolute constant ${\epsilon}$.

Remark 4 As with Theorem 3, Theorem 4 has also been extended to a number of other groups, such as the Suzuki groups (in this paper of Breuillard, Green, and Tao), and more generally to finite simple groups of Lie type of bounded rank (in forthcoming work of Breuillard, Green, Guralnick, and Tao). There are a number of other constructions of expanding Cayley graphs in such groups (and in other interesting groups, such as the alternating groups) beyond those discussed in these notes; see this recent survey of Lubotzky for further discussion. It has been conjectured by Lubotzky and Weiss that any pair ${x,y}$ of (say) ${SL_2(F_p)}$ that generates the group, is a two-sided ${\epsilon}$-expander for an absolute constant ${\epsilon}$: in the case of ${SL_2(F_p)}$, this has been established for a density one set of primes by Breuillard and Gamburd.

— 1. Expansion in thin subgroups —

We now prove Theorem 3. The first observation is that the expansion property is monotone in the group ${\langle S \rangle}$:

Exercise 3 Let ${S, S'}$ be symmetric subsets of ${SL_2({\bf Z})}$ not containing the identity, such that ${\langle S \rangle \subset \langle S' \rangle}$. Suppose that ${Cay(SL_2(F_p), \pi_p(S))}$ is a two-sided expander family for sufficiently large primes ${p}$. Show that ${Cay(SL_2(F_p), \pi_p(S'))}$ is also a two-sided expander family.

As a consequence, Theorem 3 follows from the following two statments:

Theorem 5 (Tits alternative) Let ${\Gamma \subset SL_2({\bf Z})}$ be a group. Then exactly one of the following statements holds:

• (i) ${\Gamma}$ is virtually solvable.
• (ii) ${\Gamma}$ contains a copy of the free group ${F_2}$ of two generators as a subgroup.

Theorem 6 (Expansion in free groups) Let ${x,y \in SL_2({\bf Z})}$ be generators of a free subgroup of ${SL_2({\bf Z})}$. Then as ${p}$ ranges over all sufficiently large primes, the Cayley graphs ${Cay(SL_2(F_p), \pi_p(\{x,y,x^{-1},y^{-1}\}))}$ form a two-sided expander family.

Theorem 5 is a special case of the famous Tits alternative, which among other things allows one to replace ${SL_2({\bf Z})}$ by ${GL_d(k)}$ for any ${d \geq 1}$ and any field ${k}$ of characteristic zero (and fields of positive characteristic are also allowed, if one adds the requirement that ${\Gamma}$ be finitely generated). We will not prove the full Tits alternative here, but instead just give an ad hoc proof of the special case in Theorem 5 in the following exercise.

Exercise 4 Given any matrix ${g \in SL_2({\bf Z})}$, the singular values are ${\|g\|_{op}}$ and ${\|g\|_{op}^{-1}}$, and we can apply the singular value decomposition to decompose

$\displaystyle g = u_1(g) \|g\|_{op} v_1^*(g) + u_2(g) \|g\|_{op}^{-1} v_2(g)^*$

where ${u_1(g),u_2(g)\in {\bf C}^2}$ and ${v_1(g), v_2(g) \in {\bf C}^2}$ are orthonormal bases. (When ${\|g\|_{op}>1}$, these bases are uniquely determined up to phase rotation.) We let ${\tilde u_1(g) \in {\bf CP}^1}$ be the projection of ${u_1(g)}$ to the projective complex plane, and similarly define ${\tilde v_2(g)}$.

Let ${\Gamma}$ be a subgroup of ${SL_2({\bf Z})}$. Call a pair ${(u,v) \in {\bf CP}^1 \times {\bf CP}^1}$ a limit point of ${\Gamma}$ if there exists a sequence ${g_n \in \Gamma}$ with ${\|g_n\|_{op} \rightarrow \infty}$ and ${(\tilde u_1(g_n), \tilde v_2(g_n)) \rightarrow (u,v)}$.

• (i) Show that if ${\Gamma}$ is infinite, then there is at least one limit point.
• (ii) Show that if ${(u,v)}$ is a limit point, then so is ${(v,u)}$.
• (iii) Show that if there are two limit points ${(u,v), (u',v')}$ with ${\{u,v\} \cap \{u',v'\} = \emptyset}$, then there exist ${g,h \in \Gamma}$ that generate a free group. (Hint: Choose ${(\tilde u_1(g), \tilde v_2(g))}$ close to ${(u,v)}$ and ${(\tilde u_1(h),\tilde v_2(h))}$ close to ${(u',v')}$, and consider the action of ${g}$ and ${h}$ on ${{\bf CP}^1}$, and specifically on small neighbourhoods of ${u,v,u',v'}$, and set up a ping-pong type situation.)
• (iv) Show that if ${g \in SL_2({\bf Z})}$ is hyperbolic (i.e. it has an eigenvalue greater than 1), with eigenvectors ${u,v}$, then the projectivisations ${(\tilde u,\tilde v)}$ of ${u,v}$ form a limit point. Similarly, if ${g}$ is regular parabolic (i.e. it has an eigenvalue at 1, but is not the identity) with eigenvector ${u}$, show that ${(\tilde u,\tilde bu)}$ is a limit point.
• (v) Show that if ${\Gamma}$ has no free subgroup of two generators, then all hyperbolic and regular parabolic elements of ${\Gamma}$ have a common eigenvector. Conclude that all such elements lie in a solvable subgroup of ${\Gamma}$.
• (vi) Show that if an element ${g \in SL_2({\bf Z})}$ is neither hyperbolic nor regular parabolic, and is not a multiple of the identity, then ${g}$ is conjugate to a rotation by ${\pi/2}$ (in particular, ${g^2=-1}$).
• (vii) Establish Theorem 5. (Hint: show that two square roots of ${-1}$ in ${SL_2({\bf Z})}$ cannot multiply to another square root of ${-1}$.)

Now we prove Theorem 6. Let ${\Gamma}$ be a free subgroup of ${SL_2({\bf Z})}$ generated by two generators ${x,y}$. Let ${\mu := \frac{1}{4} (\delta_x +\delta_{x^{-1}} + \delta_y + \delta_{y^{-1}})}$ be the probability measure generating a random walk on ${SL_2({\bf Z})}$, thus ${(\pi_p)_* \mu}$ is the corresponding generator on ${SL_2(F_p)}$. By Corollary 2, it thus suffices to show that

$\displaystyle \sup_{B}((\pi_p)_* \mu)^{(n)}(B) < p^{-\kappa} \ \ \ \ \ (3)$

for all sufficiently large ${p}$, some absolute constant ${\kappa>0}$, and some even ${n = O(\log p)}$ (depending on ${p}$, of course), where ${B}$ ranges over Borel subgroups.

As ${\pi_p}$ is a homomorphism, one has ${((\pi_p)_* \mu)^{(n)}(B) = (\pi_p)_* (\mu^{(n)})(B) = \mu^{(n)}(\pi_p^{-1}(B))}$ and so it suffices to show that

$\displaystyle \sup_{B} \mu^{(n)}(\pi_p^{-1}(B)) < p^{-\kappa}.$

To deal with the supremum here, we will use an argument of Bourgain and Gamburd, taking advantage of the fact that all Borel groups of ${SL_2}$ obey a common group law, the point being that free groups such as ${\Gamma}$ obey such laws only very rarely. More precisely, we use the fact that the Borel groups are solvable of derived length two; in particular we have

$\displaystyle [[a,b],[c,d]] = 1 \ \ \ \ \ (4)$

for all ${a,b,c,d \in B}$. Now, ${\mu^{(n)}}$ is supported on matrices in ${SL_2({\bf Z})}$ whose coefficients have size ${O(\exp(O(n)))}$ (where we allow the implied constants to depend on the choice of generators ${x,y}$), and so ${(\pi_p)_*( \mu^{(n)} )}$ is supported on matrices in ${SL_2(F_p)}$ whose coefficients also have size ${O(\exp(O(n)))}$. If ${n}$ is less than a sufficiently small multiple of ${\log p}$, these coefficients are then less than ${p^{1/10}}$ (say). As such, if ${\tilde a,\tilde b,\tilde c,\tilde d \in SL_2({\bf Z})}$ lie in the support of ${\mu^{(n)}}$ and their projections ${a = \pi_p(\tilde a), \ldots, d = \pi_p(\tilde d)}$ obey the word law (4) in ${SL_2(F_p)}$, then the original matrices ${\tilde a, \tilde b, \tilde c, \tilde d}$ obey the word law (4) in ${SL_2({\bf Z})}$. (This lifting of identities from the characteristic ${p}$ setting of ${SL_2(F_p)}$ to the characteristic ${0}$ setting of ${SL_2({\bf Z})}$ is a simple example of the “Lefschetz principle”.)

To summarise, if we let ${E_{n,p,B}}$ be the set of all elements of ${\pi_p^{-1}(B)}$ that lie in the support of ${\mu^{(n)}}$, then (4) holds for all ${a,b,c,d \in E_{n,p,B}}$. This severely limits the size of ${E_{n,p,B}}$ to only be of polynomial size, rather than exponential size:

Proposition 7 Let ${E}$ be a subset of the support of ${\mu^{(n)}}$ (thus, ${E}$ consists of words in ${x,y,x^{-1},y^{-1}}$ of length ${n}$) such that the law (4) holds for all ${a,b,c,d \in E}$. Then ${|E| \ll n^2}$.

The proof of this proposition is laid out in the exercise below.

Exercise 5 Let ${\Gamma}$ be a free group generated by two generators ${x,y}$. Let ${B}$ be the set of all words of length at most ${n}$ in ${x,y,x^{-1},y^{-1}}$.

• (i) Show that if ${a,b \in \Gamma}$ commute, then ${a, b}$ lie in the same cyclic group, thus ${a = c^i, b = c^j}$ for some ${c \in \Gamma}$ and ${i,j \in {\bf Z}}$.
• (ii) Show that if ${a \in \Gamma}$, there are at most ${O(n)}$ elements of ${B}$ that commute with ${a}$.
• (iii) Show that if ${a,c \in \Gamma}$, there are at most ${O(n)}$ elements ${b}$ of ${B}$ with ${[a,b] = c}$.
• (iv) Prove Proposition 7.

Now we can conclude the proof of Theorem 3:

Exercise 6 Let ${\Gamma}$ be a free group generated by two generators ${x,y}$.

• (i) Show that ${\| \mu^{(n)} \|_{\ell^\infty(\Gamma)} \ll c^n}$ for some absolute constant ${0 < c<1}$. (For much more precise information on ${\mu^{(n)}}$, see this paper of Kesten.)
• (ii) Conclude the proof of Theorem 3.

— 2. Random generators expand —

We now prove Theorem 4. Let ${{\bf F}_2}$ be the free group on two formal generators ${a,b}$, and let ${\mu := \frac{1}{4}(\delta_a + \delta_b + \delta_{a^{-1}}+ \delta_{b^{-1}}}$ be the generator of the random walk. For any word ${w \in {\bf F}_2}$ and any ${x,y}$ in a group ${G}$, let ${w(x,y) \in G}$ be the element of ${G}$ formed by substituting ${x,y}$ for ${a,b}$ respectively in the word ${w}$; thus ${w}$ can be viewed as a map ${w: G \times G \rightarrow G}$ for any group ${G}$. Observe that if ${w}$ is drawn randomly using the distribution ${\mu^{(n)}}$, and ${x,y \in SL_2(F_p)}$, then ${w(x,y)}$ is distributed according to the law ${\tilde \mu^{(n)}}$, where ${\tilde \mu := \frac{1}{4}(\delta_x + \delta_y + \delta_{x^{-1}}+ \delta_{y^{-1}})}$. Applying Corollary 2, it suffices to show that whenever ${p}$ is a large prime and ${x,y}$ are chosen uniformly and independently at random from ${SL_2(F_p)}$, that with probability ${1-o_{p \rightarrow \infty}(1)}$, one has

$\displaystyle \sup_B {\bf P}_w ( w(x,y) \in B ) \leq p^{-\kappa} \ \ \ \ \ (5)$

for some absolute constant ${\kappa}$, where ${B}$ ranges over all Borel subgroups of ${SL_2(\overline{F_p})}$ and ${w}$ is drawn from the law ${\mu^{(n)}}$ for some even natural number ${n = O(\log p)}$.

Let ${B_n}$ denote the words in ${{\bf F}_2}$ of length at most ${n}$. We may use the law (4) to obtain good bound on the supremum in (5) assuming a certain non-degeneracy property of the word evaluations ${w(x,y)}$:

Exercise 7 Let ${n}$ be a natural number, and suppose that ${x,y \in SL_2(F_p)}$ is such that ${w(x,y) \neq 1}$ for ${w \in B_{100n} \backslash \{1\}}$. Show that

$\displaystyle \sup_B {\bf P}_w ( w(x,y) \in B ) \ll \exp(-cn)$

for some absolute constant ${c>0}$, where ${w}$ is drawn from the law ${\mu^{(n)}}$. (Hint: use (4) and the hypothesis to lift the problem up to ${{\bf F}_2}$, at which point one can use Proposition 7 and Exercise 6.)

In view of this exercise, it suffices to show that with probability ${1-o_{p \rightarrow\infty}(1)}$, one has ${w(x,y) \neq 1}$ for all ${w \in B_{100n} \backslash \{1\}}$ for some ${n}$ comparable to a small multiple of ${\log p}$. As ${B_{100n}}$ has ${\exp(O(n))}$ elements, it thus suffices by the union bound to show that

$\displaystyle {\bf P}_{x,y}(w(x,y)=1) \leq p^{-\gamma} \ \ \ \ \ (6)$

for some absolute constant ${\gamma > 0}$, and any ${w \in {\bf F}_2 \backslash \{1\}}$ of length less than ${c\log p}$ for some sufficiently small absolute constant ${c>0}$.

Let us now fix a non-identity word ${w}$ of length ${|w|}$ less than ${c\log p}$, and consider ${w}$ as a function from ${SL_2(k) \times SL_2(k)}$ to ${SL_2(k)}$ for an arbitrary field ${k}$. We can identify ${SL_2(k)}$ with the set ${\{ (a,b,c,d)\in k^4: ad-bc=1\}}$. A routine induction then shows that the expression ${w((a,b,c,d),(a',b',c',d'))}$ is then a polynomial in the eight variables ${a,b,c,d,a',b',c',d'}$ of degree ${O(|w|)}$ and coefficients which are integers of size ${O( \exp( O(|w|) ) )}$. Let us then make the additional restriction to the case ${a,a' \neq 0}$, in which case we can write ${d = \frac{bc+1}{a}}$ and ${d' =\frac{b'c'+1}{a'}}$. Then ${w((a,b,c,d),(a',b',c',d'))}$ is now a rational function of ${a,b,c,a',b',c'}$ whose numerator is a polynomial of degree ${O(|w|)}$ and coefficients of size ${O( \exp( O(|w|) ) )}$, and the denominator is a monomial of ${a,a'}$ of degree ${O(|w|)}$.

We then specialise this rational function to the field ${k=F_p}$. It is conceivable that when one does so, the rational function collapses to the constant polynomial ${(1,0,0,1)}$, thus ${w((a,b,c,d),(a',b',c',d'))=1}$ for all ${(a,b,c,d),(a',b',c',d') \in SL_2(F_p)}$ with ${a,a' \neq 0}$. (For instance, this would be the case if ${w(x,y) = x^{|SL_2(F_p)|}}$, by Lagrange’s theorem, if it were not for the fact that ${|w|}$ is far too large here.) But suppose that this rational function does not collapse to the constant rational function. Applying the Schwarz-Zippel lemma (Exercise 23 from Notes 5), we then see that the set of pairs ${(a,b,c,d),(a',b',c',d') \in SL_2(F_p)}$ with ${a,a' \neq 0}$ and ${w((a,b,c,d),(a',b',c',d'))=1}$ is at most ${O( |w| p^5 )}$; adding in the ${a=0}$ and ${a'=0}$ cases, one still obtains a bound of ${O(|w|p^5)}$, which is acceptable since ${|SL_2(F_p)|^2 \sim p^6}$ and ${|w| = O( \log p )}$. Thus, the only remaining case to consider is when the rational function ${w((a,b,c,d),(a',b',c',d'))}$ is identically ${1}$ on ${SL_2(F_p)}$ with ${a,a' \neq 0}$.

Now we perform another “Lefschetz principle” maneuvre to change the underlying field. Recall that the denominator of rational function ${w((a,b,c,d),(a',b',c',d'))}$ is monomial in ${a,a'}$, and the numerator has coefficients of size ${O(\exp(O(|w|)))}$. If ${|w|}$ is less than ${c\log p}$ for a sufficiently small ${p}$, we conclude in particular (for ${p}$ large enough) that the coefficients all have magnitude less than ${p}$. As such, the only way that this function can be identically ${1}$ on ${SL_2(F_p)}$ is if it is identically ${1}$ on ${SL_2(k)}$ for all ${k}$ with ${a,a' \neq 0}$, and hence for ${a=0}$ or ${a'=0}$ also by taking Zariski closures.

On the other hand, we know that for some choices of ${k}$, e.g. ${k={\bf R}}$, ${SL_2(k)}$ contains a copy ${\Gamma}$ of the free group on two generators (see e.g. Exercise 23 of Notes 2). As such, it is not possible for any non-identity word ${w}$ to be identically trivial on ${SL_2(k) \times SL_2(k)}$. Thus this case cannot actually occur, completing the proof of (6) and hence of Theorem 4.

Remark 5 We see from the above argument that the existence of subgroups ${\Gamma}$ of an algebraic group with good “independence” properties – such as that of generating a free group – can be useful in studying the expansion properties of that algebraic group, even if the field of interest in the latter is distinct from that of the former. For more complicated algebraic groups than ${SL_2}$, in which laws such as (4) are not always available, it turns out to be useful to place further properties on the subgroup ${\Gamma}$, for instance by requiring that all non-abelian subgroups of that group be Zariski dense (a property which has been called strong density), as this turns out to be useful for preventing random walks from concentrating in proper algebraic subgroups. See this paper of Breuillard, Guralnick, Green and Tao for constructions of strongly dense free subgroups of algebraic groups and further discussion.

In the previous set of notes, we saw that one could derive expansion of Cayley graphs from three ingredients: non-concentration, product theorems, and quasirandomness. Quasirandomness was discussed in Notes 3. In the current set of notes, we discuss product theorems. Roughly speaking, these theorems assert that in certain circumstances, a finite subset ${A}$ of a group ${G}$ either exhibits expansion (in the sense that ${A^3}$, say, is significantly larger than ${A}$), or is somehow “close to” or “trapped” by a genuine group.

Theorem 1 (Product theorem in ${SL_d(k)}$) Let ${d \geq 2}$, let ${k}$ be a finite field, and let ${A}$ be a finite subset of ${G := SL_d(k)}$. Let ${\epsilon >0}$ be sufficiently small depending on ${d}$. Then at least one of the following statements holds:

• (Expansion) One has ${|A^3| \geq |A|^{1+\epsilon}}$.
• (Close to ${G}$) One has ${|A| \geq |G|^{1-O_d(\epsilon)}}$.
• (Trapping) ${A}$ is contained in a proper subgroup of ${G}$.

We will prove this theorem (which was proven first in the ${d=2,3}$ cases for fields ${F}$ of prime order by Helfgott, and then for ${d=2}$ and general ${F}$ by Dinai, and finally to general ${d}$ and ${F}$ independently by Pyber-Szabo and by Breuillard-Green-Tao) later in this notes. A more qualitative version of this proposition was also previously obtained by Hrushovski. There are also generalisations of the product theorem of importance to number theory, in which the field ${k}$ is replaced by a cyclic ring ${{\bf Z}/q{\bf Z}}$ (with ${q}$ not necessarily prime); this was achieved first for ${d=2}$ and ${q}$ square-free by Bourgain, Gamburd, and Sarnak, by Varju for general ${d}$ and ${q}$ square-free, and finally by this paper of Bourgain and Varju for arbitrary ${d}$ and ${q}$.

Exercise 1 (Girth bound) Assuming Theorem 1, show that whenever ${S}$ is a symmetric set of generators of ${SL_d(k)}$ for some finite field ${k}$ and some ${d\geq 2}$, then any element of ${SL_d(k)}$ can be expressed as the product of ${O_d( \log^{O_d(1)} |k| )}$ elements from ${S}$. (Equivalently, if we add the identity element to ${S}$, then ${S^m = SL_d(k)}$ for some ${m = O_d( \log^{O_d(1)} |k| )}$.) This is a special case of a conjecture of Babai and Seress, who conjectured that the bound should hold uniformly for all finite simple groups (in particular, the implied constants here should not actually depend on ${d}$. The methods used to handle the ${SL_d}$ case can handle other finite groups of Lie type of bounded rank, but at present we do not have bounds that are independent of the rank. On the other hand, a recent paper of Helfgott and Seress has almost resolved the conjecture for the permutation groups ${A_n}$.

A key tool to establish product theorems is an argument which is sometimes referred to as the pivot argument. To illustrate this argument, let us first discuss a much simpler (and older) theorem, essentially due to Freiman, which has a much weaker conclusion but is valid in any group ${G}$:

Theorem 2 (Baby product theorem) Let ${G}$ be a group, and let ${A}$ be a finite non-empty subset of ${G}$. Then one of the following statements hold:

• (Expansion) One has ${|A^{-1} A| \geq \frac{3}{2} |A|}$.
• (Close to a subgroup) ${A}$ is contained in a left-coset of a group ${H}$ with ${|H| < \frac{3}{2} |A|}$.

To prove this theorem, we suppose that the first conclusion does not hold, thus ${|A^{-1} A| <\frac{3}{2} |A|}$. Our task is then to place ${A}$ inside the left-coset of a fairly small group ${H}$.

To do this, we take a group element ${g \in G}$, and consider the intersection ${A\cap gA}$. A priori, the size of this set could range from anywhere from ${0}$ to ${|A|}$. However, we can use the hypothesis ${|A^{-1} A| < \frac{3}{2} |A|}$ to obtain an important dichotomy, reminiscent of the classical fact that two cosets ${gH, hH}$ of a subgroup ${H}$ of ${G}$ are either identical or disjoint:

Proposition 3 (Dichotomy) If ${g \in G}$, then exactly one of the following occurs:

• (Non-involved case) ${A \cap gA}$ is empty.
• (Involved case) ${|A \cap gA| > \frac{|A|}{2}}$.

Proof: Suppose we are not in the pivot case, so that ${A \cap gA}$ is non-empty. Let ${a}$ be an element of ${A \cap gA}$, then ${a}$ and ${g^{-1} a}$ both lie in ${A}$. The sets ${A^{-1} a}$ and ${A^{-1} g^{-1} a}$ then both lie in ${A^{-1} A}$. As these sets have cardinality ${|A|}$ and lie in ${A^{-1}A}$, which has cardinality less than ${\frac{3}{2}|A|}$, we conclude from the inclusion-exclusion formula that

$\displaystyle |A^{-1} a \cap A^{-1} g^{-1} a| > \frac{|A|}{2}.$

But the left-hand side is equal to ${|A \cap gA|}$, and the claim follows. $\Box$

The above proposition provides a clear separation between two types of elements ${g \in G}$: the “non-involved” elements, which have nothing to do with ${A}$ (in the sense that ${A \cap gA = \emptyset}$, and the “involved” elements, which have a lot to do with ${A}$ (in the sense that ${|A \cap gA| > |A|/2}$. The key point is that there is a significant “gap” between the non-involved and involved elements; there are no elements that are only “slightly involved”, in that ${A}$ and ${gA}$ intersect a little but not a lot. It is this gap that will allow us to upgrade approximate structure to exact structure. Namely,

Proposition 4 The set ${H}$ of involved elements is a finite group, and is equal to ${A A^{-1}}$.

Proof: It is clear that the identity element ${1}$ is involved, and that if ${g}$ is involved then so is ${g^{-1}}$ (since ${A \cap g^{-1} A = g^{-1}(A \cap gA)}$. Now suppose that ${g, h}$ are both involved. Then ${A \cap gA}$ and ${A\cap hA}$ have cardinality greater than ${|A|/2}$ and are both subsets of ${A}$, and so have non-empty intersection. In particular, ${gA \cap hA}$ is non-empty, and so ${A \cap g^{-1} hA}$ is non-empty. By Proposition 3, this makes ${g^{-1} h}$ involved. It is then clear that ${H}$ is a group.

If ${g \in A A^{-1}}$, then ${A \cap gA}$ is non-empty, and so from Proposition 3 ${g}$ is involved. Conversely, if ${g}$ is involved, then ${g \in A A^{-1}}$. Thus we have ${H = A A^{-1}}$ as claimed. In particular, ${H}$ is finite. $\Box$

Now we can quickly wrap up the proof of Theorem 2. By construction, ${A \cap gA| > |A|/2}$ for all ${g \in H}$,which by double counting shows that ${|H| < 2|A|}$. As ${H = A A^{-1}}$, we see that ${A}$ is contained in a right coset ${Hg}$ of ${H}$; setting ${H' := g^{-1} H g}$, we conclude that ${A}$ is contained in a left coset ${gH'}$ of ${H'}$. ${H'}$ is a conjugate of ${H}$, and so ${|H'| < 2|A|}$. If ${h \in H'}$, then ${A}$ and ${Ah}$ both lie in ${H'}$ and have cardinality ${|A|}$, so must overlap; and so ${h \in A A^{-1}}$. Thus ${A A^{-1} = H'}$, and so ${|H'| < \frac{3}{2} |A|}$, and Theorem 2 follows.

Exercise 2 Show that the constant ${3/2}$ in Theorem 2 cannot be replaced by any larger constant.

Exercise 3 Let ${A \subset G}$ be a finite non-empty set such that ${|A^2| < 2|A|}$. Show that ${AA^{-1}=A^{-1} A}$. (Hint: If ${ab^{-1} \in A A^{-1}}$, show that ${ab^{-1} = c^{-1} d}$ for some ${c,d \in A}$.)

Exercise 4 Let ${A \subset G}$ be a finite non-empty set such that ${|A^2| < \frac{3}{2} |A|}$. Show that there is a finite group ${H}$ with ${|H| < \frac{3}{2} |A|}$ and a group element ${g \in G}$ such that ${A \subset Hg \cap gH}$ and ${H = A A^{-1}}$.

Below the fold, we give further examples of the pivot argument in other group-like situations, including Theorem 2 and also the “sum-product theorem” of Bourgain-Katz-Tao and Bourgain-Glibichuk-Konyagin.