In this final lecture in the Marker lecture series, I discuss the recent work of Bourgain, Gamburd, and Sarnak on how arithmetic combinatorics and expander graphs were used to sieve for almost primes in various algebraic sets.

In previous lectures, we considered the problem of detecting tuples of primes in various linear or convex sets; in particular, we considered the size of sets of the form $V \cap {\mathcal P}^k$, where ${\mathcal P} = \{2,3,5,\ldots\}$ is the set of primes, and V is some affine subspace of ${\Bbb R}^k$.  (For instance, the twin prime conjecture would correspond to the case when k=2 and $V = \{ (x,x+2): x \in {\Bbb R}\}$, while the Green-Tao theorem would correspond to the case $V = \{ (x,x+r,\ldots,x+(k-1)r): x,r \in {\Bbb R}\}$.  We refer to elements of ${\mathcal P}^k$ as prime points.  The prime tuples conjecture implies the following qualitative criterion for when such a set of prime points should be “large”:

Qualitative prime tuples conjecture. Let V be an affine subspace of ${\Bbb R}^k$.  Suppose that

1. (No obstructions at infinity)  For any N, $V \cap {\Bbb Z}_{>N}^k$ affinely spans all of V, where ${\Bbb Z}_{>N} := \{ n \in {\Bbb Z}: n > N \}$.  (In particular, $V \cap {\Bbb Z}_{>N}^k$ is non-empty.)
2. (No obstructions at q)  For any $q > 1$, $V \cap ({\Bbb Z}_q^*)^k$ affinely spans all of V, where ${\Bbb Z}_q^* := \{ n \in {\Bbb Z}: (n,q) = 1 \}$.  (In particular,$V \cap ({\Bbb Z}_q^*)^k$ is non-empty.)

Then $V \cap {\mathcal P}^k$ affinely spans all of V.  (In particular, V contains at least one prime point.)

Both of the hypotheses in this conjecture are easily verified for any given V, the first by (integer) linear programming and the second by modular arithmetic.  This conjecture would imply several other results and conjectures in number theory, including the twin prime conjecture and the Green-Tao theorem.  Needless to say, it remains open in general (though the results mentioned in the previous lecture give partial results in the case when V is at least two-dimensional and non-degenerate).

Now we attempt to generalise the above conjecture to the setting in which V is an algebraic variety rather than an affine subspace.  (This would cover some famous open problems in number theory, for instance the Landau problem that asks whether there are infinitely many primes of the form $n^2+1$.) The notion of a set affinely spanning V is then naturally replaced by the notion of a set being Zariski dense in V, which means that the set is not contained in any strictly smaller subvariety of V.  One could then formulate a naive generalisation of the above conjecture by replacing “affine space” and “affinely spans all of” with “algebraic variety” and “is Zariski dense in” respectively.  However, the hypotheses are now no longer easy to verify; indeed, just the problem of determining whether V contains an integer point ${\Bbb Z}^k$ is essentially Hilbert’s tenth problem, which by Matiyasevich’s theorem is known to be undecidable for general V.  Indeed, since one can encode any computable set in terms of the integer points of a variety V, it is not too difficult to see that this conjecture fails in general.  (An amusing historical connection here: one of the first demonstrations of the undecidability of Hilbert’s tenth problem by Robinson, Davis, and Putnam was conditional on the existence of arbitrarily long progressions of primes (i.e. the Green-Tao theorem), although subsequent proofs did not need this fact.)

Since arbitrary algebraic varieties are far too general to have any hope of a reasonable theory, one should look for prime points in much more special sets.  An important class here is that of an orbit $\Lambda b$ in ${\Bbb Z}^k$, where b is some vector in ${\Bbb Z}^k$ and $\Lambda$ is some finitely generated subgroup of $SL_k({\Bbb Z})$.  (One can also consider the slightly more general set of images $F(\Lambda b)$ under a polynomial map, but for simplicity let us stick to just orbits.) Of course one should take b to be primitive (not a multiple of any smaller vector), since one clearly will have a difficult time finding prime points in $\Lambda b$ otherwise.

The orbit $\Lambda b$ will be Zariski dense in some algebraic variety V, and is clearly a collection of integer points (though it may not cover all of $V \cap {\Bbb Z}^k$).  Assuming no local obstructions at infinity or at q (which means that $\Lambda b \cap {\Bbb Z}_{>N}^k$ and $\Lambda b \cap ({\Bbb Z}_q^*)^k)$ are Zariski dense in V), one could then conjecture that $\Lambda b \cap {\Bbb P}^k$ is also Zariski dense in V (which, if V is infinite, would in particular imply that the orbit $\Lambda b$ contains infinitely many prime points).

For simplicity let us restrict attention to the two-dimensional case k=2, which is already highly non-trivial; Bourgain, Gamburd and Sarnak have recently begun to get some preliminary results in k=3 but I will not discuss them here.  Thus $\Lambda$ is now a finitely generated subgroup of $SL_2({\Bbb Z})$.  If this subgroup is elementary – e.g. if it is cyclic – then the orbit $\Lambda b$ can be exponentially sparse (a ball of radius R may only contain $O(\log R)$ points), and it becomes extremely difficult to do any sieving or primality detection.  (In this case, the problem becomes comparable to such notoriously difficult questions as whether there are infinitely many Mersenne primes.)  It thus makes sense to restrict attention to non-elementary subgroups of $SL_2({\Bbb Z})$ – groups which contain a copy of the free non-abelian group on two generators, or equivalently any group whose Zariski closure is all of $SL_2$ (or equivalently yet again, a group whose limit set consists of more than one point).  In this situation, Bourgain, Gamburd, and Sarnak conjectured:

Conjecture 2. Let $\Lambda$ be a non-elementary subgroup of $SL_2({\Bbb Z})$, and let b be a primitive element of ${\Bbb Z}^2$.  Suppose that there are no local obstructions at infinity or at finite places q.  Then $\Lambda b \cap {\mathcal P}^2$ is Zariski dense in the plane (in particular, $\Lambda b \cap {\mathcal P}^2$ is infinite).

This conjecture remains open.  However, as in the linear situation, one can make progress if one replaces primes with almost primes – products of at most r primes for some bounded r.  (There are certainly prior results for nonlinear patterns in the almost primes; for instance, it is a famous result of Iwaniec that there are infinitely many numbers of the form $n^2+1$ that are the product of at most two primes.)  In particular, Bourgain, Gamburd, and Sarnak were able to show

Theorem. Let $\Lambda, b$ be as in Conjecture 2.  Then there exists an r such that $\Lambda b \cap {\mathcal P}_r^2$ is Zariski dense in the plane, where ${\mathcal P}_r$ is the set of numbers that are the product of at most r primes.

Several further generalisations and extensions of this result, with a similar flavour, are known, but will not be discussed here.  There are a number of amusing special cases of these results, for instance one can show that there exist infinitely many Appollonian circle packings of the unit circle by four other mutually tangent circles, all of whose radii is the reciprocal of an almost prime, or infinitely many Pythagorean triples whose area is an almost prime (for a sufficiently large r in the definition of “almost prime”).

Let me now discuss some of the key ideas in the proof of this theorem.  One begins by rephrasing the question in a more quantitative (or finitary) manner.  In the linear case, this would be done by counting the number of points in $\Lambda b \cap {\mathcal P}_r^2$ that lie inside some large Euclidean ball, thus using the Euclidean (or Archimedean) notion of distance to localise the problem.  This can also be done here, but it turns out to be more convenient to instead use the word metric induced by the finite generating set S of $\Lambda$ (which we can take to be symmetric for convenience, thus $S = S^{-1}$).  One thus looks at sets of the form $B_R b \cap {\mathcal P}_r^2$, where $B_R \subset \Lambda$ consists of all words formed by products of at most R elements of S.  A major new difficulty here compared to the linear theory is the exponential growth of $B_R$ (a consequence of the non-elementary nature of $\Lambda$).  (Though it is not immediately apparent, the same problem also arises if one uses Euclidean balls instead of word metric balls, due to the multiplicative rather than additive nature of the group $\Lambda$.)

The next step is to use sieve theory.  Recall the sieve of Eratosthenes, which expresses the set of all (large) primes as the integers, minus the multiples of two, minus the multiples of three, and so forth.  Using the inclusion-exclusion principle, we can thus view the indicator function $1_{{\mathcal P}}$ of the primes, when restricted to an interval such as ${}[N,2N]$, as equal to 1, minus the indicator function $1_{2{\Bbb Z}}$ of the even numbers, minus the indicator function $1_{3{\Bbb Z}}$ of the multiples of three, plus the indicator function $1_{6{\Bbb Z}}$ of the multiples of six, and so forth.  This leads to the Legendre sieve

$\displaystyle 1_{{\mathcal P}} = \sum_d \mu(d) 1_{d {\Bbb Z}}$, (1)

valid in an interval ${}[N,2N]$ as long as one restricts d to those integers which are products of primes less than N.  Here $\mu(d)$ is the Möbius function.

The basic idea of sieve theory is to replace the indicator function of the primes (or almost primes) by a more general divisor sum

$\displaystyle \sum_d c_d 1_{d{\Bbb Z}}$,

where the sieve weights $c_d$ are chosen in order to optimise the final bounds in the sieve (they typically resemble “smoothed out” versions of the Möbius functionin order that these sieves be large on the almost primes and small elsewhere).  In order for the sieve to be practical, one wants to restrict d in this sum to be relatively small, for instance $d \leq N^\theta$ for some absolute constant $0 < \theta < 1$ (values such as $\theta=1/4$ are fairly typical).  The selection of the sieve weights $c_d$ is now a well-developed science (see also my earlier post, and Kowalski’s guest post, on this topic), and Bourgain, Gamburd and Sarnak basically use off-the-shelf sieves (in particular, combinatorial sieves and the Selberg sieve) in their work.  Inserting these standard sieves into the problem at hand, the task of counting almost primes in the finite set $B_R b$ then quickly reduces to the question of getting good estimates on sets such as $B_R b \cap (q {\Bbb Z})^2$ for various q.  This amounts to much the same thing as asking for good equidistribution bounds for $B_R$ modulo q, thus we project the generating set S, and the ball $B_R$ it produces, from $SL_2({\Bbb Z})$ to $SL_2({\Bbb Z}_q)$.  For sieving purposes it turns out to be necessary to consider all squarefree moduli q, but for simplicity we shall only discuss the (massively easier) case when q is prime.

The reduction to an equidistribution problem converts the original sieving problem to a more combinatorial one, involving the Cayley graph G on $SL_2({\Bbb Z}_q)$ induced by the set S, thus two vertices $x, y \in SL_2({\Bbb Z}_q)$ are connected by an edge in G if $y x^{-1}$ lie in S (modulo q).  The image of the ball $B_R$ in $SL_2({\Bbb Z}_q)$ is then the set of points one can reach in the graph G from the origin by walking on a path of length at most R.  The desired equidistribution result one needs can then be viewed as a mixing result for the random walk along the graph G.

Standard graph theory then tells us that the task reduces to showing that the graphs G form a family of expander graphs as $q \to \infty$ (recall we are restricting q to be prime for simplicity).  There are many equivalent definitions of what an expander graph is, but let us give a spectral definition that is specialised to Cayley graphs.  The symmetric generating set S induces a natural measure

$\displaystyle \mu := \frac{1}{|S|} \sum_{s \in S} \delta_s$

that is the uniform distribution on S, which controls the random walk along G; note that if $f: SL_2({\Bbb Z}_q) \to {\Bbb C}$ is a function, then the convolution $f*\mu: SL_2({\Bbb Z}_q) \to {\Bbb C}$ is another function, whose value at any vertex x is the average value of f at all the neighbours of x.  The operation $f \mapsto f*\mu$ is then a self-adjoint contraction on $l^2(SL_2({\Bbb Z}_q))$ which leaves the function 1 invariant, so its largest eigenvalue $\lambda_1$ is equal to 1.  The expander graph condition is then equivalent to the existence of a spectral gap $\lambda_2 \leq 1-c$ for the second largest eigenvalue, where $c > 0$ is a constant independent of q.

Of course, to have a spectral gap, one necessary condition is that $\lambda_2$ be strictly less than 1.  This can be easily seen to be equivalent to the statement that G is connected, which in turn is equivalent to the statement that the projection of S to $SL_2({\Bbb Z}_q)$ generates all of $SL_2({\Bbb Z}_q)$.  This statement can be verified to be true, either by direct consideration of all possible subgroups of $SL_2({\Bbb Z}_q)$, or by the strong approximation property.  However, mere connectedness is not enough to ensure that a Cayley graph is an expander family (which can be viewed as a sort of “robust” version of connectedness, which can survive the deletion of large numbers of edges).  For instance, the Cayley graph of the generating set $\{-1,+1\}$ in ${\Bbb Z}/N{\Bbb Z}$ is connected, but does not form an expander family as $N \to \infty$; the second largest eigenvalue is about $1 - O(1/N^2)$ only.  (One can also see that the random walk on this Cayley graph takes a long time (about O(N^2) steps) before it mixes to be close to the uniform distribution; with expander graphs on a set of N vertices, mixing instead occurs in time $O(\log N)$, thanks to the spectral gap.)

Obtaining the spectral gap property requires more work.  When the original subgroup $\Lambda$ of $SL_2({\Bbb Z})$ is as large as a finite index subgroup (in particular, if it is a congruence subgroup), this gap follows from a celebrated theorem of Selberg providing a similar spectral gap for arithmetic quotients of the upper half-plane.  Smaller examples (in which the index is now infinite) were first constructed by Shalom and by Gamburd, with the latter following the method of Sarnak and Xue.  Then (in the case of prime q), Bourgain and Gamburd extended the method using additional tools from additive combinatorics to handle all non-elementary subgroups.

Let us now describe the method of proof.  As mentioned briefly earlier, the existence of a spectral gap implies a strong mixing property: the iterated convolutions $\mu^{(n)} := \mu * \ldots * \mu$ of the probability measure $\mu$ (which can be interpreted as the probability distribution of a random walk on n steps) converges exponentially fast to the constant distribution on $SL_2({\Bbb Z}_q)$.  Since the latter distribution has an l^2 norm of $O( q^{-3/2} )$, we see in particular that for any fixed $\varepsilon > 0$, we will have

$\displaystyle \| \mu^{(n)}\|_{l^2(SL_2({\Bbb Z}_q))} = O( q^{-3/2+\varepsilon} )$ (2)

once n is a sufficiently large multiple of $\log q$.  This can also be seen explicitly from the trace formula

$\displaystyle \| \mu^{(n)}\|_{l^2(SL_2({\Bbb Z}_q))}^2 = \frac{1}{|SL_2({\Bbb Z}_q)|} \sum_j \lambda_j^n$. (3)

In general, this implication between spectral gap and rapid mixing (2) cannot be reversed; the problem is that $\lambda_2$ only directly influences one term in the summation on the right-hand side of (3), and so upper bounds on the left-hand side do not translate efficiently to upper bounds on $\lambda_2$. However, there is an algebraic miracle that happens in the case of groups such as $SL_2({\Bbb Z}_q)$ that allows one to reverse the implication:

Lemma (Frobenius)  Let q be prime.  Then every non-trivial finite-dimensional unitary representation of $SL_2({\Bbb Z}_q)$ has dimension at least (q-1)/2.

Proof. Observe that $SL_2({\Bbb Z}_q)$ can be generated by parabolic elements, so given a non-trivial representation $\rho: SL_2({\Bbb Z}_q) \to U(V)$, there exists a parabolic element a whose representation $\rho(a)$ is non-trivial.  By a change of basis we may take

$\displaystyle a = \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}.$

On the one hand, we have $a^q=1$ and hence $\rho(a)^q=1$; thus all eigenvalues of $\rho(a)$ are $q^{th}$ roots of unity.  On another hand, $\rho(a)$ is non-trivial, so at least one of the eigenvalues of $\rho(a)$ differs from 1.  Thirdly, conjugating a by diagonal matrices in $SL_2({\Bbb Z}_q)$, we see that a is conjugate to $a^m$ whenever m is a quadratic residue mod q, and so the eigenvalues of $\rho(a)$ must be stable under the operation of taking $m^{th}$ powers.  On the other hand, there are (q-1)/2 quadratic residues.  Putting all this together we see that $\rho(a)$ must take at least (q-1)/2 distinct eigenvalues, and the claim follows. $\Box$

Remark. For our purposes, the exact value of $(q-1)/2$ is irrelevant; any multiplicity which grows like a power of q would suffice. $\diamond$

Applying this lemma to the eigenspace of $\lambda_2$, we obtain

Corollary. The second eigenvalue $\lambda_2$ of the operation $f \mapsto f*\mu$ appears with multiplicity at least $(q-1)/2$.

Combining this corollary with (3), one can now reverse the previous implication and obtain a spectral gap $\lambda_2 \leq 1-c$ as soon as one gets a mixing estimate (2) for some $n = O(\log q)$ and some sufficiently small $\varepsilon$.

The task is now to obtain the mixing estimate (2).  The quantity $\|\mu^{(n)}\|_{l^2(SL_2({\Bbb Z}_q))}$ starts at 1 when $n=0$ and decreases with n.  If we assume (as we may) that S generates a free group, then it is not hard to see that $\mu^{(n)}$ expands rapidly for $n \ll \log q$ (because all the words generated by S will be distinct until one encounters the “wrap-around” effect of taking residues modulo q).  Using this one can get a preliminary mixing bound

$\displaystyle \| \mu^{(n)} \|_{l^2(SL_2({\Bbb Z}_q))} \leq q^{-\delta}$

for some absolute constant $\delta > 0$ and some $n = O(\log q)$.  Also, since S modulo q generates all of $SL_2({\Bbb Z}_q)$, we know that the probability measure $\mu^{(n)}$ is not trapped inside any proper subgroup H of $SL_2({\Bbb Z}_q)$; indeed, using the classification of subgroups of $SL_2({\Bbb Z}_q)$ (or some general “escape from subvarieties” machinery of Eskin, Moses, and Oh) one can show that

$\displaystyle \mu^{(n)}(H) \leq q^{-\delta}$

for any such subgroup, and some $n = O(\log q)$.  The result now follows from iterating the following lemma, which is the heart of the argument:

$l^2$ flattening lemma. Let $\nu$ be a symmetric probability measure on $SL_2({\Bbb Z}_q)$ which is a little bit dispersed in the sense that

$\displaystyle \| \nu \|_{l^2(SL_2({\Bbb Z}_q))} \leq q^{-\delta}$

for some $\delta > 0$, and is not concentrated in a subvariety in the sense that $\nu*\nu(H) \leq q^{-\delta}$ for any proper subgroup H of $SL_2({\Bbb Z}_q)$.  Suppose also that $\nu$ is not entirely flat in the sense that

$\displaystyle \| \nu \|_{l^2(SL_2({\Bbb Z}_q))} \geq q^{-3/2+\delta}$

(note that the minimal $l^2$ norm for a probability measure is comparable to $q^{-3/2}$, attained for the uniform distribution).  Then $\nu * \nu$ is “flatter” than $\nu$ in the sense that

$\displaystyle \| \nu * \nu \|_{l^2(SL_2({\Bbb Z}_q))} \leq q^{-\varepsilon} \|\nu\|_{l^2(SL_2({\Bbb Z}_q))}$

for some $\varepsilon > 0$ depending on $\delta$.

In the special case when $\nu$ is the uniform distribution on some set A, the flattening lemma is very close to the following theorem of Helfgott:

Product theorem. Let q be a prime.  Let A be a subset of $SL_2({\Bbb Z}_q)$ which is not too big in the sense that $|A| \leq q^{3-\delta}$ for some $\delta > 0$, and which is not contained in any proper subgroup H of $SL_2({\Bbb Z}_q)$.  Then $|A \cdot A \cdot A| \geq |A|^{1+\varepsilon}$ for some $\varepsilon > 0$ depending on $\delta$.

Indeed, by using some standard additive combinatorics, in particular a (non-commutative version of) the Balog-Szemerédi-Gowers lemma (which can be found for instance in my book with Van Vu), which connects “statistical” multiplication, such as that provided by convolution $\mu, \nu \mapsto \mu * \nu$, with “combinatorial” multiplication, coming from the product set operation $A, B \mapsto A \cdot B$, one can show that these two statements are in fact equivalent to each other.

The product theorem is a manifestation of certain “nonlinear” or “noncommutative” behaviour in the group $SL_2({\Bbb Z}_p)$; see my Milliman lecture for a bit more discussion on this.  For now, let me just say that Helfgott’s proof on this uses a variety of algebraic and combinatorial computations exploiting the special structure of $SL_2({\Bbb Z}_p)$ (especially how commutativity or non-commutativity of various elements in this group interact with the trace of various combinations of these elements), as well as the following sum-product estimate of Bourgain-Katz-Tao and Bourgain-Glibichuk-Konyagin:

Sum-product theorem. Let q be prime. Let A be a subset of ${\Bbb Z}_q$ which is not too big in the sense that $|A| \leq q^{1-\delta}$ for some $\delta > 0$.  Then $|A+A| + |A \cdot A| \geq |A|^{1+\varepsilon}$ for some $\varepsilon > 0$ depending only on $\delta$.

There are now some quite elementary proofs of this theorem, but I will not discuss them here (see e.g. my earlier blog post on this topic).  I should note, though, that the bulk of the Bourgain-Gamburd-Sarnak work is preoccupied with establishing a suitable extension of this sum-product theorem to the case when q is not prime, in a manner which is uniform in the number of prime factors; this turns out to be a surprisingly difficult task.