In this final set of course notes, we discuss how (a generalisation of) the expansion results obtained in the preceding notes can be used for some number-theoretic applications, and in particular to locate almost primes inside orbits of thin groups, following the work of Bourgain, Gamburd, and Sarnak. We will not attempt here to obtain the sharpest or most general results in this direction, but instead focus on the simplest instances of these results which are still illustrative of the ideas involved.

One of the basic general problems in analytic number theory is to locate tuples of primes of a certain form; for instance, the famous (and still unsolved) twin prime conjecture asserts that there are infinitely many pairs {(n_1,n_2)} in the line {\{ (n_1,n_2) \in {\bf Z}^2: n_2-n_1=2\}} in which both entries are prime. In a similar spirit, one of the Landau conjectures (also still unsolved) asserts that there are infinitely many primes in the set {\{ n^2+1: n \in {\bf Z} \}}. The Mersenne prime conjecture (also unsolved) asserts that there are infinitely many primes in the set {\{ 2^n - 1: n \in {\bf Z} \}}, and so forth.

More generally, given some explicit subset {V} in {{\bf R}^d} (or {{\bf C}^d}, if one wishes), such as an algebraic variety, one can ask the question of whether there are infinitely many integer lattice points {(n_1,\ldots,n_d)} in {V \cap {\bf Z}^d} in which all the coefficients {n_1,\ldots,n_d} are simultaneously prime; let us refer to such points as prime points.

At this level of generality, this problem is impossibly difficult. Indeed, even the much simpler problem of deciding whether the set {V \cap {\bf Z}^d} is non-empty (let alone containing prime points) when {V} is a hypersurface {\{ x \in {\bf R}^d: P(x) = 0 \}} cut out by a polynomial {P} is essentially Hilbert’s tenth problem, which is known to be undecidable in general by Matiyasevich’s theorem. So one needs to restrict attention to a more special class of sets {V}, in which the question of finding integer points is not so difficult. One model case is to consider orbits {V = \Gamma b}, where {b \in {\bf Z}^d} is a fixed lattice vector and {\Gamma} is some discrete group that acts on {{\bf Z}^d} somehow (e.g. {\Gamma} might be embedded as a subgroup of the special linear group {SL_d({\bf Z})}, or on the affine group {SL_d({\bf Z}) \ltimes {\bf Z}^d}). In such a situation it is then quite easy to show that {V = V \cap {\bf Z}^d} is large; for instance, {V} will be infinite precisely when the stabiliser of {b} in {\Gamma} has infinite index in {\Gamma}.

Even in this simpler setting, the question of determining whether an orbit {V = \Gamma b} contains infinitely prime points is still extremely difficult; indeed the three examples given above of the twin prime conjecture, Landau conjecture, and Mersenne prime conjecture are essentially of this form (possibly after some slight modification of the underlying ring {{\bf Z}}, see this paper of Bourgain-Gamburd-Sarnak for details), and are all unsolved (and generally considered well out of reach of current technology). Indeed, the list of non-trivial orbits {V = \Gamma b} which are known to contain infinitely many prime points is quite slim; Euclid’s theorem on the infinitude of primes handles the case {V = {\bf Z}}, Dirichlet’s theorem handles infinite arithmetic progressions {V = a{\bf Z} + r}, and a somewhat complicated result of Green, Tao, and Ziegler handles “non-degenerate” affine lattices in {{\bf Z}^d} of rank two or more (such as the lattice of length {d} arithmetic progressions), but there are few other positive results known that are not based on the above cases (though we will note the remarkable theorem of Friedlander and Iwaniec that there are infinitely many primes of the form {a^2+b^4}, and the related result of Heath-Brown that there are infinitely many primes of the form {a^3+2b^3}, as being in a kindred spirit to the above results, though they are not explicitly associated to an orbit of a reasonable action as far as I know).

On the other hand, much more is known if one is willing to replace the primes by the larger set of almost primes – integers with a small number of prime factors (counting multiplicity). Specifically, for any {r \geq 1}, let us call an {r}-almost prime an integer which is the product of at most {r} primes, and possibly by the unit {-1} as well. Many of the above sorts of questions which are open for primes, are known for {r}-almost primes for {r} sufficiently large. For instance, with regards to the twin prime conjecture, it is a result of Chen that there are infinitely many pairs {p,p+2} where {p} is a prime and {p+2} is a {2}-almost prime; in a similar vein, it is a result of Iwaniec that there are infinitely many {2}-almost primes of the form {n^2+1}. On the other hand, it is still open for any fixed {r} whether there are infinitely many Mersenne numbers {2^n-1} which are {r}-almost primes. (For the superficially similar situation with the numbers {2^n+1}, it is in fact believed (but again unproven) that there are only finitely many {r}-almost primes for any fixed {r} (the Fermat prime conjecture).)

The main tool that allows one to count almost primes in orbits is sieve theory. The reason for this lies in the simple observation that in order to ensure that an integer {n} of magnitude at most {x} is an {r}-almost prime, it suffices to guarantee that {n} is not divisible by any prime less than {x^{1/(r+1)}}. Thus, to create {r}-almost primes, one can start with the integers up to some large threshold {x} and remove (or “sieve out”) all the integers that are multiples of any prime {p} less than {x^{1/(r+1)}}. The difficulty is then to ensure that a sufficiently non-trivial quantity of integers remain after this process, for the purposes of finding points in the given set {V}.

The most basic sieve of this form is the sieve of Eratosthenes, which when combined with the inclusion-exclusion principle gives the Legendre sieve (or exact sieve), which gives an exact formula for quantities such as the number {\pi(x,z)} of natural numbers less than or equal to {x} that are not divisible by any prime less than or equal to a given threshold {z}. Unfortunately, when one tries to evaluate this formula, one encounters error terms which grow exponentially in {z}, rendering this sieve useful only for very small thresholds {z} (of logarithmic size in {x}). To improve the sieve level up to a small power of {x} such as {x^{1/(r+1)}}, one has to replace the exact sieve by upper bound sieves and lower bound sieves which only seek to obtain upper or lower bounds on quantities such as {\pi(x,z)}, but contain a polynomial number of terms rather than an exponential number. There are a variety of such sieves, with the two most common such sieves being combinatorial sieves (such as the beta sieve), based on various combinatorial truncations of the inclusion-exclusion formula, and the Selberg upper bound sieve, based on upper bounds that are the square of a divisor sum. (There is also the large sieve, which is somewhat different in nature and based on {L^2} almost orthogonality considerations, rather than on any actual sieving, to obtain upper bounds.) We will primarily work with a specific sieve in this notes, namely the beta sieve, and we will not attempt to optimise all the parameters of this sieve (which ultimately means that the almost primality parameter {r} in our results will be somewhat large). For a more detailed study of sieve theory, see the classic text of Halberstam and Richert, or the more recent texts of Iwaniec-Kowalski or of Friedlander-Iwaniec.

Very roughly speaking, the end result of sieve theory is that excepting some degenerate and “exponentially thin” settings (such as those associated with the Mersenne primes), all the orbits which are expected to have a large number of primes, can be proven to at least have a large number of {r}-almost primes for some finite {r}. (Unfortunately, there is a major obstruction, known as the parity problem, which prevents sieve theory from lowering {r} all the way to {1}; see this blog post for more discussion.) One formulation of this principle was established by Bourgain, Gamburd, and Sarnak:

Theorem 1 (Bourgain-Gamburd-Sarnak) Let {\Gamma} be a subgroup of {SL_2({\bf Z})} which is not virtually solvable. Let {f: {\bf Z}^4 \rightarrow {\bf Z}} be a polynomial with integer coefficients obeying the following primitivity condition: for any positive integer {q}, there exists {A \in \Gamma \subset {\bf Z}^4} such that {f(A)} is coprime to {q}. Then there exists an {r \geq 1} such that there are infinitely many {A \in \Gamma} with {f(A)} non-zero and {r}-almost prime.

This is not the strongest version of the Bourgain-Gamburd-Sarnak theorem, but it captures the general flavour of their results. Note that the theorem immediately implies an analogous result for orbits {\Gamma b \subset {\bf Z}^2}, in which {f} is now a polynomial from {{\bf Z}^2} to {{\bf Z}}, and one uses {f(Ab)} instead of {f(A)}. It is in fact conjectured that one can set {r=1} here, but this is well beyond current technology. For the purpose of reaching {r=1}, it is very natural to impose the primitivity condition, but as long as one is content with larger values of {r}, it is possible to relax the primitivity condition somewhat; see the paper of Bourgain, Gamburd, and Sarnak for more discussion.

By specialising to the polynomial {f: \begin{pmatrix} a & b \\ c & d \end{pmatrix} \rightarrow abcd}, we conclude as a corollary that as long as {\Gamma} is primitive in the sense that it contains matrices with all coefficients coprime to {q} for any given {q}, then {\Gamma} contains infinitely many matrices whose elements are all {r}-almost primes for some {r} depending only on {\Gamma}. For further applications of these sorts of results, for instance to Appolonian packings, see the paper of Bourgain, Gamburd, and Sarnak.

It turns out that to prove Theorem 1, the Cayley expansion results in {SL_2(F_p)} from the previous set of notes are not quite enough; one needs a more general Cayley expansion result in {SL_2({\bf Z}/q{\bf Z})} where {q} is square-free but not necessarily prime. The proof of this expansion result uses the same basic methods as in the {SL_2(F_p)} case, but is significantly more complicated technically, and we will only discuss it briefly here. As such, we do not give a complete proof of Theorem 1, but hopefully the portion of the argument presented here is still sufficient to give an impression of the ideas involved.

— 1. Combinatorial sieving —

In this section we set up the combinatorial sieve needed to establish Theorem 1. To motivate this sieve, let us focus first on a much simpler model problem, namely the task of estimating the number {\pi(x,z)} of natural numbers less than or equal to a given threshold {x} which are not divisible by any prime less than or equal to {z}. Note that for {z} between {\sqrt{x}} and {x}, {\pi(x,z)} is simply the number of primes in the interval {(z,x]}; but for {z} less than {\sqrt{x}}, {\pi(x,z)} also counts some almost primes in addition to genuine primes. This quantity can be studied quite precisely by a variety of tools, such as those coming from multiplicative number theory; see for instance this paper of Granville and Soundararajan for some of the most precise results in this direction.

The quantity {\pi(x,z)} is easiest to estimate when {z} is small. For instance, {\pi(x,1)} is simply the number of natural numbers less than {x}, and so

\displaystyle \pi(x,1) = x + O(1).

Similarly, {\pi(x,2)} is the number of odd numbers less than {x}, and so

\displaystyle \pi(x,2) = \frac{1}{2} x + O(1).

Carrying this further, {\pi(x,3)} is the number of numbers less than {x} that are coprime to {6}, and so

\displaystyle \pi(x,3) = \frac{1}{3} x + O(1)

(but note that the implied constant in the {O(1)} error is getting increasingly large). Continuing this analysis, it is not hard to see that

\displaystyle \pi(x,z) = (\prod_{p \leq z} (1-\frac{1}{p})) x + O_z(1)

for any fixed {z}; note from Mertens’ theorem that

\displaystyle \prod_{p \leq z} (1-\frac{1}{p}) = \frac{e^{\gamma} + o(1)}{\log z} \ \ \ \ \ (1)

leading to the heuristic approximation

\displaystyle \pi(x,z) \approx e^\gamma \frac{x}{\log z}

where {\gamma= 0.577\ldots} is the Euler-Mascheroni constant. Note though that this heuristic should be treated with caution when {z} is large; for instance, from the prime number theorem we see that we have the conflicting asymptotic

\displaystyle \pi(x,z) = (1+o(1)) \frac{x}{\log x}

when {\sqrt{x} \leq z \leq o(x)}. This is already a strong indication that one needs to pay careful attention to the error terms in this analysis. (Indeed, many false “proofs” of conjectures in analytic number theory, such as the twin prime conjecture, have been based on a cavalier attitude to such error terms, and their asymptotic behaviour under various limiting regimes.)

Let us thus work more carefully to control the error term {O_z(1)}. Write {P(z) := \prod_{p \leq z} p} for the product of all the primes less than or equal to {z} (this quantity is also known as the primorial of {z}). Then we can write

\displaystyle \pi(x,z) = \sum_{n \leq x} 1_{(n,P(z))=1}

where the sum ranges over natural numbers {n} less than {x}, and {(n,P(z))} is the greatest common divisor of {n} and {P(z)}. The function {1_{(n,P(z))=1}} is periodic of period {P(z)}, and is equal to {1} on {(\prod_{p\leq z} (1-\frac{1}{p})) P(z)} of the residue classes modulo {P(z)}, which leads to the crude bound

\displaystyle \pi(x,z) = (\prod_{p \leq z} (1-\frac{1}{p})) x + O(P(z)). \ \ \ \ \ (2)

However, this error term is too large for most applications: from the prime number theorem, we see that {P(z) = \exp((1+o(1)) z)}, so the error term grows exponentially in {z}. In particular, this estimate is only non-trivial in the regime {z = O( \log x )}.

One can do a little better than this by using the inclusion-exclusion principle, which in this context is also known as the Legendre sieve. Consider for instance {\pi(x,3)}, which counts the number of natural numbers {n \leq x} coprime to {P(3) = 2 \times 3}. We can compute this quantity by first counting all numbers less than {x}, then subtracting those numbers divisible by {2} and by {3}, and then adding back those numbers divisible by both {2} and {3}. A convenient way to describe this procedure in general is to introduce the Möbius function {\mu(n)}, defined to equal {(-1)^k} when {n} is the product of {k} distinct primes for some {k \geq 0}. The key point is that

\displaystyle 1_{n=1} = \sum_{d|n} \mu(d) \ \ \ \ \ (3)

for any natural number {n}, where {d} ranges over the divisors of {n}; indeed, this identity can be viewed as an alternate way to define the Möbius function. In particular, {1_{(n,P(z))=1} = \sum_{d|P(z)} \mu(d) 1_{d|n}}, leading to the Legendre identity

\displaystyle \pi(x,z) = \sum_{d|P(z)} \mu(d) \sum_{n \leq x; d|n} 1.

The inner sum can be easily estimated as

\displaystyle \sum_{n \leq x; d|n} 1 = \frac{x}{d} + O(1); \ \ \ \ \ (4)

since {P(z)} has {2^{\pi(z)}} distinct factors, where {\pi(z)} is the number of primes less than or equal to {z}, we conclude that

\displaystyle \pi(x,z) = \sum_{d|P(z)} \mu(d) \frac{x}{d} + O( 2^{\pi(z)} ).

The main term here can be factorised as

\displaystyle \sum_{d|P(z)} \mu(d) \frac{x}{d} = x \prod_{p \leq z} (1 - \frac{1}{p}) \ \ \ \ \ (5)

leading to the following slight improvement

\displaystyle \pi(x,z) = (\prod_{p \leq z} (1-\frac{1}{p})) x + O(2^{\pi(z)})

to (2). Note from the prime number theorem that {2^{\pi(z)} = O( \exp( O( z / \log z ) ) )}, so this error term is asymptotically better than the one in (2); the bound here is now non-trivial in the slightly larger regime {z = O( \log x \log \log x )}. But this is still not good enough for the purposes of counting almost primes, which would require {z} as large as a power of {x}.

To do better, we will replace the exact identity (3) by combinatorial truncations

\displaystyle \sum_{d|n: d \in {\mathcal D}_-} \mu(d) \leq 1_{n=1} \leq \sum_{d|n: d \in {\mathcal D}_+} \mu(d) \ \ \ \ \ (6)

of that identity, where {n} divides {P(z)} and {{\mathcal D}_-, {\mathcal D}_+} are sets to be specified later, leading to the upper bound sieve

\displaystyle \pi(x,z) \leq \sum_{d|P(z); d \in {\mathcal D}_+} \mu(d) \frac{x}{d} + O( |{\mathcal D}_+| ) \ \ \ \ \ (7)

and the lower bound sieve

\displaystyle \pi(x,z) \geq \sum_{d|P(z); d \in {\mathcal D}_-} \mu(d) \frac{x}{d} + O( |{\mathcal D}_-| ). \ \ \ \ \ (8)

The key point will be that {{\mathcal D}_+} and {{\mathcal D}_-} can be chosen to be only polynomially large in {z}, rather than exponentially large, without causing too much damage to the main terms {\sum_{d|P(z); d \in D_\pm} \mu(d) \frac{x}{d}}, which lead to upper and lower bounds on {\pi(x,z)} that remain non-trivial for moderately large values of {z} (e.g. {z = x^{1/(r+1)}} for some fixed {r}).

We now turn to the task of locating reasonably small sets {{\mathcal D}_+, {\mathcal D}_-} obeying (6). We begin with(3), which we rewrite as

\displaystyle 1_{n=1} = \sum_{d|P(z)} \mu(d) 1_{d|n} \ \ \ \ \ (9)

for {n} a divisor of {P(z)}. One can view the divisors of {P(z)} as a {\pi(z)}-dimensional combinatorial cube, with the right-hand side in (9) being a sum over that cube; the idea is then to hack off various subcubes of that cube in a way that only serves to increase (for the upper bound sieve) or decrease (for the lower bound sieve) that sum, until only a relatively small portion of the cube remains.

We turn to the details. Our starting point will be the identity

\displaystyle \sum_{d|P(z): d = p_1 \ldots p_k d', d' | P(p_k)} \mu(d) 1_{d|n} = (-1)^k 1_{n=p_1 \ldots p_k m; (m,P(p_k)=1)} \ \ \ \ \ (10)

whenever {z \geq p_1 > p_2 > \ldots > p_k} are primes, which follows easily from applying (3) to {n / (p_1 \ldots p_k)} when {p_1,\ldots,p_k} divide {n}. One can view the left-hand side of (10) as a subsum of the sum in (9), and (10) implies that this subsum is non-negative when {k} is even and non-positive when {k} is odd. In particular, we see that (6) will hold when {{\mathcal D}_+} is formed from the “cube” {\{ d: d|P(z)\}} by removing some disjoint “subcubes” of the form {\{ d =p_1 \ldots p_k d': d' | P(p_k) \}} for {z \geq p_1 > \ldots > p_k} and {k} odd, and similarly for {{\mathcal D}_-} but with {k} now required to be even instead of odd.

Observe that the subcube {\{ d =p_1 \ldots p_k d': d' | P(p_k) \}} consists precisely of those divisors {d} of {P(z)} whose top {k} prime factors are {p_1,\ldots,p_k}. We now have the following general inequality:

Lemma 2 (Combinatorial sieve) Let {z>0}. For each natural number {k}, let {A_k( p_1,\ldots,p_k)} be a predicate pertaining to {k} decreasing primes {z \geq p_1>\ldots>p_k} (thus {A_k(p_1,\ldots,p_k)} is either true or false for each choice of {p_1,\ldots,p_k}). Let {{\mathcal D}_+} be the set of all natural numbers {n|P(z)} which, when factored as {n = p_1 \ldots p_r} for {z \geq p_1 > \ldots > p_r}, is such that {A_k(p_1,\ldots,p_k)} holds for all odd {1 \leq k \leq r}. Similarly define {{\mathcal D}_-} by requiring {k} to be even instead of odd. Then (6) holds for all {n|P(z)}.

Proof: {{\mathcal D}_+} is formed from {\{ d: d|P(z)\}} by removing those subcubes of the form {\{ d =p_1 \ldots p_k d': d' | P(p_k) \}} for {z \geq p_1 > \ldots > p_k}, {k} odd, and such that {A_{k'}(p_1,\ldots,p_{k'})} holds for all odd {1 \leq k' < k} but fails for {k'=k}. These subcubes are all disjoint, and so the claim for {{\mathcal D}_+} follows from the preceding discussion. Similarly for {{\mathcal D}_-}. \Box

This gives us the upper and lower bounds (7), (8) for {\pi(x,z)}. To make these bounds useful, we need to choose {{\mathcal D}_\pm} so that the partial sums {\sum_{d|P(z); d \in {\mathcal D}_\pm} \mu(d) \frac{x}{d}} are close to

\displaystyle \sum_{d|P(z)} \mu(d) \frac{x}{d} = x \prod_{p \leq z} (1 - \frac{1}{p}).

To do this, one must select the predicates {A_k(p_1,\ldots,p_k)} carefully. The best choices for these predicates are not immediately obvious; but after much trial and error, it was discovered that one fairly efficient choice is to let {A_k(p_1,\ldots,p_k)} be the predicate

\displaystyle p_1 \ldots p_{k-1} p_k^{\beta+1} < y

for some moderately large parameter {\beta \geq 2} (we will eventually take {\beta := 10}) and some parameter {y := z^s} for some {s > \beta} to be optimised in later (we will eventually take it to be almost as large as {x}). The use of this choice is referred to as the beta sieve.

Let us now estimate the errors

\displaystyle |\sum_{d|P(z)} \mu(d) \frac{x}{d} - \sum_{d|P(z): d \in {\mathcal D}_\pm} \mu(d) \frac{x}{d}|. \ \ \ \ \ (11)

For sake of argument let us work with {{\mathcal D}_-}, as the {{\mathcal D}_+} case is almost identical. By the triangle inequality, we can bound this error by

\displaystyle \sum_{k \hbox{ even}} \sum^* |\sum_{d = p_1 \ldots p_k d': d' |P(p_k)} \mu(d) \frac{x}{d}|

where {k} ranges over positive even integers, and {\sum^*} denotes a sum over primes {z \geq p_1 > \ldots > p_k} ranges over primes such that

\displaystyle p_1 \ldots p_{k'-1} p_{k'}^{\beta+1} < y \ \ \ \ \ (12)

for all even {k' < k}, but

\displaystyle p_1 \ldots p_{k-1} p_{k}^{\beta+1} \geq y. \ \ \ \ \ (13)

Since {p_1,\ldots,p_k \leq z} and {y = z^s}, this in particular gives the bound

\displaystyle k \geq s - \beta.

From (12) we have

\displaystyle p_1 \ldots p_{k'-1} p_{k'}^{\beta} < y

for all {1 \leq k' < k} (not necessarily even); note that the case {k'=1} follows from the hypothesis {y > z^\beta}. We can rewrite this inequality as

\displaystyle \frac{y}{p_1 \ldots p_{k'}} > (\frac{y}{p_1 \ldots p_{k'-1}})^{\frac{\beta-1}{\beta}}

and hence by induction

\displaystyle \frac{y}{p_1 \ldots p_{k'}} > y^{(\frac{\beta-1}{\beta})^{k'-1}}

for all {1 \leq k'< k}. From (13) we then have

\displaystyle p_k > y^{\frac{1}{\beta+1} (\frac{\beta-1}{\beta})^{k-1}} > y^{\frac{1}{\beta}(\frac{\beta-1}{\beta})^k} > z^{(\frac{\beta-1}{\beta})^k}.

We conclude that the error (11) is bounded by

\displaystyle \sum_{k=1}^\infty \sum_{z \geq p_1 > \ldots > p_k > z^{(\frac{\beta-1}{\beta})^k}} |\sum_{d = p_1 \ldots p_k d': d' |P(p_k)} \mu(d) \frac{x}{d}|

in the {{\mathcal D}_+}; a similar argument also gives this bound in the {{\mathcal D}_-} case. The inner sum can be computed as

\displaystyle |\sum_{d = p_1 \ldots p_k d': d' |P(p_k)} \mu(d) \frac{x}{d}| = \frac{x}{p_1 \ldots p_k} \prod_{p < p_k} (1 - \frac{1}{p})

and thus by Mertens’ theorem (1) and the bound {p_k > z^{(\frac{\beta-1}{\beta})^k}} we have

\displaystyle |\sum_{d = p_1 \ldots p_k d': d' |P(p_k)} \mu(d) \frac{x}{d}| \ll (\frac{\beta}{\beta-1})^k \frac{x}{p_1 \ldots p_k \log z}.

We have thus bounded (11) by

\displaystyle \ll \frac{x}{\log z} \sum_{k \geq s-\beta} (\frac{\beta}{\beta-1})^k \sum_{z \geq p_1 > \ldots > p_k > z^{(\frac{\beta-1}{\beta})^k}} \frac{1}{p_1 \ldots p_k}.

The inner sum can be bounded by

\displaystyle \frac{1}{k!} (\sum_{z \geq p > z^{(\frac{\beta-1}{\beta})^k}} \frac{1}{p})^k.

By another of Mertens’ theorems (or by taking logarithms of (1)) one has

\displaystyle \sum_{z \geq p > z^{(\frac{\beta-1}{\beta})^k}} \frac{1}{p} \leq k \log \frac{\beta}{\beta-1} + O(1)

and so (11) is bounded by

\displaystyle \ll \frac{x}{\log z} \sum_{k \geq s-\beta} \frac{1}{k!} ( k \frac{\beta}{\beta-1} \log \frac{\beta}{\beta-1} + O(1) )^k.

Using the crude bound {k! \geq \frac{k^k}{e^k}} (as can be seen by considering the {k^{th}} term in the Taylor expansion of {e^k}) we conclude the bound

\displaystyle \ll \frac{x}{\log z} \sum_{k \geq s-\beta} ( e \frac{\beta}{\beta-1} \log \frac{\beta}{\beta-1} + O(\frac{1}{k}) )^k.

If {\beta} is large enough ({\beta=10} will suffice) then the expression {e \frac{\beta}{\beta-1} \log \frac{\beta}{\beta-1}} is less than {1/e}; since {(1 + O(1/k))^k = O(1)}, this leads to the bound

\displaystyle \ll \frac{x}{\log z} \sum_{k \geq s-\beta} e^{-k}

which after summing the geometric series becomes

\displaystyle \ll e^{-s} \frac{x}{\log z}

(allowing implied constants to depend on {\beta}). From this bound on (11) and (5), (1) we have We thus have

\displaystyle \sum_{d|P(z): d \in {\mathcal D}_\pm} \mu(d) \frac{x}{d} = \frac{x}{\log z}( e^\gamma + O( e^{-s} ) + o(1) ).

Finally, if {d = p_1 \ldots p_k} is an element of {{\mathcal D}_\pm}, then by (12) and the hypothesis {\beta \geq 2} we have

\displaystyle d = p_1 \ldots p_k \leq y

and so we have the crude upper bounds {|{\mathcal D}_\pm| \leq y}. From (7), (8) and recalling that {y = z^s}, we thus have

\displaystyle \pi(x,z) = \frac{x}{\log z}( e^\gamma + O( e^{-s} ) + o(1) ) + O(z^s).

If {x > z^{11}}, we may optimise in {s} by setting {s := \frac{\log x}{\log z} - 1} (in order to make the final error term much less than {x}), leading to the bound

\displaystyle \pi(x,z) = \frac{x}{\log z}( e^\gamma + O( e^{-\log x/\log z} ) + o(1) ).

In particular, we have

\displaystyle \frac{x}{\log z} \ll \pi(x,z) \ll \frac{x}{\log z} \ \ \ \ \ (14)

whenever {2 \leq z \leq x^\epsilon} for some sufficiently small absolute constant {\epsilon>0}.

Remark 1 The bound (14) implies, among other things, that there exists an absolute constant {r} such that the number of {r}-almost primes less than {x} is {\gg x/\log x}, which is a very weak version of the prime number theorem. Note though that the upper bound in (14) does not directly imply a corresponding upper bound on this count of {r}-almost primes, because {r}-almost primes are allowed to have prime factors that are less than {x}. Indeed, a routine computation using Mertens’ theorem shows that for any fixed {r}, the number of {r}-almost primes less than {x} is comparable to {\frac{x}{\log x} (\log \log x)^{r-1}}.

We can generalise the above argument as follows:

Exercise 1 (Beta sieve) Let {a_n} be an absolutely convergent sequence of non-negative reals for {n \geq 1}. Let {x > 1}, {\kappa \geq 1}, and {\epsilon > 0}. Let {g: {\bf N} \rightarrow {\bf R}^+} be a multiplicative function taking values between {0} and {1}, with {g(p)<1} for all primes {p}. Assume the following axioms:

  • (i) (Control in arithmetic progressions) For any {d \leq x^\epsilon}, one has

    \displaystyle \sum_{n: d|n} a_n = g(d) x + O( x^{1-\epsilon} ).

  • (ii) (Mertens type theorem) For all {2 \leq z \leq x^\epsilon}, one has

    \displaystyle \frac{1}{\log^\kappa z} \ll \prod_{p \leq z} (1-g(p)) \ll \frac{1}{\log^\kappa z}. \ \ \ \ \ (15)

Conclude that there is an {\epsilon'>0} depending only on {\kappa,\epsilon}, and the implied constants in the above axioms, such that

\displaystyle \frac{x}{\log^\kappa z} \ll \sum_{n: (n,P(z))=1} a_n \ll \frac{x}{\log^\kappa z}

whenever {2 \leq z \leq x^{\epsilon'}}, and the implied constants may depend on {\kappa, \epsilon}, and the implied constants in the above axioms. (Note that (14) corresponds to the case when {\kappa := 1}, {g(n) := 1/n}, and {a_n := 1_{n \leq x}}.)

Exercise 2 Suppose we have the notation and hypotheses of the preceding exercise, except that the estimate (15) is replaced by the weaker bound

\displaystyle g(p) \leq \frac{\kappa}{p} + O(\frac{1}{p^2}) \ \ \ \ \ (16)

for all sufficiently large {p}. (For small {p}, note that we still have the bound {g(p)<1}.) Show that we still have the lower bound

\displaystyle \sum_{n: (n,P(z))=1} a_n \gg \frac{x}{\log^\kappa z}

whenever {2 \leq z \leq x^{\epsilon'}} for sufficiently small {\epsilon'} (which may depend on {g}), where the implied constant is now allowed to depend on {g}. (Hint: the main trick here is to extract out a common factor of {\prod_{p \leq z} (1-g(p))} from the analysis first, and then use the bound (16) to upper bound quantities such as {\prod_{p_k \leq p \leq z} (1-g(p))^{-1}}.)

One can weaken the axioms somewhat and still obtain non-trivial results from the beta sieve, but this somewhat crude version of the sieve will suffice for our purposes. Another, more abstract, formalisation of the above argument (involving a construction of sets {{\mathcal D}_\pm} obeying (6) and a number of other desirable properties) is sometimes referred to as the fundamental lemma of sieve theory.

Exercise 3 (Twin almost primes) Let {\pi_2(x,z)} be the number of integers {n} between {1} and {x} such that {n} and {n+2} are both coprime to {P(z)}.

  • (i) Show that

    \displaystyle \frac{x}{\log^2 z} \ll \pi_2(x,z) \ll \frac{x}{\log^2 z}

    if {2 \leq z \leq x^\epsilon}, and {\epsilon>0} is a sufficiently small absolute constant.

  • (ii) Show that there exists an {r \geq 1} such that there are infinitely many pairs {n,n+2} which are both {r}-almost primes. (Indeed, the argument here allows one to take {r=20} without much effort, and by working considerably harder to optimise everything, one can lower {r} substantially, although the parity problem mentioned earlier prevents one from taking {r} below {2}.)
  • (iii) Establish Brun’s theorem that the sum of reciprocals of the twin primes is convergent.

Exercise 4 (Landau conjecture for almost primes) Let {\pi_*(x,z)} be the number of integers {n} between {1} and {x} such that {n^2+1} is coprime to {P(z)}.

  • (i) Show that

    \displaystyle \frac{x}{\log z} \ll \pi_2(x,z) \ll \frac{x}{\log z}

    if {2 \leq z \leq x^\epsilon}, and {\epsilon>0} is a sufficiently small absolute constant. (Hint: you will need the fact that {-1} is a quadratic residue mod {p} if and only if {p \neq 3 \hbox{ mod } 4}, and Merten’s theorem for arithmetic progressions, which among other things asserts that {\sum_{p \leq x: p=1 \hbox{ mod } 4} \frac{1}{p} = \frac{1}{2} \log x + O(1)}.)

  • (ii) Show that there exists an {r \geq 1} such that there are infinitely natural numbers {n} such that {n^2+1} is an {r}-almost primes.

Exercise 5 Let {P: {\bf Z} \rightarrow {\bf Z}} be a polynomial with integer coefficients and degree {k}. Assume that {P} is primitive in the sense that for each natural number {q}, there exists a natural number {n} such that {P(n)} is coprime to {q}. Show that there exists an {r} depending only on {P} such that for all sufficiently large {x}, there are at least {\gg_P x / \log^k x} natural numbers {n} less than {x} such that {P(n)} is an {r}-almost prime.

In many cases (e.g. if {P} is irreducible) one can decrease the power of {\log x} here (as in Exercise 4), by using tools such as Landau’s prime ideal theorem; see this previous blog post for some related discussion.

Remark 2 The combinatorial sieve is not the only type of sieve used in sieve theory. Another popular choice is the Selberg upper bound sieve, in which the starting point is not the combinatorial inequalities (6), but rather the variant

\displaystyle 1_{n=1} \leq (\sum_{d|n} \lambda_d)^2

where the {\lambda_d} are arbitrary real parameters with {\lambda_1 := 1}, typically supported up to some level {d < y}. By optimising the choice of weights {\lambda_d}, the Selberg sieve can lead to upper bounds on quantities such as {\pi(x,z)} which are competitive with the beta sieve (particularly when {z} is moderately large), although it is more difficult for this sieve to produce matching lower bounds. A somewhat different type of sieve is the large sieve, which does not upper bound or lower bound indicator functions such as {1_{n=1}} directly, but rather controls the size of a function that avoids many residue classes by exploiting the {L^2} properties of these residue classes, such as almost orthogonality phenomena or Fourier uncertainty principles. See this text of Friedlander and Iwaniec for a much more thorough discussion and comparison of these sieves.

— 2. The strong approximation property —

For any natural number {q}, let {\pi_q: SL_2({\bf Z}) \rightarrow SL_2({\bf Z}/q{\bf Z})} be the obvious projection homomorphism. An easy application of Bezout’s theorem (or the Euclidean algorithm) shows that this map is surjective. From the Chinese remainder theorem, we also have {SL_2({\bf Z}/q{\bf Z}) \equiv SL_2({\bf Z}/q_1 {\bf Z}) \times SL_2({\bf Z}/q_2{\bf Z})} whenever {q=q_1q_2} and {q_1,q_2} are coprime.

To set up the sieve needed to establish Theorem 1, we need to understand the images {\pi_q(\Gamma)} of a non-virtually-solvable subgroup {\Gamma} of {SL_2({\bf Z})}. Clearly this is a subgroup of {SL_2({\bf Z}/q{\bf Z})}. Given that {\Gamma} is fairly “large” (in particular, such groups can be easily seen to be Zariski-dense in {SL_2}), we expect that in most cases {\pi_q(\Gamma)} is in fact all of {SL_2({\bf Z}/q{\bf Z})}. This type belief is formalised in general as the strong approximation property. We will not prove the most general instance of this property, but instead focus on the model case of {SL_2({\bf Z}/q{\bf Z})} for {q} square-free, in which one can proceed by ad hoc elementary arguments. The general treatment of the strong approximation property was first achieved by Matthews, Vaserstein, and Weisfeiler using the classification of finite simple groups; a subsequent paper of Nori gave an alternate treatment that avoided the use of this classification.

In the previous set of notes (see Remark 2) it was already observed that {\pi_p(\Gamma) =SL_2({\bf Z}/p{\bf Z})} for all sufficiently large primes {p}. (Indeed, {\Gamma} did not need to be free for this to hold; it was enough that {\Gamma} not be virtually solvable.) To extend from the prime case to the (square-free) composite case, we will need some basic group theory, and in particular the theory of composition factors.

Define a composition series for a group {G} to be a finite sequence

\displaystyle \{1\}= H_0\lhd H_1 \lhd \ldots\lhd H_n = G

of subgroups, where each {H_i} is a normal subgroup of {H_{i+1}}, and the quotients {H_{i+1}/H_i} are all simple. (By convention, we do not consider the trivial group to be simple.) The quotients {H_{i+1}/H_i} for {i=1,\ldots,n-1} are referred to as the composition factors of this series.

Exercise 6 Show that every finite group has at least one composition series.

A key fact about composition factors, known as the Jordan-Holder theorem, asserts that, up to permutation and isomorphism, they are independent of the choice of series:

Theorem 3 (Jordan-Holder theorem) Let

\displaystyle \{1\}= H_0\lhd H_1 \lhd \ldots\lhd H_n = G

and

\displaystyle \{1\}= K_0\lhd K_1 \lhd \ldots\lhd K_m = G

be two composition series of the same group {G}. Then there is a bijection {\sigma: \{0,\ldots,n-1\} \rightarrow\{0,\ldots,m-1\}} such that for each {i=0,\ldots,n-1}, {H_{i+1}/H_{i}} is isomorphic to {K_{\phi(i+1)}/K_{\phi(i)}}. (In particular, {n} and {m} must be equal.)

Proof: By symmetry we may assume that {n \leq m}. Fix {0 \leq i < n}. Let {\pi_i: H_{i+1} \rightarrow H_{i+1}/H_i} be the quotient map, and consider the groups {A_j^{(i)} := \pi_i(H_{i+1} \cap K_j) \equiv (H_{i+1} \cap K_j)/(H_i \cap K_j)} for {j=0,\ldots,m}. These are an increasing family of subgroups of {H_{i+1}/H_i}, with {A^{(i)}_0 = \{1\}} and {A^{(i)}_m = H_{i+1}/H_i}. Since each {K_j} is a normal subgroup of {K_{j+1}}, we see that {A^{(i)}_j} is a normal subgroup of {A^{(i)}_{j+1}}. As {A^{(i)}_m} is simple, this implies that there is a unique element {\sigma(i)} of {\{0,\ldots,m-1\}} such that {A^{(i)}_j} is trivial for {j \leq \sigma(i)} and {A^{(i)}_j} is equal to {H_{i+1}/H_i} for {j > \sigma(i)}.

Now we claim that {\sigma} is a bijection. Suppose this is not the case. Since {n \leq m}, there thus exists {j \in \{0,\ldots,m-1\}} which is not in the range of {\sigma}. This implies that {A^{(i)}_j = A^{(i)}_{j+1}} for all {i}. An induction on {i} then shows that {H_i \cap K_j = H_i \cap K_{j+1}} for all {i}, and thus {K_j = K_{j+1}}, contradicting the assumption that {K_{j+1}/K_j} is simple.

Finally, fix {i_0 \in \{0,\ldots,n-1\}}, and let {j_0 := \sigma(i_0)}. Then we have {A^{(i)}_{j_0} = A^{(i)}_{j_0+1}} for all {i \neq i_0}, while {A^{(i_0)}_{j_0} = \{1\}} and {A^{(i_0)}_{j_0+1} \equiv H_{i_0+1}/H_{i_0}}. From this and induction we see that {(H_i \cap K_{j_0+1})/(H_i \cap K_{j_0})} is trivial for {i \leq i_0} but isomorphic to {H_{i_0+1}/H_{i_0}} for {i>i_0}. (Here we are basically relying on a special case of the Zassenhaus lemma.) In particular, {K_{j_0+1}/K_{j_0}} is isomorphic to {H_{i_0+1}/H_{i_0}}, and the claim follows. \Box

In view of this theorem, we can assign to each finite group a set (or more precisely, multiset) of composition factors of simple groups, which are unique up to permutation and isomorphism. This is somewhat analogous to how the fundamental theorem of arithmetic assigns to each positive integer a multiset of prime numbers, which are unique up to permutation. (Indeed, the latter can be viewed as the special case of the former in the case of cyclic groups.)

Exercise 7 Show that for {p \geq 5} a prime, the composition factors of {SL_2(F_p)} are (up to isomorphism and permutation) the cyclic group {{\bf Z}/2{\bf Z}} and the projective special linear group {PSL_2(F_p)}. What happens instead when {p = 2} or {p=3}?

Also, show that the only normal subgroup of {SL_2(F_p)} (other than the trivial group and all of {SL_2(F_p)}) is the center {Z(SL_2(F_p))\equiv {\bf Z}/2{\bf Z}} of the group. Thus, we see (in contrast with the fundamental theorem of arithmetic) that one cannot permute the composition factors arbitrarily.

Exercise 8 Let {N} be a normal subgroup of a finite group {G}. Show that the set of composition factors of {G} is equal to (up to isomorphism, and counting multiplicity) the union of the set of composition factors of {N}, and the set of composition factors of {G/N}. In particular, the set of composition factors of {N} and of {G/N} are subsets of the set of composition factors of {G} (again up to isomorphism, and counting multiplicity). As another corollary, we see that the composition factors of a direct product {G \times H} or semidirect product {G \ltimes H} of two finite groups {G, H} is the union of the set of composition factors of {G} and {H} separately (again up to isomorphism, and counting multiplicity).

Knowing the composition factors of a group can assist in classifying its subgroups; in particular, groups which are “coprime” in the sense of having no composition factors in common are difficult to “join” together. (Interestingly, the phenomenon of “coprimality” implying “disjointness” also shows up in ergodic theory, in the theory of joinings, but we will not discuss this further here.) Here is an example of this which will be of importance in our application:

Lemma 4 Let {p \geq 5} be a prime, and {G} be a finite group which does not have a copy of {PSL_2(F_p)} amongst its composition factors. Let {H} be a subgroup of {G \times SL_2(F_p)} whose projections to {G} and {SL_2(F_p)} are surjective. Then {H} is all of {G \times SL_2(F_p)}.

Proof: We apply Goursat’s lemma (see Exercise 9 below). Thus if {N_1 := \{ g \in G: (g,1) \in H\}} and {N_2 := \{ h \in SL_2(F_p): (1,h) \in H \}}, then {N_1,N_2} are normal subgroups of {G, SL_2(F_p)} respectively such that {G/N_1} is isomorphic to {SL_2(F_p)/N_2}. From Exercise 7 we see that {N_2} is either trivial, all of {SL_2(F_p)}, or is the center {Z(SL_2(F_p))}.

If {N_2} is trivial, then {SL_2(F_p)} is isomorphic to a quotient of {G}, and thus by Exercise 8 the composition factors of {SL_2(F_p)} are a subset of those of {G}. But this is a contradiction, since {PSL_2(F_p)} is a composition factor of {SL_2(F_p)} but not of {G}. Similarly if {N_2} is the center, since {SL_2(F_p)/N_2} is then isomorphic to {PSL_2(F_p)}. So the only remaining case is when {N_2} is all of {SL_2(F_p)}. But then as {H} surjects onto {G}, we see that {H} is all of {G \times SL_2(F_p)} and we are done. \Box

Exercise 9 (Goursat’s lemma) Let {G_1,G_2} be groups, and let {H} be a subgroup of {G_1 \times G_2} whose projections to {G_1,G_2} are surjective. Let {N_1 := \{ g_1 \in G_1: (g_1,1) \in H \}} and {N_2 := \{ g_2 \in G_2: (1,g_2) \in H \}}. Show that {N_1,N_2} are normal subgroups of {G_1,G_2}, and that {G_1/N_1} and {G_2/N_2} are isomorphic. (Indeed, after quotienting out by {N_1 \times N_2}, {H} becomes a graph of such an isomorphism.) Conclude that the set of composition factors of {H} are a subset of the union of the set of composition factors of {G_1} and the set of composition factors of {G_2} (up to isomorphism and counting multiplicity, as usual).

As such, we have the following satisfactory description of the images {\pi_q(\Gamma)} of a free group {\Gamma}:

Corollary 5 (Strong approximation) Let {\Gamma} be a subgroup of {SL_2({\bf Z})} which is not virtually solvable. Let {M \geq 1} be an integer. Then there exists a multiple {q_1} of {M} with the following property: whenever {q} is of the form {q = d p_1 \ldots p_k} with {d|q_1} and {p_1,\ldots,p_k} distinct and coprime to {q_1}, one has

\displaystyle \pi_q(\Gamma) = \pi_d(\Gamma) \times SL_2(F_{p_1}) \times \ldots\times SL_2(F_{p_k})

(after using the Chinese remainder theorem to identify {SL_2(F_q)} with {SL_2(F_d) \times SL_2(F_{p_1}) \times \ldots \times SL_2(F_{p_k})}). In particular, one has

\displaystyle \pi_q(\Gamma) = \pi_d(\Gamma) \times SL_2({\bf Z}/p_1\ldots p_k{\bf Z}).

The parameter {M} will not actually be needed in our application, but is useful in the more general setting in which {f} has rational coefficients instead of integer coefficients.

Proof: We already know that {\pi_p(\Gamma)= SL_2(F_p)} for all but finitely many primes {p}. Let {q_0} be the product of {M} with all the eceptional primes, as well as {2} and {3}, thus {p \geq 5} and {\pi_p(\Gamma)= SL_2(F_p)} for all {p} coprime to {q_0}. By repeated application of Lemma 4 this implies that {\pi_{p_1 \ldots p_k}(\Gamma) = SL_2(F_{p_1}) \times \ldots \times SL_2(F_{p_k})} for any distinct primes {p_1,\ldots,p_k} coprime to {q_0} (the key point being that the groups {PSL_2(F_p)} for primes {p \geq 5} are all non-isomorphic to each other and to {{\bf Z}/2{\bf Z}} by cardinality considerations).

The finite group {\pi_{q_0}(\Gamma)} may contain copies of {PSL_2(F_p)} amongst their composition factors for a finite number of primes {p} coprime to {q_0}; let {q_1} be the product of {q_0} with all these primes. By many applications of Exercise 9, we see that the set of composition factors of {\pi_{q_1}(\Gamma)} are contained in the union of the set of composition factors of {\pi_{q_0}(\Gamma)}, and the set of composition factors of {\pi_p(\Gamma) = SL_2(F_p)} for all {p} dividing {q_1} but not {q_0}. As a consequence, we see that {PSL_2(F_p)} is not a composition factor of {\pi_{q_1}(\Gamma)} for any {p} coprime to {q_1}; by Exercise 8, {PSL_2(F_p)} is also not a composition factor of {\pi_d(\Gamma)} for any {d} dividing {q_1} and {p} coprime to {q_1}. By many applications of Lemma 4, we then obtain the claim. \Box

As a simple application of the above corollary, we observe that we may reduce Theorem 1 to the case when {\Lambda} is a free group on two generators. Indeed, if {\Lambda} is not virtually solvable, then by the Tits alternative (Theorem 5 from Notes 6), {\Lambda} contains a subgroup {\Lambda'} which is a free group on two generators (and in particular, continues to not be virtually solvable). Now the polynomial {f} need not be primitive on {\Lambda'}, so we cannot deduce Theorem 1 for {\Lambda,f} from its counterpart for {\Lambda',f}. However, by Corollary 5 we have an integer {q_1 \geq 1} such that

\displaystyle \pi_q(\Gamma') = \pi_d(\Gamma') \times SL_2({\bf Z}/p_1\ldots p_k{\bf Z}) \ \ \ \ \ (17)

whenever {q=dp_1\ldots p_k} with {d|q_1} and {p_1,\ldots,p_k} are distinct primes coprime to {q_1}.

As {f} is primitive with respect to {\Lambda}, we may find {a \in \Gamma} such that {f(a)} is coprime to {q_1}. By translating {f} by {a}, we obtain a new polynomial {f'} for which {f'(1)} is coprime to {q_1}. In particular, for any {d|q_1}, we have {f'(1)} coprime to {d}. By (17), this implies that for any square-free {q} (and hence for arbitrary {q}), we can find {a \in \Gamma'} with {f'(a)} coprime to {q}. Thus {f'} is primitive with respect to {\Lambda'}, and so we may deduce Theorem 1 for {\Lambda,f} from its counterpart for {\Lambda',f'}.

— 3. Sieving in thin groups —

We can now deduce Theorem 1 from the following expander result:

Theorem 6 (Uniform expansion) Let {a,b} generate a free group {\Lambda} in {SL_2({\bf Z})}. Then, as {q} runs through the square-free integers, {Cay(\pi_q(\Lambda), \pi_q(\{a,b,a^{-1},b^{-1}\}))} form a two-sided expander family.

When {q} is restricted to be prime, this result follows from Theorem 3 from Notes 6. The extension of this theorem to non-prime {q} is more difficult, and will be discussed later. For now, let us assume Theorem 6 and see how we can use it, together with the beta sieve, to imply Theorem 1.

As discussed in the preceding section, to show Theorem 1 we may assume without loss of generality that {\Lambda} is a free group on two generators {a,b}. Let {\mu := \frac{1}{4} (\delta_a +\delta_b+\delta_{a^{-1}}+\delta_{b^{-1}})} be the generator of the associated random walk, and let {T} be a large integer. Then {\mu^{(T)}} will be supported on elements of {\Lambda} whose coefficients have size {O(\exp(O(T)))}, where we allow implied constants to depend on {a,b}. In particular, for {x} in this support, {f(x)} will be an integer of size {O(\exp(O(T)))}, where we allow implied constants to depend on {f} also. On the other hand, {\mu^{(T)}} has an {\ell^\infty} norm that decreases exponentially in {T} (by Exercise 6 of Notes 6). If we then set {z := \exp(\epsilon' T)} for a sufficiently small absolute constant {\epsilon'>0}, it will then suffice to show that with probability {\gg T^{-O(1)}}, an element {x} drawn from {\Lambda} with distribution {\mu^{(T)}} is such that {f(x)} is non-zero and coprime to {P(z)}.

It will be convenient to knock out a few exceptional primes. From Corollary 5, we may find an integer {q_1} with the property that

\displaystyle \pi_q(\Gamma) = \pi_d(\Gamma) \times SL_2(F_{p_1}) \times \ldots\times SL_2(F_{p_k})

whenever {q = d p_1 \ldots p_k} with {d|q_1} and {p_1,\ldots,p_k} distinct and coprime to {q_1}. As {f} is primitive, we may find a residue class {x_1 \in \pi_{q_1}(\Gamma)} such that {f(x_1)} is coprime to {q_1}. For each integer {n}, let {a_n} denote the quantity

\displaystyle a_n := \sum_{x \in \Lambda: f(x)=n; x = x_1 \mod q_1} \mu^{(T)}(x).

It will suffice to show that

\displaystyle \sum_{n: (n,P(z))=1} a_n \gg T^{-O(1)}.

To do this, we will use the beta sieve. Indeed, by Exercise 2 it suffices to establish a bound of the form

\displaystyle \sum_{n: d|n} a_n = \frac{1}{|\pi_{q_1}(\Lambda)|} g(d) + O( \exp(-\epsilon T) ) \ \ \ \ \ (18)

for all square-free {1 \leq d \leq \exp(\epsilon T)}, some constant {\epsilon>0}, and some multiplicative function {g} obeying the bounds

\displaystyle g(p)<1

and

\displaystyle g(p) \ll 1/p

for all primes {p}.

By choice of {x_1}, the quantity {a_n} vanishes whenever {n} is not coprime to {q_1}. So we will set {g(p)=0} for the primes {p} dividing {q_1}, and it will suffice to establish (18) for {d} coprime to {q_1}. The left-hand side of (18) can then be expressed as

\displaystyle \sum_{x \in \pi_d(\Lambda): f(x)=0} ((\pi_{q_1 d})_* \mu)^{(T)}(x_1,x),

where we descend the polynomial {f: SL_2({\bf Z}) \rightarrow {\bf Z}} to a polynomial {f: SL_2({\bf Z}/d{\bf Z}) \rightarrow {\bf Z}/d{\bf Z}} in the obvious fashion. However, in view of Theorem 6 (and the random walk interpretation of expansion), we have

\displaystyle ((\pi_{q_1 d})_* \mu)^{(T)}(x) = |\pi_{q_1 d}(\Lambda)|^{-1} + O( \exp(-cT) )

for some {c>0} independent of {\epsilon}. Note that

\displaystyle |\pi_d(\Lambda)| \leq |SL_2({\bf Z}/d{\bf Z})| \ll d^{O(1)} \ll \exp(O(\epsilon T))

while from Corollary 5 we have

\displaystyle |\pi_{q_1 d}(\Lambda)| = |\pi_{q_1}(\Lambda)| |\pi_d(\Lambda)|

and thus

\displaystyle \sum_{n: d|n} a_n = \frac{1}{|\pi_{q_1}(\Lambda)|} g(d) + O( \exp(-\epsilon T) )

for {\epsilon>0} small enough, where {g(d)} is defined for {d} coprime to {q_1} as

\displaystyle g(d) := \frac{1}{|\pi_d(\Gamma)|} |\{ x \in SL_2({\bf Z}/d{\bf Z}): f(x)=0 \}|.

As {f} is primitive, we have {g(d)<1} for all such {d}; from Corollary 5 we see that {g} is multiplicative for such {d}. Finally, from the Schwarz-Zippel lemma (see Exercise 23 from Notes 5) we have

\displaystyle g(p) \ll 1/p

and Theorem 1 follows.

Remark 3 One can obtain more precise bounds on {g(p)} using the Lang-Weil theorem, but we will not need this result here. (Such results would however be needed if one wanted more quantitative information than Theorem 1; see the paper of Bourgain, Gamburd, and Sarnak for details.)

It remains to establish Theorem 1. In the case when {q} is prime, this was achieved in previous sections using the ingredients of quasirandomness, product theorems, and non-concentration. In the original paper of Bourgain, Gamburd, and Sarnak, these ingredients were extendedto the square-free case by hand, which led to a fairly lengthy argument. In the subsequent paper of Varju, it was shown that each of these ingredients can in fact be more or less automatically bootstrapped from the prime case to the square-free case by using tools such as the Chinese remainder theorem (or strong approximation) to “factor” the latter case into copies of the former, thus simplifying the extension to the square-free case significantly. We will not give the full argument here, but just to convey a taste of these sorts of product arguments, we will discuss the product structure of just one of the three ingredients, namely quasirandomness. (The extension of this ingredient to the square-free setting was already observed in the Bourgain-Gamburd-Sarnak paper.)

As a consequence of Proposition 4 of Notes 3, the following claim was shown:

Proposition 7 Let {G} be a {|G|^\alpha}-quasirandom finite group, {S} is a symmetric set of generators of {G} not containing the identity of cardinality {k}, and {\mu = \frac{1}{|S|} \sum_{s \in S} \delta_s} be such that

\displaystyle \| \mu^{*n} \|_{\ell^2(G)} \leq |G|^{-1/2+\alpha/4}

(say) for some {n = O(\log |G|)} with {\mu := \frac{1}{|S|} \sum_{s \in S} \delta_s}, then {Cay(G,S)} is a two-sided {\epsilon}-expander for some {\epsilon} depending only on {\alpha, k} and the implied constants in the {O()} notation.

It turns out that this fact can be extended to product groups:

Proposition 8 Proposition 7 continues to hold if the hypothesis that {G} is {|G|^\alpha}-quasirandom is replaced with the hypothesis that {G = G_1 \times \ldots \times G_n} for some {n \geq 0} and some finite groups {G_1,\ldots,G_n}, with each {G_i} being {|G_i|^\alpha}-quasirandom.

The key point here is that the expansion constant {\epsilon} does not depend on the number {n} of groups in this factorisation.

Proof: (Sketch) For technical reasons it is convenient to allow {S} to have multiplicity and to possibly contain the identity; this will require generalising the notion of Cayley graph, and of expansion in such generalised graphs.

Let {f} be a non-constant eigenfunction of the adjacency operator, thus {f * \mu = \lambda f} for some real {\lambda}. The objective is to prove that {|\lambda| \leq 1-\epsilon} for some sufficiently small {\epsilon>0} independent of {n}.

The claim {n=0} is trivial, so assume inductively that {n \geq 1} and that the claim is proven for all smaller values of {n} (with a fixed choice of {\epsilon}). For each {G_i}, we can partition the eigenspaces of the adjacency operator into those functions which are invariant in the {G_i} direction, and those functions which have mean zero in each coset of {G_i}. These partitions are compatible with each other as {i} varies (basically because the operations of averaging in {G_i} and averaging in {G_j} commute). Thus, without loss of generality, we may assume that the eigenfunction {f} is such that for each {i}, either {f} is {G_i}-invariant, or has mean zero on each {G_i} coset.

Suppose that {f} was {G_n}-invariant, then the eigenvalue {\lambda} would also persist after projecting {S} down from {G} to {G_1 \times G_{n-1}} (possibly picking up the identity or some multiplicity in the process). The claim then follows from the induction hyopthesis. Similarly if {f} was {G_i}-invariant for any other value of {i}. Thus we may assume that {f} has mean zero in each {G_i} coset.

One can show that every irreducible unitary representation of {G} splits as a tensor product of irreducible unitary representations of the {G_i}. If one lets {V} be the subspace of {\ell^2(G)} spanned by {f} and its left translates, we thus see that {V} contains at least one such tensor product; but as every element of {V} will have mean zero in each {G_i} coset, the factors in this tensor product will all be non-trivial. Using quasirandomness, the {i^{th}} factor will have dimension at least {|G_i|^\alpha}, and so {V} must have dimension at least {|G|^\alpha}. At this point, one can use a trace formula to relate {V} to {\|\mu^{*n}\|_{\ell^2(G)}^2} to conclude the argument. \Box

Exercise 10 Develop the above sketch into a complete proof of the proposition.