[This post is authored by Emmanuel Kowalski.]

This post may be seen as complementary to the post “The parity problem in sieve theory“. In addition to a survey of another important sieve technique, it might be interesting as a discussion of some of the foundational issues which were discussed in the comments to that post.

Many readers will certainly have heard already of one form or another of the “large sieve inequality”. The name itself is misleading however, and what is meant by this may be something having very little, if anything, to do with sieves. What I will discuss are genuine sieve situations.

The framework I will describe is explained in the preprint arXiv:math.NT/0610021, and in a forthcoming Cambridge Tract. I started looking at this first to have a common setting for the usual large sieve and a “sieve for Frobenius” I had devised earlier to study some arithmetic properties of families of zeta functions over finite fields. Another version of such a sieve was described by Zywina (“The large sieve and Galois representations”, preprint), and his approach was quite helpful in suggesting more general settings than I had considered at first. The latest generalizations more or less took life naturally when looking at new applications, such as discrete groups.

Unfortunately (maybe), there will be quite a bit of notation involved; hopefully, the illustrations related to the classical case of sieving integers to obtain the primes (or other subsets of integers with special multiplicative features) will clarify the general case, and the “new” examples will motivate readers to find yet more interesting applications of sieves.

— Setting up the sieve —

The objects to sieve will be in a fixed set Y, typically infinite. The “sieve” is related to the assumed existence of a set of surjective maps \rho_{\ell}:Y\rightarrow Y_{\ell}, where {\ell} runs through another (arbitrary) set \Lambda, and Y_{\ell} is a finite set. Combinatorialists can think of these maps as a family of colorings of Y, and number-theorists can think of “reduction” maps modulo {\ell}, in the case where \Lambda is a subset of prime numbers. Indeed, the classical case occurs with Y := {\mathbf Z}, \Lambda the set of primes, and \rho_{\ell} being the reduction map.

We now want to define sifted sets and try to “count” them. This is where classically one would look at a discrete interval of integers, and one could think of taking a finite subset of Y as a generalization. However, we will generalize this by looking at an arbitrary measure space (X,\mu), such that \mu(X) is finite, and assume there is a map (measurable in an obvious sense) x\mapsto F_x from X to Y, and instead of “counting”, we will try to understand the measure (under \mu) of sifted sets, of the following type:

S(X,\Omega,L):=\{x\in X : \rho_{\ell}(F_x)\notin \Omega_{\ell}, {\ell}\in L\}

whenever subsets \Omega_{\ell}\subset Y_{\ell} have been chosen, say for a finite subset L of \Lambda.

Classically, we have X:=\{1,\ldots, N\} for some integer N, \mu is the counting measure, and one might consider \Omega_{\ell}, for {\ell} prime, to be e.g., \{0,-2\} modulo {\ell}: if L is the set of primes \leq \sqrt{N}, we see that S(X,\Omega,L) is (essentially) the set of twin primes between \sqrt{N} and N. It is clear from this that in most applications we will really have a sequence of sets X (with attending \mu and F), depending on some parameter (here N), and it will be when this parameter gets large that the results will be interesting. In particular, we need to be careful about the uniformity of our results if they are going to be useful.

— First examples —

There are a great many different settings that can be phrased in such a way. Here are a few, beyond the case of integers, which are related to existing results and can be described fairly easily:

Example 1. If we have a probability space (\Omega,P), and a (finite) family of events A_i\subset \Omega, i\in I, we can set up a sieve by taking X:=Y:=\Omega, \mu:=P, \Lambda:=I, Y_{i}:=\{0,1\}, and letting \rho_i be the characteristic function of A_i. Then the sifted set S is simply the event which is the intersection of the complements of A_i, if we take \Omega_i=1; the probability of S can be expressed using the usual inclusion-exclusion formula. \diamond

Example 2. Gallagher used the large sieve in several variables to study the typical Galois group of a monic integral polynomial with height < T, T getting large. Here Y can be defined to be the space of monic integral polynomials of degree d, X the subset of those with height <T, \Lambda is the set of primes, Y_{\ell} the space of monic polynomials of degree d over the finite field \mathbf{Z}/{\ell}\mathbf{Z}, and \rho_{\ell} is the obvious restriction map. \diamond

Example 3. Serre used a variant of the large sieve to study the density of quadratic forms over \mathbf{Q} in n variables which have a non-trivial rational zero. Here, Y can be described as the set of coefficients describing the quadratic forms, the sets X are those quadratic forms with height \leq T, with T getting large, \Lambda is the set of primes, and \rho_{\ell} is the reduction modulo {\ell}^2. \diamond

Example 4. Poonen used a very nice sieve to find the density, among homogeneous polynomials f\in \mathbf{F}_q[x_0,...,x_n], with coefficients in a finite field \mathbf{F}_q, of those such that the intersection Z\cap \{f=0\} is smooth, where Z/\mathbf{F}_q is a fixed smooth subvariety Z of projective n-space. One can set up the sieve more or less as follows: Y is the set of all homogeneous polynomials with coefficients in \mathbf{F}_q, for d\geq 1 fixed, X is the subset of polynomials of degree d (and \mu the normalized counting measure on X), \Lambda is the set of closed points of Z (which can be thought-of simply as the set of Galois-orbits of points of Z over an algebraic closure of \mathbf{F}_q) and \rho_x, for x such a point, maps Y to the \mathbf{F}_q-vector space of possible Taylor expansions of order 1. Poonen uses as sieving sets the \Omega_x which characterize x to be a singular point, so that an f survives sieving by all x precisely when Z\cap \{f=0\} is smooth. \diamond

We will give a few more recent examples later in this post.

— Heuristics —

Now, we will describe the general large-sieve inequality which can be used in many cases to obtain an upper bound for the measure of the sifted set. Let us try first to guess what the “right” answer should be, which will highlight the need for a further piece of data, which is usually merely implicit in the classical cases.

The heuristic value for the measure of the sifted set arises from two fairly natural assumptions: first, one might expect that the various \rho_{\ell} should be “independent”, so the measure of the sifted set should be roughly equal to

\mu(X)\prod_{{\ell}\in L}{(1-\tilde{\nu}_{\ell}(\Omega_{\ell}))} (1)

where \tilde{\nu}_{\ell} is the (normalized) image measure on Y_{\ell} of \mu via X\rightarrow Y_{\ell} obtained by composition:

\tilde{\nu}_{\ell}(y)=\frac{\mu(\{x:\rho_{\ell}(F_x)=y\})}{\mu(X)}.

This measure depends on X (hence on some unspecified asymptotic parameter, such as the length of the interval of integers considered) and may be hard to understand; however, the second natural assumption is that this measure should be close to some “natural” measure \nu_{\ell} on Y_{\ell} which is independent of X (if Y has some natural measure, it could be the image of this measure under \rho_{\ell}). In other words, we may hope that \rho_{\ell}(F_x), for x\in X, becomes equidistributed with respect to such a measure.

In the classical case, integers in large intervals are uniformly distributed modulo any fixed prime, so \nu_{\ell}(y)=1/\ell in this case. Note that assuming some form of asymptotic equidistribution is typically the content of standard “sieve axioms”, introducing sieve remainder terms measuring the default of exact equidistribution. In all the examples above, the target set Y_{\ell} are finite abelian groups, and the natural probability density is also the normalized counting measure, except in the case of inclusion-exclusion; there the natural measure is the distribution law of the characteristic function of A_i: it maps 1\mapsto P(A_i), 0\mapsto 1-P(A_i).

— Further examples —

Here are now the promised “new” examples, which in particular show further examples of non-trivial sieve settings, and less trivial measures on the finite sets Y_{\ell}.

Example 5. Let E/\mathbf{Q} be an elliptic curve defined over the rationals, given in Weierstrass form y^2=x^3+Ax+B with A, B integers. We can take Y:=E(\mathbf{Q}), the finitely generated Mordell-Weil group of rational points on E, \Lambda the set of primes (not dividing the discriminant for simplicity), Y_p the image of Y under the reduction map modulo p. In general, this image is badly understood (it certainly is not equal to the group of points on the reduction of E modulo p, in most cases, for instance because the latter grows with p, whereas Y may well be finite). If we take \Omega_p to be the origin of the group law, the sifted set has an interesting interpretation: it is the set of S-integer solutions of the equation which “is” E, where S is the set of primes dividing the discriminant (i.e., it is the set of solutions to the equation where the denominators of x and y are divisible only by those primes). In particular, in contrast with classical sieves, those points are examples of interesting elements which survive after “infinite” sieving (though a famous theorem of Siegel states there are only finitely many). \diamond

Example 6. Let Y:=SL(n,\mathbf{Z}), and S a fixed finite symmetric set of generators (e.g., S is the set of elementary matrices with \pm 1 off the diagonal). For a prime {\ell}, let Y\rightarrow Y_{\ell} be reduction modulo {\ell}; it is not obvious, but true, that the image is Y_{\ell}=SL(n,\mathbf{Z}/{\ell}\mathbf{Z}) for all primes {\ell}. The natural measure is again the counting measure on Y_{\ell}. We can define different types of sets to sieve (balls in word length or archimedean metric for instance), but one that has interest as being quite different-looking from the standard ones is the following: take X to be an abstract probability space, and F=X_k a G-valued random variable X\rightarrow Y obtained as the k-th step of a random walk defined by X_0=1, X_{k+1}=X_k\xi_{k+1}, where the steps \xi_k are independent, uniformly distributed S-valued random variables (with P(\xi_k=s)>0 for all s); for instance, \xi_k may be chosen independently and uniformly at random in S. In this setting, interesting sifted sets might be those where X_k is a matrix with prime or almost prime entries (in this case the setup is closely related to ongoing work of Bourgain, Gamburd, and Sarnak, and other collaborators such as J. Liu, A. Nevo), or those X_k has for instance a reducible characteristic polynomial, or one with small splitting field. This type of setting was considered first by I. Rivin, with fairly remarkable applications in low-dimensional topology and for the theory of free-group automorphisms. One can do this type of things also for other discrete groups, of course. \diamond

Example 7. The last example is studied by Zywina in his preprint “The large sieve and Galois representations”, and is a number field analogue of the sieve for Frobenius over function fields considered earlier by myself (the latter is notationally a bit more complicated). We select a special case again for concreteness, and disregard for simplicity some issues of ramification. Let E/\mathbf{Q} be an elliptic curve without complex multiplication. We consider for Y the set of conjugacy classes in the group G which is the image of the Galois group of \mathbf{Q} under the natural action on all torsion points of E; by work of Serre, it is known that G is naturally isomorphic to an open subgroup of GL(2,\hat{\mathbf{Z}}). We take \Lambda to be the set of primes such that G_{\ell}, the Galois group G_{\ell} of the action on {\ell}-torsion points, is isomorphic to GL(2,\mathbf{F}_{\ell}): the result of Serre mentioned above implies that \Lambda contains all primes except finitely many.There is a surjection G\rightarrow G_{\ell}, and correspondingly a surjective map Y\rightarrow Y_{\ell} where Y_{\ell} is the set of conjugacy classes in G_{\ell}. On Y_{\ell}, the natural measure \nu_{\ell} maps a conjugacy class C to |C|/|G_{\ell}|; since G_{\ell} is not abelian, this is not counting measure on Y_{\ell}. Now we take X to be the set of prime numbers p\leq T, for some T, which do not divide the discriminant of E. The map F from X to Y maps p to the conjugacy class of the Frobenius automorphism at p (it is here that some care with ramification is needed to be precise).

Note that the equidistribution statement for a fixed {\ell} is valid, and is a special case of the Chebotarev Density Theorem: for any conjugacy class C in Y_{\ell}, we have \lim_{T\rightarrow +\infty} \frac{|\{p\leq T : \rho_{\ell}(F_p)=C \}|}{\pi(T)} =\frac{|C|}{|G_{\ell}|}.

What are the applications of such a sieve setting? They arise because F_p has the following properties: the trace of \rho_{\ell}(F_p) (seen as an element of GL(2,\mathbf{F}_{\ell})) is the residue class modulo {\ell} of the integer a_p such that

|E(\mathbf{F}_p)|=p+1-a_p

In particular, controlling the conjugacy class of F_p implies that the order of the curve modulo p is (somehow) controlled, and these orders (or a_p equivalently) control (at least conjecturally!) much of the arithmetic properties of the elliptic curve. In particular, recall that the Birch and Swinnerton-Dyer conjecture states that the rank of E(\mathbf{Q}) should be the order at s=1 of the Euler product

\prod_{p}{(1-a_p p^{-s}+p^{1-2s})^{-1}}.

Certainly, one can see how sieving in this context will lead to information on, say, almost-prime values of |E(\mathbf{F}_p)|, and many other arithmetic properties of the a_p which are of great importance for instance in understanding some algorithms such as elliptic curve primality proving, elliptic curve cryptography, etc. \diamond

— Two sieve inequalities —

Coming back to the “large sieve inequalities”, we mentioned that they will provide one possible way to get upper bounds for the measure of S(X,\Omega,L), which are often comparable in size with the “expected” measure (1) above. There are two forms, which are inspired by the corresponding inequalities for integers due to Renyi and Montgomery respectively. Both depend on bounding a further quantity \Delta, which is the crucial ingredient, and which is explained below (without knowing something about \Delta, the inequalities are useless; they should be read assuming that \Delta is close to \mu(X), possibly up to a constant multiplicative factor).

I. Renyi’s sieve takes the form

\int_{X}{\Bigl( N(x,L)-M(L) \Bigr)^2d\mu(x)}\leq \Delta M(L)

where N(x,L) is the number of {\ell}\in L such that \rho_{\ell}(F_x)\in \Omega_{\ell}, while M(L) is the “expected” value of this quantity, namely the sum over {\ell}\in L of \nu_{\ell}(\Omega_{\ell}).

So there is a clear interpretation: if \Delta is indeed close to \mu(X), this means that N(x,L) is close to M(L) for “most” x (in mean-square sense). In the case of counting primes, this inequality becomes the Turán inequality

\sum_{n\leq N}{(\omega(n)-\log\log N)}^2 \ll N(\log\log N)^{-1},

which is known to be sharp (in fact there is an asymptotic formula of this size for the left-hand side).

In the case of Example 5 (elliptic curves), this gives some lower bound for the number of prime divisors of the denominator of “most” rational points (not without some extra work; and note that the proof require the use of Siegel’s Theorem: it does not provide another approach to it).

In general, note that we obtain the bound

\mu(S)\leq \Delta M(L)^{-1}

for the measure of the sifted set, using positivity. This is weak; in the case of primes, when \nu(\Omega_{\ell}) tends to zero (it gives the bound N(\log\log N)^{-1} for the number of primes \leq N), but quite strong when \nu(\Omega_{\ell}) is fairly large (bounded below); indeed, these are the situations which characterize what Linnik called the “large” sieve. \diamond

II. The second sieve inequality is stronger in case the local densities \nu_{\ell}(\Omega_{\ell}) are small, in fact as strong qualitatively as the best “small” sieves. (But of course small sieve theory is developed a lot looking at lower bounds, which the large sieve does not approach directly — see the former post on the parity problem).

Here the idea is to combine the Y_{\ell} to look at the possible “multi-colorings” Y\rightarrow Y_{{\ell}_1}\times Y_{{\ell}_2}\times \cdots, which classically means looking at the reductions modulo square-free integers, not only modulo primes.

Abstractly, this is done as follows: let Y_m, for m a finite subset of \Lambda, be the product of Y_{\ell} for {\ell}\in m. Define \rho_m in the obvious manner, but notice it may not be surjective. If it is, this means some analogue of the Chinese Remainder Theorem holds, and this may be very difficult to show (think about the case of elliptic curves). Similarly, having chosen \Omega_{\ell}, we can define \Omega_m as the product over elements of m. Now choose an arbitrary finite set M of finite subsets of L; e.g., M could correspond by unique factorization to the set of all square-free integers \leq Q, if L is the set of primes at most Q.

We now have

\mu(S)\leq \Delta H^{-1}

where \Delta is as above, and will be explained below, and

H := \sum_{m\in M}{\prod_{{\ell}\in m}{\frac{\nu_m(\Omega_{\ell})}{1-\nu_{\ell}(\Omega_{\ell})}}}.

Now this may look strange, but observe the following: if M is the set of all subsets of L, a moment’s thought reveals that

H^{-1}=\prod_{{\ell}\in L}{(1-\nu(\Omega_{\ell}))},

i.e., for \Delta=\mu(X), the right-hand side of the large-sieve inequality is exactly the expected value in (1). The problem is that, typically, \Delta would be too large if M is that big (this is shown for instance by the problem of explosion of the remainder term in applying the Eratosthenes-Legendre sieve to compute the number of primes less than X). The interpretation of this is then similar to that of the earlier sieves by V. Brun: using all subsets of L (all integers divisible by primes \leq L) is too costly, but a suitable “cutoff” yields approximations to the truth which can remain very useful. (One case where one can take M so large is for Example 1, if the event A_i are independent so that by the very definition of independence, the probability of the sifted set S is exactly

P(S)=\prod_{i}{(1-P(A_i))}=H^{-1},

and it turns out that in this case one has exactly \Delta=1, so that
the large-sieve inequality is sharp in the full generality considered). \diamond

Now we need finally to define \Delta. The reason it was delayed is
that it will need yet more notation…

First, for each {\ell} in \Lambda, let V_{\ell} be the finite dimensional Hilbert space of functions of mean-zero on Y_{\ell}, i.e., of those f such that

\sum_{y\in Y_{\ell}}{\nu_{\ell}(y)f(y)}=0

(with the inner product defined using the density \nu_{\ell}).

Select (arbitrarily) an orthonormal basis B_{\ell} of V_{\ell}. Then consider the set B_m of functions on Y_m of the type

(y_{\ell})\mapsto \prod_{{\ell}\in m}{\varphi_{\ell}(y_{\ell})}

where \varphi_{\ell} is an element of the chosen basis of V_{\ell}. Those functions form themselves an orthonormal basis for a “primitive” subspace of the space of functions on Y_m, with respect to the “product” inner-product on Y_m.

Finally, \Delta is defined to be the least non-negative real number such that the inequality

\sum_{m\in M}{\sum_{\varphi\in B_m}{\Bigl|\int_{X}{\alpha(x)\varphi(\rho_{m}(F_x))d\mu(x)}\Bigr|^2}} \leq \Delta \|\alpha\|^2,

holds for any square-integrable function \alpha on X. For Renyi’s inequality, M is simply the set of singletons \{{\ell}\} for {\ell}\in L.

One can see quite easily that \Delta is independent of the choice of basis (but practical estimates may well depend on a clever selection) by interpreting it as the norm of some operator between Hilbert spaces.

Let’s look at the classical example of integers: there, V_{\ell} is the space of functions on \mathbf{Z}/{\ell}\mathbf{Z} which sum to zero, and a particular basis is given by the additive characters

x\mapsto \exp(2i\pi ax/{\ell})

for all non-zero a\in \mathbf{Z}/{\ell}\mathbf{Z}. Extending to all $m$, by the Chinese Remainder Theorem, amounts to looking at characters

x\mapsto \exp(2i\pi ax/m)

of \mathbf{Z}/m\mathbf{Z} with a coprime with m. So the defining inequality for sieving \{1,\ldots, N\} using square-free numbers up to L is

\sum_{m\leq M}{\sum_{(a,m)=1}{|\sum_{1\leq n\leq N}{a_n \exp(2i\pi an/m)}|^2}}\leq \Delta \sum_n{|a_n|^2}.

It is classical, by now, that \Delta\leq N-1+L^2. The first bound of this strength being due to Bombieri, though this particular value is due to Montgomery-Vaughan and Selberg independently. In fact, stronger analytic inequalities are known, there it is not necessary to restrict to m square-free on the left-hand side, and in fact one can look at arbitrary sets of well-spaced points in \mathbf{R}/\mathbf{Z}. This classical theory is very-well described in Montgomery’s survey paper. [Incidentally, reading this paper was the subject of my first-year project at E.N.S. Lyon under the direction of É. Fouvry and H. Daboussi, and it is quite remarkable how much of my mathematical work has involved the large-sieve in one way or another...]

— An approach to concrete sieve estimates —

Here is the most standard way to estimate directly \Delta. It relies on duality: the defining inequality is equivalent with

\int_X{\Bigl|\sum_{m\in M}{\sum_{\varphi\in B_m}{\beta(m,\varphi)\varphi(\rho_{m}(F_x))}} \Bigr|^2} \leq \Delta \sum_{m,\varphi}{|\beta(m,\varphi)|^2}

for arbitrary complex numbers \beta(m,\pi). Expanding the left-hand side, we find a quadratic form with coefficients

W(\varphi,\psi) := \int_X{\varphi(\rho_{m}(F_x)) \overline{\psi(\rho_{n}(F_x))}d\mu(x)}

with \varphi\in B_m and \psi\in B_n. Those correlation coefficients then must be estimated. One expects that

W(\varphi,\psi)=\delta(\varphi,\psi)\mu(X)+R(\varphi,\psi)

where the remainder R is estimated uniformly and explicitly, and is small compared with \mu(X), at least if m and n are not “too big” (in the appropriate sense). Then we have

\Delta\leq \max_{m,\varphi}\sum_{n,\psi}{|W(\varphi,\psi)|} \leq \mu(X)+\max_{m,\varphi}\sum_{n,\psi}{|R(\varphi,\psi)|},

and this will have the desired feature of being close to \mu(X) if L and M are suitably chosen.

Here are the two most important new examples where this approach gives strong results, and leads to links with very deep principles of harmonic analysis and number theory.

In the setting of Example 6 (and its generalizations), i.e., sieve for discrete groups and random walks, a natural basis B_{\ell} is given by the matrix coefficients of irreducible unitary representations of the finite groups G_{\ell}=SL(n,\mathbf{Z}/{\ell}\mathbf{Z}). Then estimating W(\varphi,\psi) turns out to be very cleanly linked with Property (\tau) for the congruence quotients of SL(n,\mathbf{Z}) (or Property (T), though that applies only for n at least 3). Recall that this property states that there is a positive \delta>0 such that, for any m\geq 1, any map

\pi : SL(n,\mathbf{Z})\rightarrow SL(n,\mathbf{Z}/m\mathbf{Z}) \rightarrow U(r)

where U(r) is the unitary group of a finite-dimensional Hilbert space of dimension r, either there is a vector v\in \mathbf{C}^r invariant under all \pi(g), g\in SL(n,\mathbf{Z}), or for any unit vector v\in\mathbf{C}^r, there exists s\in S (recall S is a generating set of SL(n,\mathbf{Z})) such that

\|\pi(s)v-v\|\geq \delta.

In other words, if a vector is not invariant under all s\in S, it is uniformly non-invariant. This property is also equivalent with the fact that the Cayley graphs of the congruence quotients form a family of expanders.

(Note: we do not currently necessarily need fine knowledge of the character table of G_{\ell}, as described at least partly through Deligne-Lusztig characters, though, when n is large, this can lead to small improvements in estimates, and interesting problems).

In Example 7, since we work with conjugacy classes, the natural basis B_{\ell} is the set of characters of irreducible unitary representations of G_{\ell}. Estimating W(\varphi,\psi) is then a problem of analytic number theory, amounting to uniform quantitative estimates for partial sums of coefficients (at primes) of Artin
L-functions
. As such, there exist unconditional, but fairly weak, results, and much stronger ones when assuming the Generalized Riemann Hypothesis.

When dealing with the function field analogue, the Riemann Hypothesis is known, due to the amazing work of Deligne [in my mind, the most important number-theory result of the 20th century], building on the foundations of Grothendieck. However, this only gives a (very strong) result for fixed \varphi, \psi, a priori; getting uniform bounds is not trivial, and because the remainder terms after applying the Grothendieck-Lefschetz trace formula and the Riemann Hypothesis involve the dimension of some étale cohomology groups depending on \varphi and \psi, some uniform estimates for those so-called Betti numbers are required. They are proved either by analyzing the ramification behavior of the analogues of the Artin representations, or (for higher-dimensional parameter spaces) by adapting an induction method of Katz.

In applying concretely the sieve for Frobenius, it should be mentioned
that one of the main issues is in fact to compute Y_{\ell} precisely! This is a completely new phenomenon compared with the classical sieves, where Y_{\ell} is perfectly known and understood. Here, this amounts to computing certain Galois groups, such as those of torsion points on elliptic curves, or their analogues for function fields (which are often called monodromy groups because of their geometric origin). Serre’s theorem quoted above for non CM elliptic curves is one of the most famous examples, but the “baby” example of this is the fact that cyclotomic polynomials are irreducible over $\mathbf{Q}$ (since this means that the Galois groups of cyclotomic extensions, generated by roots of unity, are “as big as possible”, which is usually the desired outcome). In the first application of the sieve for Frobenius, which concerned a conjecture of Katz already considered by N. Chavdarov, I used an unpublished result of J-K. Yu which has recently been given a new proof by C. Hall.