(Emmanuel Kowalski) The large sieve inequalities

8 August, 2007 in expository, guest blog, math.NT | Tags: elliptic curves, Emmanuel Kowalski, property T, sieve theory | by Terence Tao

[This post is authored by Emmanuel Kowalski.]

This post may be seen as complementary to the post “The parity problem in sieve theory“. In addition to a survey of another important sieve technique, it might be interesting as a discussion of some of the foundational issues which were discussed in the comments to that post.

Many readers will certainly have heard already of one form or another of the “large sieve inequality”. The name itself is misleading however, and what is meant by this may be something having very little, if anything, to do with sieves. What I will discuss are genuine sieve situations.

The framework I will describe is explained in the preprint arXiv:math.NT/0610021, and in a forthcoming Cambridge Tract. I started looking at this first to have a common setting for the usual large sieve and a “sieve for Frobenius” I had devised earlier to study some arithmetic properties of families of zeta functions over finite fields. Another version of such a sieve was described by Zywina (“The large sieve and Galois representations”, preprint), and his approach was quite helpful in suggesting more general settings than I had considered at first. The latest generalizations more or less took life naturally when looking at new applications, such as discrete groups.

Unfortunately (maybe), there will be quite a bit of notation involved; hopefully, the illustrations related to the classical case of sieving integers to obtain the primes (or other subsets of integers with special multiplicative features) will clarify the general case, and the “new” examples will motivate readers to find yet more interesting applications of sieves.

— Setting up the sieve —

The objects to sieve will be in a fixed set $Y$ , typically infinite. The “sieve” is related to the assumed existence of a set of surjective maps $\rho_{\ell}:Y\rightarrow Y_{\ell}$ , where ${\ell}$ runs through another (arbitrary) set $\Lambda$ , and $Y_{\ell}$ is a finite set. Combinatorialists can think of these maps as a family of colorings of $Y$ , and number-theorists can think of “reduction” maps modulo ${\ell}$ , in the case where $\Lambda$ is a subset of prime numbers. Indeed, the classical case occurs with $Y := {\mathbf Z}$ , $\Lambda$ the set of primes, and $\rho_{\ell}$ being the reduction map.

We now want to define sifted sets and try to “count” them. This is where classically one would look at a discrete interval of integers, and one could think of taking a finite subset of $Y$ as a generalization. However, we will generalize this by looking at an arbitrary measure space $(X,\mu)$ , such that $\mu(X)$ is finite, and assume there is a map (measurable in an obvious sense) $x\mapsto F_x$ from $X$ to $Y$ , and instead of “counting”, we will try to understand the measure (under $\mu$ ) of sifted sets, of the following type:

$S(X,\Omega,L):=\{x\in X : \rho_{\ell}(F_x)\notin \Omega_{\ell}, {\ell}\in L\}$

whenever subsets $\Omega_{\ell}\subset Y_{\ell}$ have been chosen, say for a finite subset $L$ of $\Lambda$ .

Classically, we have $X:=\{1,\ldots, N\}$ for some integer $N$ , $\mu$ is the counting measure, and one might consider $\Omega_{\ell}$ , for ${\ell}$ prime, to be e.g., $\{0,-2\}$ modulo ${\ell}$ : if $L$ is the set of primes $\leq \sqrt{N}$ , we see that $S(X,\Omega,L)$ is (essentially) the set of twin primes between $\sqrt{N}$ and $N$ . It is clear from this that in most applications we will really have a sequence of sets $X$ (with attending $\mu$ and $F$ ), depending on some parameter (here $N$ ), and it will be when this parameter gets large that the results will be interesting. In particular, we need to be careful about the uniformity of our results if they are going to be useful.

— First examples —

There are a great many different settings that can be phrased in such a way. Here are a few, beyond the case of integers, which are related to existing results and can be described fairly easily:

Example 1. If we have a probability space $(\Omega,P)$ , and a (finite) family of events $A_i\subset \Omega$ , $i\in I$ , we can set up a sieve by taking $X:=Y:=\Omega$ , $\mu:=P$ , $\Lambda:=I$ , $Y_{i}:=\{0,1\}$ , and letting $\rho_i$ be the characteristic function of $A_i$ . Then the sifted set $S$ is simply the event which is the intersection of the complements of $A_i$ , if we take $\Omega_i=1$ ; the probability of $S$ can be expressed using the usual inclusion-exclusion formula. $\diamond$

Example 2. Gallagher used the large sieve in several variables to study the typical Galois group of a monic integral polynomial with height $< T$ , $T$ getting large. Here $Y$ can be defined to be the space of monic integral polynomials of degree $d$ , $X$ the subset of those with height $<T$ , $\Lambda$ is the set of primes, $Y_{\ell}$ the space of monic polynomials of degree $d$ over the finite field $\mathbf{Z}/{\ell}\mathbf{Z}$ , and $\rho_{\ell}$ is the obvious restriction map. $\diamond$

Example 3. Serre used a variant of the large sieve to study the density of quadratic forms over $\mathbf{Q}$ in $n$ variables which have a non-trivial rational zero. Here, $Y$ can be described as the set of coefficients describing the quadratic forms, the sets $X$ are those quadratic forms with height $\leq T$ , with $T$ getting large, $\Lambda$ is the set of primes, and $\rho_{\ell}$ is the reduction modulo ${\ell}^2$ . $\diamond$

Example 4. Poonen used a very nice sieve to find the density, among homogeneous polynomials $f\in \mathbf{F}_q[x_0,...,x_n]$ , with coefficients in a finite field $\mathbf{F}_q$ , of those such that the intersection $Z\cap \{f=0\}$ is smooth, where $Z/\mathbf{F}_q$ is a fixed smooth subvariety $Z$ of projective $n$ -space. One can set up the sieve more or less as follows: $Y$ is the set of all homogeneous polynomials with coefficients in $\mathbf{F}_q$ , for $d\geq 1$ fixed, $X$ is the subset of polynomials of degree $d$ (and $\mu$ the normalized counting measure on $X$ ), $\Lambda$ is the set of closed points of $Z$ (which can be thought-of simply as the set of Galois-orbits of points of $Z$ over an algebraic closure of $\mathbf{F}_q$ ) and $\rho_x$ , for $x$ such a point, maps $Y$ to the $\mathbf{F}_q$ -vector space of possible Taylor expansions of order 1. Poonen uses as sieving sets the $\Omega_x$ which characterize $x$ to be a singular point, so that an $f$ survives sieving by all $x$ precisely when $Z\cap \{f=0\}$ is smooth. $\diamond$

We will give a few more recent examples later in this post.

— Heuristics —

Now, we will describe the general large-sieve inequality which can be used in many cases to obtain an upper bound for the measure of the sifted set. Let us try first to guess what the “right” answer should be, which will highlight the need for a further piece of data, which is usually merely implicit in the classical cases.

The heuristic value for the measure of the sifted set arises from two fairly natural assumptions: first, one might expect that the various $\rho_{\ell}$ should be “independent”, so the measure of the sifted set should be roughly equal to

$\mu(X)\prod_{{\ell}\in L}{(1-\tilde{\nu}_{\ell}(\Omega_{\ell}))}$ (1)

where $\tilde{\nu}_{\ell}$ is the (normalized) image measure on $Y_{\ell}$ of $\mu$ via $X\rightarrow Y_{\ell}$ obtained by composition:

$\tilde{\nu}_{\ell}(y)=\frac{\mu(\{x:\rho_{\ell}(F_x)=y\})}{\mu(X)}.$

This measure depends on $X$ (hence on some unspecified asymptotic parameter, such as the length of the interval of integers considered) and may be hard to understand; however, the second natural assumption is that this measure should be close to some “natural” measure $\nu_{\ell}$ on $Y_{\ell}$ which is independent of $X$ (if $Y$ has some natural measure, it could be the image of this measure under $\rho_{\ell}$ ). In other words, we may hope that $\rho_{\ell}(F_x)$ , for $x\in X$ , becomes equidistributed with respect to such a measure.

In the classical case, integers in large intervals are uniformly distributed modulo any fixed prime, so $\nu_{\ell}(y)=1/\ell$ in this case. Note that assuming some form of asymptotic equidistribution is typically the content of standard “sieve axioms”, introducing sieve remainder terms measuring the default of exact equidistribution. In all the examples above, the target set $Y_{\ell}$ are finite abelian groups, and the natural probability density is also the normalized counting measure, except in the case of inclusion-exclusion; there the natural measure is the distribution law of the characteristic function of $A_i$ : it maps $1\mapsto P(A_i)$ , $0\mapsto 1-P(A_i)$ .

— Further examples —

Here are now the promised “new” examples, which in particular show further examples of non-trivial sieve settings, and less trivial measures on the finite sets $Y_{\ell}$ .

Example 5. Let $E/\mathbf{Q}$ be an elliptic curve defined over the rationals, given in Weierstrass form $y^2=x^3+Ax+B$ with $A$ , $B$ integers. We can take $Y:=E(\mathbf{Q})$ , the finitely generated Mordell-Weil group of rational points on $E$ , $\Lambda$ the set of primes (not dividing the discriminant for simplicity), $Y_p$ the image of $Y$ under the reduction map modulo $p$ . In general, this image is badly understood (it certainly is not equal to the group of points on the reduction of $E$ modulo $p$ , in most cases, for instance because the latter grows with $p$ , whereas $Y$ may well be finite). If we take $\Omega_p$ to be the origin of the group law, the sifted set has an interesting interpretation: it is the set of $S$ -integer solutions of the equation which “is” $E$ , where $S$ is the set of primes dividing the discriminant (i.e., it is the set of solutions to the equation where the denominators of $x$ and $y$ are divisible only by those primes). In particular, in contrast with classical sieves, those points are examples of interesting elements which survive after “infinite” sieving (though a famous theorem of Siegel states there are only finitely many). $\diamond$

Example 6. Let $Y:=SL(n,\mathbf{Z})$ , and $S$ a fixed finite symmetric set of generators (e.g., $S$ is the set of elementary matrices with $\pm 1$ off the diagonal). For a prime ${\ell}$ , let $Y\rightarrow Y_{\ell}$ be reduction modulo ${\ell}$ ; it is not obvious, but true, that the image is $Y_{\ell}=SL(n,\mathbf{Z}/{\ell}\mathbf{Z})$ for all primes ${\ell}$ . The natural measure is again the counting measure on $Y_{\ell}$ . We can define different types of sets to sieve (balls in word length or archimedean metric for instance), but one that has interest as being quite different-looking from the standard ones is the following: take $X$ to be an abstract probability space, and $F=X_k$ a $G$ -valued random variable $X\rightarrow Y$ obtained as the $k$ -th step of a random walk defined by $X_0=1$ , $X_{k+1}=X_k\xi_{k+1}$ , where the steps $\xi_k$ are independent, uniformly distributed $S$ -valued random variables (with $P(\xi_k=s)>0$ for all $s$ ); for instance, $\xi_k$ may be chosen independently and uniformly at random in $S$ . In this setting, interesting sifted sets might be those where $X_k$ is a matrix with prime or almost prime entries (in this case the setup is closely related to ongoing work of Bourgain, Gamburd, and Sarnak, and other collaborators such as J. Liu, A. Nevo), or those $X_k$ has for instance a reducible characteristic polynomial, or one with small splitting field. This type of setting was considered first by I. Rivin, with fairly remarkable applications in low-dimensional topology and for the theory of free-group automorphisms. One can do this type of things also for other discrete groups, of course. $\diamond$

Example 7. The last example is studied by Zywina in his preprint “The large sieve and Galois representations”, and is a number field analogue of the sieve for Frobenius over function fields considered earlier by myself (the latter is notationally a bit more complicated). We select a special case again for concreteness, and disregard for simplicity some issues of ramification. Let $E/\mathbf{Q}$ be an elliptic curve without complex multiplication. We consider for $Y$ the set of conjugacy classes in the group $G$ which is the image of the Galois group of $\mathbf{Q}$ under the natural action on all torsion points of $E$ ; by work of Serre, it is known that $G$ is naturally isomorphic to an open subgroup of $GL(2,\hat{\mathbf{Z}})$ . We take $\Lambda$ to be the set of primes such that $G_{\ell}$ , the Galois group $G_{\ell}$ of the action on ${\ell}$ -torsion points, is isomorphic to $GL(2,\mathbf{F}_{\ell})$ : the result of Serre mentioned above implies that $\Lambda$ contains all primes except finitely many.There is a surjection $G\rightarrow G_{\ell}$ , and correspondingly a surjective map $Y\rightarrow Y_{\ell}$ where $Y_{\ell}$ is the set of conjugacy classes in $G_{\ell}$ . On $Y_{\ell}$ , the natural measure $\nu_{\ell}$ maps a conjugacy class $C$ to $|C|/|G_{\ell}|$ ; since $G_{\ell}$ is not abelian, this is not counting measure on $Y_{\ell}$ . Now we take $X$ to be the set of prime numbers $p\leq T$ , for some $T$ , which do not divide the discriminant of $E$ . The map $F$ from $X$ to $Y$ maps $p$ to the conjugacy class of the Frobenius automorphism at $p$ (it is here that some care with ramification is needed to be precise).

Note that the equidistribution statement for a fixed ${\ell}$ is valid, and is a special case of the Chebotarev Density Theorem: for any conjugacy class $C$ in $Y_{\ell}$ , we have $\lim_{T\rightarrow +\infty} \frac{|\{p\leq T : \rho_{\ell}(F_p)=C \}|}{\pi(T)} =\frac{|C|}{|G_{\ell}|}.$

What are the applications of such a sieve setting? They arise because $F_p$ has the following properties: the trace of $\rho_{\ell}(F_p)$ (seen as an element of $GL(2,\mathbf{F}_{\ell})$ ) is the residue class modulo ${\ell}$ of the integer $a_p$ such that

$|E(\mathbf{F}_p)|=p+1-a_p$

In particular, controlling the conjugacy class of $F_p$ implies that the order of the curve modulo $p$ is (somehow) controlled, and these orders (or $a_p$ equivalently) control (at least conjecturally!) much of the arithmetic properties of the elliptic curve. In particular, recall that the Birch and Swinnerton-Dyer conjecture states that the rank of $E(\mathbf{Q})$ should be the order at $s=1$ of the Euler product

$\prod_{p}{(1-a_p p^{-s}+p^{1-2s})^{-1}}.$

Certainly, one can see how sieving in this context will lead to information on, say, almost-prime values of $|E(\mathbf{F}_p)|$ , and many other arithmetic properties of the $a_p$ which are of great importance for instance in understanding some algorithms such as elliptic curve primality proving, elliptic curve cryptography, etc. $\diamond$

— Two sieve inequalities —

Coming back to the “large sieve inequalities”, we mentioned that they will provide one possible way to get upper bounds for the measure of $S(X,\Omega,L)$ , which are often comparable in size with the “expected” measure (1) above. There are two forms, which are inspired by the corresponding inequalities for integers due to Renyi and Montgomery respectively. Both depend on bounding a further quantity $\Delta$ , which is the crucial ingredient, and which is explained below (without knowing something about $\Delta$ , the inequalities are useless; they should be read assuming that $\Delta$ is close to $\mu(X)$ , possibly up to a constant multiplicative factor).

I. Renyi’s sieve takes the form

$\int_{X}{\Bigl( N(x,L)-M(L) \Bigr)^2d\mu(x)}\leq \Delta M(L)$

where $N(x,L)$ is the number of ${\ell}\in L$ such that $\rho_{\ell}(F_x)\in \Omega_{\ell}$ , while $M(L)$ is the “expected” value of this quantity, namely the sum over ${\ell}\in L$ of $\nu_{\ell}(\Omega_{\ell})$ .

So there is a clear interpretation: if $\Delta$ is indeed close to $\mu(X)$ , this means that $N(x,L)$ is close to $M(L)$ for “most” $x$ (in mean-square sense). In the case of counting primes, this inequality becomes the Turán inequality

$\sum_{n\leq N}{(\omega(n)-\log\log N)}^2 \ll N(\log\log N)^{-1},$

which is known to be sharp (in fact there is an asymptotic formula of this size for the left-hand side).

In the case of Example 5 (elliptic curves), this gives some lower bound for the number of prime divisors of the denominator of “most” rational points (not without some extra work; and note that the proof require the use of Siegel’s Theorem: it does not provide another approach to it).

In general, note that we obtain the bound

$\mu(S)\leq \Delta M(L)^{-1}$

for the measure of the sifted set, using positivity. This is weak; in the case of primes, when $\nu(\Omega_{\ell})$ tends to zero (it gives the bound $N(\log\log N)^{-1}$ for the number of primes $\leq N$ ), but quite strong when $\nu(\Omega_{\ell})$ is fairly large (bounded below); indeed, these are the situations which characterize what Linnik called the “large” sieve. $\diamond$

II. The second sieve inequality is stronger in case the local densities $\nu_{\ell}(\Omega_{\ell})$ are small, in fact as strong qualitatively as the best “small” sieves. (But of course small sieve theory is developed a lot looking at lower bounds, which the large sieve does not approach directly — see the former post on the parity problem).

Here the idea is to combine the $Y_{\ell}$ to look at the possible “multi-colorings” $Y\rightarrow Y_{{\ell}_1}\times Y_{{\ell}_2}\times \cdots$ , which classically means looking at the reductions modulo square-free integers, not only modulo primes.

Abstractly, this is done as follows: let $Y_m$ , for $m$ a finite subset of $\Lambda$ , be the product of $Y_{\ell}$ for ${\ell}\in m$ . Define $\rho_m$ in the obvious manner, but notice it may not be surjective. If it is, this means some analogue of the Chinese Remainder Theorem holds, and this may be very difficult to show (think about the case of elliptic curves). Similarly, having chosen $\Omega_{\ell}$ , we can define $\Omega_m$ as the product over elements of $m$ . Now choose an arbitrary finite set $M$ of finite subsets of $L$ ; e.g., $M$ could correspond by unique factorization to the set of all square-free integers $\leq Q$ , if $L$ is the set of primes at most $Q$ .

We now have

$\mu(S)\leq \Delta H^{-1}$

where $\Delta$ is as above, and will be explained below, and

$H := \sum_{m\in M}{\prod_{{\ell}\in m}{\frac{\nu_m(\Omega_{\ell})}{1-\nu_{\ell}(\Omega_{\ell})}}}.$

Now this may look strange, but observe the following: if $M$ is the set of all subsets of $L$ , a moment’s thought reveals that

$H^{-1}=\prod_{{\ell}\in L}{(1-\nu(\Omega_{\ell}))},$

i.e., for $\Delta=\mu(X)$ , the right-hand side of the large-sieve inequality is exactly the expected value in (1). The problem is that, typically, $\Delta$ would be too large if $M$ is that big (this is shown for instance by the problem of explosion of the remainder term in applying the Eratosthenes-Legendre sieve to compute the number of primes less than $X$ ). The interpretation of this is then similar to that of the earlier sieves by V. Brun: using all subsets of $L$ (all integers divisible by primes $\leq L$ ) is too costly, but a suitable “cutoff” yields approximations to the truth which can remain very useful. (One case where one can take $M$ so large is for Example 1, if the event $A_i$ are independent so that by the very definition of independence, the probability of the sifted set $S$ is exactly

$P(S)=\prod_{i}{(1-P(A_i))}=H^{-1},$

and it turns out that in this case one has exactly $\Delta=1$ , so that
the large-sieve inequality is sharp in the full generality considered). $\diamond$

Now we need finally to define $\Delta$ . The reason it was delayed is
that it will need yet more notation…

First, for each ${\ell}$ in $\Lambda$ , let $V_{\ell}$ be the finite dimensional Hilbert space of functions of mean-zero on $Y_{\ell}$ , i.e., of those $f$ such that

$\sum_{y\in Y_{\ell}}{\nu_{\ell}(y)f(y)}=0$

(with the inner product defined using the density $\nu_{\ell}$ ).

Select (arbitrarily) an orthonormal basis $B_{\ell}$ of $V_{\ell}$ . Then consider the set $B_m$ of functions on $Y_m$ of the type

$(y_{\ell})\mapsto \prod_{{\ell}\in m}{\varphi_{\ell}(y_{\ell})}$

where $\varphi_{\ell}$ is an element of the chosen basis of $V_{\ell}$ . Those functions form themselves an orthonormal basis for a “primitive” subspace of the space of functions on $Y_m$ , with respect to the “product” inner-product on $Y_m$ .

Finally, $\Delta$ is defined to be the least non-negative real number such that the inequality

$\sum_{m\in M}{\sum_{\varphi\in B_m}{\Bigl|\int_{X}{\alpha(x)\varphi(\rho_{m}(F_x))d\mu(x)}\Bigr|^2}} \leq \Delta \|\alpha\|^2,$

holds for any square-integrable function $\alpha$ on $X$ . For Renyi’s inequality, $M$ is simply the set of singletons $\{{\ell}\}$ for ${\ell}\in L$ .

One can see quite easily that $\Delta$ is independent of the choice of basis (but practical estimates may well depend on a clever selection) by interpreting it as the norm of some operator between Hilbert spaces.

Let’s look at the classical example of integers: there, $V_{\ell}$ is the space of functions on $\mathbf{Z}/{\ell}\mathbf{Z}$ which sum to zero, and a particular basis is given by the additive characters

$x\mapsto \exp(2i\pi ax/{\ell})$

for all non-zero $a\in \mathbf{Z}/{\ell}\mathbf{Z}$ . Extending to all $m$, by the Chinese Remainder Theorem, amounts to looking at characters

$x\mapsto \exp(2i\pi ax/m)$

of $\mathbf{Z}/m\mathbf{Z}$ with $a$ coprime with $m$ . So the defining inequality for sieving $\{1,\ldots, N\}$ using square-free numbers up to $L$ is

$\sum_{m\leq M}{\sum_{(a,m)=1}{|\sum_{1\leq n\leq N}{a_n \exp(2i\pi an/m)}|^2}}\leq \Delta \sum_n{|a_n|^2}.$

It is classical, by now, that $\Delta\leq N-1+L^2$ . The first bound of this strength being due to Bombieri, though this particular value is due to Montgomery-Vaughan and Selberg independently. In fact, stronger analytic inequalities are known, there it is not necessary to restrict to $m$ square-free on the left-hand side, and in fact one can look at arbitrary sets of well-spaced points in $\mathbf{R}/\mathbf{Z}$ . This classical theory is very-well described in Montgomery’s survey paper. [Incidentally, reading this paper was the subject of my first-year project at E.N.S. Lyon under the direction of É. Fouvry and H. Daboussi, and it is quite remarkable how much of my mathematical work has involved the large-sieve in one way or another…]

— An approach to concrete sieve estimates —

Here is the most standard way to estimate directly $\Delta$ . It relies on duality: the defining inequality is equivalent with

$\int_X{\Bigl|\sum_{m\in M}{\sum_{\varphi\in B_m}{\beta(m,\varphi)\varphi(\rho_{m}(F_x))}} \Bigr|^2} \leq \Delta \sum_{m,\varphi}{|\beta(m,\varphi)|^2}$

for arbitrary complex numbers $\beta(m,\pi)$ . Expanding the left-hand side, we find a quadratic form with coefficients

$W(\varphi,\psi) := \int_X{\varphi(\rho_{m}(F_x)) \overline{\psi(\rho_{n}(F_x))}d\mu(x)}$

with $\varphi\in B_m$ and $\psi\in B_n$ . Those correlation coefficients then must be estimated. One expects that

$W(\varphi,\psi)=\delta(\varphi,\psi)\mu(X)+R(\varphi,\psi)$

where the remainder R is estimated uniformly and explicitly, and is small compared with $\mu(X)$ , at least if $m$ and $n$ are not “too big” (in the appropriate sense). Then we have

$\Delta\leq \max_{m,\varphi}\sum_{n,\psi}{|W(\varphi,\psi)|} \leq \mu(X)+\max_{m,\varphi}\sum_{n,\psi}{|R(\varphi,\psi)|},$

and this will have the desired feature of being close to $\mu(X)$ if $L$ and $M$ are suitably chosen.

Here are the two most important new examples where this approach gives strong results, and leads to links with very deep principles of harmonic analysis and number theory.

In the setting of Example 6 (and its generalizations), i.e., sieve for discrete groups and random walks, a natural basis $B_{\ell}$ is given by the matrix coefficients of irreducible unitary representations of the finite groups $G_{\ell}=SL(n,\mathbf{Z}/{\ell}\mathbf{Z})$ . Then estimating $W(\varphi,\psi)$ turns out to be very cleanly linked with Property $(\tau)$ for the congruence quotients of $SL(n,\mathbf{Z})$ (or Property (T), though that applies only for $n$ at least $3$ ). Recall that this property states that there is a positive $\delta>0$ such that, for any $m\geq 1$ , any map

$\pi : SL(n,\mathbf{Z})\rightarrow SL(n,\mathbf{Z}/m\mathbf{Z}) \rightarrow U(r)$

where $U(r)$ is the unitary group of a finite-dimensional Hilbert space of dimension $r$ , either there is a vector $v\in \mathbf{C}^r$ invariant under all $\pi(g)$ , $g\in SL(n,\mathbf{Z})$ , or for any unit vector $v\in\mathbf{C}^r$ , there exists $s\in S$ (recall $S$ is a generating set of $SL(n,\mathbf{Z})$ ) such that

$\|\pi(s)v-v\|\geq \delta.$

In other words, if a vector is not invariant under all $s\in S$ , it is uniformly non-invariant. This property is also equivalent with the fact that the Cayley graphs of the congruence quotients form a family of expanders.

(Note: we do not currently necessarily need fine knowledge of the character table of $G_{\ell}$ , as described at least partly through Deligne-Lusztig characters, though, when $n$ is large, this can lead to small improvements in estimates, and interesting problems).

In Example 7, since we work with conjugacy classes, the natural basis $B_{\ell}$ is the set of characters of irreducible unitary representations of $G_{\ell}$ . Estimating $W(\varphi,\psi)$ is then a problem of analytic number theory, amounting to uniform quantitative estimates for partial sums of coefficients (at primes) of Artin
$L$ -functions. As such, there exist unconditional, but fairly weak, results, and much stronger ones when assuming the Generalized Riemann Hypothesis.

When dealing with the function field analogue, the Riemann Hypothesis is known, due to the amazing work of Deligne [in my mind, the most important number-theory result of the 20th century], building on the foundations of Grothendieck. However, this only gives a (very strong) result for fixed $\varphi$ , $\psi$ , a priori; getting uniform bounds is not trivial, and because the remainder terms after applying the Grothendieck-Lefschetz trace formula and the Riemann Hypothesis involve the dimension of some étale cohomology groups depending on $\varphi$ and $\psi$ , some uniform estimates for those so-called Betti numbers are required. They are proved either by analyzing the ramification behavior of the analogues of the Artin representations, or (for higher-dimensional parameter spaces) by adapting an induction method o f Katz.

In applying concretely the sieve for Frobenius, it should be mentioned
that one of the main issues is in fact to compute $Y_{\ell}$ precisely! This is a completely new phenomenon compared with the classical sieves, where $Y_{\ell}$ is perfectly known and understood. Here, this amounts to computing certain Galois groups, such as those of torsion points on elliptic curves, or their analogues for function fields (which are often called monodromy groups because of their geometric origin). Serre’s theorem quoted above for non CM elliptic curves is one of the most famous examples, but the “baby” example of this is the fact that cyclotomic polynomials are irreducible over $\mathbf{Q}$ (since this means that the Galois groups of cyclotomic extensions, generated by roots of unity, are “as big as possible”, which is usually the desired outcome). In the first application of the sieve for Frobenius, which concerned a conjecture of Katz already considered by N. Chavdarov, I used an unpublished result of J-K. Yu which has recently been given a new proof by C. Hall.

6 comments

Comments feed for this article

9 August, 2007 at 6:15 am

Jordan Ellenberg

Emmanuel,

I’ve only just started reading this enjoyable post! But let me make a couple of comments off the bat.

1. In Poonen’s theorem (your example 4), the “sieve-style” method really only works for the points of small degree — there is a very clever trick involving Frobenius to deal with the points of large degree (section 2.3 of Poonen’s paper.) But maybe this is what you meant by “more or less” — or maybe to you this material in section 2.3 can be thought of as “sieve-style” as well, in which case I’d like to hear more about it!

2. Yu’s unpublished theorem was also reproved by Jeff Achter and Rachel Pries, independently of Hall.

9 August, 2007 at 11:06 am

Emmanuel Kowalski

Hello Jordan !

(1) This is absolutely correct; what I wanted to say was simply that
the framework of Poonen’s work, which is quite unusual by the usual
standards of classical analytic number theory, does fall within the
general framework I described (a good test that it is at least close to the
“right” one, although whether one can expect a perfect abstract theory for
something like this is not clear at all). In fact, in sieve-theoretic terms,
Poonen’s application is something like a “zero-dimensional sieve”, which
is quite far from the standard “large-sieve” (very few residue classes are excluded, instead of a positive proportion). This means in particular that
it is possible to expect an asymptotic formula with a positive density, though proving such a formula is of course quite difficult usually (see for instance the remarkable recent work of Helfgott proving that values of
a cubic polynomial evaluated at primes represent the right number of squarefree values, when there is no trivial obstruction). For a zero
dimensional sieve, the large sieve useless results in general.

(2) I know of this work of Achter and Pries, but actually I was thinking of
the stronger version of Yu’s theorem for one-parameter families
y^2=f(x)(x-t)
and as far as I understood their paper, this does not follow from their
approach (but I may be mistaken…)

15 August, 2007 at 9:57 am

Jesse Kass

A) It should be noted that (as I understand it) Yu’s proof of this monodromy result uses a technique that is entirely different from the technique used both in Hall’s paper and Achter-Pries’ paper. For the families considered by Achter and Pries, his proof is fundamentally topological and uses the description of the fundamental group of the parameter space as a briad group. He handles the case of an arbitrary base field by proving a comparison result.

B) Kowalski is correct that the Achter and Pries do not compute the monodromy for the family $y^2 = f(x)(x-t)$ (considered as a family in the variable t). Furthermore, their technique can not be extended to handle this
family without some additional work.

Achter and Pries study the family

$U_d: y^2 = (x-r_1) \ldots (x-r_{d})$

given by allowing $(r_1,\ldots, r_d)$ to be any d-tuple of distint points. Their technique is to partially compactify the family by adding in curves of compact type (curves with compact Jacobian). The boundary looks like a product of spaces of the same type but of smaller dimension, so the monodromy group can be computed inductively. Achter and Pries then use a result about semi-continuity of mondromy under degeneration to compare the monodromy group of the boundary to the monodromy group of $U_d$ .

One way of extending this technique would be to put the $y^2=f(x)(x-t)$ into a larger family of smooth curves by allowing the coefficients of f(x) to vary in some continuous manner. One could try to do this in such a way so that the monodromy group of $y^2=f(x)(x-t)$ is the same as the monodromy group of the larger family. This larger family would then be a family for which it is natural to try to apply the technique from the Achter-Pries paper to.

The difficulty is that it is not immediately clear to me how to do this in such a manner that the larger family admits a partial compactification by curves of compact type. The wandering point (t,0) on the curve is the main source of difficulties.

For a partial compactification by curves that are not of compact type, it may still be possible to extend the method, but the difficulty lies proving an appropriate semi-continuity result. When all the curves in the family are of compact type, the monodromy group is the group associated to a sheaf (the sheaf $R^1f_{*} Z/\ell$ ) that is locally constant on the entire partial compactification. This fact is used in a fundamental way by Achter and Pries and is no longer true when the family includes curves that are not of compact type.

16 August, 2007 at 2:01 am

Emmanuel Kowalski

Hello,

Thanks for the information on the methods of Achter and Pries, which are a bit far away from my current knowledge of algebraic geometry.

Incidentally there’s another question I’ve been wondering about concerning the families $y^2=f(x)(x-t)$ , which is whether the curves in the family are generically ordinary. (This is partly motivated by applications where both big monodromy and ordinarity are useful, e.g. this paper). I used to think this would probably be the case, but it’s not so clear since Achter and Pries have extended their results to get big monodromy results for the moduli spaces of curves with given $p$ -rank (arXiv:0707.2110), which implies in particular the existence of one-parameter families of curves with big monodromy, but not ordinary. (Though, of course, not necessarily of the form above.) (Katz had mentioned having found such families, but without details.)

16 August, 2007 at 8:29 am

Jesse Kass

Hello,
I think that the curves in the family y^2 = f(x) (x-t) are generically ordinary, at least for f(x) a general polynomial. I know that the generic hyperelliptic curve of genus g is ordinary and I think that the same kind of technique can be applied to your question.

Here is a sketch of the argument: Fix an integer d and consider the family

V_d:

y^2 = f(x)(x-t)

considered as a family over an open subset of A^{d-1} \times A^1 by varying both t and the coefficients of f(x) in a continuous manner. If this family is generically ordinary, then it follows that the family y^2 = g(x) (x-t) is generically ordinary for g(x) a general polynomial.

Now to prove that the larger family is generically ordinary, we induct on d. For d small (d <= 6), the family V_d contains every smooth curve of appropriate genus so we are ok.

For the inductive step, first compactify V_d by adding in nodal hyperelliptic curves. Now one can construct lots of ordinary curves that lie on the boundary of V_d by taking two curves in V_d’with d’ <d and gluing them together at a point. A deformation theory argument shows that these nodal, ordinary curves can be deformed into smooth, ordinary curves that lie in the interior of V_d. This proves that the generic element of V_d is ordinary.

Technical Note: For the inductive step, it is probably better to work with a finite quotient of V_d that has a nice moduli theoretic interpretation and then use a standard compactification of that space rather than to work with V_d directly.

20 November, 2008 at 1:13 pm

Marker lecture IV: sieving for almost primes and expanders « What’s new

[…] The selection of the sieve weights is now a well-developed science (see also my earlier post, and Kowalski’s guest post, on this topic), and Bourgain, Gamburd and Sarnak basically use off-the-shelf sieves (in […]

	Anonymous on On product representations of…
	Anonymous on 275A, Notes 3: The weak and st…
	Anonymous on 275A, Notes 3: The weak and st…
	Alex Gunning on A symmetric formulation of the…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Anonymous on 275A, Notes 3: The weak and st…
	Anonymous on It ought to be common knowledg…
	Anonymous on Work hard
	Aleksandar on 245C, Notes 4: Sobolev sp…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Terence Tao on 245C, Notes 4: Sobolev sp…
	Terence Tao on 275A, Notes 3: The weak and st…

(Emmanuel Kowalski) The large sieve inequalities

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

6 comments

Leave a comment Cancel reply

For commenters

(Emmanuel Kowalski) The large sieve inequalities

Share this:

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

6 comments

Leave a comment Cancel reply

For commenters