You are currently browsing the monthly archive for March 2012.
A few days ago, Endre Szemerédi was awarded the 2012 Abel prize “for his fundamental contributions to discrete mathematics and theoretical computer science, and in recognition of the profound and lasting impact of these contributions on additive number theory and ergodic theory.” The full citation for the prize may be found here, and the written notes for a talk given by Tim Gowers on Endre’s work at the announcement may be found here (and video of the talk can be found here).
As I was on the Abel prize committee this year, I won’t comment further on the prize, but will instead focus on what is arguably Endre’s most well known result, namely Szemerédi’s theorem on arithmetic progressions:
Theorem 1 (Szemerédi’s theorem) Let be a set of integers of positive upper density, thus , where . Then contains an arithmetic progression of length for any .
Szemerédi’s original proof of this theorem is a remarkably intricate piece of combinatorial reasoning. Most proofs of theorems in mathematics – even long and difficult ones – generally come with a reasonably compact “high-level” overview, in which the proof is (conceptually, at least) broken down into simpler pieces. There may well be technical difficulties in formulating and then proving each of the component pieces, and then in fitting the pieces together, but usually the “big picture” is reasonably clear. To give just one example, the overall strategy of Perelman’s proof of the Poincaré conjecture can be briefly summarised as follows: to show that a simply connected three-dimensional manifold is homeomorphic to a sphere, place a Riemannian metric on it and perform Ricci flow, excising any singularities that arise by surgery, until the entire manifold becomes extinct. By reversing the flow and analysing the surgeries performed, obtain enough control on the topology of the original manifold to establish that it is a topological sphere.
In contrast, the pieces of Szemerédi’s proof are highly interlocking, particularly with regard to all the epsilon-type parameters involved; it takes quite a bit of notational setup and foundational lemmas before the key steps of the proof can even be stated, let alone proved. Szemerédi’s original paper contains a logical diagram of the proof (reproduced in Gowers’ recent talk) which already gives a fair indication of this interlocking structure. (Many years ago I tried to present the proof, but I was unable to find much of a simplification, and my exposition is probably not that much clearer than the original text.) Even the use of nonstandard analysis, which is often helpful in cleaning up armies of epsilons, turns out to be a bit tricky to apply here. (In typical applications of nonstandard analysis, one can get by with a single nonstandard universe, constructed as an ultrapower of the standard universe; but to correctly model all the epsilons occuring in Szemerédi’s argument, one needs to repeatedly perform the ultrapower construction to obtain a (finite) sequence of increasingly nonstandard (and increasingly saturated) universes, each one containing unbounded quantities that are far larger than any quantity that appears in the preceding universe, as discussed at the end of this previous blog post. This sequence of universes does end up concealing all the epsilons, but it is not so clear that this is a net gain in clarity for the proof; I may return to the nonstandard presentation of Szemeredi’s argument at some future juncture.)
Instead of trying to describe the entire argument here, I thought I would instead show some key components of it, with only the slightest hint as to how to assemble the components together to form the whole proof. In particular, I would like to show how two particular ingredients in the proof – namely van der Waerden’s theorem and the Szemerédi regularity lemma – become useful. For reasons that will hopefully become clearer later, it is convenient not only to work with ordinary progressions , but also progressions of progressions , progressions of progressions of progressions, and so forth. (In additive combinatorics, these objects are known as generalised arithmetic progressions of rank one, two, three, etc., and play a central role in the subject, although the way they are used in Szemerédi’s proof is somewhat different from the way that they are normally used in additive combinatorics.) Very roughly speaking, Szemerédi’s proof begins by building an enormous generalised arithmetic progression of high rank containing many elements of the set (arranged in a “near-maximal-density” configuration), and then steadily prunes this progression to improve the combinatorial properties of the configuration, until one ends up with a single rank one progression of length that consists entirely of elements of .
To illustrate some of the basic ideas, let us first consider a situation in which we have located a progression of progressions of length , with each progression , being quite long, and containing a near-maximal amount of elements of , thus
where is the “maximal density” of along arithmetic progressions. (There are a lot of subtleties in the argument about exactly how good the error terms are in various approximations, but we will ignore these issues for the sake of this discussion and just use the imprecise symbols such as instead.) By hypothesis, is positive. The objective is then to locate a progression in , with each in for . It may help to view the progression of progressions as a tall thin rectangle .
If we write for , then the problem is equivalent to finding a (possibly degenerate) arithmetic progression , with each in .
for “most” subsets of of density of a certain form to be specified shortly. This is a plausible type of assumption if one believes to behave like a random set, and if the sets are constructed “independently” of the in some sense. Of course, we do not expect such an assumption to be valid all of the time, but we will postpone consideration of this point until later. Let us now see how this sort of weakly mixing hypothesis could help one count progressions of the desired form.
We will inductively consider the following (nonrigorously defined) sequence of claims for each :
- : For most choices of , there are arithmetic progressions in with the specified choice of , such that for all .
(Actually, to avoid boundary issues one should restrict to lie in the middle third of , rather than near the edges, but let us ignore this minor technical detail.) The quantity is natural here, given that there are arithmetic progressions in that pass through in the position, and that each one ought to have a probability of or so that the events simultaneously hold.) If one has the claim , then by selecting a typical in , we obtain a progression with for all , as required. (In fact, we obtain about such progressions by this method.)
We can heuristically justify the claims by induction on . For , the claims are clear just from direct counting of progressions (as long as we keep away from the edges of ). Now suppose that , and the claims have already been proven. For any and for most , we have from hypothesis that there are progressions in through with . Let be the set of all the values of attained by these progressions, then . Invoking the weak mixing hypothesis, we (heuristically, at least) conclude that for most choices of , we have
which then gives the desired claim .
The observant reader will note that we only needed the claim in the case for the above argument, but for technical reasons, the full proof requires one to work with more general values of (also the claim needs to be replaced by a more complicated version of itself, but let’s ignore this for sake of discussion).
We now return to the question of how to justify the weak mixing hypothesis (2). For a single block of , one can easily concoct a scenario in which this hypothesis fails, by choosing to overlap with too strongly, or to be too disjoint from . However, one can do better if one can select from a long progression of blocks. The starting point is the following simple double counting observation that gives the right upper bound:
Proposition 2 (Single upper bound) Let be a progression of progressions for some large . Suppose that for each , the set has density in (i.e. (1) holds). Let be a subset of of density . Then (if is large enough) one can find an such that
Proof: The key is the double counting identity
Because has maximal density and is large, we have
for each , and thus
The claim then follows from the pigeonhole principle.
for all , where is the density of in . The above proposition gives, for each , a choice of for which (3) holds, but it could be a different for each , and so it is not immediately obvious how to use Proposition 2 to find an for which (3) holds simultaneously for all . However, it turns out that the van der Waerden theorem is the perfect tool for this amplification:
Proposition 3 (Multiple upper bound) Let be a progression of progressions for some large . Suppose that for each , the set has density in (i.e. (1) holds). For each , let be a subset of of density . Then (if is large enough depending on ) one can find an such that
simultaneously for all .
Proof: Suppose that the claim failed (for some suitably large ). Then, for each , there exists such that
This can be viewed as a colouring of the interval by colours. If we take large compared to , van der Waerden’s theorem allows us to then find a long subprogression of which is monochromatic, so that is constant on this progression. But then this will furnish a counterexample to Proposition 2.
One nice thing about this proposition is that the upper bounds can be automatically upgraded to an asymptotic:
Proposition 4 (Multiple mixing) Let be a progression of progressions for some large . Suppose that for each , the set has density in (i.e. (1) holds). For each , let be a subset of of density . Then (if is large enough depending on ) one can find an such that
simultaneously for all .
Proof: By applying the previous proposition to the collection of sets and their complements (thus replacing with , one can find an for which
which gives the claim.
However, this improvement of Proposition 2 turns out to not be strong enough for applications. The reason is that the number of sets for which mixing is established is too small compared with the length of the progression one has to use in order to obtain that mixing. However, thanks to the magic of the Szemerédi regularity lemma, one can amplify the above proposition even further, to allow for a huge number of to be mixed (at the cost of excluding a small fraction of exceptions):
Proposition 5 (Really multiple mixing) Let be a progression of progressions for some large . Suppose that for each , the set has density in (i.e. (1) holds). For each in some (large) finite set , let be a subset of of density . Then (if is large enough, but not dependent on the size of ) one can find an such that
simultaneously for almost all .
Proof: We build a bipartite graph connecting the progression to the finite set by placing an edge between an element and an element whenever . The number can then be interpreted as the degree of in this graph, while the number is the number of neighbours of that land in .
We now apply the regularity lemma to this graph . Roughly speaking, what this lemma does is to partition and into almost equally sized cells and such that for most pairs of cells, the graph resembles a random bipartite graph of some density between these two cells. The key point is that the number of cells here is bounded uniformly in the size of and . As a consequence of this lemma, one can show that for most vertices in a typical cell , the number is approximately equal to
and the number is approximately equal to
The point here is that the different statistics are now controlled by a mere statistics (this is not unlike the use of principal component analysis in statistics, incidentally, but that is another story). Now, we invoke Proposition 4 to find an for which
simultaneously for all , and the claim follows.
This proposition now suggests a way forward to establish the type of mixing properties (2) needed for the preceding attempt at proving Szemerédi’s theorem to actually work. Whereas in that attempt, we were working with a single progression of progressions of progressions containing a near-maximal density of elements of , we will now have to work with a family of such progression of progressions, where ranges over some suitably large parameter set. Furthermore, in order to invoke Proposition 5, this family must be “well-arranged” in some arithmetic sense; in particular, for a given , it should be possible to find many reasonably large subfamilies of this family for which the terms of the progression of progressions in this subfamily are themselves in arithmetic progression. (Also, for technical reasons having to do with the fact that the sets in Proposition 5 are not allowed to depend on , one also needs the progressions for any given to be “similar” in the sense that they intersect in the same fashion (thus the sets as varies need to be translates of each other).) If one has this sort of family, then Proposition 5 allows us to “spend” some of the degrees of freedom of the parameter set in order to gain good mixing properties for at least one of the sets in the progression of progressions.
Of course, we still have to figure out how to get such large families of well-arranged progressions of progressions. Szemerédi’s solution was to begin by working with generalised progressions of a much larger rank than the rank progressions considered here; roughly speaking, to prove Szemerédi’s theorem for length progressions, one has to consider generalised progressions of rank as high as . It is possible by a reasonably straightforward (though somewhat delicate) “density increment argument” to locate a huge generalised progression of this rank which is “saturated” by in a certain rather technical sense (related to the concept of “near maximal density” used previously). Then, by another reasonably elementary argument, it is possible to locate inside a suitable large generalised progression of some rank , a family of large generalised progressions of rank which inherit many of the good properties of the original generalised progression, and which have the arithmetic structure needed for Proposition 5 to be applicable, at least for one value of . (But getting this sort of property for all values of simultaneously is tricky, and requires many careful iterations of the above scheme; there is also the problem that by obtaining good behaviour for one index , one may lose good behaviour at previous indices, leading to a sort of “Tower of Hanoi” situation which may help explain the exponential factor in the rank that is ultimately needed. It is an extremely delicate argument; all the parameters and definitions have to be set very precisely in order for the argument to work at all, and it is really quite remarkable that Endre was able to see it through to the end.)
This is an addendum to last quarter’s course notes on Hilbert’s fifth problem, which I am in the process of reviewing in order to transcribe them into a book (as was done similarly for several other sets of lecture notes on this blog). When reviewing the zeroth set of notes in particular, I found that I had made a claim (Proposition 11 from those notes) which asserted, roughly speaking, that any sufficiently large nilprogression was an approximate group, and promised to prove it later in the course when we had developed the ability to calculate efficiently in nilpotent groups. As it turned out, I managed finish the course without the need to develop these calculations, and so the proposition remained unproven. In order to rectify this, I will use this post to lay out some of the basic algebra of nilpotent groups, and use it to prove the above proposition, which turns out to be a bit tricky. (In my paper with Breuillard and Green, we avoid the need for this proposition by restricting attention to a special type of nilprogression, which we call a nilprogression in -normal form, for which the computations are simpler.)
There are several ways to think about nilpotent groups; for instance one can use the model example of the Heisenberg group
over an arbitrary ring (which need not be commutative), or more generally any matrix group consisting of unipotent upper triangular matrices, and view a general nilpotent group as being an abstract generalisation of such concrete groups. (In the case of nilpotent Lie groups, at least, this is quite an accurate intuition, thanks to Engel’s theorem.) Or, one can adopt a Lie-theoretic viewpoint and try to think of nilpotent groups as somehow arising from nilpotent Lie algebras; this intuition is rigorous when working with nilpotent Lie groups (at least when the characteristic is large, in order to avoid issues coming from the denominators in the Baker-Campbell-Hausdorff formula), but also retains some conceptual value in the non-Lie setting. In particular, nilpotent groups (particularly finitely generated ones) can be viewed in some sense as “nilpotent Lie groups over “, even though Lie theory does not quite work perfectly when the underlying scalars merely form an integral domain instead of a field.
Another point of view, which arises naturally both in analysis and in algebraic geometry, is to view nilpotent groups as modeling “infinitesimal” perturbations of the identity, where the infinitesimals have a certain finite order. For instance, given a (not necessarily commutative) ring without identity (representing all the “small” elements of some larger ring or algebra), we can form the powers for , defined as the ring generated by -fold products of elements in ; this is an ideal of which represents the elements which are “ order” in some sense. If one then formally adjoins an identity onto the ring , then for any , the multiplicative group is a nilpotent group of step at most . For instance, if is the ring of strictly upper matrices (over some base ring), then vanishes and becomes the group of unipotent upper triangular matrices over the same ring, thus recovering the previous matrix-based example. In analysis applications, might be a ring of operators which are somehow of “order” or for some small parameter or , and one wishes to perform Taylor expansions up to order or , thus discarding (i.e. quotienting out) all errors in .
From a dynamical or group-theoretic perspective, one can also view nilpotent groups as towers of central extensions of a trivial group. Finitely generated nilpotent groups can also be profitably viewed as a special type of polycylic group; this is the perspective taken in this previous blog post. Last, but not least, one can view nilpotent groups from a combinatorial group theory perspective, as being words from some set of generators of various “degrees” subject to some commutation relations, with commutators of two low-degree generators being expressed in terms of higher degree objects, and all commutators of a sufficiently high degree vanishing. In particular, generators of a given degree can be moved freely around a word, as long as one is willing to generate commutator errors of higher degree.
With this last perspective, in particular, one can start computing in nilpotent groups by adopting the philosophy that the lowest order terms should be attended to first, without much initial concern for the higher order errors generated in the process of organising the lower order terms. Only after the lower order terms are in place should attention then turn to higher order terms, working successively up the hierarchy of degrees until all terms are dealt with. This turns out to be a relatively straightforward philosophy to implement in many cases (particularly if one is not interested in explicit expressions and constants, being content instead with qualitative expansions of controlled complexity), but the arguments are necessarily recursive in nature and as such can become a bit messy, and require a fair amount of notation to express precisely. So, unfortunately, the arguments here will be somewhat cumbersome and notation-heavy, even if the underlying methods of proof are relatively simple.
I’ve just uploaded to the arXiv my paper The asymptotic distribution of a single eigenvalue gap of a Wigner matrix, submitted to Probability Theory and Related Fields. This paper (like several of my previous papers) is concerned with the asymptotic distribution of the eigenvalues of a random Wigner matrix in the limit , with a particular focus on matrices drawn from the Gaussian Unitary Ensemble (GUE). This paper is focused on the bulk of the spectrum, i.e. to eigenvalues with for some fixed .
The location of an individual eigenvalue is by now quite well understood. If we normalise the entries of the matrix to have mean zero and variance , then in the asymptotic limit , the Wigner semicircle law tells us that with probability one has
where the classical location of the eigenvalue is given by the formula
and the semicircular distribution is given by the formula
Actually, one can improve the error term here from to for any (see this previous recent paper of Van and myself for more discussion of these sorts of estimates, sometimes known as eigenvalue rigidity estimates).
From the semicircle law (and the fundamental theorem of calculus), one expects the eigenvalue spacing to have an average size of . It is thus natural to introduce the normalised eigenvalue spacing
and ask what the distribution of is.
As mentioned previously, we will focus on the bulk case , and begin with the model case when is drawn from GUE. (In the edge case when is close to or to , the distribution is given by the famous Tracy-Widom law.) Here, the distribution was almost (but as we shall see, not quite) worked out by Gaudin and Mehta. By using the theory of determinantal processes, they were able to compute a quantity closely related to , namely the probability
that an interval near of length comparable to the expected eigenvalue spacing is devoid of eigenvalues. For in the bulk and fixed , they showed that this probability is equal to
where is the Dyson projection
to Fourier modes in , and is the Fredholm determinant. As shown by Jimbo, Miwa, Tetsuji, Mori, and Sato, this determinant can also be expressed in terms of a solution to a Painleve V ODE, though we will not need this fact here. In view of this asymptotic and some standard integration by parts manipulations, it becomes plausible to propose that will be asymptotically distributed according to the Gaudin-Mehta distribution , where
A reasonably accurate approximation for is given by the Wigner surmise , which was presciently proposed by Wigner as early as 1957; it is exact for but not in the asymptotic limit .
Unfortunately, when one tries to make this argument rigorous, one finds that the asymptotic for (1) does not control a single gap , but rather an ensemble of gaps , where is drawn from an interval of some moderate size (e.g. ); see for instance this paper of Deift, Kriecherbauer, McLaughlin, Venakides, and Zhou for a more precise formalisation of this statement (which is phrased slightly differently, in which one samples all gaps inside a fixed window of spectrum, rather than inside a fixed range of eigenvalue indices ). (This result is stated for GUE, but can be extended to other Wigner ensembles by the Four Moment Theorem, at least if one assumes a moment matching condition; see this previous paper with Van Vu for details. The moment condition can in fact be removed, as was done in this subsequent paper with Erdos, Ramirez, Schlein, Vu, and Yau.)
The problem is that when one specifies a given window of spectrum such as , one cannot quite pin down in advance which eigenvalues are going to lie to the left or right of this window; even with the strongest eigenvalue rigidity results available, there is a natural uncertainty of or so in the index (as can be quantified quite precisely by this central limit theorem of Gustavsson).
The main difficulty here is that there could potentially be some strange coupling between the event (1) of an interval being devoid of eigenvalues, and the number of eigenvalues to the left of that interval. For instance, one could conceive of a possible scenario in which the interval in (1) tends to have many eigenvalues when is even, but very few when is odd. In this sort of situation, the gaps may have different behaviour for even than for odd , and such anomalies would not be picked up in the averaged statistics in which is allowed to range over some moderately large interval.
The main result of the current paper is that these anomalies do not actually occur, and that all of the eigenvalue gaps in the bulk are asymptotically governed by the Gaudin-Mehta law without the need for averaging in the parameter. Again, this is shown first for GUE, and then extended to other Wigner matrices obeying a matching moment condition using the Four Moment Theorem. (It is likely that the moment matching condition can be removed here, but I was unable to achieve this, despite all the recent advances in establishing universality of local spectral statistics for Wigner matrices, mainly because the universality results in the literature are more focused on specific energy levels than on specific eigenvalue indices . To make matters worse, in some cases universality is currently known only after an additional averaging in the energy parameter.)
The main task in the proof is to show that the random variable is largely decoupled from the event in (1) when is drawn from GUE. To do this we use some of the theory of determinantal processes, and in particular the nice fact that when one conditions a determinantal process to the event that a certain spatial region (such as an interval) contains no points of the process, then one obtains a new determinantal process (with a kernel that is closely related to the original kernel). The main task is then to obtain a sufficiently good control on the distance between the new determinantal kernel and the old one, which we do by some functional-analytic considerations involving the manipulation of norms of operators (and specifically, the operator norm, Hilbert-Schmidt norm, and nuclear norm). Amusingly, the Fredholm alternative makes a key appearance, as I end up having to invert a compact perturbation of the identity at one point (specifically, I need to invert , where is the Dyson projection and is an interval). As such, the bounds in my paper become ineffective, though I am sure that with more work one can invert this particular perturbation of the identity by hand, without the need to invoke the Fredholm alternative.
In this final set of course notes, we discuss how (a generalisation of) the expansion results obtained in the preceding notes can be used for some nnumber-theoretic applications, and in particular to locate almost primes inside orbits of thin groups, following the work of Bourgain, Gamburd, and Sarnak. We will not attempt here to obtain the sharpest or most general results in this direction, but instead focus on the simplest instances of these results which are still illustrative of the ideas involved.
One of the basic general problems in analytic number theory is to locate tuples of primes of a certain form; for instance, the famous (and still unsolved) twin prime conjecture asserts that there are infinitely many pairs in the line in which both entries are prime. In a similar spirit, one of the Landau conjectures (also still unsolved) asserts that there are infinitely many primes in the set . The Mersenne prime conjecture (also unsolved) asserts that there are infinitely many primes in the set , and so forth.
More generally, given some explicit subset in (or , if one wishes), such as an algebraic variety, one can ask the question of whether there are infinitely many integer lattice points in in which all the coefficients are simultaneously prime; let us refer to such points as prime points.
At this level of generality, this problem is impossibly difficult. Indeed, even the much simpler problem of deciding whether the set is non-empty (let alone containing prime points) when is a hypersurface cut out by a polynomial is essentially Hilbert’s tenth problem, which is known to be undecidable in general by Matiyasevich’s theorem. So one needs to restrict attention to a more special class of sets , in which the question of finding integer points is not so difficult. One model case is to consider orbits , where is a fixed lattice vector and is some discrete group that acts on somehow (e.g. might be embedded as a subgroup of the special linear group , or on the affine group ). In such a situation it is then quite easy to show that is large; for instance, will be infinite precisely when the stabiliser of in has infinite index in .
Even in this simpler setting, the question of determining whether an orbit contains infinitely prime points is still extremely difficult; indeed the three examples given above of the twin prime conjecture, Landau conjecture, and Mersenne prime conjecture are essentially of this form (possibly after some slight modification of the underlying ring , see this paper of Bourgain-Gamburd-Sarnak for details), and are all unsolved (and generally considered well out of reach of current technology). Indeed, the list of non-trivial orbits which are known to contain infinitely many prime points is quite slim; Euclid’s theorem on the infinitude of primes handles the case , Dirichlet’s theorem handles infinite arithmetic progressions , and a somewhat complicated result of Green, Tao, and Ziegler handles “non-degenerate” affine lattices in of rank two or more (such as the lattice of length arithmetic progressions), but there are few other positive results known that are not based on the above cases (though we will note the remarkable theorem of Friedlander and Iwaniec that there are infinitely many primes of the form , and the related result of Heath-Brown that there are infinitely many primes of the form , as being in a kindred spirit to the above results, though they are not explicitly associated to an orbit of a reasonable action as far as I know).
On the other hand, much more is known if one is willing to replace the primes by the larger set of almost primes – integers with a small number of prime factors (counting multiplicity). Specifically, for any , let us call an -almost prime an integer which is the product of at most primes, and possibly by the unit as well. Many of the above sorts of questions which are open for primes, are known for -almost primes for sufficiently large. For instance, with regards to the twin prime conjecture, it is a result of Chen that there are infinitely many pairs where is a prime and is a -almost prime; in a similar vein, it is a result of Iwaniec that there are infinitely many -almost primes of the form . On the other hand, it is still open for any fixed whether there are infinitely many Mersenne numbers which are -almost primes. (For the superficially similar situation with the numbers , it is in fact believed (but again unproven) that there are only finitely many -almost primes for any fixed (the Fermat prime conjecture).)
The main tool that allows one to count almost primes in orbits is sieve theory. The reason for this lies in the simple observation that in order to ensure that an integer of magnitude at most is an -almost prime, it suffices to guarantee that is not divisible by any prime less than . Thus, to create -almost primes, one can start with the integers up to some large threshold and remove (or “sieve out”) all the integers that are multiples of any prime less than . The difficulty is then to ensure that a sufficiently non-trivial quantity of integers remain after this process, for the purposes of finding points in the given set .
The most basic sieve of this form is the sieve of Eratosthenes, which when combined with the inclusion-exclusion principle gives the Legendre sieve (or exact sieve), which gives an exact formula for quantities such as the number of natural numbers less than or equal to that are not divisible by any prime less than or equal to a given threshold . Unfortunately, when one tries to evaluate this formula, one encounters error terms which grow exponentially in , rendering this sieve useful only for very small thresholds (of logarithmic size in ). To improve the sieve level up to a small power of such as , one has to replace the exact sieve by upper bound sieves and lower bound sieves which only seek to obtain upper or lower bounds on quantities such as , but contain a polynomial number of terms rather than an exponential number. There are a variety of such sieves, with the two most common such sieves being combinatorial sieves (such as the beta sieve), based on various combinatorial truncations of the inclusion-exclusion formula, and the Selberg upper bound sieve, based on upper bounds that are the square of a divisor sum. (There is also the large sieve, which is somewhat different in nature and based on almost orthogonality considerations, rather than on any actual sieving, to obtain upper bounds.) We will primarily work with a specific sieve in this notes, namely the beta sieve, and we will not attempt to optimise all the parameters of this sieve (which ultimately means that the almost primality parameter in our results will be somewhat large). For a more detailed study of sieve theory, see the classic text of Halberstam and Richert, or the more recent texts of Iwaniec-Kowalski or of Friedlander-Iwaniec.
Very roughly speaking, the end result of sieve theory is that excepting some degenerate and “exponentially thin” settings (such as those associated with the Mersenne primes), all the orbits which are expected to have a large number of primes, can be proven to at least have a large number of -almost primes for some finite . (Unfortunately, there is a major obstruction, known as the parity problem, which prevents sieve theory from lowering all the way to ; see this blog post for more discussion.) One formulation of this principle was established by Bourgain, Gamburd, and Sarnak:
Theorem 1 (Bourgain-Gamburd-Sarnak) Let be a subgroup of which is not virtually solvable. Let be a polynomial with integer coefficients obeying the following primitivity condition: for any positive integer , there exists such that is coprime to . Then there exists an such that there are infinitely many with non-zero and -almost prime.
This is not the strongest version of the Bourgain-Gamburd-Sarnak theorem, but it captures the general flavour of their results. Note that the theorem immediately implies an analogous result for orbits , in which is now a polynomial from to , and one uses instead of . It is in fact conjectured that one can set here, but this is well beyond current technology. For the purpose of reaching , it is very natural to impose the primitivity condition, but as long as one is content with larger values of , it is possible to relax the primitivity condition somewhat; see the paper of Bourgain, Gamburd, and Sarnak for more discussion.
By specialising to the polynomial , we conclude as a corollary that as long as is primitive in the sense that it contains matrices with all coefficients coprime to for any given , then contains infinitely many matrices whose elements are all -almost primes for some depending only on . For further applications of these sorts of results, for instance to Appolonian packings, see the paper of Bourgain, Gamburd, and Sarnak.
It turns out that to prove Theorem 1, the Cayley expansion results in from the previous set of notes are not quite enough; one needs a more general Cayley expansion result in where is square-free but not necessarily prime. The proof of this expansion result uses the same basic methods as in the case, but is significantly more complicated technically, and we will only discuss it briefly here. As such, we do not give a complete proof of Theorem 1, but hopefully the portion of the argument presented here is still sufficient to give an impression of the ideas involved.