You are currently browsing the category archive for the ‘math.CO’ category.

Ben Green and I have just uploaded to the arXiv our paper “New bounds for Szemeredi’s theorem, Ia: Progressions of length 4 in finite field geometries revisited“, submitted to Proc. Lond. Math. Soc.. This is both an erratum to, and a replacement for, our previous paper “New bounds for Szemeredi’s theorem. I. Progressions of length 4 in finite field geometries“. The main objective in both papers is to bound the quantity {r_4(F^n)} for a vector space {F^n} over a finite field {F} of characteristic greater than {4}, where {r_4(F^n)} is defined as the cardinality of the largest subset of {F^n} that does not contain an arithmetic progression of length {4}. In our earlier paper, we gave two arguments that bounded {r_4(F^n)} in the regime when the field {F} was fixed and {n} was large. The first “cheap” argument gave the bound

\displaystyle  r_4(F^n) \ll |F|^n \exp( - c \sqrt{\log n} )

and the more complicated “expensive” argument gave the improvement

\displaystyle  r_4(F^n) \ll |F|^n n^{-c} \ \ \ \ \ (1)

for some constant {c>0} depending only on {F}.

Unfortunately, while the cheap argument is correct, we discovered a subtle but serious gap in our expensive argument in the original paper. Roughly speaking, the strategy in that argument is to employ the density increment method: one begins with a large subset {A} of {F^n} that has no arithmetic progressions of length {4}, and seeks to locate a subspace on which {A} has a significantly increased density. Then, by using a “Koopman-von Neumann theorem”, ultimately based on an iteration of the inverse {U^3} theorem of Ben and myself (and also independently by Samorodnitsky), one approximates {A} by a “quadratically structured” function {f}, which is (locally) a combination of a bounded number of quadratic phase functions, which one can prepare to be in a certain “locally equidistributed” or “locally high rank” form. (It is this reduction to the high rank case that distinguishes the “expensive” argument from the “cheap” one.) Because {A} has no progressions of length {4}, the count of progressions of length {4} weighted by {f} will also be small; by combining this with the theory of equidistribution of quadratic phase functions, one can then conclude that there will be a subspace on which {f} has increased density.

The error in the paper was to conclude from this that the original function {1_A} also had increased density on the same subspace; it turns out that the manner in which {f} approximates {1_A} is not strong enough to deduce this latter conclusion from the former. (One can strengthen the nature of approximation until one restores such a conclusion, but only at the price of deteriorating the quantitative bounds on {r_4(F^n)} one gets at the end of the day to be worse than the cheap argument.)

After trying unsuccessfully to repair this error, we eventually found an alternate argument, based on earlier papers of ourselves and of Bergelson-Host-Kra, that avoided the density increment method entirely and ended up giving a simpler proof of a stronger result than (1), and also gives the explicit value of {c = 2^{-22}} for the exponent {c} in (1). In fact, it gives the following stronger result:

Theorem 1 Let {A} be a subset of {F^n} of density at least {\alpha}, and let {\epsilon>0}. Then there is a subspace {W} of {F^n} of codimension {O( \epsilon^{-2^{20}})} such that the number of (possibly degenerate) progressions {a, a+r, a+2r, a+3r} in {A \cap W} is at least {(\alpha^4-\epsilon)|W|^2}.

The bound (1) is an easy consequence of this theorem after choosing {\epsilon := \alpha^4/2} and removing the degenerate progressions from the conclusion of the theorem.

The main new idea is to work with a local Koopman-von Neumann theorem rather than a global one, trading a relatively weak global approximation to {1_A} with a significantly stronger local approximation to {1_A} on a subspace {W}. This is somewhat analogous to how sometimes in graph theory it is more efficient (from the point of view of quantative estimates) to work with a local version of the Szemerédi regularity lemma which gives just a single regular pair of cells, rather than attempting to regularise almost all of the cells. This local approach is well adapted to the inverse {U^3} theorem we use (which also has this local aspect), and also makes the reduction to the high rank case much cleaner. At the end of the day, one ends up with a fairly large subspace {W} on which {A} is quite dense (of density {\alpha-O(\epsilon)}) and which can be well approximated by a “pure quadratic” object, namely a function of a small number of quadratic phases obeying a high rank condition. One can then exploit a special positivity property of the count of length four progressions weighted by pure quadratic objects, essentially due to Bergelson-Host-Kra, which then gives the required lower bound.

A few days ago, Endre Szemerédi was awarded the 2012 Abel prize “for his fundamental contributions to discrete mathematics and theoretical computer science, and in recognition of the profound and lasting impact of these contributions on additive number theory and ergodic theory.” The full citation for the prize may be found here, and the written notes for a talk given by Tim Gowers on Endre’s work at the announcement may be found here (and video of the talk can be found here).

As I was on the Abel prize committee this year, I won’t comment further on the prize, but will instead focus on what is arguably Endre’s most well known result, namely Szemerédi’s theorem on arithmetic progressions:

Theorem 1 (Szemerédi’s theorem) Let {A} be a set of integers of positive upper density, thus {\lim \sup_{N \rightarrow\infty} \frac{|A \cap [-N,N]|}{|[-N,N]|} > 0}, where {[-N,N] := \{-N, -N+1,\ldots,N\}}. Then {A} contains an arithmetic progression of length {k} for any {k>1}.

Szemerédi’s original proof of this theorem is a remarkably intricate piece of combinatorial reasoning. Most proofs of theorems in mathematics – even long and difficult ones – generally come with a reasonably compact “high-level” overview, in which the proof is (conceptually, at least) broken down into simpler pieces. There may well be technical difficulties in formulating and then proving each of the component pieces, and then in fitting the pieces together, but usually the “big picture” is reasonably clear. To give just one example, the overall strategy of Perelman’s proof of the Poincaré conjecture can be briefly summarised as follows: to show that a simply connected three-dimensional manifold is homeomorphic to a sphere, place a Riemannian metric on it and perform Ricci flow, excising any singularities that arise by surgery, until the entire manifold becomes extinct. By reversing the flow and analysing the surgeries performed, obtain enough control on the topology of the original manifold to establish that it is a topological sphere.

In contrast, the pieces of Szemerédi’s proof are highly interlocking, particularly with regard to all the epsilon-type parameters involved; it takes quite a bit of notational setup and foundational lemmas before the key steps of the proof can even be stated, let alone proved. Szemerédi’s original paper contains a logical diagram of the proof (reproduced in Gowers’ recent talk) which already gives a fair indication of this interlocking structure. (Many years ago I tried to present the proof, but I was unable to find much of a simplification, and my exposition is probably not that much clearer than the original text.) Even the use of nonstandard analysis, which is often helpful in cleaning up armies of epsilons, turns out to be a bit tricky to apply here. (In typical applications of nonstandard analysis, one can get by with a single nonstandard universe, constructed as an ultrapower of the standard universe; but to correctly model all the epsilons occuring in Szemerédi’s argument, one needs to repeatedly perform the ultrapower construction to obtain a (finite) sequence of increasingly nonstandard (and increasingly saturated) universes, each one containing unbounded quantities that are far larger than any quantity that appears in the preceding universe, as discussed at the end of this previous blog post. This sequence of universes does end up concealing all the epsilons, but it is not so clear that this is a net gain in clarity for the proof; I may return to the nonstandard presentation of Szemeredi’s argument at some future juncture.)

Instead of trying to describe the entire argument here, I thought I would instead show some key components of it, with only the slightest hint as to how to assemble the components together to form the whole proof. In particular, I would like to show how two particular ingredients in the proof – namely van der Waerden’s theorem and the Szemerédi regularity lemma – become useful. For reasons that will hopefully become clearer later, it is convenient not only to work with ordinary progressions {P_1 = \{ a, a+r_1, a+2r_1, \ldots, a+(k_1-1)r_1\}}, but also progressions of progressions {P_2 := \{ P_1, P_1 + r_2, P_1+2r_2, \ldots, P_1+(k_2-1)r_2\}}, progressions of progressions of progressions, and so forth. (In additive combinatorics, these objects are known as generalised arithmetic progressions of rank one, two, three, etc., and play a central role in the subject, although the way they are used in Szemerédi’s proof is somewhat different from the way that they are normally used in additive combinatorics.) Very roughly speaking, Szemerédi’s proof begins by building an enormous generalised arithmetic progression of high rank containing many elements of the set {A} (arranged in a “near-maximal-density” configuration), and then steadily prunes this progression to improve the combinatorial properties of the configuration, until one ends up with a single rank one progression of length {k} that consists entirely of elements of {A}.

To illustrate some of the basic ideas, let us first consider a situation in which we have located a progression {P, P + r, \ldots, P+(k-1)r} of progressions of length {k}, with each progression {P+ir}, {i=0,\ldots,k-1} being quite long, and containing a near-maximal amount of elements of {A}, thus

\displaystyle  |A \cap (P+ir)| \approx \delta |P|

where {\delta := \lim \sup_{|P| \rightarrow \infty} \frac{|A \cap P|}{|P|}} is the “maximal density” of {A} along arithmetic progressions. (There are a lot of subtleties in the argument about exactly how good the error terms are in various approximations, but we will ignore these issues for the sake of this discussion and just use the imprecise symbols such as {\approx} instead.) By hypothesis, {\delta} is positive. The objective is then to locate a progression {a, a+r', \ldots,a+(k-1)r'} in {A}, with each {a+ir} in {P+ir} for {i=0,\ldots,k-1}. It may help to view the progression of progressions {P, P + r, \ldots, P+(k-1)r} as a tall thin rectangle {P \times \{0,\ldots,k-1\}}.

If we write {A_i := \{ a \in P: a+ir \in A \}} for {i=0,\ldots,k-1}, then the problem is equivalent to finding a (possibly degenerate) arithmetic progression {a_0,a_1,\ldots,a_{k-1}}, with each {a_i} in {A_i}.

By hypothesis, we know already that each set {A_i} has density about {\delta} in {P}:

\displaystyle  |A_i \cap P| \approx \delta |P|. \ \ \ \ \ (1)

Let us now make a “weakly mixing” assumption on the {A_i}, which roughly speaking asserts that

\displaystyle  |A_i \cap E| \approx \delta \sigma |P| \ \ \ \ \ (2)

for “most” subsets {E} of {P} of density {\approx \sigma} of a certain form to be specified shortly. This is a plausible type of assumption if one believes {A_i} to behave like a random set, and if the sets {E} are constructed “independently” of the {A_i} in some sense. Of course, we do not expect such an assumption to be valid all of the time, but we will postpone consideration of this point until later. Let us now see how this sort of weakly mixing hypothesis could help one count progressions {a_0,\ldots,a_{k-1}} of the desired form.

We will inductively consider the following (nonrigorously defined) sequence of claims {C(i,j)} for each {0 \leq i \leq j < k}:

  • {C(i,j)}: For most choices of {a_j \in P}, there are {\sim \delta^i |P|} arithmetic progressions {a_0,\ldots,a_{k-1}} in {P} with the specified choice of {a_j}, such that {a_l \in A_l} for all {l=0,\ldots,i-1}.

(Actually, to avoid boundary issues one should restrict {a_j} to lie in the middle third of {P}, rather than near the edges, but let us ignore this minor technical detail.) The quantity {\delta^i |P|} is natural here, given that there are {\sim |P|} arithmetic progressions {a_0,\ldots,a_{k-1}} in {P} that pass through {a_i} in the {i^{th}} position, and that each one ought to have a probability of {\delta^i} or so that the events {a_0 \in A_0, \ldots, a_{i-1} \in A_{i-1}} simultaneously hold.) If one has the claim {C(k-1,k-1)}, then by selecting a typical {a_{k-1}} in {A_{k-1}}, we obtain a progression {a_0,\ldots,a_{k-1}} with {a_i \in A_i} for all {i=0,\ldots,k-1}, as required. (In fact, we obtain about {\delta^k |P|^2} such progressions by this method.)

We can heuristically justify the claims {C(i,j)} by induction on {i}. For {i=0}, the claims {C(0,j)} are clear just from direct counting of progressions (as long as we keep {a_j} away from the edges of {P}). Now suppose that {i>0}, and the claims {C(i-1,j)} have already been proven. For any {i \leq j < k} and for most {a_j \in P}, we have from hypothesis that there are {\sim \delta^{i-1} |P|} progressions {a_0,\ldots,a_{k-1}} in {P} through {a_j} with {a_0 \in A_0,\ldots,a_{i-2}\in A_{i-2}}. Let {E = E(a_j)} be the set of all the values of {a_{i-1}} attained by these progressions, then {|E| \sim \delta^{i-1} |P|}. Invoking the weak mixing hypothesis, we (heuristically, at least) conclude that for most choices of {a_j}, we have

\displaystyle  |A_{i-1} \cap E| \sim \delta^i |P|

which then gives the desired claim {C(i,j)}.

The observant reader will note that we only needed the claim {C(i,j)} in the case {j=k-1} for the above argument, but for technical reasons, the full proof requires one to work with more general values of {j} (also the claim {C(i,j)} needs to be replaced by a more complicated version of itself, but let’s ignore this for sake of discussion).

We now return to the question of how to justify the weak mixing hypothesis (2). For a single block {A_i} of {A}, one can easily concoct a scenario in which this hypothesis fails, by choosing {E} to overlap with {A_i} too strongly, or to be too disjoint from {A_i}. However, one can do better if one can select {A_i} from a long progression of blocks. The starting point is the following simple double counting observation that gives the right upper bound:

Proposition 2 (Single upper bound) Let {P, P+r, \ldots, P+(M-1)r} be a progression of progressions {P} for some large {M}. Suppose that for each {i=0,\ldots,M-1}, the set {A_i := \{ a \in P: a+ir \in A \}} has density {\approx \delta} in {P} (i.e. (1) holds). Let {E} be a subset of {P} of density {\approx \sigma}. Then (if {M} is large enough) one can find an {i = 0,\ldots,M-1} such that

\displaystyle  |A_i \cap E| \lessapprox \delta \sigma |P|.

Proof: The key is the double counting identity

\displaystyle  \sum_{i=0}^{M-1} |A_i \cap E| = \sum_{a \in E} |A \cap \{ a, a+r, \ldots, a+(M-1) r\}|.

Because {A} has maximal density {\delta} and {M} is large, we have

\displaystyle  |A \cap \{ a, a+r, \ldots, a+(M-1) r\}| \lessapprox \delta M

for each {a}, and thus

\displaystyle  \sum_{i=0}^{M-1} |A_i \cap E| \lessapprox \delta M |E|.

The claim then follows from the pigeonhole principle. \Box

Now suppose we want to obtain weak mixing not just for a single set {E}, but for a small number {E_1,\ldots,E_m} of such sets, i.e. we wish to find an {i} for which

\displaystyle  |A_i \cap E_j| \lessapprox \delta \sigma_j |P|. \ \ \ \ \ (3)

for all {j=1,\ldots,m}, where {\sigma_j} is the density of {E_j} in {P}. The above proposition gives, for each {j}, a choice of {i} for which (3) holds, but it could be a different {i} for each {j}, and so it is not immediately obvious how to use Proposition 2 to find an {i} for which (3) holds simultaneously for all {j}. However, it turns out that the van der Waerden theorem is the perfect tool for this amplification:

Proposition 3 (Multiple upper bound) Let {P, P+r, \ldots, P+(M-1)r} be a progression of progressions {P+ir} for some large {M}. Suppose that for each {i=0,\ldots,M-1}, the set {A_i := \{ a \in P: a+ir \in A \}} has density {\approx \delta} in {P} (i.e. (1) holds). For each {1 \leq j \leq m}, let {E_j} be a subset of {P} of density {\approx \sigma_j}. Then (if {M} is large enough depending on {j}) one can find an {i = 0,\ldots,M-1} such that

\displaystyle  |A_i \cap E_j| \lessapprox \delta \sigma_j |P|

simultaneously for all {1 \leq j \leq m}.

Proof: Suppose that the claim failed (for some suitably large {M}). Then, for each {i = 0,\ldots,M-1}, there exists {j \in \{1,\ldots,m\}} such that

\displaystyle  |A_i \cap E_j| \gg \delta \sigma_j |P|.

This can be viewed as a colouring of the interval {\{1,\ldots,M\}} by {m} colours. If we take {M} large compared to {m}, van der Waerden’s theorem allows us to then find a long subprogression of {\{1,\ldots,M\}} which is monochromatic, so that {j} is constant on this progression. But then this will furnish a counterexample to Proposition 2. \Box

One nice thing about this proposition is that the upper bounds can be automatically upgraded to an asymptotic:

Proposition 4 (Multiple mixing) Let {P, P+r, \ldots, P+(M-1)r} be a progression of progressions {P+ir} for some large {M}. Suppose that for each {i=0,\ldots,M-1}, the set {A_i := \{ a \in P: a+ir \in A \}} has density {\approx \delta} in {P} (i.e. (1) holds). For each {1 \leq j \leq m}, let {E_j} be a subset of {P} of density {\approx \sigma_j}. Then (if {M} is large enough depending on {m}) one can find an {i = 0,\ldots,M-1} such that

\displaystyle  |A_i \cap E_j| \approx \delta \sigma_j |P|

simultaneously for all {1 \leq j \leq m}.

Proof: By applying the previous proposition to the collection of sets {E_1,\ldots,E_m} and their complements {P\backslash E_1,\ldots,P \backslash E_m} (thus replacing {m} with {2m}, one can find an {i} for which

\displaystyle  |A_i \cap E_j| \lessapprox \delta \sigma_j |P|

and

\displaystyle  |A_i \cap (P \backslash E_j)| \lessapprox \delta (1-\sigma_j) |P|

which gives the claim. \Box

However, this improvement of Proposition 2 turns out to not be strong enough for applications. The reason is that the number {m} of sets {E_1,\ldots,E_m} for which mixing is established is too small compared with the length {M} of the progression one has to use in order to obtain that mixing. However, thanks to the magic of the Szemerédi regularity lemma, one can amplify the above proposition even further, to allow for a huge number of {E_i} to be mixed (at the cost of excluding a small fraction of exceptions):

Proposition 5 (Really multiple mixing) Let {P, P+r, \ldots, P+(M-1)r} be a progression of progressions {P+ir} for some large {M}. Suppose that for each {i=0,\ldots,M-1}, the set {A_i := \{ a \in P: a+ir \in A \}} has density {\approx \delta} in {P} (i.e. (1) holds). For each {v} in some (large) finite set {V}, let {E_v} be a subset of {P} of density {\approx \sigma_v}. Then (if {M} is large enough, but not dependent on the size of {V}) one can find an {i = 0,\ldots,M-1} such that

\displaystyle  |A_i \cap E_v| \approx \delta \sigma_v |P|

simultaneously for almost all {v \in V}.

Proof: We build a bipartite graph {G = (P, V, E)} connecting the progression {P} to the finite set {V} by placing an edge {(a,v)} between an element {a \in P} and an element {v \in V} whenever {a \in E_v}. The number {|E_v| \approx \sigma_v |P|} can then be interpreted as the degree of {v} in this graph, while the number {|A_i \cap E_v|} is the number of neighbours of {v} that land in {A_i}.

We now apply the regularity lemma to this graph {G}. Roughly speaking, what this lemma does is to partition {P} and {V} into almost equally sized cells {P = P_1 \cup \ldots P_m} and {V = V_1 \cup \ldots V_m} such that for most pairs {P_j, V_k} of cells, the graph {G} resembles a random bipartite graph of some density {d_{jk}} between these two cells. The key point is that the number {m} of cells here is bounded uniformly in the size of {P} and {V}. As a consequence of this lemma, one can show that for most vertices {v} in a typical cell {V_k}, the number {|E_v|} is approximately equal to

\displaystyle  |E_v| \approx \sum_{j=1}^m d_{ij} |P_j|

and the number {|A_i \cap E_v|} is approximately equal to

\displaystyle  |A_i \cap E_v| \approx \sum_{j=1}^m d_{ij} |A_i \cap P_j|.

The point here is that the {|V|} different statistics {|A_i \cap E_v|} are now controlled by a mere {m} statistics {|A_i \cap P_j|} (this is not unlike the use of principal component analysis in statistics, incidentally, but that is another story). Now, we invoke Proposition 4 to find an {i} for which

\displaystyle  |A_i \cap P_j| \approx \delta |P_j|

simultaneously for all {j=1,\ldots,m}, and the claim follows. \Box

This proposition now suggests a way forward to establish the type of mixing properties (2) needed for the preceding attempt at proving Szemerédi’s theorem to actually work. Whereas in that attempt, we were working with a single progression of progressions {P, P+r, \ldots, P+(k-1)r} of progressions containing a near-maximal density of elements of {A}, we will now have to work with a family {(P_\lambda, P_\lambda+r_\lambda,\ldots,P_\lambda+(k-1)r_\lambda)_{\lambda \in \Lambda}} of such progression of progressions, where {\Lambda} ranges over some suitably large parameter set. Furthermore, in order to invoke Proposition 5, this family must be “well-arranged” in some arithmetic sense; in particular, for a given {i}, it should be possible to find many reasonably large subfamilies of this family for which the {i^{th}} terms {P_\lambda + i r_\lambda} of the progression of progressions in this subfamily are themselves in arithmetic progression. (Also, for technical reasons having to do with the fact that the sets {E_v} in Proposition 5 are not allowed to depend on {i}, one also needs the progressions {P_\lambda + i' r_\lambda} for any given {0 \leq i' < i} to be “similar” in the sense that they intersect {A} in the same fashion (thus the sets {A \cap (P_\lambda + i' r_\lambda)} as {\lambda} varies need to be translates of each other).) If one has this sort of family, then Proposition 5 allows us to “spend” some of the degrees of freedom of the parameter set {\Lambda} in order to gain good mixing properties for at least one of the sets {P_\lambda +i r_\lambda} in the progression of progressions.

Of course, we still have to figure out how to get such large families of well-arranged progressions of progressions. Szemerédi’s solution was to begin by working with generalised progressions of a much larger rank {d} than the rank {2} progressions considered here; roughly speaking, to prove Szemerédi’s theorem for length {k} progressions, one has to consider generalised progressions of rank as high as {2^k+1}. It is possible by a reasonably straightforward (though somewhat delicate) “density increment argument” to locate a huge generalised progression of this rank which is “saturated” by {A} in a certain rather technical sense (related to the concept of “near maximal density” used previously). Then, by another reasonably elementary argument, it is possible to locate inside a suitable large generalised progression of some rank {d}, a family of large generalised progressions of rank {d-1} which inherit many of the good properties of the original generalised progression, and which have the arithmetic structure needed for Proposition 5 to be applicable, at least for one value of {i}. (But getting this sort of property for all values of {i} simultaneously is tricky, and requires many careful iterations of the above scheme; there is also the problem that by obtaining good behaviour for one index {i}, one may lose good behaviour at previous indices, leading to a sort of “Tower of Hanoi” situation which may help explain the exponential factor in the rank {2^k+1} that is ultimately needed. It is an extremely delicate argument; all the parameters and definitions have to be set very precisely in order for the argument to work at all, and it is really quite remarkable that Endre was able to see it through to the end.)

This is an addendum to last quarter’s course notes on Hilbert’s fifth problem, which I am in the process of reviewing in order to transcribe them into a book (as was done similarly for several other sets of lecture notes on this blog). When reviewing the zeroth set of notes in particular, I found that I had made a claim (Proposition 11 from those notes) which asserted, roughly speaking, that any sufficiently large nilprogression was an approximate group, and promised to prove it later in the course when we had developed the ability to calculate efficiently in nilpotent groups. As it turned out, I managed finish the course without the need to develop these calculations, and so the proposition remained unproven. In order to rectify this, I will use this post to lay out some of the basic algebra of nilpotent groups, and use it to prove the above proposition, which turns out to be a bit tricky. (In my paper with Breuillard and Green, we avoid the need for this proposition by restricting attention to a special type of nilprogression, which we call a nilprogression in {C}-normal form, for which the computations are simpler.)

There are several ways to think about nilpotent groups; for instance one can use the model example of the Heisenberg group

\displaystyle  H(R) :=\begin{pmatrix} 1 & R & R \\ 0 & 1 & R\\ 0 & 0 & 1 \end{pmatrix}

over an arbitrary ring {R} (which need not be commutative), or more generally any matrix group consisting of unipotent upper triangular matrices, and view a general nilpotent group as being an abstract generalisation of such concrete groups. (In the case of nilpotent Lie groups, at least, this is quite an accurate intuition, thanks to Engel’s theorem.) Or, one can adopt a Lie-theoretic viewpoint and try to think of nilpotent groups as somehow arising from nilpotent Lie algebras; this intuition is rigorous when working with nilpotent Lie groups (at least when the characteristic is large, in order to avoid issues coming from the denominators in the Baker-Campbell-Hausdorff formula), but also retains some conceptual value in the non-Lie setting. In particular, nilpotent groups (particularly finitely generated ones) can be viewed in some sense as “nilpotent Lie groups over {{\bf Z}}“, even though Lie theory does not quite work perfectly when the underlying scalars merely form an integral domain instead of a field.

Another point of view, which arises naturally both in analysis and in algebraic geometry, is to view nilpotent groups as modeling “infinitesimal” perturbations of the identity, where the infinitesimals have a certain finite order. For instance, given a (not necessarily commutative) ring {R} without identity (representing all the “small” elements of some larger ring or algebra), we can form the powers {R^j} for {j=1,2,\ldots}, defined as the ring generated by {j}-fold products {r_1 \ldots r_j} of elements {r_1,\ldots,r_j} in {R}; this is an ideal of {R} which represents the elements which are “{j^{th}} order” in some sense. If one then formally adjoins an identity {1} onto the ring {R}, then for any {s \geq 1}, the multiplicative group {G := 1+R \hbox{ mod } R^{s+1}} is a nilpotent group of step at most {s}. For instance, if {R} is the ring of strictly upper {s \times s} matrices (over some base ring), then {R^{s+1}} vanishes and {G} becomes the group of unipotent upper triangular matrices over the same ring, thus recovering the previous matrix-based example. In analysis applications, {R} might be a ring of operators which are somehow of “order” {O(\epsilon)} or {O(\hbar)} for some small parameter {\epsilon} or {\hbar}, and one wishes to perform Taylor expansions up to order {O(\epsilon^s)} or {O(\hbar^s)}, thus discarding (i.e. quotienting out) all errors in {R^{s+1}}.

From a dynamical or group-theoretic perspective, one can also view nilpotent groups as towers of central extensions of a trivial group. Finitely generated nilpotent groups can also be profitably viewed as a special type of polycylic group; this is the perspective taken in this previous blog post. Last, but not least, one can view nilpotent groups from a combinatorial group theory perspective, as being words from some set of generators of various “degrees” subject to some commutation relations, with commutators of two low-degree generators being expressed in terms of higher degree objects, and all commutators of a sufficiently high degree vanishing. In particular, generators of a given degree can be moved freely around a word, as long as one is willing to generate commutator errors of higher degree.

With this last perspective, in particular, one can start computing in nilpotent groups by adopting the philosophy that the lowest order terms should be attended to first, without much initial concern for the higher order errors generated in the process of organising the lower order terms. Only after the lower order terms are in place should attention then turn to higher order terms, working successively up the hierarchy of degrees until all terms are dealt with. This turns out to be a relatively straightforward philosophy to implement in many cases (particularly if one is not interested in explicit expressions and constants, being content instead with qualitative expansions of controlled complexity), but the arguments are necessarily recursive in nature and as such can become a bit messy, and require a fair amount of notation to express precisely. So, unfortunately, the arguments here will be somewhat cumbersome and notation-heavy, even if the underlying methods of proof are relatively simple.

Read the rest of this entry »

In this final set of course notes, we discuss how (a generalisation of) the expansion results obtained in the preceding notes can be used for some nnumber-theoretic applications, and in particular to locate almost primes inside orbits of thin groups, following the work of Bourgain, Gamburd, and Sarnak. We will not attempt here to obtain the sharpest or most general results in this direction, but instead focus on the simplest instances of these results which are still illustrative of the ideas involved.

One of the basic general problems in analytic number theory is to locate tuples of primes of a certain form; for instance, the famous (and still unsolved) twin prime conjecture asserts that there are infinitely many pairs {(n_1,n_2)} in the line {\{ (n_1,n_2) \in {\bf Z}^2: n_2-n_1=2\}} in which both entries are prime. In a similar spirit, one of the Landau conjectures (also still unsolved) asserts that there are infinitely many primes in the set {\{ n^2+1: n \in {\bf Z} \}}. The Mersenne prime conjecture (also unsolved) asserts that there are infinitely many primes in the set {\{ 2^n - 1: n \in {\bf Z} \}}, and so forth.

More generally, given some explicit subset {V} in {{\bf R}^d} (or {{\bf C}^d}, if one wishes), such as an algebraic variety, one can ask the question of whether there are infinitely many integer lattice points {(n_1,\ldots,n_d)} in {V \cap {\bf Z}^d} in which all the coefficients {n_1,\ldots,n_d} are simultaneously prime; let us refer to such points as prime points.

At this level of generality, this problem is impossibly difficult. Indeed, even the much simpler problem of deciding whether the set {V \cap {\bf Z}^d} is non-empty (let alone containing prime points) when {V} is a hypersurface {\{ x \in {\bf R}^d: P(x) = 0 \}} cut out by a polynomial {P} is essentially Hilbert’s tenth problem, which is known to be undecidable in general by Matiyasevich’s theorem. So one needs to restrict attention to a more special class of sets {V}, in which the question of finding integer points is not so difficult. One model case is to consider orbits {V = \Gamma b}, where {b \in {\bf Z}^d} is a fixed lattice vector and {\Gamma} is some discrete group that acts on {{\bf Z}^d} somehow (e.g. {\Gamma} might be embedded as a subgroup of the special linear group {SL_d({\bf Z})}, or on the affine group {SL_d({\bf Z}) \ltimes {\bf Z}^d}). In such a situation it is then quite easy to show that {V = V \cap {\bf Z}^d} is large; for instance, {V} will be infinite precisely when the stabiliser of {b} in {\Gamma} has infinite index in {\Gamma}.

Even in this simpler setting, the question of determining whether an orbit {V = \Gamma b} contains infinitely prime points is still extremely difficult; indeed the three examples given above of the twin prime conjecture, Landau conjecture, and Mersenne prime conjecture are essentially of this form (possibly after some slight modification of the underlying ring {{\bf Z}}, see this paper of Bourgain-Gamburd-Sarnak for details), and are all unsolved (and generally considered well out of reach of current technology). Indeed, the list of non-trivial orbits {V = \Gamma b} which are known to contain infinitely many prime points is quite slim; Euclid’s theorem on the infinitude of primes handles the case {V = {\bf Z}}, Dirichlet’s theorem handles infinite arithmetic progressions {V = a{\bf Z} + r}, and a somewhat complicated result of Green, Tao, and Ziegler handles “non-degenerate” affine lattices in {{\bf Z}^d} of rank two or more (such as the lattice of length {d} arithmetic progressions), but there are few other positive results known that are not based on the above cases (though we will note the remarkable theorem of Friedlander and Iwaniec that there are infinitely many primes of the form {a^2+b^4}, and the related result of Heath-Brown that there are infinitely many primes of the form {a^3+2b^3}, as being in a kindred spirit to the above results, though they are not explicitly associated to an orbit of a reasonable action as far as I know).

On the other hand, much more is known if one is willing to replace the primes by the larger set of almost primes – integers with a small number of prime factors (counting multiplicity). Specifically, for any {r \geq 1}, let us call an {r}-almost prime an integer which is the product of at most {r} primes, and possibly by the unit {-1} as well. Many of the above sorts of questions which are open for primes, are known for {r}-almost primes for {r} sufficiently large. For instance, with regards to the twin prime conjecture, it is a result of Chen that there are infinitely many pairs {p,p+2} where {p} is a prime and {p+2} is a {2}-almost prime; in a similar vein, it is a result of Iwaniec that there are infinitely many {2}-almost primes of the form {n^2+1}. On the other hand, it is still open for any fixed {r} whether there are infinitely many Mersenne numbers {2^n-1} which are {r}-almost primes. (For the superficially similar situation with the numbers {2^n+1}, it is in fact believed (but again unproven) that there are only finitely many {r}-almost primes for any fixed {r} (the Fermat prime conjecture).)

The main tool that allows one to count almost primes in orbits is sieve theory. The reason for this lies in the simple observation that in order to ensure that an integer {n} of magnitude at most {x} is an {r}-almost prime, it suffices to guarantee that {n} is not divisible by any prime less than {x^{1/(r+1)}}. Thus, to create {r}-almost primes, one can start with the integers up to some large threshold {x} and remove (or “sieve out”) all the integers that are multiples of any prime {p} less than {x^{1/(r+1)}}. The difficulty is then to ensure that a sufficiently non-trivial quantity of integers remain after this process, for the purposes of finding points in the given set {V}.

The most basic sieve of this form is the sieve of Eratosthenes, which when combined with the inclusion-exclusion principle gives the Legendre sieve (or exact sieve), which gives an exact formula for quantities such as the number {\pi(x,z)} of natural numbers less than or equal to {x} that are not divisible by any prime less than or equal to a given threshold {z}. Unfortunately, when one tries to evaluate this formula, one encounters error terms which grow exponentially in {z}, rendering this sieve useful only for very small thresholds {z} (of logarithmic size in {x}). To improve the sieve level up to a small power of {x} such as {x^{1/(r+1)}}, one has to replace the exact sieve by upper bound sieves and lower bound sieves which only seek to obtain upper or lower bounds on quantities such as {\pi(x,z)}, but contain a polynomial number of terms rather than an exponential number. There are a variety of such sieves, with the two most common such sieves being combinatorial sieves (such as the beta sieve), based on various combinatorial truncations of the inclusion-exclusion formula, and the Selberg upper bound sieve, based on upper bounds that are the square of a divisor sum. (There is also the large sieve, which is somewhat different in nature and based on {L^2} almost orthogonality considerations, rather than on any actual sieving, to obtain upper bounds.) We will primarily work with a specific sieve in this notes, namely the beta sieve, and we will not attempt to optimise all the parameters of this sieve (which ultimately means that the almost primality parameter {r} in our results will be somewhat large). For a more detailed study of sieve theory, see the classic text of Halberstam and Richert, or the more recent texts of Iwaniec-Kowalski or of Friedlander-Iwaniec.

Very roughly speaking, the end result of sieve theory is that excepting some degenerate and “exponentially thin” settings (such as those associated with the Mersenne primes), all the orbits which are expected to have a large number of primes, can be proven to at least have a large number of {r}-almost primes for some finite {r}. (Unfortunately, there is a major obstruction, known as the parity problem, which prevents sieve theory from lowering {r} all the way to {1}; see this blog post for more discussion.) One formulation of this principle was established by Bourgain, Gamburd, and Sarnak:

Theorem 1 (Bourgain-Gamburd-Sarnak) Let {\Gamma} be a subgroup of {SL_2({\bf Z})} which is not virtually solvable. Let {f: {\bf Z}^4 \rightarrow {\bf Z}} be a polynomial with integer coefficients obeying the following primitivity condition: for any positive integer {q}, there exists {A \in \Gamma \subset {\bf Z}^4} such that {f(A)} is coprime to {q}. Then there exists an {r \geq 1} such that there are infinitely many {A \in \Gamma} with {f(A)} non-zero and {r}-almost prime.

This is not the strongest version of the Bourgain-Gamburd-Sarnak theorem, but it captures the general flavour of their results. Note that the theorem immediately implies an analogous result for orbits {\Gamma b \subset {\bf Z}^2}, in which {f} is now a polynomial from {{\bf Z}^2} to {{\bf Z}}, and one uses {f(Ab)} instead of {f(A)}. It is in fact conjectured that one can set {r=1} here, but this is well beyond current technology. For the purpose of reaching {r=1}, it is very natural to impose the primitivity condition, but as long as one is content with larger values of {r}, it is possible to relax the primitivity condition somewhat; see the paper of Bourgain, Gamburd, and Sarnak for more discussion.

By specialising to the polynomial {f: \begin{pmatrix} a & b \\ c & d \end{pmatrix} \rightarrow abcd}, we conclude as a corollary that as long as {\Gamma} is primitive in the sense that it contains matrices with all coefficients coprime to {q} for any given {q}, then {\Gamma} contains infinitely many matrices whose elements are all {r}-almost primes for some {r} depending only on {\Gamma}. For further applications of these sorts of results, for instance to Appolonian packings, see the paper of Bourgain, Gamburd, and Sarnak.

It turns out that to prove Theorem 1, the Cayley expansion results in {SL_2(F_p)} from the previous set of notes are not quite enough; one needs a more general Cayley expansion result in {SL_2({\bf Z}/q{\bf Z})} where {q} is square-free but not necessarily prime. The proof of this expansion result uses the same basic methods as in the {SL_2(F_p)} case, but is significantly more complicated technically, and we will only discuss it briefly here. As such, we do not give a complete proof of Theorem 1, but hopefully the portion of the argument presented here is still sufficient to give an impression of the ideas involved.

Read the rest of this entry »

In the previous set of notes, we saw that one could derive expansion of Cayley graphs from three ingredients: non-concentration, product theorems, and quasirandomness. Quasirandomness was discussed in Notes 3. In the current set of notes, we discuss product theorems. Roughly speaking, these theorems assert that in certain circumstances, a finite subset {A} of a group {G} either exhibits expansion (in the sense that {A^3}, say, is significantly larger than {A}), or is somehow “close to” or “trapped” by a genuine group.

Theorem 1 (Product theorem in {SL_d(k)}) Let {d \geq 2}, let {k} be a finite field, and let {A} be a finite subset of {G := SL_d(k)}. Let {\epsilon >0} be sufficiently small depending on {d}. Then at least one of the following statements holds:

  • (Expansion) One has {|A^3| \geq |A|^{1+\epsilon}}.
  • (Close to {G}) One has {|A| \geq |G|^{1-O_d(\epsilon)}}.
  • (Trapping) {A} is contained in a proper subgroup of {G}.

We will prove this theorem (which was proven first in the {d=2,3} cases for fields {F} of prime order by Helfgott, and then for {d=2} and general {F} by Dinai, and finally to general {d} and {F} independently by Pyber-Szabo and by Breuillard-Green-Tao) later in this notes. A more qualitative version of this proposition was also previously obtained by Hrushovski. There are also generalisations of the product theorem of importance to number theory, in which the field {k} is replaced by a cyclic ring {{\bf Z}/q{\bf Z}} (with {q} not necessarily prime); this was achieved first for {d=2} and {q} square-free by Bourgain, Gamburd, and Sarnak, by Varju for general {d} and {q} square-free, and finally by this paper of Bourgain and Varju for arbitrary {d} and {q}.

Exercise 1 (Girth bound) Assuming Theorem 1, show that whenever {S} is a symmetric set of generators of {SL_d(k)} for some finite field {k} and some {d\geq 2}, then any element of {SL_d(k)} can be expressed as the product of {O_d( \log^{O_d(1)} |k| )} elements from {S}. (Equivalently, if we add the identity element to {S}, then {S^m = SL_d(k)} for some {m = O_d( \log^{O_d(1)} |k| )}.) This is a special case of a conjecture of Babai and Seress, who conjectured that the bound should hold uniformly for all finite simple groups (in particular, the implied constants here should not actually depend on {d}. The methods used to handle the {SL_d} case can handle other finite groups of Lie type of bounded rank, but at present we do not have bounds that are independent of the rank. On the other hand, a recent paper of Helfgott and Seress has almost resolved the conjecture for the permutation groups {A_n}.

A key tool to establish product theorems is an argument which is sometimes referred to as the pivot argument. To illustrate this argument, let us first discuss a much simpler (and older) theorem, essentially due to Freiman, which has a much weaker conclusion but is valid in any group {G}:

Theorem 2 (Baby product theorem) Let {G} be a group, and let {A} be a finite non-empty subset of {G}. Then one of the following statements hold:

  • (Expansion) One has {|A^{-1} A| \geq \frac{3}{2} |A|}.
  • (Close to a subgroup) {A} is contained in a left-coset of a group {H} with {|H| < \frac{3}{2} |A|}.

To prove this theorem, we suppose that the first conclusion does not hold, thus {|A^{-1} A| <\frac{3}{2} |A|}. Our task is then to place {A} inside the left-coset of a fairly small group {H}.

To do this, we take a group element {g \in G}, and consider the intersection {A\cap gA}. A priori, the size of this set could range from anywhere from {0} to {|A|}. However, we can use the hypothesis {|A^{-1} A| < \frac{3}{2} |A|} to obtain an important dichotomy, reminiscent of the classical fact that two cosets {gH, hH} of a subgroup {H} of {G} are either identical or disjoint:

Proposition 3 (Dichotomy) If {g \in G}, then exactly one of the following occurs:

  • (Non-involved case) {A \cap gA} is empty.
  • (Involved case) {|A \cap gA| > \frac{|A|}{2}}.

Proof: Suppose we are not in the pivot case, so that {A \cap gA} is non-empty. Let {a} be an element of {A \cap gA}, then {a} and {g^{-1} a} both lie in {A}. The sets {A^{-1} a} and {A^{-1} g^{-1} a} then both lie in {A^{-1} A}. As these sets have cardinality {|A|} and lie in {A^{-1}A}, which has cardinality less than {\frac{3}{2}|A|}, we conclude from the inclusion-exclusion formula that

\displaystyle |A^{-1} a \cap A^{-1} g^{-1} a| > \frac{|A|}{2}.

But the left-hand side is equal to {|A \cap gA|}, and the claim follows. \Box

The above proposition provides a clear separation between two types of elements {g \in G}: the “non-involved” elements, which have nothing to do with {A} (in the sense that {A \cap gA = \emptyset}, and the “involved” elements, which have a lot to do with {A} (in the sense that {|A \cap gA| > |A|/2}. The key point is that there is a significant “gap” between the non-involved and involved elements; there are no elements that are only “slightly involved”, in that {A} and {gA} intersect a little but not a lot. It is this gap that will allow us to upgrade approximate structure to exact structure. Namely,

Proposition 4 The set {H} of involved elements is a finite group, and is equal to {A A^{-1}}.

Proof: It is clear that the identity element {1} is involved, and that if {g} is involved then so is {g^{-1}} (since {A \cap g^{-1} A = g^{-1}(A \cap gA)}. Now suppose that {g, h} are both involved. Then {A \cap gA} and {A\cap hA} have cardinality greater than {|A|/2} and are both subsets of {A}, and so have non-empty intersection. In particular, {gA \cap hA} is non-empty, and so {A \cap g^{-1} hA} is non-empty. By Proposition 3, this makes {g^{-1} h} involved. It is then clear that {H} is a group.

If {g \in A A^{-1}}, then {A \cap gA} is non-empty, and so from Proposition 3 {g} is involved. Conversely, if {g} is involved, then {g \in A A^{-1}}. Thus we have {H = A A^{-1}} as claimed. In particular, {H} is finite. \Box

Now we can quickly wrap up the proof of Theorem 2. By construction, {A \cap gA| > |A|/2} for all {g \in H},which by double counting shows that {|H| < 2|A|}. As {H = A A^{-1}}, we see that {A} is contained in a right coset {Hg} of {H}; setting {H' := g^{-1} H g}, we conclude that {A} is contained in a left coset {gH'} of {H'}. {H'} is a conjugate of {H}, and so {|H'| < 2|A|}. If {h \in H'}, then {A} and {Ah} both lie in {H'} and have cardinality {|A|}, so must overlap; and so {h \in A A^{-1}}. Thus {A A^{-1} = H'}, and so {|H'| < \frac{3}{2} |A|}, and Theorem 2 follows.

Exercise 2 Show that the constant {3/2} in Theorem 2 cannot be replaced by any larger constant.

Exercise 3 Let {A \subset G} be a finite non-empty set such that {|A^2| < 2|A|}. Show that {AA^{-1}=A^{-1} A}. (Hint: If {ab^{-1} \in A A^{-1}}, show that {ab^{-1} = c^{-1} d} for some {c,d \in A}.)

Exercise 4 Let {A \subset G} be a finite non-empty set such that {|A^2| < \frac{3}{2} |A|}. Show that there is a finite group {H} with {|H| < \frac{3}{2} |A|} and a group element {g \in G} such that {A \subset Hg \cap gH} and {H = A A^{-1}}.

Below the fold, we give further examples of the pivot argument in other group-like situations, including Theorem 2 and also the “sum-product theorem” of Bourgain-Katz-Tao and Bourgain-Glibichuk-Konyagin.

Read the rest of this entry »

We have now seen two ways to construct expander Cayley graphs {Cay(G,S)}. The first, discussed in Notes 2, is to use Cayley graphs that are projections of an infinite Cayley graph on a group with Kazhdan’s property (T). The second, discussed in Notes 3, is to combine a quasirandomness property of the group {G} with a flattening hypothesis for the random walk.

We now pursue the second approach more thoroughly. The main difficulty here is to figure out how to ensure flattening of the random walk, as it is then an easy matter to use quasirandomness to show that the random walk becomes mixing soon after it becomes flat. In the case of Selberg’s theorem, we achieved this through an explicit formula for the heat kernel on the hyperbolic plane (which is a proxy for the random walk). However, in most situations such an explicit formula is not available, and one must develop some other tool for forcing flattening, and specifically an estimate of the form

\displaystyle  \| \mu^{(n)} \|_{\ell^2(G)} \ll |G|^{-1/2+\epsilon} \ \ \ \ \ (1)

for some {n = O(\log |G|)}, where {\mu} is the uniform probability measure on the generating set {S}.

In 2006, Bourgain and Gamburd introduced a general method for achieving this goal. The intuition here is that the main obstruction that prevents a random walk from spreading out to become flat over the entire group {G} is if the random walk gets trapped in some proper subgroup {H} of {G} (or perhaps in some coset {xH} of such a subgroup), so that {\mu^{(n)}(xH)} remains large for some moderately large {n}. Note that

\displaystyle  \mu^{(2n)}(H) \geq \mu^{(n)}(H x^{-1}) \mu^{(n)}(xH) = \mu^{(n)}(xH)^2,

since {\mu^{(2n)} = \mu^{(n)} * \mu^{(n)}}, {H = (H x^{-1}) \cdot (xH)}, and {\mu^{(n)}} is symmetric. By iterating this observation, we seethat if {\mu^{(n)}(xH)} is too large (e.g. of size {|G|^{-o(1)}} for some {n} comparable to {\log |G|}), then it is not possible for the random walk {\mu^{(n)}} to converge to the uniform distribution in time {O(\log |G|)}, and so expansion does not occur.

A potentially more general obstruction of this type would be if the random walk gets trapped in (a coset of) an approximate group {H}. Recall that a {K}-approximate group is a subset {H} of a group {G} which is symmetric, contains the identity, and is such that {H \cdot H} can be covered by at most {K} left-translates (or equivalently, right-translates) of {H}. Such approximate groups were studied extensively in last quarter’s course. A similar argument to the one given previously shows (roughly speaking) that expansion cannot occur if {\mu^{(n)}(xH)} is too large for some coset {xH} of an approximate group.

It turns out that this latter observation has a converse: if a measure does not concentrate in cosets of approximate groups, then some flattening occurs. More precisely, one has the following combinatorial lemma:

Lemma 1 (Weighted Balog-Szemerédi-Gowers lemma) Let {G} be a group, let {\nu} be a finitely supported probability measure on {G} which is symmetric (thus {\nu(g)=\nu(g^{-1})} for all {g \in G}), and let {K \geq 1}. Then one of the following statements hold:

  • (i) (Flattening) One has {\| \nu * \nu \|_{\ell^2(G)} \leq \frac{1}{K} \|\nu\|_{\ell^2(G)}}.
  • (ii) (Concentration in an approximate group) There exists an {O(K^{O(1)})}-approximate group {H} in {G} with {|H| \ll K^{O(1)} / \| \nu \|_{\ell^2(G)}^2} and an element {x \in G} such that {\nu(xH) \gg K^{-O(1)}}.

This lemma is a variant of the more well-known Balog-Szemerédi-Gowers lemma in additive combinatorics due to Gowers (which roughly speaking corresponds to the case when {\mu} is the uniform distribution on some set {A}), which in turn is a polynomially quantitative version of an earlier lemma of Balog and Szemerédi. We will prove it below the fold.

The lemma is particularly useful when the group {G} in question enjoys a product theorem, which roughly speaking says that the only medium-sized approximate subgroups of {G} are trapped inside genuine proper subgroups of {G} (or, contrapositively, medium-sized sets that generate the entire group {G} cannot be approximate groups). The fact that some finite groups (and specifically, the bounded rank finite simple groups of Lie type) enjoy product theorems is a non-trivial fact, and will be discussed in later notes. For now, we simply observe that the presence of the product theorem, together with quasirandomness and a non-concentration hypothesis, can be used to demonstrate expansion:

Theorem 2 (Bourgain-Gamburd expansion machine) Suppose that {G} is a finite group, that {S \subseteq G} is a symmetric set of {k} generators, and that there are constants {0 < \kappa < 1 < \Lambda} with the following properties.

  1. (Quasirandomness). The smallest dimension of a nontrivial representation {\rho: G \rightarrow GL_d({\bf C})} of {G} is at least {|G|^{\kappa}};
  2. (Product theorem). For all {\delta > 0} there is some {\delta' = \delta'(\delta) > 0} such that the following is true. If {H \subseteq G} is a {|G|^{\delta'}}-approximate subgroup with {|G|^{\delta} \leq |H| \leq |G|^{1 - \delta}} then {H} generates a proper subgroup of {G};
  3. (Non-concentration estimate). There is some even number {n \leq \Lambda\log |G|} such that

    \displaystyle  \sup_{H < G}\mu^{(n)}(H) < |G|^{-\kappa},

    where the supremum is over all proper subgroups {H < G}.

Then {Cay(G,S)} is a two-sided {\epsilon}-expander for some {\epsilon > 0} depending only on {k,\kappa, \Lambda}, and the function {\delta'(\cdot )} (and this constant {\epsilon} is in principle computable in terms of these constants).

This criterion for expansion is implicitly contained in this paper of Bourgain and Gamburd, who used it to establish the expansion of various Cayley graphs in {SL_2(F_p)} for prime {p}. This criterion has since been applied (or modified) to obtain expansion results in many other groups, as will be discussed in later notes.

Read the rest of this entry »

Let {\alpha \in {\bf R}/{\bf Z}} be an element of the unit circle, let {N \geq 1}, and let {\rho > 0}. We define the (rank one) Bohr set {B_N(\alpha;\rho)} to be the set

\displaystyle B_N(\alpha;\rho) := \{ n \in {\bf Z}: -N \leq n \leq N; \|n\alpha\|_{{\bf R}/{\bf Z}} \leq \rho \}

where {\|x\|_{{\bf R}/{\bf Z}}} is the distance to the origin in the unit circle (or equivalently, the distance to the nearest integer, after lifting up to {{\bf R}}). These sets play an important role in additive combinatorics and in additive number theory. For instance, they arise naturally when applying the circle method, because Bohr sets describe the oscillation of exponential phases such as {n \mapsto e^{2\pi i n \alpha}}.

Observe that Bohr sets enjoy the doubling property

\displaystyle B_N(\alpha;\rho) + B_N(\alpha;\rho) \subset B_{2N}(\alpha;2\rho),

thus doubling the Bohr set doubles both the length parameter {N} and the radius parameter {\rho}. As such, these Bohr sets resemble two-dimensional balls (or boxes). Indeed, one can view {B_N(\alpha;\rho)} as the preimage of the two-dimensional box {[-1,1] \times [-\rho,\rho] \subset {\bf R} \times {\bf R}/{\bf Z}} under the homomorphism {n \mapsto (n/N, \alpha n \hbox{ mod } 1)}.

Another class of finite set with two-dimensional behaviour is the class of (rank two) generalised arithmetic progressions

\displaystyle P( a_1,a_2; N_1,N_2 ) := \{ n_1 a_1 + n_2 a_2: n_1,n_2 \in {\bf Z}; |n_1| \leq N_1, |n_2| \leq N_2 \}

with {a_1,a_2 \in {\bf Z}} and {N_1,N_2 > 0} Indeed, we have

\displaystyle P( a_1,a_2; N_1,N_2 ) + P( a_1,a_2; N_1,N_2 ) \subset P( a_1,a_2; 2N_1, 2N_2 )

and so we see, as with the Bohr set, that doubling the generalised arithmetic progressions doubles the two defining parameters of that progression.

More generally, there is an analogy between rank {r} Bohr sets

\displaystyle B_N(\alpha_1,\ldots,\alpha_r; \rho_1,\ldots,\rho_r) := \{ n \in {\bf Z}: -N \leq n \leq N; \|n\alpha_i\|_{{\bf R}/{\bf Z}} \leq \rho_i

\displaystyle \hbox{ for all } 1 \leq i \leq r \}

and the rank {r+1} generalised arithmetic progressions

\displaystyle P( a_1,\ldots,a_{r+1}; N_1,\ldots,N_{r+1} ) := \{ n_1 a_1 + \ldots + n_{r+1} a_{r+1}:

\displaystyle n_1,\ldots,n_{r+1} \in {\bf Z}; |n_i| \leq N_i \hbox{ for all } 1 \leq i \leq r+1 \}.

One of the aims of additive combinatorics is to formalise analogies such as the one given above. By using some arguments from the geometry of numbers, for instance, one can show that for any rank {r} Bohr set {B_N(\alpha_1,\ldots,\alpha_r;\rho_1,\ldots,\rho_r)}, there is a rank {r+1} generalised arithmetic progression {P(a_1,\ldots,a_{r+1}; N_1,\ldots,N_{r+1})} for which one has the containments

\displaystyle B_N(\alpha_1,\ldots,\alpha_r;\epsilon \rho_1,\ldots,\epsilon \rho_r) \subset P(a_1,\ldots,a_{r+1}; N_1,\ldots,N_{r+1})

\displaystyle \subset B_N(\alpha_1,\ldots,\alpha_r;\rho_1,\ldots,\rho_r)

for some explicit {\epsilon>0} depending only on {r} (in fact one can take {\epsilon = (r+1)^{-2(r+1)}}); this is (a slight modification of) Lemma 4.22 of my book with Van Vu.

In the special case when {r=1}, one can make a significantly more detailed description of the link between rank one Bohr sets and rank two generalised arithmetic progressions, by using the classical theory of continued fractions, which among other things gives a fairly precise formula for the generators {a_1,a_2} and lengths {N_1,N_2} of the generalised arithmetic progression associated to a rank one Bohr set {B_N(\alpha;\rho)}. While this connection is already implicit in the continued fraction literature (for instance, in the classic text of Hardy and Wright), I thought it would be a good exercise to work it out explicitly and write it up, which I will do below the fold.

It is unfortunate that the theory of continued fractions is restricted to the rank one setting (it relies very heavily on the total ordering of one-dimensional sets such as {{\bf Z}} or {{\bf R}}). A higher rank version of the theory could potentially help with questions such as the Littlewood conjecture, which remains open despite a substantial amount of effort and partial progress on the problem. At the end of this post I discuss how one can use the rank one theory to rephrase the Littlewood conjecture as a conjecture about a doubly indexed family of rank four progressions, which can be used to heuristically justify why this conjecture should be true, but does not otherwise seem to shed much light on the problem.

Read the rest of this entry »

In 1964, Kemperman established the following result:

Theorem 1 Let {G} be a compact connected group, with a Haar probability measure {\mu}. Let {A, B} be compact subsets of {G}. Then

\displaystyle  \mu(AB) \geq \min( \mu(A) + \mu(B), 1 ).

Remark 1 The estimate is sharp, as can be seen by considering the case when {G} is a unit circle, and {A, B} are arcs; similarly if {G} is any compact connected group that projects onto the circle. The connectedness hypothesis is essential, as can be seen by considering what happens if {A} and {B} are a non-trivial open subgroup of {G}. For locally compact connected groups which are unimodular but not compact, there is an analogous statement, but with {\mu} now a Haar measure instead of a Haar probability measure, and the right-hand side {\min(\mu(A)+\mu(B),1)} replaced simply by {\mu(A)+\mu(B)}. The case when {G} is a torus is due to Macbeath, and the case when {G} is a circle is due to Raikov. The theorem is closely related to the Cauchy-Davenport inequality; indeed, it is not difficult to use that inequality to establish the circle case, and the circle case can be used to deduce the torus case by considering increasingly dense circle subgroups of the torus (alternatively, one can also use Kneser’s theorem).

By inner regularity, the hypothesis that {A,B} are compact can be replaced with Borel measurability, so long as one then adds the additional hypothesis that {A+B} is also Borel measurable.

A short proof of Kemperman’s theorem was given by Ruzsa. In this post I wanted to record how this argument can be used to establish the following more “robust” version of Kemperman’s theorem, which not only lower bounds {AB}, but gives many elements of {AB} some multiplicity:

Theorem 2 Let {G} be a compact connected group, with a Haar probability measure {\mu}. Let {A, B} be compact subsets of {G}. Then for any {0 \leq t \leq \min(\mu(A),\mu(B))}, one has

\displaystyle  \int_G \min(1_A*1_B, t)\ d\mu \geq t \min(\mu(A)+\mu(B) - t,1). \ \ \ \ \ (1)

Indeed, Theorem 1 can be deduced from Theorem 2 by dividing (1) by {t} and then taking limits as {t \rightarrow 0}. The bound in (1) is sharp, as can again be seen by considering the case when {A,B} are arcs in a circle. The analogous claim for cyclic groups for prime order was established by Pollard, and for general abelian groups by Green and Ruzsa.

Let us now prove Theorem 2. It uses a submodularity argument related to one discussed in this previous post. We fix {B} and {t} with {0 \leq t \leq \mu(B)}, and define the quantity

\displaystyle  c(A) := \int_G \min(1_A*1_B, t)\ d\mu - t (\mu(A)+\mu(B)-t).

for any compact set {A}. Our task is to establish that {c(A) \geq 0} whenever {t \leq \mu(A) \leq 1-\mu(B)+t}.

We first verify the extreme cases. If {\mu(A) = t}, then {1_A*1_B \leq t}, and so {c(A)=0} in this case. At the other extreme, if {\mu(A) = 1-\mu(B)+t}, then from the inclusion-exclusion principle we see that {1_A * 1_B \geq t}, and so again {c(A)=0} in this case (since {\int_G 1_A*1_B = \mu(A)\mu(B) = t \mu(B)}).

To handle the intermediate regime when {\mu(A)} lies between {t} and {1-\mu(B)+t}, we rely on the submodularity inequality

\displaystyle  c(A_1) + c(A_2) \geq c(A_1 \cap A_2) + c(A_1 \cup A_2) \ \ \ \ \ (2)

for arbitrary compact {A_1,A_2}. This inequality comes from the obvious pointwise identity

\displaystyle  1_{A_1} + 1_{A_2} = 1_{A_1 \cap A_2} + 1_{A_1 \cup A_2}

whence

\displaystyle  1_{A_1}*1_B + 1_{A_2}*1_B = 1_{A_1 \cap A_2}*1_B + 1_{A_1 \cup A_2}*1_B

and thus (noting that the quantities on the left are closer to each other than the quantities on the right)

\displaystyle  \min(1_{A_1}*1_B,t) + \min(1_{A_2}*1_B,t)

\displaystyle  \geq \min(1_{A_1 \cap A_2}*1_B,t) + \min(1_{A_1 \cup A_2}*1_B,t)

at which point (2) follows by integrating over {G} and then using the inclusion-exclusion principle.

Now introduce the function

\displaystyle  f(a) := \inf \{ c(A) : \mu(A) = a \}

for {t \leq a \leq 1-\mu(B)+t}. From the preceding discussion {f(a)} vanishes at the endpoints {a = t, 1-\mu(B)+t}; our task is to show that {f(a)} is non-negative in the interior region {t < a < 1-\mu(B)+t}. Suppose for contradiction that this was not the case. It is easy to see that {f} is continuous (indeed, it is even Lipschitz continuous), so there must be {t < a < 1-\mu(B)+t} at which {f} is a local minimum and not locally constant. In particular, {0 <a <1}. But for any {A} with {\mu(A) = a}, we have the translation-invariance

\displaystyle  c(gA) = c(A) \ \ \ \ \ (3)

for any {g \in G}, and hence by (2)

\displaystyle  c(A) \geq \frac{1}{2} c(A \cap gA) + \frac{1}{2} c(A \cup gA ).

Note that {\mu(A \cap gA)} depends continuously on {g}, equals {a} when {g} is the identity, and has an average value of {a^2}. As {G} is connected, we thus see from the intermediate value theorem that for any {0 < \epsilon < a-a^2}, we can find {g} such that {\mu(A \cap gA) = a-\epsilon}, and thus by inclusion-exclusion {\mu(A \cup gA) = a+\epsilon}. By definition of {f}, we thus have

\displaystyle  c(A) \geq \frac{1}{2} f(a-\epsilon) + \frac{1}{2} f(a+\epsilon).

Taking infima in {A} (and noting that the hypotheses on {\epsilon} are independent of {A}) we conclude that

\displaystyle  f(a) \geq \frac{1}{2} f(a-\epsilon) + \frac{1}{2} f(a+\epsilon)

for all {0 < \epsilon < a-a^2}. As {f} is a local minimum and {\epsilon} is arbitrarily small, this implies that {f} is locally constant, a contradiction. This establishes Theorem 2.

We observe the following corollary:

Corollary 3 Let {G} be a compact connected group, with a Haar probability measure {\mu}. Let {A, B, C} be compact subsets of {G}, and let {\delta := \min(\mu(A),\mu(B),\mu(C))}. Then one has the pointwise estimate

\displaystyle  1_A * 1_B * 1_C \geq \frac{1}{4} (\mu(A)+\mu(B)+\mu(C)-1)_+^2

if {\mu(A)+\mu(B)+\mu(C)-1 \leq 2 \delta}, and

\displaystyle  1_A * 1_B * 1_C \geq \delta (\mu(A)+\mu(B)+\mu(C)-1-\delta)

if {\mu(A)+\mu(B)+\mu(C)-1 \geq 2 \delta}.

Once again, the bounds are completely sharp, as can be seen by computing {1_A*1_B*1_C} when {A,B,C} are arcs of a circle. For quasirandom {G}, one can do much better than these bounds, as discussed in this recent blog post; thus, the abelian case is morally the worst case here, although it seems difficult to convert this intuition into a rigorous reduction.

Proof: By cyclic permutation we may take {\delta = \mu(C)}. For any

\displaystyle  (\mu(A)+\mu(B)-1)_+ \leq t \leq \min(\mu(A),\mu(B)),

we can bound

\displaystyle  1_A*1_B*1_C \geq \min(1_A*1_B,t)*1_C

\displaystyle  \geq \int_G \min(1_A*1_B,t)\ d\mu - t (1-\mu(C))

\displaystyle  \geq t (\mu(A)+\mu(B)-t) - t (1-\mu(C))

\displaystyle  = t \min( \mu(A)+\mu(B)+\mu(C)-1-t )

where we used Theorem 2 to obtain the third line. Optimising in {t}, we obtain the claim. \Box

Emmanuel Breuillard, Ben Green and I have just uploaded to the arXiv the short paper “A nilpotent Freiman dimension lemma“, submitted to the special volume of the European Journal of Combinatorics in honour of Yahya Ould Hamidoune.  This paper is a nonabelian (or more precisely, nilpotent) variant of the following additive combinatorics lemma of Freiman:

Freiman’s lemma.  Let A be a finite subset of a Euclidean space with |A+A| \leq K|A|.  Then A is contained in an affine subspace of dimension at most {}\lfloor K-1 \rfloor.

This can be viewed as a “cheap” version of the more well known theorem of Freiman that places sets of small doubling in a torsion-free abelian group inside a generalised arithmetic progression.  The advantage here is that the bound on the dimension is extremely explicit.

Our main result is

Theorem.  Let A be a finite subset of a simply-connected nilpotent Lie group G which is a K-approximate group (i.e. A is symmetric, contains the identity, and A \cdot A can be covered by up to K left translates of A.  Then A can be covered by at most K^{2+29K^9} left-translates of a closed connected Lie subgroup of dimension at most K^9.

We remark that our previous paper established a similar result, in which the dimension bound was improved to 6\log_2 K, but at the cost of worsening the covering number to O_K(1), and with a much more complicated proof  (91 pages instead of 8). Furthermore, the bound on O_K(1) is ineffective, due to the use of ultraproducts in the argument (though it is likely that some extremely lousy explicit bound could eventually be squeezed out of the argument by finitising everything).  Note that the step of the ambient nilpotent group G does not influence the final bounds in the theorem, although we do of course need this step to be finite.  A simple quotienting argument allows one to deduce a corollary of the above theorem in which the ambient group is assumed to be residually torsion-free nilpotent instead of being a simply connected nilpotent Lie group, but we omit the statement of this corollary here.

To motivate the proof of this theorem, let us first show a simple case of an argument of Gleason, which is very much in the spirit of Freiman’s lemma:

Gleason Lemma (special case).  Let A be a finite symmetric subset of a Euclidean space, and let 0 = H_0 \subset H_1 \subset \ldots \subset H_k be a sequence of subspaces in this space, such that the sets 2A \cap H_i are strictly increasing in i for i=0,\ldots,k.  Then |5A| \geq k|A|, where 5A = A+A+A+A+A.

Proof.    By hypothesis, for each i=1,\ldots,k, the projection B_i of 2A \cap H_i to H_i / H_{i-1} is non-trivial, finite, and symmetric.   In particular, since the vector space H_i/H_{i-1} is torsion-free, B_i+B_i is strictly larger than B_i.  Equivalently, one can find a_i in (2A \cap H_i) + (2A \cap H_i) that does not lie in (2A \cap H_i) + H_{i-1}; in particular, a_i \in 4A \cap H_i and a_i+A is disjoint from H_{i-1}+A.  As a consequence, the a_i+A are disjoint and lie in 5A, whence the claim. \Box

Note that by combining the contrapositive of this lemma with a greedy algorithm, one can show that any K-approximate group in a Euclidean space is contained in a subspace of dimension at most K^4, which is a weak version of Freiman’s lemma.

To extend the argument to the nilpotent setting we use the following idea.  Observe that any non-trivial genuine subgroup H of a nilpotent group G will contain at least one non-trivial central element; indeed, by intersecting H with the lower central series G = G_1 \geq G_2 \geq G_3 \geq \ldots of G, and considering the last intersection H \cap G_k which is non-trivial, one obtains the claim.  It turns out that one can adapt this argument to approximate groups, so that any sufficiently large K-approximate subgroup A of G will contain a non-trivial element that centralises a large fraction of A.  Passing to this large fraction and quotienting out the central element, we obtain a new approximate group.    If, after a bounded number of steps, this procedure gives an approximate group of bounded size, we are basically done.  If, however, the process continues, then by using some Lie group theory, one can find a long sequence H_0 \leq H_1 \leq \ldots \leq H_k of connected Lie subgroups of G, such that the sets A^2 \cap H_i are strictly increasing in i.   Using some Lie group theory and the hypotheses on G, one can deduce that the group \langle A^2 \cap H_{i+1}\rangle generated by A^2 \cap H_{i+1} is much larger than \langle A^2 \cap H_i \rangle, in the sense that the latter group has infinite index in the former.   It then turns out that the Gleason argument mentioned above can be adapted to this setting.

The objective of this course is to present a number of recent constructions of expander graphs, which are a type of sparse but “pseudorandom” graph of importance in computer science, the theory of random walks, geometric group theory, and in number theory. The subject of expander graphs and their applications is an immense one, and we will not possibly be able to cover it in full in this course. In particular, we will say almost nothing about the important applications of expander graphs to computer science, for instance in constructing good pseudorandom number generators, derandomising a probabilistic algorithm, constructing error correcting codes, or in building probabilistically checkable proofs. For such topics, I recommend the survey of Hoory-Linial-Wigderson. We will also only pass very lightly over the other applications of expander graphs, though if time permits I may discuss at the end of the course the application of expander graphs in finite groups such as {SL_2(F_p)} to certain sieving problems in analytic number theory, following the work of Bourgain, Gamburd, and Sarnak.

Instead of focusing on applications, this course will concern itself much more with the task of constructing expander graphs. This is a surprisingly non-trivial problem. On one hand, an easy application of the probabilistic method shows that a randomly chosen (large, regular, bounded-degree) graph will be an expander graph with very high probability, so expander graphs are extremely abundant. On the other hand, in many applications, one wants an expander graph that is more deterministic in nature (requiring either no or very few random choices to build), and of a more specialised form. For the applications to number theory or geometric group theory, it is of particular interest to determine the expansion properties of a very symmetric type of graph, namely a Cayley graph; we will also occasionally work with the more general concept of a Schreier graph. It turns out that such questions are related to deep properties of various groups {G} of Lie type (such as {SL_2({\bf R})} or {SL_2({\bf Z})}), such as Kazhdan’s property (T), the first nontrivial eigenvalue of a Laplacian on a symmetric space {G/\Gamma} associated to {G}, the quasirandomness of {G} (as measured by the size of irreducible representations), and the product theory of subsets of {G}. These properties are of intrinsic interest to many other fields of mathematics (e.g. ergodic theory, operator algebras, additive combinatorics, representation theory, finite group theory, number theory, etc.), and it is quite remarkable that a single problem – namely the construction of expander graphs – is so deeply connected with such a rich and diverse array of mathematical topics. (Perhaps this is because so many of these fields are all grappling with aspects of a single general problem in mathematics, namely when to determine whether a given mathematical object or process of interest “behaves pseudorandomly” or not, and how this is connected with the symmetry group of that object or process.)

(There are also other important constructions of expander graphs that are not related to Cayley or Schreier graphs, such as those graphs constructed by the zigzag product construction, but we will not discuss those types of graphs in this course, again referring the reader to the survey of Hoory, Linial, and Wigderson.)

Read the rest of this entry »

RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.
Follow

Get every new post delivered to your Inbox.

Join 2,330 other followers