You are currently browsing the monthly archive for February 2012.

In the last three notes, we discussed the Bourgain-Gamburd expansion machine and two of its three ingredients, namely quasirandomness and product theorems, leaving only the non-concentration ingredient to discuss. We can summarise the results of the last three notes, in the case of fields of prime order, as the following theorem.

Theorem 1 (Non-concentration implies expansion in {SL_d}) Let {p} be a prime, let {d \geq 1}, and let {S} be a symmetric set of elements in {G := SL_d(F_p)} of cardinality {|S|=k} not containing the identity. Write {\mu := \frac{1}{|S|} \sum_{s\in S}\delta_s}, and suppose that one has the non-concentration property

\displaystyle  \sup_{H < G}\mu^{(n)}(H) < |G|^{-\kappa} \ \ \ \ \ (1)

for some {\kappa>0} and some even integer {n \leq \Lambda \log |G|}. Then {Cay(G,S)} is a two-sided {\epsilon}-expander for some {\epsilon>0} depending only on {k, d, \kappa,\Lambda}.

Proof: From (1) we see that {\mu^{(n)}} is not supported in any proper subgroup {H} of {G}, which implies that {S} generates {G}. The claim now follows from the Bourgain-Gamburd expansion machine (Theorem 2 of Notes 4), the product theorem (Theorem 1 of Notes 5), and quasirandomness (Exercise 8 of Notes 3). \Box

Remark 1 The same argument also works if we replace {F_p} by the field {F_{p^j}} of order {p^j} for some bounded {j}. However, there is a difficulty in the regime when {j} is unbounded, because the quasirandomness property becomes too weak for the Bourgain-Gamburd expansion machine to be directly applicable. On theother hand, the above type of theorem was generalised to the setting of cyclic groups {{\bf Z}/q{\bf Z}} with {q} square-free by Varju, to arbitrary {q} by Bourgain and Varju, and to more general algebraic groups than {SL_d} and square-free {q} by Salehi Golsefidy and Varju. It may be that some modification of the proof techniques in these papers may also be able to handle the field case {F_{p^j}} with unbounded {j}.

It thus remains to construct tools that can establish the non-concentration property (1). The situation is particularly simple in {SL_2(F_p)}, as we have a good understanding of the subgroups of that group. Indeed, from Theorem 14 from Notes 5, we obtain the following corollary to Theorem 1:

Corollary 2 (Non-concentration implies expansion in {SL_2}) Let {p} be a prime, and let {S} be a symmetric set of elements in {G := SL_2(F_p)} of cardinality {|S|=k} not containing the identity. Write {\mu := \frac{1}{|S|} \sum_{s\in S}\delta_s}, and suppose that one has the non-concentration property

\displaystyle  \sup_{B}\mu^{(n)}(B) < |G|^{-\kappa} \ \ \ \ \ (2)

for some {\kappa>0} and some even integer {n \leq \Lambda \log |G|}, where {B} ranges over all Borel subgroups of {SL_2(\overline{F})}. Then, if {|G|} is sufficiently large depending on {k,\kappa,\Lambda}, {Cay(G,S)} is a two-sided {\epsilon}-expander for some {\epsilon>0} depending only on {k, \kappa,\Lambda}.

It turns out (2) can be verified in many cases by exploiting the solvable nature of the Borel subgroups {B}. We give two examples of this in these notes. The first result, due to Bourgain and Gamburd (with earlier partial results by Gamburd and by Shalom) generalises Selberg’s expander construction to the case when {S} generates a thin subgroup of {SL_2({\bf Z})}:

Theorem 3 (Expansion in thin subgroups) Let {S} be a symmetric subset of {SL_2({\bf Z})} not containing the identity, and suppose that the group {\langle S \rangle} generated by {S} is not virtually solvable. Then as {p} ranges over all sufficiently large primes, the Cayley graphs {Cay(SL_2(F_p), \pi_p(S))} form a two-sided expander family, where {\pi_p: SL_2({\bf Z}) \rightarrow SL_2(F_p)} is the usual projection.

Remark 2 One corollary of Theorem 3 (or of the non-concentration estimate (3) below) is that {\pi_p(S)} generates {SL_2(F_p)} for all sufficiently large {p}, if {\langle S \rangle} is not virtually solvable. This is a special case of a much more general result, known as the strong approximation theorem, although this is certainly not the most direct way to prove such a theorem. Conversely, the strong approximation property is used in generalisations of this result to higher rank groups than {SL_2}.

Exercise 1 In the converse direction, if {\langle S\rangle} is virtually solvable, show that for sufficiently large {p}, {\pi_p(S)} fails to generate {SL_2(F_p)}. (Hint: use Theorem 14 from Notes 5 to prevent {SL_2(F_p)} from having bounded index solvable subgroups.)

Exercise 2 (Lubotzsky’s 1-2-3 problem) Let {S := \{ \begin{pmatrix}1 & \pm 3 \\ 0 & 1 \end{pmatrix}, \begin{pmatrix}1 & 0 \\ \pm 3 & 1 \end{pmatrix}}.

  • (i) Show that {S} generates a free subgroup of {SL_2({\bf Z})}. (Hint: use a ping-pong argument, as in Exercise 23 of Notes 2.)
  • (ii) Show that if {v, w} are two distinct elements of the sector {\{ (x,y) \in {\bf R}^2_+: x/2 < y < 2x \}}, then there os no element {g \in \langle S \rangle} for which {gv = w}. (Hint: this is another ping-pong argument.) Conclude that {\langle S \rangle} has infinite index in {SL_2({\bf Z})}. (Contrast this with the situation in which the {3} coefficients in {S} are replaced by {1} or {2}, in which case {\langle S \rangle} is either all of {SL_2({\bf Z})}, or a finite index subgroup, as demonstrated in Exercise 23 of Notes 2).
  • (iii) Show that {Cay(SL_2(F_p), \pi_p(S))} for sufficiently large primes {p} form a two-sided expander family.

Remark 3 Theorem 3 has been generalised to arbitrary linear groups, and with {F_p} replaced by {{\bf Z}/q{\bf Z}} for square-free {q}; see this paper of Salehi Golsefidy and Varju. In this more general setting, the condition of virtual solvability must be replaced by the condition that the connected component of the Zariski closure of {\langle S \rangle} is perfect. An effective version of Theorem 3 (with completely explicit constants) was recently obtained by Kowalski.

The second example concerns Cayley graphs constructed using random elements of {SL_2(F_p)}.

Theorem 4 (Random generators expand) Let {p} be a prime, and let {x,y} be two elements of {SL_2(F_p)} chosen uniformly at random. Then with probability {1-o_{p \rightarrow \infty}(1)}, {Cay(SL_2(F_p), \{x,x^{-1},y,y^{-1}\})} is a two-sided {\epsilon}-expander for some absolute constant {\epsilon}.

Remark 4 As with Theorem 3, Theorem 4 has also been extended to a number of other groups, such as the Suzuki groups (in this paper of Breuillard, Green, and Tao), and more generally to finite simple groups of Lie type of bounded rank (in forthcoming work of Breuillard, Green, Guralnick, and Tao). There are a number of other constructions of expanding Cayley graphs in such groups (and in other interesting groups, such as the alternating groups) beyond those discussed in these notes; see this recent survey of Lubotzky for further discussion. It has been conjectured by Lubotzky and Weiss that any pair {x,y} of (say) {SL_2(F_p)} that generates the group, is a two-sided {\epsilon}-expander for an absolute constant {\epsilon}: in the case of {SL_2(F_p)}, this has been established for a density one set of primes by Breuillard and Gamburd.

— 1. Expansion in thin subgroups —

We now prove Theorem 3. The first observation is that the expansion property is monotone in the group {\langle S \rangle}:

Exercise 3 Let {S, S'} be symmetric subsets of {SL_2({\bf Z})} not containing the identity, such that {\langle S \rangle \subset \langle S' \rangle}. Suppose that {Cay(SL_2(F_p), \pi_p(S))} is a two-sided expander family for sufficiently large primes {p}. Show that {Cay(SL_2(F_p), \pi_p(S'))} is also a two-sided expander family.

As a consequence, Theorem 3 follows from the following two statments:

Theorem 5 (Tits alternative) Let {\Gamma \subset SL_2({\bf Z})} be a group. Then exactly one of the following statements holds:

  • (i) {\Gamma} is virtually solvable.
  • (ii) {\Gamma} contains a copy of the free group {F_2} of two generators as a subgroup.

Theorem 6 (Expansion in free groups) Let {x,y \in SL_2({\bf Z})} be generators of a free subgroup of {SL_2({\bf Z})}. Then as {p} ranges over all sufficiently large primes, the Cayley graphs {Cay(SL_2(F_p), \pi_p(\{x,y,x^{-1},y^{-1}\}))} form a two-sided expander family.

Theorem 5 is a special case of the famous Tits alternative, which among other things allows one to replace {SL_2({\bf Z})} by {GL_d(k)} for any {d \geq 1} and any field {k} of characteristic zero (and fields of positive characteristic are also allowed, if one adds the requirement that {\Gamma} be finitely generated). We will not prove the full Tits alternative here, but instead just give an ad hoc proof of the special case in Theorem 5 in the following exercise.

Exercise 4 Given any matrix {g \in SL_2({\bf Z})}, the singular values are {\|g\|_{op}} and {\|g\|_{op}^{-1}}, and we can apply the singular value decomposition to decompose

\displaystyle  g = u_1(g) \|g\|_{op} v_1^*(g) + u_2(g) \|g\|_{op}^{-1} v_2(g)^*

where {u_1(g),u_2(g)\in {\bf C}^2} and {v_1(g), v_2(g) \in {\bf C}^2} are orthonormal bases. (When {\|g\|_{op}>1}, these bases are uniquely determined up to phase rotation.) We let {\tilde u_1(g) \in {\bf CP}^1} be the projection of {u_1(g)} to the projective complex plane, and similarly define {\tilde v_2(g)}.

Let {\Gamma} be a subgroup of {SL_2({\bf Z})}. Call a pair {(u,v) \in {\bf CP}^1 \times {\bf CP}^1} a limit point of {\Gamma} if there exists a sequence {g_n \in \Gamma} with {\|g_n\|_{op} \rightarrow \infty} and {(\tilde u_1(g_n), \tilde v_2(g_n)) \rightarrow (u,v)}.

  • (i) Show that if {\Gamma} is infinite, then there is at least one limit point.
  • (ii) Show that if {(u,v)} is a limit point, then so is {(v,u)}.
  • (iii) Show that if there are two limit points {(u,v), (u',v')} with {\{u,v\} \cap \{u',v'\} = \emptyset}, then there exist {g,h \in \Gamma} that generate a free group. (Hint: Choose {(\tilde u_1(g), \tilde v_2(g))} close to {(u,v)} and {(\tilde u_1(h),\tilde v_2(h))} close to {(u',v')}, and consider the action of {g} and {h} on {{\bf CP}^1}, and specifically on small neighbourhoods of {u,v,u',v'}, and set up a ping-pong type situation.)
  • (iv) Show that if {g \in SL_2({\bf Z})} is hyperbolic (i.e. it has an eigenvalue greater than 1), with eigenvectors {u,v}, then the projectivisations {(\tilde u,\tilde v)} of {u,v} form a limit point. Similarly, if {g} is regular parabolic (i.e. it has an eigenvalue at 1, but is not the identity) with eigenvector {u}, show that {(\tilde u,\tilde bu)} is a limit point.
  • (v) Show that if {\Gamma} has no free subgroup of two generators, then all hyperbolic and regular parabolic elements of {\Gamma} have a common eigenvector. Conclude that all such elements lie in a solvable subgroup of {\Gamma}.
  • (vi) Show that if an element {g \in SL_2({\bf Z})} is neither hyperbolic nor regular parabolic, and is not a multiple of the identity, then {g} is conjugate to a rotation by {\pi/2} (in particular, {g^2=-1}).
  • (vii) Establish Theorem 5. (Hint: show that two square roots of {-1} in {SL_2({\bf Z})} cannot multiply to another square root of {-1}.)

Now we prove Theorem 6. Let {\Gamma} be a free subgroup of {SL_2({\bf Z})} generated by two generators {x,y}. Let {\mu := \frac{1}{4} (\delta_x +\delta_{x^{-1}} + \delta_y + \delta_{y^{-1}})} be the probability measure generating a random walk on {SL_2({\bf Z})}, thus {(\pi_p)_* \mu} is the corresponding generator on {SL_2(F_p)}. By Corollary 2, it thus suffices to show that

\displaystyle  \sup_{B}((\pi_p)_* \mu)^{(n)}(B) < p^{-\kappa} \ \ \ \ \ (3)

for all sufficiently large {p}, some absolute constant {\kappa>0}, and some even {n = O(\log p)} (depending on {p}, of course), where {B} ranges over Borel subgroups.

As {\pi_p} is a homomorphism, one has {((\pi_p)_* \mu)^{(n)}(B) = (\pi_p)_* (\mu^{(n)})(B) = \mu^{(n)}(\pi_p^{-1}(B))} and so it suffices to show that

\displaystyle  \sup_{B} \mu^{(n)}(\pi_p^{-1}(B)) < p^{-\kappa}.

To deal with the supremum here, we will use an argument of Bourgain and Gamburd, taking advantage of the fact that all Borel groups of {SL_2} obey a common group law, the point being that free groups such as {\Gamma} obey such laws only very rarely. More precisely, we use the fact that the Borel groups are solvable of derived length two; in particular we have

\displaystyle  [[a,b],[c,d]] = 1 \ \ \ \ \ (4)

for all {a,b,c,d \in B}. Now, {\mu^{(n)}} is supported on matrices in {SL_2({\bf Z})} whose coefficients have size {O(\exp(O(n)))} (where we allow the implied constants to depend on the choice of generators {x,y}), and so {(\pi_p)_*( \mu^{(n)} )} is supported on matrices in {SL_2(F_p)} whose coefficients also have size {O(\exp(O(n)))}. If {n} is less than a sufficiently small multiple of {\log p}, these coefficients are then less than {p^{1/10}} (say). As such, if {\tilde a,\tilde b,\tilde c,\tilde d \in SL_2({\bf Z})} lie in the support of {\mu^{(n)}} and their projections {a = \pi_p(\tilde a), \ldots, d = \pi_p(\tilde d)} obey the word law (4) in {SL_2(F_p)}, then the original matrices {\tilde a, \tilde b, \tilde c, \tilde d} obey the word law (4) in {SL_2({\bf Z})}. (This lifting of identities from the characteristic {p} setting of {SL_2(F_p)} to the characteristic {0} setting of {SL_2({\bf Z})} is a simple example of the “Lefschetz principle”.)

To summarise, if we let {E_{n,p,B}} be the set of all elements of {\pi_p^{-1}(B)} that lie in the support of {\mu^{(n)}}, then (4) holds for all {a,b,c,d \in E_{n,p,B}}. This severely limits the size of {E_{n,p,B}} to only be of polynomial size, rather than exponential size:

Proposition 7 Let {E} be a subset of the support of {\mu^{(n)}} (thus, {E} consists of words in {x,y,x^{-1},y^{-1}} of length {n}) such that the law (4) holds for all {a,b,c,d \in E}. Then {|E| \ll n^2}.

The proof of this proposition is laid out in the exercise below.

Exercise 5 Let {\Gamma} be a free group generated by two generators {x,y}. Let {B} be the set of all words of length at most {n} in {x,y,x^{-1},y^{-1}}.

  • (i) Show that if {a,b \in \Gamma} commute, then {a, b} lie in the same cyclic group, thus {a = c^i, b = c^j} for some {c \in \Gamma} and {i,j \in {\bf Z}}.
  • (ii) Show that if {a \in \Gamma}, there are at most {O(n)} elements of {B} that commute with {a}.
  • (iii) Show that if {a,c \in \Gamma}, there are at most {O(n)} elements {b} of {B} with {[a,b] = c}.
  • (iv) Prove Proposition 7.

Now we can conclude the proof of Theorem 3:

Exercise 6 Let {\Gamma} be a free group generated by two generators {x,y}.

  • (i) Show that {\| \mu^{(n)} \|_{\ell^\infty(\Gamma)} \ll c^n} for some absolute constant {0 < c<1}. (For much more precise information on {\mu^{(n)}}, see this paper of Kesten.)
  • (ii) Conclude the proof of Theorem 3.

— 2. Random generators expand —

We now prove Theorem 4. Let {{\bf F}_2} be the free group on two formal generators {a,b}, and let {\mu := \frac{1}{4}(\delta_a + \delta_b + \delta_{a^{-1}}+ \delta_{b^{-1}}} be the generator of the random walk. For any word {w \in {\bf F}_2} and any {x,y} in a group {G}, let {w(x,y) \in G} be the element of {G} formed by substituting {x,y} for {a,b} respectively in the word {w}; thus {w} can be viewed as a map {w: G \times G \rightarrow G} for any group {G}. Observe that if {w} is drawn randomly using the distribution {\mu^{(n)}}, and {x,y \in SL_2(F_p)}, then {w(x,y)} is distributed according to the law {\tilde \mu^{(n)}}, where {\tilde \mu := \frac{1}{4}(\delta_x + \delta_y + \delta_{x^{-1}}+ \delta_{y^{-1}})}. Applying Corollary 2, it suffices to show that whenever {p} is a large prime and {x,y} are chosen uniformly and independently at random from {SL_2(F_p)}, that with probability {1-o_{p \rightarrow \infty}(1)}, one has

\displaystyle  \sup_B {\bf P}_w ( w(x,y) \in B ) \leq p^{-\kappa} \ \ \ \ \ (5)

for some absolute constant {\kappa}, where {B} ranges over all Borel subgroups of {SL_2(\overline{F_p})} and {w} is drawn from the law {\mu^{(n)}} for some even natural number {n = O(\log p)}.

Let {B_n} denote the words in {{\bf F}_2} of length at most {n}. We may use the law (4) to obtain good bound on the supremum in (5) assuming a certain non-degeneracy property of the word evaluations {w(x,y)}:

Exercise 7 Let {n} be a natural number, and suppose that {x,y \in SL_2(F_p)} is such that {w(x,y) \neq 1} for {w \in B_{100n} \backslash \{1\}}. Show that

\displaystyle  \sup_B {\bf P}_w ( w(x,y) \in B ) \ll \exp(-cn)

for some absolute constant {c>0}, where {w} is drawn from the law {\mu^{(n)}}. (Hint: use (4) and the hypothesis to lift the problem up to {{\bf F}_2}, at which point one can use Proposition 7 and Exercise 6.)

In view of this exercise, it suffices to show that with probability {1-o_{p \rightarrow\infty}(1)}, one has {w(x,y) \neq 1} for all {w \in B_{100n} \backslash \{1\}} for some {n} comparable to a small multiple of {\log p}. As {B_{100n}} has {\exp(O(n))} elements, it thus suffices by the union bound to show that

\displaystyle  {\bf P}_{x,y}(w(x,y)=1) \leq p^{-\gamma} \ \ \ \ \ (6)

for some absolute constant {\gamma > 0}, and any {w \in {\bf F}_2 \backslash \{1\}} of length less than {c\log p} for some sufficiently small absolute constant {c>0}.

Let us now fix a non-identity word {w} of length {|w|} less than {c\log p}, and consider {w} as a function from {SL_2(k) \times SL_2(k)} to {SL_2(k)} for an arbitrary field {k}. We can identify {SL_2(k)} with the set {\{ (a,b,c,d)\in k^4: ad-bc=1\}}. A routine induction then shows that the expression {w((a,b,c,d),(a',b',c',d'))} is then a polynomial in the eight variables {a,b,c,d,a',b',c',d'} of degree {O(|w|)} and coefficients which are integers of size {O( \exp( O(|w|) ) )}. Let us then make the additional restriction to the case {a,a' \neq 0}, in which case we can write {d = \frac{bc+1}{a}} and {d' =\frac{b'c'+1}{a'}}. Then {w((a,b,c,d),(a',b',c',d'))} is now a rational function of {a,b,c,a',b',c'} whose numerator is a polynomial of degree {O(|w|)} and coefficients of size {O( \exp( O(|w|) ) )}, and the denominator is a monomial of {a,a'} of degree {O(|w|)}.

We then specialise this rational function to the field {k=F_p}. It is conceivable that when one does so, the rational function collapses to the constant polynomial {(1,0,0,1)}, thus {w((a,b,c,d),(a',b',c',d'))=1} for all {(a,b,c,d),(a',b',c',d') \in SL_2(F_p)} with {a,a' \neq 0}. (For instance, this would be the case if {w(x,y) = x^{|SL_2(F_p)|}}, by Lagrange’s theorem, if it were not for the fact that {|w|} is far too large here.) But suppose that this rational function does not collapse to the constant rational function. Applying the Schwarz-Zippel lemma (Exercise 23 from Notes 5), we then see that the set of pairs {(a,b,c,d),(a',b',c',d') \in SL_2(F_p)} with {a,a' \neq 0} and {w((a,b,c,d),(a',b',c',d'))=1} is at most {O( |w| p^5 )}; adding in the {a=0} and {a'=0} cases, one still obtains a bound of {O(|w|p^5)}, which is acceptable since {|SL_2(F_p)|^2 \sim p^6} and {|w| = O( \log p )}. Thus, the only remaining case to consider is when the rational function {w((a,b,c,d),(a',b',c',d'))} is identically {1} on {SL_2(F_p)} with {a,a' \neq 0}.

Now we perform another “Lefschetz principle” maneuvre to change the underlying field. Recall that the denominator of rational function {w((a,b,c,d),(a',b',c',d'))} is monomial in {a,a'}, and the numerator has coefficients of size {O(\exp(O(|w|)))}. If {|w|} is less than {c\log p} for a sufficiently small {p}, we conclude in particular (for {p} large enough) that the coefficients all have magnitude less than {p}. As such, the only way that this function can be identically {1} on {SL_2(F_p)} is if it is identically {1} on {SL_2(k)} for all {k} with {a,a' \neq 0}, and hence for {a=0} or {a'=0} also by taking Zariski closures.

On the other hand, we know that for some choices of {k}, e.g. {k={\bf R}}, {SL_2(k)} contains a copy {\Gamma} of the free group on two generators (see e.g. Exercise 23 of Notes 2). As such, it is not possible for any non-identity word {w} to be identically trivial on {SL_2(k) \times SL_2(k)}. Thus this case cannot actually occur, completing the proof of (6) and hence of Theorem 4.

Remark 5 We see from the above argument that the existence of subgroups {\Gamma} of an algebraic group with good “independence” properties – such as that of generating a free group – can be useful in studying the expansion properties of that algebraic group, even if the field of interest in the latter is distinct from that of the former. For more complicated algebraic groups than {SL_2}, in which laws such as (4) are not always available, it turns out to be useful to place further properties on the subgroup {\Gamma}, for instance by requiring that all non-abelian subgroups of that group be Zariski dense (a property which has been called strong density), as this turns out to be useful for preventing random walks from concentrating in proper algebraic subgroups. See this paper of Breuillard, Guralnick, Green and Tao for constructions of strongly dense free subgroups of algebraic groups and further discussion.

It has been a little over two weeks now since the protest site at thecostofknowledge.com was set up to register declarations of non-cooperation with Reed Elsevier in protest of their research publishing practices, inspired by this blog post of Tim Gowers.   Awareness of the protest has certainly grown in these two weeks; the number of signatories is now well over four thousand, across a broad array of academic disciplines, and the protest has been covered by many blogs and also the mainstream media (e.g. the Guardian, the Economist, Forbes, etc.), and even by Elsevier stock analysts.    (Elsevier itself released an open letter responding to the protest here.)  My interpretation of events is that there was a significant amount of latent or otherwise undisclosed dissatisfaction already with the publishing practices of Elsevier (and, to a lesser extent, some other commercial academic publishers), and a desire to support alternatives such as university or society publishers, and the more recent open access journals; and that this protest (and parallel protests, such as the movement to oppose the Research Works Act) served to drive these feelings out into the open.

The statement of the protest itself, though, is rather brief, reflecting the improvised manner in which the site was created.  A group of mathematicians including myself therefore decided to write and sign a more detailed explanation of why we supported this protest, giving more background and references to support our position.   The 34 signatories are Scott Aaronson, Douglas N. Arnold, Artur Avila, John Baez, Folkmar Bornemann, Danny Calegari, Henry Cohn, Ingrid Daubechies, Jordan Ellenberg, Matthew Emerton, Marie Farge, David Gabai, Timothy Gowers, Ben Green, Martin Grotschel, Michael Harris, Frederic Helein, Rob Kirby, Vincent Lafforgue, Gregory F. Lawler, Randall J. LeVeque, Laszlo Lovasz, Peter J. Olver, Olof Sisask, Richard Taylor, Bernard Teissier, Burt Totaro, Lloyd N. Trefethen, Takashi Tsuboi, Marie-France Vigneras, Wendelin Werner, Amie Wilkinson, Gunter M. Ziegler, and myself.  (Note that while Daubechies is current president of the International Mathematical Union, Lovasz is a past president, and Grotschel is the current secretary, they are signing this letter as individuals and not as representatives of the IMU. Similarly for Trefethen and Arnold (current and past president of SIAM).)

Of course, the 34 of us do not presume to speak for the remaining four thousand signatories to the protest, but I hope that our statement is somewhat representative of the position of many of its supporters.

Further discussion of this statement can be found at this blog post of Tim Gowers.

EDIT: I think it is appropriate to quote the following excerpt from our statement:

All mathematicians must decide for themselves whether, or to what extent, they wish to participate in the boycott. Senior mathematicians who have signed the boycott bear some responsibility towards junior colleagues who are forgoing the option of publishing in Elsevier journals, and should do their best to help minimize any negative career consequences.

Whether or not you decide to join the boycott, there are some simple actions that everyone can take, which seem to us to be uncontroversial:

  1. Make sure that the final versions of all your papers, particularly new ones, are freely available online, ideally both on the arXiv and on your home page.
  2.  If you are submitting a paper and there is a choice between an expensive journal and a cheap (or free) journal of the same standard, then always submit to the cheap one.

In the previous set of notes, we saw that one could derive expansion of Cayley graphs from three ingredients: non-concentration, product theorems, and quasirandomness. Quasirandomness was discussed in Notes 3. In the current set of notes, we discuss product theorems. Roughly speaking, these theorems assert that in certain circumstances, a finite subset {A} of a group {G} either exhibits expansion (in the sense that {A^3}, say, is significantly larger than {A}), or is somehow “close to” or “trapped” by a genuine group.

Theorem 1 (Product theorem in {SL_d(k)}) Let {d \geq 2}, let {k} be a finite field, and let {A} be a finite subset of {G := SL_d(k)}. Let {\epsilon >0} be sufficiently small depending on {d}. Then at least one of the following statements holds:

  • (Expansion) One has {|A^3| \geq |A|^{1+\epsilon}}.
  • (Close to {G}) One has {|A| \geq |G|^{1-O_d(\epsilon)}}.
  • (Trapping) {A} is contained in a proper subgroup of {G}.

We will prove this theorem (which was proven first in the {d=2,3} cases for fields {F} of prime order by Helfgott, and then for {d=2} and general {F} by Dinai, and finally to general {d} and {F} independently by Pyber-Szabo and by Breuillard-Green-Tao) later in this notes. A more qualitative version of this proposition was also previously obtained by Hrushovski. There are also generalisations of the product theorem of importance to number theory, in which the field {k} is replaced by a cyclic ring {{\bf Z}/q{\bf Z}} (with {q} not necessarily prime); this was achieved first for {d=2} and {q} square-free by Bourgain, Gamburd, and Sarnak, by Varju for general {d} and {q} square-free, and finally by this paper of Bourgain and Varju for arbitrary {d} and {q}.

Exercise 1 (Diameter bound) Assuming Theorem 1, show that whenever {S} is a symmetric set of generators of {SL_d(k)} for some finite field {k} and some {d\geq 2}, then any element of {SL_d(k)} can be expressed as the product of {O_d( \log^{O_d(1)} |k| )} elements from {S}. (Equivalently, if we add the identity element to {S}, then {S^m = SL_d(k)} for some {m = O_d( \log^{O_d(1)} |k| )}.) This is a special case of a conjecture of Babai and Seress, who conjectured that the bound should hold uniformly for all finite simple groups (in particular, the implied constants here should not actually depend on {d}. The methods used to handle the {SL_d} case can handle other finite groups of Lie type of bounded rank, but at present we do not have bounds that are independent of the rank. On the other hand, a recent paper of Helfgott and Seress has almost resolved the conjecture for the permutation groups {A_n}.

A key tool to establish product theorems is an argument which is sometimes referred to as the pivot argument. To illustrate this argument, let us first discuss a much simpler (and older) theorem, essentially due to Freiman, which has a much weaker conclusion but is valid in any group {G}:

Theorem 2 (Baby product theorem) Let {G} be a group, and let {A} be a finite non-empty subset of {G}. Then one of the following statements hold:

  • (Expansion) One has {|A^{-1} A| \geq \frac{3}{2} |A|}.
  • (Close to a subgroup) {A} is contained in a left-coset of a group {H} with {|H| < \frac{3}{2} |A|}.

To prove this theorem, we suppose that the first conclusion does not hold, thus {|A^{-1} A| <\frac{3}{2} |A|}. Our task is then to place {A} inside the left-coset of a fairly small group {H}.

To do this, we take a group element {g \in G}, and consider the intersection {A\cap gA}. A priori, the size of this set could range from anywhere from {0} to {|A|}. However, we can use the hypothesis {|A^{-1} A| < \frac{3}{2} |A|} to obtain an important dichotomy, reminiscent of the classical fact that two cosets {gH, hH} of a subgroup {H} of {G} are either identical or disjoint:

Proposition 3 (Dichotomy) If {g \in G}, then exactly one of the following occurs:

  • (Non-involved case) {A \cap gA} is empty.
  • (Involved case) {|A \cap gA| > \frac{|A|}{2}}.

Proof: Suppose we are not in the pivot case, so that {A \cap gA} is non-empty. Let {a} be an element of {A \cap gA}, then {a} and {g^{-1} a} both lie in {A}. The sets {A^{-1} a} and {A^{-1} g^{-1} a} then both lie in {A^{-1} A}. As these sets have cardinality {|A|} and lie in {A^{-1}A}, which has cardinality less than {\frac{3}{2}|A|}, we conclude from the inclusion-exclusion formula that

\displaystyle |A^{-1} a \cap A^{-1} g^{-1} a| > \frac{|A|}{2}.

But the left-hand side is equal to {|A \cap gA|}, and the claim follows. \Box

The above proposition provides a clear separation between two types of elements {g \in G}: the “non-involved” elements, which have nothing to do with {A} (in the sense that {A \cap gA = \emptyset}, and the “involved” elements, which have a lot to do with {A} (in the sense that {|A \cap gA| > |A|/2}. The key point is that there is a significant “gap” between the non-involved and involved elements; there are no elements that are only “slightly involved”, in that {A} and {gA} intersect a little but not a lot. It is this gap that will allow us to upgrade approximate structure to exact structure. Namely,

Proposition 4 The set {H} of involved elements is a finite group, and is equal to {A A^{-1}}.

Proof: It is clear that the identity element {1} is involved, and that if {g} is involved then so is {g^{-1}} (since {A \cap g^{-1} A = g^{-1}(A \cap gA)}. Now suppose that {g, h} are both involved. Then {A \cap gA} and {A\cap hA} have cardinality greater than {|A|/2} and are both subsets of {A}, and so have non-empty intersection. In particular, {gA \cap hA} is non-empty, and so {A \cap g^{-1} hA} is non-empty. By Proposition 3, this makes {g^{-1} h} involved. It is then clear that {H} is a group.

If {g \in A A^{-1}}, then {A \cap gA} is non-empty, and so from Proposition 3 {g} is involved. Conversely, if {g} is involved, then {g \in A A^{-1}}. Thus we have {H = A A^{-1}} as claimed. In particular, {H} is finite. \Box

Now we can quickly wrap up the proof of Theorem 2. By construction, {A \cap gA| > |A|/2} for all {g \in H},which by double counting shows that {|H| < 2|A|}. As {H = A A^{-1}}, we see that {A} is contained in a right coset {Hg} of {H}; setting {H' := g^{-1} H g}, we conclude that {A} is contained in a left coset {gH'} of {H'}. {H'} is a conjugate of {H}, and so {|H'| < 2|A|}. If {h \in H'}, then {A} and {Ah} both lie in {H'} and have cardinality {|A|}, so must overlap; and so {h \in A^{-1} A}. Thus {A^{-1} A = H'}, and so {|H'| < \frac{3}{2} |A|}, and Theorem 2 follows.

Exercise 2 Show that the constant {3/2} in Theorem 2 cannot be replaced by any larger constant.

Exercise 3 Let {A \subset G} be a finite non-empty set such that {|A^2| < 2|A|}. Show that {AA^{-1}=A^{-1} A}. (Hint: If {ab^{-1} \in A A^{-1}}, show that {ab^{-1} = c^{-1} d} for some {c,d \in A}.)

Exercise 4 Let {A \subset G} be a finite non-empty set such that {|A^2| < \frac{3}{2} |A|}. Show that there is a finite group {H} with {|H| < \frac{3}{2} |A|} and a group element {g \in G} such that {A \subset Hg \cap gH} and {H = A A^{-1}}.

Below the fold, we give further examples of the pivot argument in other group-like situations, including Theorem 2 and also the “sum-product theorem” of Bourgain-Katz-Tao and Bourgain-Glibichuk-Konyagin.

Read the rest of this entry »

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: The Universality phenomenon for Wigner ensembles“. This survey is a longer version (58 pages) of a previous short survey we wrote up a few months ago. The survey focuses on recent progress in understanding the universality phenomenon for Hermitian Wigner ensembles, of which the Gaussian Unitary Ensemble (GUE) is the most well known. The one-sentence summary of this progress is that many of the asymptotic spectral statistics (e.g. correlation functions, eigenvalue gaps, determinants, etc.) that were previously known for GUE matrices, are now known for very large classes of Wigner ensembles as well. There are however a wide variety of results of this type, due to the large number of interesting spectral statistics, the varying hypotheses placed on the ensemble, and the different modes of convergence studied, and it is difficult to isolate a single such result currently as the definitive universality result. (In particular, there is at present a tradeoff between generality of ensemble and strength of convergence; the universality results that are available for the most general classes of ensemble are only presently able to demonstrate a rather weak sense of convergence to the universal distribution (involving an additional averaging in the energy parameter), which limits the applicability of such results to a number of interesting questions in which energy averaging is not permissible, such as the study of the least singular value of a Wigner matrix, or of related quantities such as the condition number or determinant. But it is conceivable that this tradeoff is a temporary phenomenon and may be eliminated by future work in this area; in the case of Hermitian matrices whose entries have the same second moments as that of the GUE ensemble, for instance, the need for energy averaging has already been removed.)

Nevertheless, throughout the family of results that have been obtained recently, there are two main methods which have been fundamental to almost all of the recent progress in extending from special ensembles such as GUE to general ensembles. The first method, developed extensively by Erdos, Schlein, Yau, Yin, and others (and building on an initial breakthrough by Johansson), is the heat flow method, which exploits the rapid convergence to equilibrium of the spectral statistics of matrices undergoing Dyson-type flows towards GUE. (An important aspect to this method is the ability to accelerate the convergence to equilibrium by localising the Hamiltonian, in order to eliminate the slowest modes of the flow; this refinement of the method is known as the “local relaxation flow” method. Unfortunately, the translation mode is not accelerated by this process, which is the principal reason why results obtained by pure heat flow methods still require an energy averaging in the final conclusion; it would of interest to find a way around this difficulty.) The other method, which goes all the way back to Lindeberg in his classical proof of the central limit theorem, and which was introduced to random matrix theory by Chatterjee and then developed for the universality problem by Van Vu and myself, is the swapping method, which is based on the observation that spectral statistics of Wigner matrices tend to be stable if one replaces just one or two entries of the matrix with another distribution, with the stability of the swapping process becoming stronger if one assumes that the old and new entries have many matching moments. The main formalisations of this observation are known as four moment theorems, because they require four matching moments between the entries, although there are some variant three moment theorems and two moment theorems in the literature as well. Our initial four moment theorems were focused on individual eigenvalues (and later also to eigenvectors), but it was later observed by Erdos, Yau, and Yin that simpler four moment theorems could also be established for aggregate spectral statistics, such as the coefficients of the Greens function, and Knowles and Yin also subsequently observed that these latter theorems could be used to recover a four moment theorem for eigenvalues and eigenvectors, giving an alternate approach to proving such theorems.

Interestingly, it seems that the heat flow and swapping methods are complementary to each other; the heat flow methods are good at removing moment hypotheses on the coefficients, while the swapping methods are good at removing regularity hypotheses. To handle general ensembles with minimal moment or regularity hypotheses, it is thus necessary to combine the two methods (though perhaps in the future a third method, or a unification of the two existing methods, might emerge).

Besides the heat flow and swapping methods, there are also a number of other basic tools that are also needed in these results, such as local semicircle laws and eigenvalue rigidity, which are also discussed in the survey. We also survey how universality has been established for wide variety of spectral statistics; the {k}-point correlation functions are the most well known of these statistics, but they do not tell the whole story (particularly if one can only control these functions after an averaging in the energy), and there are a number of other statistics, such as eigenvalue counting functions, determinants, or spectral gaps, for which the above methods can be applied.

In order to prevent the survey from becoming too enormous, we decided to restrict attention to Hermitian matrix ensembles, whose entries off the diagonal are identically distributed, as this is the case in which the strongest results are available. There are several results that are applicable to more general ensembles than these which are briefly mentioned in the survey, but they are not covered in detail.

We plan to submit this survey eventually to the proceedings of a workshop on random matrix theory, and will continue to update the references on the arXiv version until the time comes to actually submit the paper.

Finally, in the survey we issue some errata for previous papers of Van and myself in this area, mostly centering around the three moment theorem (a variant of the more widely used four moment theorem), for which the original proof of Van and myself was incomplete. (Fortunately, as the three moment theorem had many fewer applications than the four moment theorem, and most of the applications that it did have ended up being superseded by subsequent papers, the actual impact of this issue was limited, but still an erratum is in order.)

I’ve just uploaded to the arXiv my paper “Every odd number greater than 1 is the sum of at most five primes“, submitted to Mathematics of Computation. The main result of the paper is as stated in the title, and is in the spirit of (though significantly weaker than) the even Goldbach conjecture (every even natural number is the sum of at most two primes) and odd Goldbach conjecture (every odd natural number greater than 1 is the sum of at most three primes). It also improves on a result of Ramaré that every even natural number is the sum of at most six primes. This result had previously also been established by Kaniecki under the additional assumption of the Riemann hypothesis, so one can view the main result here as an unconditional version of Kaniecki’s result.

The method used is the Hardy-Littlewood circle method, which was for instance also used to prove Vinogradov’s theorem that every sufficiently large odd number is the sum of three primes. Let’s quickly recall how this argument works. It is convenient to use a proxy for the primes, such as the von Mangoldt function {\Lambda}, which is mostly supported on the primes. To represent a large number {x} as the sum of three primes, it suffices to obtain a good lower bound for the sum

\displaystyle  \sum_{n_1,n_2,n_3: n_1+n_2+n_3=x} \Lambda(n_1) \Lambda(n_2) \Lambda(n_3).

By Fourier analysis, one can rewrite this sum as an integral

\displaystyle  \int_{{\bf R}/{\bf Z}} S(x,\alpha)^3 e(-x\alpha)\ d\alpha

where

\displaystyle  S(x,\alpha) := \sum_{n \leq x} \Lambda(n) e(n\alpha)

and {e(\theta) :=e^{2\pi i \theta}}. To control this integral, one then needs good bounds on {S(x,\alpha)} for various values of {\alpha}. To do this, one first approximates {\alpha} by a rational {a/q} with controlled denominator (using a tool such as the Dirichlet approximation theorem) {q}. The analysis then broadly bifurcates into the major arc case when {q} is small, and the minor arc case when {q} is large. In the major arc case, the problem more or less boils down to understanding sums such as

\displaystyle  \sum_{n\leq x} \Lambda(n) e(an/q),

which in turn is almost equivalent to understanding the prime number theorem in arithmetic progressions modulo {q}. In the minor arc case, the prime number theorem is not strong enough to give good bounds (unless one is using some extremely strong hypotheses, such as the generalised Riemann hypothesis), so instead one uses a rather different method, using truncated versions of divisor sum identities such as {\Lambda(n) =\sum_{d|n} \mu(d) \log\frac{n}{d}} to split {S(x,\alpha)} into a collection of linear and bilinear sums that are more tractable to bound, typical examples of which (after using a particularly simple truncated divisor sum identity known as Vaughan’s identity) include the “Type I sum”

\displaystyle \sum_{d \leq U} \mu(d) \sum_{n \leq x/d} \log(n) e(\alpha dn)

and the “Type II sum”

\displaystyle  \sum_{d > U} \sum_{w > V} \mu(d) (\sum_{b|w: b > V} \Lambda(b)) e(\alpha dw) 1_{dw \leq x}.

After using tools such as the triangle inequality or Cauchy-Schwarz inequality to eliminate arithmetic functions such as {\mu(d)} or {\sum_{b|w: b>V}\Lambda(b)}, one ends up controlling plain exponential sums such as {\sum_{V < w < x/d} e(\alpha dw)}, which can be efficiently controlled in the minor arc case.

This argument works well when {x} is extremely large, but starts running into problems for moderate sized {x}, e.g. {x \sim 10^{30}}. The first issue is that of logarithmic losses in the minor arc estimates. A typical minor arc estimate takes the shape

\displaystyle  |S(x,\alpha)| \ll (\frac{x}{\sqrt{q}}+\frac{x}{\sqrt{x/q}} + x^{4/5}) \log^3 x \ \ \ \ \ (1)

when {\alpha} is close to {a/q} for some {1\leq q\leq x}. This only improves upon the trivial estimate {|S(x,\alpha)| \ll x} from the prime number theorem when {\log^6 x \ll q \ll x/\log^6 x}. As a consequence, it becomes necessary to obtain an accurate prime number theorem in arithmetic progressions with modulus as large as {\log^6 x}. However, with current technology, the error term in such theorems are quite poor (terms such as {O(\exp(-c\sqrt{\log x}) x)} for some small {c>0} are typical, and there is also a notorious “Siegel zero” problem), and as a consequence, the method is generally only applicable for very large {x}. For instance, the best explicit result of Vinogradov type known currently is due to Liu and Wang, who established that all odd numbers larger than {10^{1340}} are the sum of three odd primes. (However, on the assumption of the GRH, the full odd Goldbach conjecture is known to be true; this is a result of Deshouillers, Effinger, te Riele, and Zinoviev.)

In this paper, we make a number of refinements to the general scheme, each one of which is individually rather modest and not all that novel, but which when added together turn out to be enough to resolve the five primes problem (though many more ideas would still be needed to tackle the three primes problem, and as is well known the circle method is very unlikely to be the route to make progress on the two primes problem). The first refinement, which is only available in the five primes case, is to take advantage of the numerical verification of the even Goldbach conjecture up to some large {N_0} (we take {N_0=4\times 10^{14}}, using a verification of Richstein, although there are now much larger values of {N_0}as high as {2.6 \times 10^{18}} – for which the conjecture has been verified). As such, instead of trying to represent an odd number {x} as the sum of five primes, we can represent it as the sum of three odd primes and a natural number between {2} and {N_0}. This effectively brings us back to the three primes problem, but with the significant additional boost that one can essentially restrict the frequency variable {\alpha} to be of size {O(1/N_0)}. In practice, this eliminates all of the major arcs except for the principal arc around {0}. This is a significant simplification, in particular avoiding the need to deal with the prime number theorem in arithmetic progressions (and all the attendant theory of L-functions, Siegel zeroes, etc.).

In a similar spirit, by taking advantage of the numerical verification of the Riemann hypothesis up to some height {T_0}, and using the explicit formula relating the von Mangoldt function with the zeroes of the zeta function, one can safely deal with the principal major arc {\{ \alpha = O( T_0 / x ) \}}. For our specific application, we use the value {T_0= 3.29 \times 10^9}, arising from the verification of the Riemann hypothesis of the first {10^{10}} zeroes by van de Lune (unpublished) and Wedeniswki. (Such verifications have since been extended further, the latest being that the first {10^{13}} zeroes lie on the line.)

To make the contribution of the major arc as efficient as possible, we borrow an idea from a paper of Bourgain, and restrict one of the three primes in the three-primes problem to a somewhat shorter range than the other two (of size {O(x/K)} instead of {O(x)}, where we take {K} to be something like {10^3}), as this largely eliminates the “Archimedean” losses coming from trying to use Fourier methods to control convolutions on {{\bf R}}. In our paper, we set the scale parameter {K} to be {10^3} (basically, anything that is much larger than {1} but much less than {T_0} will work), but we found that an additional gain (which we ended up not using) could be obtained by averaging {K} over a range of scales, say between {10^3} and {10^6}. This sort of averaging could be a useful trick in future work on Goldbach-type problems.

It remains to treat the contribution of the “minor arc” {T_0/x \ll |\alpha| \ll 1/N_0}. To do this, one needs good {L^2} and {L^\infty} type estimates on the exponential sum {S(x,\alpha)}. Plancherel’s theorem gives an {L^2} estimate which loses a logarithmic factor, but it turns out that on this particular minor arc one can use tools from the theory of the large sieve (such as Montgomery’s uncertainty principle) to eliminate this logarithmic loss almost completely; it turns out that the most efficient way to do this is use an effective upper bound of Siebert on the number of prime pairs {(p,p+h)} less than {x} to obtain an {L^2} bound that only loses a factor of {8} (or of {7}, once one cuts out the major arc).

For {L^\infty} estimates, it turns out that existing effective versions of (1) (in particular, the bound given by Chen and Wang) are insufficient, due to the three logarithmic factors of {\log x} in the bound. By using a smoothed out version {S_\eta(x,\alpha) :=\sum_{n}\Lambda(n) e(n\alpha) \eta(n/x)} of the sum {S(\alpha,x)}, for some suitable cutoff function {\eta}, one can save one factor of a logarithm, obtaining a bound of the form

\displaystyle  |S_\eta(x,\alpha)| \ll (\frac{x}{\sqrt{q}}+\frac{x}{\sqrt{x/q}} + x^{4/5}) \log^2 x

with effective constants. One can improve the constants further by restricting all summations to odd integers (which barely affects {S_\eta(x,\alpha)}, since {\Lambda} was mostly supported on odd numbers anyway), which in practice reduces the effective constants by a factor of two or so. One can also make further improvements in the constants by using the very sharp large sieve inequality to control the “Type II” sums that arise from Vaughan’s identity, and by using integration by parts to improve the bounds on the “Type I” sums. A final gain can then be extracted by optimising the cutoff parameters {U, V} appearing in Vaughan’s identity to minimise the contribution of the Type II sums (which, in practice, are the dominant term). Combining all these improvements, one ends up with bounds of the shape

\displaystyle  |S_\eta(x,\alpha)| \ll \frac{x}{q} \log^2 x + \frac{x}{\sqrt{q}} \log^2 q

when {q} is small (say {1 < q < x^{1/3}}) and

\displaystyle  |S_\eta(x,\alpha)| \ll \frac{x}{(x/q)^2} \log^2 x + \frac{x}{\sqrt{x/q}} \log^2(x/q)

when {q} is large (say {x^{2/3} < q < x}). (See the paper for more explicit versions of these estimates.) The point here is that the {\log x} factors have been partially replaced by smaller logarithmic factors such as {\log q} or {\log x/q}. Putting together all of these improvements, one can finally obtain a satisfactory bound on the minor arc. (There are still some terms with a {\log x} factor in them, but we use the effective Vinogradov theorem of Liu and Wang to upper bound {\log x} by {3100}, which ends up making the remaining terms involving {\log x} manageable.)

Archives