You are currently browsing the monthly archive for January 2012.

A few days ago, inspired by this recent post of Tim Gowers, a web page entitled “the cost of knowledge” has been set up as a location for mathematicians and other academics to declare a protest against the academic publishing practices of Reed Elsevier, in particular with regard to their exceptionally high journal prices, their policy of “bundling” journals together so that libraries are forced to purchase subscriptions to large numbers of low-quality journals in order to gain access to a handful of high-quality journals, and their opposition to the open access movement (as manifested, for instance, in their lobbying in support of legislation such as the Stop Online Piracy Act (SOPA) and the Research Works Act (RWA)).   [These practices have been documented in a number of places; this wiki page, which was set up in response to Tim’s post, collects several relevant links for this purpose.  Some of the other commercial publishers have  exhibited similar behaviour, though usually not to the extent that Elsevier has, which is why this particular publisher is the focus of this protest.]  At the protest site, one can publicly declare a refusal to either publish at an Elsevier journal, referee for an Elsevier journal, or join the board of an Elsevier journal.

(In the past, the editorial boards of several Elsevier journals have resigned over the pricing policies of the journal, most famously the board of Topology in 2006, but also the Journal of Algorithms in 2003, and a number of journals in other sciences as well.  Several libraries, such as those of Harvard and Cornell, have also managed to negotiate an unbundling of Elsevier journals, but most libraries are still unable to subscribe to such journals individually.)

For a more thorough discussion as to why such a protest is warranted, please see Tim’s post on the matter (and the 100+ comments to that post).   Many of the issues regarding Elsevier were already known to some extent to many mathematicians (particularly those who have served on departmental library committees), several of whom had already privately made the decision to boycott Elsevier; but nevertheless it is important to bring these issues out into the open, to make them commonly known as opposed to merely mutually known.  (Amusingly, this distinction is also of crucial importance in my favorite logic puzzle, but that’s another story.)   One can also see Elsevier’s side of the story in this response to Tim’s post by David Clark (the Senior Vice President for Physical Sciences at Elsevier).

For my own part, though I have sent about 9% of my papers in the past to Elsevier journals (with one or two still in press), I have now elected not to submit any further papers to these journals, nor to serve on their editorial boards, though I will continue refereeing some papers from these journals.  As of this time of writing, over five hundred mathematicians and other academics have also signed on to the protest in the four days that the site has been active.

Admittedly, I am fortunate enough to be at a stage of career in which I am not pressured to publish in a very specific set of journals, and as such, I am not making a recommendation as to what anyone else should do or not do regarding this protest.  However, I do feel that it is worth spreading awareness, at least, of the fact that such protests exist (and some additional petitions on related issues can be found at the previously mentioned wiki page).

Van Vu and I have just uploaded to the arXiv our paper Random matrices: Sharp concentration of eigenvalues, submitted to the Electronic Journal of Probability. As with many of our previous papers, this paper is concerned with the distribution of the eigenvalues ${\lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)}$ of a random Wigner matrix ${M_n}$ (such as a matrix drawn from the Gaussian Unitary Ensemble (GUE) or Gaussian Orthogonal Ensemble (GOE)). To simplify the discussion we shall mostly restrict attention to the bulk of the spectrum, i.e. to eigenvalues ${\lambda_i(M_n)}$ with ${\delta n \leq i \leq (1-\delta) i n}$ for some fixed ${\delta>0}$, although analogues of most of the results below have also been obtained at the edge of the spectrum.

If we normalise the entries of the matrix ${M_n}$ to have mean zero and variance ${1/n}$, then in the asymptotic limit ${n \rightarrow \infty}$, we have the Wigner semicircle law, which asserts that the eigenvalues are asymptotically distributed according to the semicircular distribution ${\rho_{sc}(x)\ dx}$, where

$\displaystyle \rho_{sc}(x) := \frac{1}{2\pi} (4-x^2)_+^{1/2}.$

An essentially equivalent way of saying this is that for large ${n}$, we expect the ${i^{th}}$ eigenvalue ${\lambda_i(M_n)}$ of ${M_n}$ to stay close to the classical location ${\gamma_i \in [-2,2]}$, defined by the formula

$\displaystyle \int_{-2}^{\gamma_i} \rho_{sc}(x)\ dx = \frac{i}{n}.$

In particular, from the Wigner semicircle law it can be shown that asymptotically almost surely, one has

$\displaystyle \lambda_i(M_n) = \gamma_i + o(1) \ \ \ \ \ (1)$

for all ${1 \leq i \leq n}$.

In the modern study of the spectrum of Wigner matrices (and in particular as a key tool in establishing universality results), it has become of interest to improve the error term in (1) as much as possible. A typical early result in this direction was by Bai, who used the Stieltjes transform method to obtain polynomial convergence rates of the shape ${O(n^{-c})}$ for some absolute constant ${c>0}$; see also the subsequent papers of Alon-Krivelevich-Vu and of of Meckes, who were able to obtain such convergence rates (with exponentially high probability) by using concentration of measure tools, such as Talagrand’s inequality. On the other hand, in the case of the GUE ensemble it is known (by this paper of Gustavsson) that ${\lambda_i(M_n)}$ has variance comparable to ${\frac{\log n}{n^2}}$ in the bulk, so that the optimal error term in (1) should be about ${O(\log^{1/2} n/n)}$. (One may think that if one wanted bounds on (1) that were uniform in ${i}$, one would need to enlarge the error term further, but this does not appear to be the case, due to strong correlations between the ${\lambda_i}$; note for instance this recent result of Ben Arous and Bourgarde that the largest gap between eigenvalues in the bulk is typically of order ${O(\log^{1/2} n/n)}$.)

A significant advance in this direction was achieved by Erdos, Schlein, and Yau in a series of papers where they used a combination of Stieltjes transform and concentration of measure methods to obtain local semicircle laws which showed, among other things, that one had asymptotics of the form

$\displaystyle N(I) = (1+o(1)) \int_I \rho_{sc}(x)\ dx$

with exponentially high probability for intervals ${I}$ in the bulk that were as short as ${n^{-1+\epsilon}}$ for some ${\epsilon>0}$, where ${N(I)}$ is the number of eigenvalues. These asymptotics are consistent with a good error term in (1), and are already sufficient for many applications, but do not quite imply a strong concentration result for individual eigenvalues ${\lambda_i}$ (basically because they do not preclude long-range or “secular” shifts in the spectrum that involve large blocks of eigenvalues at mesoscopic scales). Nevertheless, this was rectified in a subsequent paper of Erdos, Yau, and Yin, which roughly speaking obtained a bound of the form

$\displaystyle \lambda_i(M_n) = \gamma_i + O( \frac{\log^{O(\log\log n)} n}{n} )$

in the bulk with exponentially high probability, for Wigner matrices obeying some exponential decay conditions on the entries. This was achieved by a rather delicate high moment calculation, in which the contribution of the diagonal entries of the resolvent (whose average forms the Stieltjes transform) was shown to mostly cancel each other out.

As the GUE computations show, this concentration result is sharp up to the quasilogarithmic factor ${\log^{O(\log\log n)} n}$. The main result of this paper is to improve the concentration result to one more in line with the GUE case, namely

$\displaystyle \lambda_i(M_n) = \gamma_i + O( \frac{\log^{O(1)} n}{n} )$

with exponentially high probability (see the paper for a more precise statement of results). The one catch is that an additional hypothesis is required, namely that the entries of the Wigner matrix have vanishing third moment. We also obtain similar results for the edge of the spectrum (but with a different scaling).

Our arguments are rather different from those of Erdos, Yau, and Yin, and thus provide an alternate approach to establishing eigenvalue concentration. The main tool is the Lindeberg exchange strategy, which is also used to prove the Four Moment Theorem (although we do not directly invoke the Four Moment Theorem in our analysis). The main novelty is that this exchange strategy is now used to establish large deviation estimates (i.e. exponentially small tail probabilities) rather than universality of the limiting distribution. Roughly speaking, the basic point is as follows. The Lindeberg exchange strategy seeks to compare a function ${F(X_1,\ldots,X_n)}$ of many independent random variables ${X_1,\ldots,X_n}$ with the same function ${F(Y_1,\ldots,Y_n)}$ of a different set of random variables (which match moments with the original set of variables to some order, such as to second or fourth order) by exchanging the random variables one at a time. Typically, one tries to upper bound expressions such as

$\displaystyle {\bf E} \phi(F(X_1,\ldots,X_n)) - \phi(F(X_1,\ldots,X_{n-1},Y_n))$

for various smooth test functions ${\phi}$, by performing a Taylor expansion in the variable being swapped and taking advantage of the matching moment hypotheses. In previous implementations of this strategy, ${\phi}$ was a bounded test function, which allowed one to get control of the bulk of the distribution of ${F(X_1,\ldots,X_n)}$, and in particular in controlling probabilities such as

$\displaystyle {\bf P}( a \leq F(X_1,\ldots,X_n) \leq b )$

for various thresholds ${a}$ and ${b}$, but did not give good control on the tail as the error terms tended to be polynomially decaying in ${n}$ rather than exponentially decaying. However, it turns out that one can modify the exchange strategy to deal with moments such as

$\displaystyle {\bf E} (1 + F(X_1,\ldots,X_n)^2)^k$

for various moderately large ${k}$ (e.g. of size comparable to ${\log n}$), obtaining results such as

$\displaystyle {\bf E} (1 + F(Y_1,\ldots,Y_n)^2)^k = (1+o(1)) {\bf E} (1 + F(X_1,\ldots,X_n)^2)^k$

after performing all the relevant exchanges. As such, one can then use large deviation estimates on ${F(X_1,\ldots,X_n)}$ to deduce large deviation estimates on ${F(Y_1,\ldots,Y_n)}$.

In this paper we also take advantage of a simplification, first noted by Erdos, Yau, and Yin, that Four Moment Theorems become somewhat easier to prove if one works with resolvents ${(M_n-z)^{-1}}$ (and the closely related Stieltjes transform ${s(z) := \frac{1}{n} \hbox{tr}( (M_n-z)^{-1} )}$) rather than with individual eigenvalues, as the Taylor expansion of resolvents are very simple (essentially being a Neumann series). The relationship between the Stieltjes transform and the location of individual eigenvalues can be seen by taking advantage of the identity

$\displaystyle \frac{\pi}{2} - \frac{\pi}{n} N((-\infty,E)) = \int_0^\infty \hbox{Re} s(E + i \eta)\ d\eta$

for any energy level ${E \in {\bf R}}$, which can be verified from elementary calculus. (In practice, we would truncate ${\eta}$ near zero and near infinity to avoid some divergences, but this is a minor technicality.) As such, a concentration result for the Stieltjes transform can be used to establish an analogous concentration result for the eigenvalue counting functions ${N((-\infty,E))}$, which in turn can be used to deduce concentration results for individual eigenvalues ${\lambda_i(M_n)}$ by some basic combinatorial manipulations.

We have now seen two ways to construct expander Cayley graphs ${Cay(G,S)}$. The first, discussed in Notes 2, is to use Cayley graphs that are projections of an infinite Cayley graph on a group with Kazhdan’s property (T). The second, discussed in Notes 3, is to combine a quasirandomness property of the group ${G}$ with a flattening hypothesis for the random walk.

We now pursue the second approach more thoroughly. The main difficulty here is to figure out how to ensure flattening of the random walk, as it is then an easy matter to use quasirandomness to show that the random walk becomes mixing soon after it becomes flat. In the case of Selberg’s theorem, we achieved this through an explicit formula for the heat kernel on the hyperbolic plane (which is a proxy for the random walk). However, in most situations such an explicit formula is not available, and one must develop some other tool for forcing flattening, and specifically an estimate of the form

$\displaystyle \| \mu^{(n)} \|_{\ell^2(G)} \ll |G|^{-1/2+\epsilon} \ \ \ \ \ (1)$

for some ${n = O(\log |G|)}$, where ${\mu}$ is the uniform probability measure on the generating set ${S}$.

In 2006, Bourgain and Gamburd introduced a general method for achieving this goal. The intuition here is that the main obstruction that prevents a random walk from spreading out to become flat over the entire group ${G}$ is if the random walk gets trapped in some proper subgroup ${H}$ of ${G}$ (or perhaps in some coset ${xH}$ of such a subgroup), so that ${\mu^{(n)}(xH)}$ remains large for some moderately large ${n}$. Note that

$\displaystyle \mu^{(2n)}(H) \geq \mu^{(n)}(H x^{-1}) \mu^{(n)}(xH) = \mu^{(n)}(xH)^2,$

since ${\mu^{(2n)} = \mu^{(n)} * \mu^{(n)}}$, ${H = (H x^{-1}) \cdot (xH)}$, and ${\mu^{(n)}}$ is symmetric. By iterating this observation, we seethat if ${\mu^{(n)}(xH)}$ is too large (e.g. of size ${|G|^{-o(1)}}$ for some ${n}$ comparable to ${\log |G|}$), then it is not possible for the random walk ${\mu^{(n)}}$ to converge to the uniform distribution in time ${O(\log |G|)}$, and so expansion does not occur.

A potentially more general obstruction of this type would be if the random walk gets trapped in (a coset of) an approximate group ${H}$. Recall that a ${K}$-approximate group is a subset ${H}$ of a group ${G}$ which is symmetric, contains the identity, and is such that ${H \cdot H}$ can be covered by at most ${K}$ left-translates (or equivalently, right-translates) of ${H}$. Such approximate groups were studied extensively in last quarter’s course. A similar argument to the one given previously shows (roughly speaking) that expansion cannot occur if ${\mu^{(n)}(xH)}$ is too large for some coset ${xH}$ of an approximate group.

It turns out that this latter observation has a converse: if a measure does not concentrate in cosets of approximate groups, then some flattening occurs. More precisely, one has the following combinatorial lemma:

Lemma 1 (Weighted Balog-Szemerédi-Gowers lemma) Let ${G}$ be a group, let ${\nu}$ be a finitely supported probability measure on ${G}$ which is symmetric (thus ${\nu(g)=\nu(g^{-1})}$ for all ${g \in G}$), and let ${K \geq 1}$. Then one of the following statements hold:

• (i) (Flattening) One has ${\| \nu * \nu \|_{\ell^2(G)} \leq \frac{1}{K} \|\nu\|_{\ell^2(G)}}$.
• (ii) (Concentration in an approximate group) There exists an ${O(K^{O(1)})}$-approximate group ${H}$ in ${G}$ with ${|H| \ll K^{O(1)} / \| \nu \|_{\ell^2(G)}^2}$ and an element ${x \in G}$ such that ${\nu(xH) \gg K^{-O(1)}}$.

This lemma is a variant of the more well-known Balog-Szemerédi-Gowers lemma in additive combinatorics due to Gowers (which roughly speaking corresponds to the case when ${\mu}$ is the uniform distribution on some set ${A}$), which in turn is a polynomially quantitative version of an earlier lemma of Balog and Szemerédi. We will prove it below the fold.

The lemma is particularly useful when the group ${G}$ in question enjoys a product theorem, which roughly speaking says that the only medium-sized approximate subgroups of ${G}$ are trapped inside genuine proper subgroups of ${G}$ (or, contrapositively, medium-sized sets that generate the entire group ${G}$ cannot be approximate groups). The fact that some finite groups (and specifically, the bounded rank finite simple groups of Lie type) enjoy product theorems is a non-trivial fact, and will be discussed in later notes. For now, we simply observe that the presence of the product theorem, together with quasirandomness and a non-concentration hypothesis, can be used to demonstrate expansion:

Theorem 2 (Bourgain-Gamburd expansion machine) Suppose that ${G}$ is a finite group, that ${S \subseteq G}$ is a symmetric set of ${k}$ generators, and that there are constants ${0 < \kappa < 1 < \Lambda}$ with the following properties.

1. (Quasirandomness). The smallest dimension of a nontrivial representation ${\rho: G \rightarrow GL_d({\bf C})}$ of ${G}$ is at least ${|G|^{\kappa}}$;
2. (Product theorem). For all ${\delta > 0}$ there is some ${\delta' = \delta'(\delta) > 0}$ such that the following is true. If ${H \subseteq G}$ is a ${|G|^{\delta'}}$-approximate subgroup with ${|G|^{\delta} \leq |H| \leq |G|^{1 - \delta}}$ then ${H}$ generates a proper subgroup of ${G}$;
3. (Non-concentration estimate). There is some even number ${n \leq \Lambda\log |G|}$ such that

$\displaystyle \sup_{H < G}\mu^{(n)}(H) < |G|^{-\kappa},$

where the supremum is over all proper subgroups ${H < G}$.

Then ${Cay(G,S)}$ is a two-sided ${\epsilon}$-expander for some ${\epsilon > 0}$ depending only on ${k,\kappa, \Lambda}$, and the function ${\delta'(\cdot )}$ (and this constant ${\epsilon}$ is in principle computable in terms of these constants).

This criterion for expansion is implicitly contained in this paper of Bourgain and Gamburd, who used it to establish the expansion of various Cayley graphs in ${SL_2(F_p)}$ for prime ${p}$. This criterion has since been applied (or modified) to obtain expansion results in many other groups, as will be discussed in later notes.

Let ${\alpha \in {\bf R}/{\bf Z}}$ be an element of the unit circle, let ${N \geq 1}$, and let ${\rho > 0}$. We define the (rank one) Bohr set ${B_N(\alpha;\rho)}$ to be the set

$\displaystyle B_N(\alpha;\rho) := \{ n \in {\bf Z}: -N \leq n \leq N; \|n\alpha\|_{{\bf R}/{\bf Z}} \leq \rho \}$

where ${\|x\|_{{\bf R}/{\bf Z}}}$ is the distance to the origin in the unit circle (or equivalently, the distance to the nearest integer, after lifting up to ${{\bf R}}$). These sets play an important role in additive combinatorics and in additive number theory. For instance, they arise naturally when applying the circle method, because Bohr sets describe the oscillation of exponential phases such as ${n \mapsto e^{2\pi i n \alpha}}$.

Observe that Bohr sets enjoy the doubling property

$\displaystyle B_N(\alpha;\rho) + B_N(\alpha;\rho) \subset B_{2N}(\alpha;2\rho),$

thus doubling the Bohr set doubles both the length parameter ${N}$ and the radius parameter ${\rho}$. As such, these Bohr sets resemble two-dimensional balls (or boxes). Indeed, one can view ${B_N(\alpha;\rho)}$ as the preimage of the two-dimensional box ${[-1,1] \times [-\rho,\rho] \subset {\bf R} \times {\bf R}/{\bf Z}}$ under the homomorphism ${n \mapsto (n/N, \alpha n \hbox{ mod } 1)}$.

Another class of finite set with two-dimensional behaviour is the class of (rank two) generalised arithmetic progressions

$\displaystyle P( a_1,a_2; N_1,N_2 ) := \{ n_1 a_1 + n_2 a_2: n_1,n_2 \in {\bf Z}; |n_1| \leq N_1, |n_2| \leq N_2 \}$

with ${a_1,a_2 \in {\bf Z}}$ and ${N_1,N_2 > 0}$ Indeed, we have

$\displaystyle P( a_1,a_2; N_1,N_2 ) + P( a_1,a_2; N_1,N_2 ) \subset P( a_1,a_2; 2N_1, 2N_2 )$

and so we see, as with the Bohr set, that doubling the generalised arithmetic progressions doubles the two defining parameters of that progression.

More generally, there is an analogy between rank ${r}$ Bohr sets

$\displaystyle B_N(\alpha_1,\ldots,\alpha_r; \rho_1,\ldots,\rho_r) := \{ n \in {\bf Z}: -N \leq n \leq N; \|n\alpha_i\|_{{\bf R}/{\bf Z}} \leq \rho_i$

$\displaystyle \hbox{ for all } 1 \leq i \leq r \}$

and the rank ${r+1}$ generalised arithmetic progressions

$\displaystyle P( a_1,\ldots,a_{r+1}; N_1,\ldots,N_{r+1} ) := \{ n_1 a_1 + \ldots + n_{r+1} a_{r+1}:$

$\displaystyle n_1,\ldots,n_{r+1} \in {\bf Z}; |n_i| \leq N_i \hbox{ for all } 1 \leq i \leq r+1 \}.$

One of the aims of additive combinatorics is to formalise analogies such as the one given above. By using some arguments from the geometry of numbers, for instance, one can show that for any rank ${r}$ Bohr set ${B_N(\alpha_1,\ldots,\alpha_r;\rho_1,\ldots,\rho_r)}$, there is a rank ${r+1}$ generalised arithmetic progression ${P(a_1,\ldots,a_{r+1}; N_1,\ldots,N_{r+1})}$ for which one has the containments

$\displaystyle B_N(\alpha_1,\ldots,\alpha_r;\epsilon \rho_1,\ldots,\epsilon \rho_r) \subset P(a_1,\ldots,a_{r+1}; N_1,\ldots,N_{r+1})$

$\displaystyle \subset B_N(\alpha_1,\ldots,\alpha_r;\rho_1,\ldots,\rho_r)$

for some explicit ${\epsilon>0}$ depending only on ${r}$ (in fact one can take ${\epsilon = (r+1)^{-2(r+1)}}$); this is (a slight modification of) Lemma 4.22 of my book with Van Vu.

In the special case when ${r=1}$, one can make a significantly more detailed description of the link between rank one Bohr sets and rank two generalised arithmetic progressions, by using the classical theory of continued fractions, which among other things gives a fairly precise formula for the generators ${a_1,a_2}$ and lengths ${N_1,N_2}$ of the generalised arithmetic progression associated to a rank one Bohr set ${B_N(\alpha;\rho)}$. While this connection is already implicit in the continued fraction literature (for instance, in the classic text of Hardy and Wright), I thought it would be a good exercise to work it out explicitly and write it up, which I will do below the fold.

It is unfortunate that the theory of continued fractions is restricted to the rank one setting (it relies very heavily on the total ordering of one-dimensional sets such as ${{\bf Z}}$ or ${{\bf R}}$). A higher rank version of the theory could potentially help with questions such as the Littlewood conjecture, which remains open despite a substantial amount of effort and partial progress on the problem. At the end of this post I discuss how one can use the rank one theory to rephrase the Littlewood conjecture as a conjecture about a doubly indexed family of rank four progressions, which can be used to heuristically justify why this conjecture should be true, but does not otherwise seem to shed much light on the problem.