A few days ago, inspired by this recent post of Tim Gowers, a web page entitled “the cost of knowledge” has been set up as a location for mathematicians and other academics to declare a protest against the academic publishing practices of Reed Elsevier, in particular with regard to their exceptionally high journal prices, their policy of “bundling” journals together so that libraries are forced to purchase subscriptions to large numbers of low-quality journals in order to gain access to a handful of high-quality journals, and their opposition to the open access movement (as manifested, for instance, in their lobbying in support of legislation such as the Stop Online Piracy Act (SOPA) and the Research Works Act (RWA)).   [These practices have been documented in a number of places; this wiki page, which was set up in response to Tim's post, collects several relevant links for this purpose.  Some of the other commercial publishers have  exhibited similar behaviour, though usually not to the extent that Elsevier has, which is why this particular publisher is the focus of this protest.]  At the protest site, one can publicly declare a refusal to either publish at an Elsevier journal, referee for an Elsevier journal, or join the board of an Elsevier journal.

(In the past, the editorial boards of several Elsevier journals have resigned over the pricing policies of the journal, most famously the board of Topology in 2006, but also the Journal of Algorithms in 2003, and a number of journals in other sciences as well.  Several libraries, such as those of Harvard and Cornell, have also managed to negotiate an unbundling of Elsevier journals, but most libraries are still unable to subscribe to such journals individually.)

For a more thorough discussion as to why such a protest is warranted, please see Tim’s post on the matter (and the 100+ comments to that post).   Many of the issues regarding Elsevier were already known to some extent to many mathematicians (particularly those who have served on departmental library committees), several of whom had already privately made the decision to boycott Elsevier; but nevertheless it is important to bring these issues out into the open, to make them commonly known as opposed to merely mutually known.  (Amusingly, this distinction is also of crucial importance in my favorite logic puzzle, but that’s another story.)   One can also see Elsevier’s side of the story in this response to Tim’s post by David Clark (the Senior Vice President for Physical Sciences at Elsevier).

For my own part, though I have sent about 9% of my papers in the past to Elsevier journals (with one or two still in press), I have now elected not to submit any further papers to these journals, nor to serve on their editorial boards, though I will continue refereeing some papers from these journals.  As of this time of writing, over five hundred mathematicians and other academics have also signed on to the protest in the four days that the site has been active.

Admittedly, I am fortunate enough to be at a stage of career in which I am not pressured to publish in a very specific set of journals, and as such, I am not making a recommendation as to what anyone else should do or not do regarding this protest.  However, I do feel that it is worth spreading awareness, at least, of the fact that such protests exist (and some additional petitions on related issues can be found at the previously mentioned wiki page).

Van Vu and I have just uploaded to the arXiv our paper Random matrices: Sharp concentration of eigenvalues, submitted to the Electronic Journal of Probability. As with many of our previous papers, this paper is concerned with the distribution of the eigenvalues {\lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)} of a random Wigner matrix {M_n} (such as a matrix drawn from the Gaussian Unitary Ensemble (GUE) or Gaussian Orthogonal Ensemble (GOE)). To simplify the discussion we shall mostly restrict attention to the bulk of the spectrum, i.e. to eigenvalues {\lambda_i(M_n)} with {\delta n \leq i \leq (1-\delta) i n} for some fixed {\delta>0}, although analogues of most of the results below have also been obtained at the edge of the spectrum.

If we normalise the entries of the matrix {M_n} to have mean zero and variance {1/n}, then in the asymptotic limit {n \rightarrow \infty}, we have the Wigner semicircle law, which asserts that the eigenvalues are asymptotically distributed according to the semicircular distribution {\rho_{sc}(x)\ dx}, where

\displaystyle  \rho_{sc}(x) := \frac{1}{2\pi} (4-x^2)_+^{1/2}.

An essentially equivalent way of saying this is that for large {n}, we expect the {i^{th}} eigenvalue {\lambda_i(M_n)} of {M_n} to stay close to the classical location {\gamma_i \in [-2,2]}, defined by the formula

\displaystyle  \int_{-2}^{\gamma_i} \rho_{sc}(x)\ dx = \frac{i}{n}.

In particular, from the Wigner semicircle law it can be shown that asymptotically almost surely, one has

\displaystyle  \lambda_i(M_n) = \gamma_i + o(1) \ \ \ \ \ (1)

for all {1 \leq i \leq n}.

In the modern study of the spectrum of Wigner matrices (and in particular as a key tool in establishing universality results), it has become of interest to improve the error term in (1) as much as possible. A typical early result in this direction was by Bai, who used the Stieltjes transform method to obtain polynomial convergence rates of the shape {O(n^{-c})} for some absolute constant {c>0}; see also the subsequent papers of Alon-Krivelevich-Vu and of of Meckes, who were able to obtain such convergence rates (with exponentially high probability) by using concentration of measure tools, such as Talagrand’s inequality. On the other hand, in the case of the GUE ensemble it is known (by this paper of Gustavsson) that {\lambda_i(M_n)} has variance comparable to {\frac{\log n}{n^2}} in the bulk, so that the optimal error term in (1) should be about {O(\log^{1/2} n/n)}. (One may think that if one wanted bounds on (1) that were uniform in {i}, one would need to enlarge the error term further, but this does not appear to be the case, due to strong correlations between the {\lambda_i}; note for instance this recent result of Ben Arous and Bourgarde that the largest gap between eigenvalues in the bulk is typically of order {O(\log^{1/2} n/n)}.)

A significant advance in this direction was achieved by Erdos, Schlein, and Yau in a series of http://www.ams.org/mathscinet-getitem?mr=2587574{papers} where they used a combination of Stieltjes transform and concentration of measure methods to obtain local semicircle laws which showed, among other things, that one had asymptotics of the form

\displaystyle  N(I) = (1+o(1)) \int_I \rho_{sc}(x)\ dx

with exponentially high probability for intervals {I} in the bulk that were as short as {n^{-1+\epsilon}} for some {\epsilon>0}, where {N(I)} is the number of eigenvalues. These asymptotics are consistent with a good error term in (1), and are already sufficient for many applications, but do not quite imply a strong concentration result for individual eigenvalues {\lambda_i} (basically because they do not preclude long-range or “secular” shifts in the spectrum that involve large blocks of eigenvalues at mesoscopic scales). Nevertheless, this was rectified in a subsequent paper of Erdos, Yau, and Yin, which roughly speaking obtained a bound of the form

\displaystyle  \lambda_i(M_n) = \gamma_i + O( \frac{\log^{O(\log\log n)} n}{n} )

in the bulk with exponentially high probability, for Wigner matrices obeying some exponential decay conditions on the entries. This was achieved by a rather delicate high moment calculation, in which the contribution of the diagonal entries of the resolvent (whose average forms the Stieltjes transform) was shown to mostly cancel each other out.

As the GUE computations show, this concentration result is sharp up to the quasilogarithmic factor {\log^{O(\log\log n)} n}. The main result of this paper is to improve the concentration result to one more in line with the GUE case, namely

\displaystyle  \lambda_i(M_n) = \gamma_i + O( \frac{\log^{O(1)} n}{n} )

with exponentially high probability (see the paper for a more precise statement of results). The one catch is that an additional hypothesis is required, namely that the entries of the Wigner matrix have vanishing third moment. We also obtain similar results for the edge of the spectrum (but with a different scaling).

Our arguments are rather different from those of Erdos, Yau, and Yin, and thus provide an alternate approach to establishing eigenvalue concentration. The main tool is the Lindeberg exchange strategy, which is also used to prove the Four Moment Theorem (although we do not directly invoke the Four Moment Theorem in our analysis). The main novelty is that this exchange strategy is now used to establish large deviation estimates (i.e. exponentially small tail probabilities) rather than universality of the limiting distribution. Roughly speaking, the basic point is as follows. The Lindeberg exchange strategy seeks to compare a function {F(X_1,\ldots,X_n)} of many independent random variables {X_1,\ldots,X_n} with the same function {F(Y_1,\ldots,Y_n)} of a different set of random variables (which match moments with the original set of variables to some order, such as to second or fourth order) by exchanging the random variables one at a time. Typically, one tries to upper bound expressions such as

\displaystyle  {\bf E} \phi(F(X_1,\ldots,X_n)) - \phi(F(X_1,\ldots,X_{n-1},Y_n))

for various smooth test functions {\phi}, by performing a Taylor expansion in the variable being swapped and taking advantage of the matching moment hypotheses. In previous implementations of this strategy, {\phi} was a bounded test function, which allowed one to get control of the bulk of the distribution of {F(X_1,\ldots,X_n)}, and in particular in controlling probabilities such as

\displaystyle  {\bf P}( a \leq F(X_1,\ldots,X_n) \leq b )

for various thresholds {a} and {b}, but did not give good control on the tail as the error terms tended to be polynomially decaying in {n} rather than exponentially decaying. However, it turns out that one can modify the exchange strategy to deal with moments such as

\displaystyle  {\bf E} (1 + F(X_1,\ldots,X_n)^2)^k

for various moderately large {k} (e.g. of size comparable to {\log n}), obtaining results such as

\displaystyle  {\bf E} (1 + F(Y_1,\ldots,Y_n)^2)^k = (1+o(1)) {\bf E} (1 + F(X_1,\ldots,X_n)^2)^k

after performing all the relevant exchanges. As such, one can then use large deviation estimates on {F(X_1,\ldots,X_n)} to deduce large deviation estimates on {F(Y_1,\ldots,Y_n)}.

In this paper we also take advantage of a simplification, first noted by Erdos, Yau, and Yin, that Four Moment Theorems become somewhat easier to prove if one works with resolvents {(M_n-z)^{-1}} (and the closely related Stieltjes transform {s(z) := \frac{1}{n} \hbox{tr}( (M_n-z)^{-1} )}) rather than with individual eigenvalues, as the Taylor expansion of resolvents are very simple (essentially being a Neumann series). The relationship between the Stieltjes transform and the location of individual eigenvalues can be seen by taking advantage of the identity

\displaystyle  \frac{\pi}{2} - \frac{\pi}{n} N((-\infty,E)) = \int_0^\infty \hbox{Re} s(E + i \eta)\ d\eta

for any energy level {E \in {\bf R}}, which can be verified from elementary calculus. (In practice, we would truncate {\eta} near zero and near infinity to avoid some divergences, but this is a minor technicality.) As such, a concentration result for the Stieltjes transform can be used to establish an analogous concentration result for the eigenvalue counting functions {N((-\infty,E))}, which in turn can be used to deduce concentration results for individual eigenvalues {\lambda_i(M_n)} by some basic combinatorial manipulations.

We have now seen two ways to construct expander Cayley graphs {Cay(G,S)}. The first, discussed in Notes 2, is to use Cayley graphs that are projections of an infinite Cayley graph on a group with Kazhdan’s property (T). The second, discussed in Notes 3, is to combine a quasirandomness property of the group {G} with a flattening hypothesis for the random walk.

We now pursue the second approach more thoroughly. The main difficulty here is to figure out how to ensure flattening of the random walk, as it is then an easy matter to use quasirandomness to show that the random walk becomes mixing soon after it becomes flat. In the case of Selberg’s theorem, we achieved this through an explicit formula for the heat kernel on the hyperbolic plane (which is a proxy for the random walk). However, in most situations such an explicit formula is not available, and one must develop some other tool for forcing flattening, and specifically an estimate of the form

\displaystyle  \| \mu^{(n)} \|_{\ell^2(G)} \ll |G|^{-1/2+\epsilon} \ \ \ \ \ (1)

for some {n = O(\log |G|)}, where {\mu} is the uniform probability measure on the generating set {S}.

In 2006, Bourgain and Gamburd introduced a general method for achieving this goal. The intuition here is that the main obstruction that prevents a random walk from spreading out to become flat over the entire group {G} is if the random walk gets trapped in some proper subgroup {H} of {G} (or perhaps in some coset {xH} of such a subgroup), so that {\mu^{(n)}(xH)} remains large for some moderately large {n}. Note that

\displaystyle  \mu^{(2n)}(H) \geq \mu^{(n)}(H x^{-1}) \mu^{(n)}(xH) = \mu^{(n)}(xH)^2,

since {\mu^{(2n)} = \mu^{(n)} * \mu^{(n)}}, {H = (H x^{-1}) \cdot (xH)}, and {\mu^{(n)}} is symmetric. By iterating this observation, we seethat if {\mu^{(n)}(xH)} is too large (e.g. of size {|G|^{-o(1)}} for some {n} comparable to {\log |G|}), then it is not possible for the random walk {\mu^{(n)}} to converge to the uniform distribution in time {O(\log |G|)}, and so expansion does not occur.

A potentially more general obstruction of this type would be if the random walk gets trapped in (a coset of) an approximate group {H}. Recall that a {K}-approximate group is a subset {H} of a group {G} which is symmetric, contains the identity, and is such that {H \cdot H} can be covered by at most {K} left-translates (or equivalently, right-translates) of {H}. Such approximate groups were studied extensively in last quarter’s course. A similar argument to the one given previously shows (roughly speaking) that expansion cannot occur if {\mu^{(n)}(xH)} is too large for some coset {xH} of an approximate group.

It turns out that this latter observation has a converse: if a measure does not concentrate in cosets of approximate groups, then some flattening occurs. More precisely, one has the following combinatorial lemma:

Lemma 1 (Weighted Balog-Szemerédi-Gowers lemma) Let {G} be a group, let {\mu} be a finitely supported probability measure on {G} which is symmetric (thus {\nu(g)=\nu(g^{-1})} for all {g \in G}), and let {K \geq 1}. Then one of the following statements hold:

  • (i) (Flattening) One has {\| \nu * \nu \|_{\ell^2(G)} \leq \frac{1}{K} \|\nu\|_{\ell^2(G)}}.
  • (ii) (Concentration in an approximate group) There exists an {O(K^{O(1)})}-approximate group {H} in {G} with {|H| \ll K^{O(1)} / \| \nu \|_{\ell^2(G)}^2} and an element {x \in G} such that {\nu(xH) \gg K^{-O(1)}}.

This lemma is a variant of the more well-known Balog-Szemerédi-Gowers lemma in additive combinatorics due to Gowers (which roughly speaking corresponds to the case when {\mu} is the uniform distribution on some set {A}), which in turn is a polynomially quantitative version of an earlier lemma of Balog and Szemerédi. We will prove it below the fold.

The lemma is particularly useful when the group {G} in question enjoys a product theorem, which roughly speaking says that the only medium-sized approximate subgroups of {G} are trapped inside genuine proper subgroups of {G} (or, contrapositively, medium-sized sets that generate the entire group {G} cannot be approximate groups). The fact that some finite groups (and specifically, the bounded rank finite simple groups of Lie type) enjoy product theorems is a non-trivial fact, and will be discussed in later notes. For now, we simply observe that the presence of the product theorem, together with quasirandomness and a non-concentration hypothesis, can be used to demonstrate expansion:

Theorem 2 (Bourgain-Gamburd expansion machine) Suppose that {G} is a finite group, that {S \subseteq G} is a symmetric set of {k} generators, and that there are constants {0 < \kappa < 1 < \Lambda} with the following properties.

  1. (Quasirandomness). The smallest dimension of a nontrivial representation {\rho: G \rightarrow GL_d({\bf C})} of {G} is at least {|G|^{\kappa}};
  2. (Product theorem). For all {\delta > 0} there is some {\delta' = \delta'(\delta) > 0} such that the following is true. If {H \subseteq G} is a {|G|^{\delta'}}-approximate subgroup with {|G|^{\delta} \leq |H| \leq |G|^{1 - \delta}} then {H} generates a proper subgroup of {G};
  3. (Non-concentration estimate). There is some even number {n \leq \Lambda\log |G|} such that

    \displaystyle  \sup_{H < G, x \in G}\mu^{(n)}(H) < |G|^{-\kappa},

    where the supremum is over all proper subgroups {H < G}.

Then {Cay(G,S)} is a two-sided {\epsilon}-expander for some {\epsilon > 0} depending only on {k,\kappa, \Lambda}, and the function {\delta'(\cdot )} (and this constant {\epsilon} is in principle computable in terms of these constants).

This criterion for expansion is implicitly contained in this paper of Bourgain and Gamburd, who used it to establish the expansion of various Cayley graphs in {SL_2(F_p)} for prime {p}. This criterion has since been applied (or modified) to obtain expansion results in many other groups, as will be discussed in later notes.

Read the rest of this entry »

Let {\alpha \in {\bf R}/{\bf Z}} be an element of the unit circle, let {N \geq 1}, and let {\rho > 0}. We define the (rank one) Bohr set {B_N(\alpha;\rho)} to be the set

\displaystyle  B_N(\alpha;\rho) := \{ n \in {\bf Z}: -N \leq n \leq N; \|n\alpha\|_{{\bf R}/{\bf Z}} \leq \rho \}

where {\|x\|_{{\bf R}/{\bf Z}}} is the distance to the origin in the unit circle (or equivalently, the distance to the nearest integer, after lifting up to {{\bf R}}). These sets play an important role in additive combinatorics and in additive number theory. For instance, they arise naturally when applying the circle method, because Bohr sets describe the oscillation of exponential phases such as {n \mapsto e^{2\pi i n \alpha}}.

Observe that Bohr sets enjoy the doubling property

\displaystyle  B_N(\alpha;\rho) + B_N(\alpha;\rho) \subset B_{2N}(\alpha;2\rho),

thus doubling the Bohr set doubles both the length parameter {N} and the radius parameter {\rho}. As such, these Bohr sets resemble two-dimensional balls (or boxes). Indeed, one can view {B_N(\alpha;\rho)} as the preimage of the two-dimensional box {[-1,1] \times [-\rho,\rho] \subset {\bf R} \times {\bf R}/{\bf Z}} under the homomorphism {n \mapsto (n/N, \alpha n \hbox{ mod } 1)}.

Another class of finite set with two-dimensional behaviour is the class of (rank two) generalised arithmetic progressions

\displaystyle  P( a_1,a_2; N_1,N_2 ) := \{ n_1 a_1 + n_2 a_2: n_1,n_2 \in {\bf Z}; |n_1| \leq N_1, |n_2| \leq N_2 \}

with {a_1,a_2 \in {\bf Z}} and {N_1,N_2 > 0} Indeed, we have

\displaystyle  P( a_1,a_2; N_1,N_2 ) + P( a_1,a_2; N_1,N_2 ) \subset P( a_1,a_2; 2N_1, 2N_2 )

and so we see, as with the Bohr set, that doubling the generalised arithmetic progressions doubles the two defining parameters of that progression.

More generally, there is an analogy between rank {r} Bohr sets

\displaystyle  B_N(\alpha_1,\ldots,\alpha_r; \rho_1,\ldots,\rho_r) := \{ n \in {\bf Z}: -N \leq n \leq N; \|n\alpha_i\|_{{\bf R}/{\bf Z}} \leq \rho_i

\displaystyle  \hbox{ for all } 1 \leq i \leq r \}

and the rank {r+1} generalised arithmetic progressions

\displaystyle  P( a_1,\ldots,a_{r+1}; N_1,\ldots,N_{r+1} ) := \{ n_1 a_1 + \ldots + n_{r+1} a_{r+1}:

\displaystyle  n_1,\ldots,n_{r+1} \in {\bf Z}; |n_i| \leq N_i \hbox{ for all } 1 \leq i \leq r+1 \}.

One of the aims of additive combinatorics is to formalise analogies such as the one given above. By using some arguments from the geometry of numbers, for instance, one can show that for any rank {r} Bohr set {B_N(\alpha_1,\ldots,\alpha_r;\rho_1,\ldots,\rho_r)}, there is a rank {r+1} generalised arithmetic progression {P(a_1,\ldots,a_{r+1}; N_1,\ldots,N_{r+1})} for which one has the containments

\displaystyle  B_N(\alpha_1,\ldots,\alpha_r;\epsilon \rho_1,\ldots,\epsilon \rho_r) \subset P(a_1,\ldots,a_{r+1}; N_1,\ldots,N_{r+1})

\displaystyle \subset B_N(\alpha_1,\ldots,\alpha_r;\rho_1,\ldots,\rho_r)

for some explicit {\epsilon>0} depending only on {r} (in fact one can take {\epsilon = (r+1)^{-2(r+1)}}); this is (a slight modification of) Lemma 4.22 of my book with Van Vu.

In the special case when {r=1}, one can make a significantly more detailed description of the link between rank one Bohr sets and rank two generalised arithmetic progressions, by using the classical theory of continued fractions, which among other things gives a fairly precise formula for the generators {a_1,a_2} and lengths {N_1,N_2} of the generalised arithmetic progression associated to a rank one Bohr set {B_N(\alpha;\rho)}. While this connection is already implicit in the continued fraction literature (for instance, in the classic text of Hardy and Wright), I thought it would be a good exercise to work it out explicitly and write it up, which I will do below the fold.

It is unfortunate that the theory of continued fractions is restricted to the rank one setting (it relies very heavily on the total ordering of one-dimensional sets such as {{\bf Z}} or {{\bf R}}). A higher rank version of the theory could potentially help with questions such as the Littlewood conjecture, which remains open despite a substantial amount of effort and partial progress on the problem. At the end of this post I discuss how one can use the rank one theory to rephrase the Littlewood conjecture as a conjecture about a doubly indexed family of rank four progressions, which can be used to heuristically justify why this conjecture should be true, but does not otherwise seem to shed much light on the problem.

Read the rest of this entry »

One of the most fundamental principles in Fourier analysis is the uncertainty principle. It does not have a single canonical formulation, but one typical informal description of the principle is that if a function {f} is restricted to a narrow region of physical space, then its Fourier transform {\hat f} must be necessarily “smeared out” over a broad region of frequency space. Some versions of the uncertainty principle are discussed in this previous blog post.

In this post I would like to highlight a useful instance of the uncertainty principle, due to Hugh Montgomery, which is useful in analytic number theory contexts. Specifically, suppose we are given a complex-valued function {f: {\bf Z} \rightarrow {\bf C}} on the integers. To avoid irrelevant issues at spatial infinity, we will assume that the support {\hbox{supp}(f) := \{ n \in {\bf Z}: f(n) \neq 0 \}} of this function is finite (in practice, we will only work with functions that are supported in an interval {[M+1,M+N]} for some natural numbers {M,N}). Then we can define the Fourier transform {\hat f: {\bf R}/{\bf Z} \rightarrow {\bf C}} by the formula

\displaystyle  \hat f(\xi) := \sum_{n \in {\bf Z}} f(n) e(-n\xi)

where {e(x) := e^{2\pi i x}}. (In some literature, the sign in the exponential phase is reversed, but this will make no substantial difference to the arguments below.)

The classical uncertainty principle, in this context, asserts that if {f} is localised in an interval of length {N}, then {\hat f} must be “smeared out” at a scale of at least {1/N} (and essentially constant at scales less than {1/N}). For instance, if {f} is supported in {[M+1,M+N]}, then we have the Plancherel identity

\displaystyle  \int_{{\bf R}/{\bf Z}} |\hat f(\xi)|^2\ d\xi = \| f \|_{\ell^2({\bf Z})}^2 \ \ \ \ \ (1)

while from the Cauchy-Schwarz inequality we have

\displaystyle  |\hat f(\xi)| \leq N^{1/2} \| f \|_{\ell^2({\bf Z})}

for each frequency {\xi}, and in particular that

\displaystyle  \int_I |\hat f(\xi)|^2\ d\xi \leq N |I| \| f \|_{\ell^2({\bf Z})}^2

for any arc {I} in the unit circle (with {|I|} denoting the length of {I}). In particular, an interval of length significantly less than {1/N} can only capture a fraction of the {L^2} energy of the Fourier transform of {\hat f}, which is consistent with the above informal statement of the uncertainty principle.

Another manifestation of the classical uncertainty principle is the large sieve inequality. A particularly nice formulation of this inequality is due independently to Montgomery and Vaughan and Selberg: if {f} is supported in {[M+1,M+N]}, and {\xi_1,\ldots,\xi_J} are frequencies in {{\bf R}/{\bf Z}} that are {\delta}-separated for some {\delta>0}, thus {\| \xi_i-\xi_j\|_{{\bf R}/{\bf Z}} \geq \delta} for all {1 \leq i<j \leq J} (where {\|\xi\|_{{\bf R}/{\bf Z}}} denotes the distance of {\xi} to the origin in {{\bf R}/{\bf Z}}), then

\displaystyle  \sum_{j=1}^J |\hat f(\xi_j)|^2 \leq (N + \frac{1}{\delta}) \| f \|_{\ell^2({\bf Z})}^2. \ \ \ \ \ (2)

The reader is encouraged to see how this inequality is consistent with the Plancherel identity (1) and the intuition that {\hat f} is essentially constant at scales less than {1/N}. The factor {N + \frac{1}{\delta}} can in fact be amplified a little bit to {N + \frac{1}{\delta} - 1}, which is essentially optimal, by using a neat dilation trick of Paul Cohen, in which one dilates {[M+1,M+N]} to {[MK+K, MK+NK]} (and replaces each frequency {\xi_j} by their {K^{th}} roots), and then sending {K \rightarrow \infty} (cf. the tensor product trick); see this survey of Montgomery for details. But we will not need this refinement here.

In the above instances of the uncertainty principle, the concept of narrow support in physical space was formalised in the Archimedean sense, using the standard Archimedean metric {d_\infty(n,m) := |n-m|} on the integers {{\bf Z}} (in particular, the parameter {N} is essentially the Archimedean diameter of the support of {f}). However, in number theory, the Archimedean metric is not the only metric of importance on the integers; the {p}-adic metrics play an equally important role; indeed, it is common to unify the Archimedean and {p}-adic perspectives together into a unified adelic perspective. In the {p}-adic world, the metric balls are no longer intervals, but are instead residue classes modulo some power of {p}. Intersecting these balls from different {p}-adic metrics together, we obtain residue classes with respect to various moduli {q} (which may be either prime or composite). As such, another natural manifestation of the concept of “narrow support in physical space” is “vanishes on many residue classes modulo {q}“. This notion of narrowness is particularly common in sieve theory, when one deals with functions supported on thin sets such as the primes, which naturally tend to avoid many residue classes (particularly if one throws away the first few primes).

In this context, the uncertainty principle is this: the more residue classes modulo {q} that {f} avoids, the more the Fourier transform {\hat f} must spread out along multiples of {1/q}. To illustrate a very simple example of this principle, let us take {q=2}, and suppose that {f} is supported only on odd numbers (thus it completely avoids the residue class {0 \mod 2}). We write out the formulae for {\hat f(\xi)} and {\hat f(\xi+1/2)}:

\displaystyle  \hat f(\xi) = \sum_n f(n) e(-n\xi)

\displaystyle  \hat f(\xi+1/2) = \sum_n f(n) e(-n\xi) e(-n/2).

If {f} is supported on the odd numbers, then {e(-n/2)} is always equal to {-1} on the support of {f}, and so we have {\hat f(\xi+1/2)=-\hat f(\xi)}. Thus, whenever {\hat f} has a significant presence at a frequency {\xi}, it also must have an equally significant presence at the frequency {\xi+1/2}; there is a spreading out across multiples of {1/2}. Note that one has a similar effect if {f} was supported instead on the even integers instead of the odd integers.

A little more generally, suppose now that {f} avoids a single residue class modulo a prime {p}; for sake of argument let us say that it avoids the zero residue class {0 \mod p}, although the situation for the other residue classes is similar. For any frequency {\xi} and any {j=0,\ldots,p-1}, one has

\displaystyle  \hat f(\xi+j/p) = \sum_n f(n) e(-n\xi) e(-jn/p).

From basic Fourier analysis, we know that the phases {e(-jn/p)} sum to zero as {j} ranges from {0} to {p-1} whenever {n} is not a multiple of {p}. We thus have

\displaystyle  \sum_{j=0}^{p-1} \hat f(\xi+j/p) = 0.

In particular, if {\hat f(\xi)} is large, then one of the other {\hat f(\xi+j/p)} has to be somewhat large as well; using the Cauchy-Schwarz inequality, we can quantify this assertion in an {\ell^2} sense via the inequality

\displaystyle  \sum_{j=1}^{p-1} |\hat f(\xi+j/p)|^2 \geq \frac{1}{p-1} |\hat f(\xi)|^2. \ \ \ \ \ (3)

Let us continue this analysis a bit further. Now suppose that {f} avoids {\omega(p)} residue classes modulo a prime {p}, for some {0 \leq \omega(p) < p}. (We exclude the case {\omega(p)=p} as it is clearly degenerates by forcing {f} to be identically zero.) Let {\nu(n)} be the function that equals {1} on these residue classes and zero away from these residue classes, then

\displaystyle  \sum_n f(n) e(-n\xi) \nu(n) = 0.

Using the periodic Fourier transform, we can write

\displaystyle  \nu(n) = \sum_{j=0}^{p-1} c(j) e(-jn/p)

for some coefficients {c(j)}, thus

\displaystyle  \sum_{j=0}^{p-1} \hat f(\xi+j/p) c(j) = 0.

Some Fourier-analytic computations reveal that

\displaystyle  c(0) = \frac{\omega(p)}{p}

and

\displaystyle  \sum_{j=0}^{p-1} |c(j)|^2 = \frac{\omega(p)}{p}

and so after some routine algebra and the Cauchy-Schwarz inequality, we obtain a generalisation of (3):

\displaystyle  \sum_{j=1}^{p-1} |\hat f(\xi+j/p)|^2 \geq \frac{\omega(p)}{p-\omega(p)} |\hat f(\xi)|^2.

Thus we see that the more residue classes mod {p} we exclude, the more Fourier energy has to disperse along multiples of {1/p}. It is also instructive to consider the extreme case {\omega(p)=p-1}, in which {f} is supported on just a single residue class {a \mod p}; in this case, one clearly has {\hat f(\xi+j/p) = e(-aj/p) \hat f(\xi)}, and so {\hat f} spreads its energy completely evenly along multiples of {1/p}.

In 1968, Montgomery observed the following useful generalisation of the above calculation to arbitrary modulus:

Proposition 1 (Montgomery’s uncertainty principle) Let {f: {\bf Z} \rightarrow{\bf C}} be a finitely supported function which, for each prime {p}, avoids {\omega(p)} residue classes modulo {p} for some {0 \leq \omega(p) < p}. Then for each natural number {q}, one has

\displaystyle  \sum_{1 \leq a \leq q: (a,q)=1} |\hat f(\xi+\frac{a}{q})|^2 \geq h(q) |\hat f(\xi)|^2

where {h(q)} is the quantity

\displaystyle  h(q) := \mu(q)^2 \prod_{p|q} \frac{\omega(p)}{p-\omega(p)} \ \ \ \ \ (4)

where {\mu} is the Möbius function.

We give a proof of this proposition below the fold.

Following the “adelic” philosophy, it is natural to combine this uncertainty principle with the large sieve inequality to take simultaneous advantage of localisation both in the Archimedean sense and in the {p}-adic senses. This leads to the following corollary:

Corollary 2 (Arithmetic large sieve inequality) Let {f: {\bf Z} \rightarrow {\bf C}} be a function supported on an interval {[M+1,M+N]} which, for each prime {p}, avoids {\omega(p)} residue classes modulo {p} for some {0 \leq \omega(p) < p}. Let {\xi_1,\ldots,\xi_J \in {\bf R}/{\bf Z}}, and let {{\mathcal Q}} be a finite set of natural numbers. Suppose that the frequencies {\xi_j + \frac{a}{q}} with {1 \leq j \leq J}, {q \in {\mathcal Q}}, and {(a,q)=1} are {\delta}-separated. Then one has

\displaystyle  \sum_{j=1}^J |\hat f(\xi_j)|^2 \leq \frac{N + \frac{1}{\delta}}{\sum_{q \in {\mathcal Q}} h(q)} \| f \|_{\ell^2({\bf Z})}^2

where {h(q)} was defined in (4).

Indeed, from the large sieve inequality one has

\displaystyle  \sum_{j=1}^J \sum_{q \in {\mathcal Q}} \sum_{1 \leq a \leq q: (a,q)=1} |\hat f(\xi_j+\frac{a}{q})|^2 \leq (N + \frac{1}{\delta}) \| f \|_{\ell^2({\bf Z})}^2

while from Proposition 1 one has

\displaystyle  \sum_{q \in {\mathcal Q}} \sum_{1 \leq a \leq q: (a,q)=1} |\hat f(\xi_j+\frac{a}{q})|^2 \geq (\sum_{q \in {\mathcal Q}} h(q)) |\hat f(\xi_j)|^2

whence the claim.

There is a great deal of flexibility in the above inequality, due to the ability to select the set {{\mathcal Q}}, the frequencies {\xi_1,\ldots,\xi_J}, the omitted classes {\omega(p)}, and the separation parameter {\delta}. Here is a typical application concerning the original motivation for the large sieve inequality, namely in bounding the size of sets which avoid many residue classes:

Corollary 3 (Large sieve) Let {A} be a set of integers contained in {[M+1,M+N]} which avoids {\omega(p)} residue classes modulo {p} for each prime {p}, and let {R>0}. Then

\displaystyle  |A| \leq \frac{N+R^2}{G(R)}

where

\displaystyle  G(R) := \sum_{q \leq R} h(q).

Proof: We apply Corollary 2 with {f = 1_A}, {J=1}, {\delta=1/R^2}, {\xi_1=0}, and {{\mathcal Q} := \{ q: q \leq R\}}. The key point is that the Farey sequence of fractions {a/q} with {q \leq R} and {(a,q)=1} is {1/R^2}-separated, since

\displaystyle  \| \frac{a}{q}-\frac{a'}{q'} \|_{{\bf R}/{\bf Z}} \geq \frac{1}{qq'} \geq \frac{1}{R^2}

whenever {a/q, a'/q'} are distinct fractions in this sequence. \Box

If, for instance, {A} is the set of all primes in {[M+1,M+N]} larger than {R}, then one can set {\omega(p)=1} for all {p \leq R}, which makes {h(q) = \frac{\mu^2(q)}{\phi(q)}}, where {\phi} is the Euler totient function. It is a classical estimate that

\displaystyle  G(R) = \log R + O(1).

Using this fact and optimising in {R}, we obtain (a special case of) the Brun-Titchmarsh inequality

\displaystyle  \pi(M+N)-\pi(M) \leq (2+o_{N \rightarrow \infty}(1)) \frac{N}{\log N}

where {\pi(x)} is the prime counting function; a variant of the same argument gives the more general Brun-Titchmarsh inequality

\displaystyle  \pi(M+N;a,q)-\pi(M;a,q) \leq (2+o_{N \rightarrow \infty;q}(1)) \frac{q}{\phi(q)} \frac{N}{\log N}

for any primitive residue class {a \mod q}, where {\pi(M;a,q)} is the number of primes less than or equal to {M} that are congruent to {a \mod q}. By performing a more careful optimisation using a slightly sharper version of the large sieve inequality (2) that exploits the irregular spacing of the Farey sequence, Montgomery and Vaughan were able to delete the error term in the Brun-Titchmarsh inequality, thus establishing the very nice inequality

\displaystyle  \pi(M+N;a,q)-\pi(M;a,q) \leq 2 \frac{q}{\phi(q)} \frac{N}{\log N}

for any natural numbers {M,N,a,q} with {N>1}. This is a particularly useful inequality in non-asymptotic analytic number theory (when one wishes to study number theory at explicit orders of magnitude, rather than the number theory of sufficiently large numbers), due to the absence of asymptotic notation.

I recently realised that Corollary 2 also establishes a stronger version of the “restriction theorem for the Selberg sieve” that Ben Green and I proved some years ago (indeed, one can view Corollary 2 as a “restriction theorem for the large sieve”). I’m placing the details below the fold.

Read the rest of this entry »

In 1964, Kemperman established the following result:

Theorem 1 Let {G} be a compact connected group, with a Haar probability measure {\mu}. Let {A, B} be compact subsets of {G}. Then

\displaystyle  \mu(AB) \geq \min( \mu(A) + \mu(B), 1 ).

Remark 1 The estimate is sharp, as can be seen by considering the case when {G} is a unit circle, and {A, B} are arcs; similarly if {G} is any compact connected group that projects onto the circle. The connectedness hypothesis is essential, as can be seen by considering what happens if {A} and {B} are a non-trivial open subgroup of {G}. For locally compact connected groups which are unimodular but not compact, there is an analogous statement, but with {\mu} now a Haar measure instead of a Haar probability measure, and the right-hand side {\min(\mu(A)+\mu(B),1)} replaced simply by {\mu(A)+\mu(B)}. The case when {G} is a torus is due to Macbeath, and the case when {G} is a circle is due to Raikov. The theorem is closely related to the Cauchy-Davenport inequality; indeed, it is not difficult to use that inequality to establish the circle case, and the circle case can be used to deduce the torus case by considering increasingly dense circle subgroups of the torus (alternatively, one can also use Kneser’s theorem).

By inner regularity, the hypothesis that {A,B} are compact can be replaced with Borel measurability, so long as one then adds the additional hypothesis that {A+B} is also Borel measurable.

A short proof of Kemperman’s theorem was given by Ruzsa. In this post I wanted to record how this argument can be used to establish the following more “robust” version of Kemperman’s theorem, which not only lower bounds {AB}, but gives many elements of {AB} some multiplicity:

Theorem 2 Let {G} be a compact connected group, with a Haar probability measure {\mu}. Let {A, B} be compact subsets of {G}. Then for any {0 \leq t \leq \min(\mu(A),\mu(B))}, one has

\displaystyle  \int_G \min(1_A*1_B, t)\ d\mu \geq t \min(\mu(A)+\mu(B) - t,1). \ \ \ \ \ (1)

Indeed, Theorem 1 can be deduced from Theorem 2 by dividing (1) by {t} and then taking limits as {t \rightarrow 0}. The bound in (1) is sharp, as can again be seen by considering the case when {A,B} are arcs in a circle. The analogous claim for cyclic groups for prime order was established by Pollard, and for general abelian groups by Green and Ruzsa.

Let us now prove Theorem 2. It uses a submodularity argument related to one discussed in this previous post. We fix {B} and {t} with {0 \leq t \leq \mu(B)}, and define the quantity

\displaystyle  c(A) := \int_G \min(1_A*1_B, t)\ d\mu - t (\mu(A)+\mu(B)-t).

for any compact set {A}. Our task is to establish that {c(A) \geq 0} whenever {t \leq \mu(A) \leq 1-\mu(B)+t}.

We first verify the extreme cases. If {\mu(A) = t}, then {1_A*1_B \leq t}, and so {c(A)=0} in this case. At the other extreme, if {\mu(A) = 1-\mu(B)+t}, then from the inclusion-exclusion principle we see that {1_A * 1_B \geq t}, and so again {c(A)=0} in this case (since {\int_G 1_A*1_B = \mu(A)\mu(B) = t \mu(B)}).

To handle the intermediate regime when {\mu(A)} lies between {t} and {1-\mu(B)+t}, we rely on the submodularity inequality

\displaystyle  c(A_1) + c(A_2) \geq c(A_1 \cap A_2) + c(A_1 \cup A_2) \ \ \ \ \ (2)

for arbitrary compact {A_1,A_2}. This inequality comes from the obvious pointwise identity

\displaystyle  1_{A_1} + 1_{A_2} = 1_{A_1 \cap A_2} + 1_{A_1 \cup A_2}

whence

\displaystyle  1_{A_1}*1_B + 1_{A_2}*1_B = 1_{A_1 \cap A_2}*1_B + 1_{A_1 \cup A_2}*1_B

and thus (noting that the quantities on the left are closer to each other than the quantities on the right)

\displaystyle  \min(1_{A_1}*1_B,t) + \min(1_{A_2}*1_B,t)

\displaystyle  \geq \min(1_{A_1 \cap A_2}*1_B,t) + \min(1_{A_1 \cup A_2}*1_B,t)

at which point (2) follows by integrating over {G} and then using the inclusion-exclusion principle.

Now introduce the function

\displaystyle  f(a) := \inf \{ c(A) : \mu(A) = a \}

for {t \leq a \leq 1-\mu(B)+t}. From the preceding discussion {f(a)} vanishes at the endpoints {a = t, 1-\mu(B)+t}; our task is to show that {f(a)} is non-negative in the interior region {t < a < 1-\mu(B)+t}. Suppose for contradiction that this was not the case. It is easy to see that {f} is continuous (indeed, it is even Lipschitz continuous), so there must be {t < a < 1-\mu(B)+t} at which {f} is a local minimum and not locally constant. In particular, {0 <a <1}. But for any {A} with {\mu(A) = a}, we have the translation-invariance

\displaystyle  c(gA) = c(A) \ \ \ \ \ (3)

for any {g \in G}, and hence by (2)

\displaystyle  c(A) \geq \frac{1}{2} c(A \cap gA) + \frac{1}{2} c(A \cup gA ).

Note that {\mu(A \cap gA)} depends continuously on {g}, equals {a} when {g} is the identity, and has an average value of {a^2}. As {G} is connected, we thus see from the intermediate value theorem that for any {0 < \epsilon < a-a^2}, we can find {g} such that {\mu(A \cap gA) = a-\epsilon}, and thus by inclusion-exclusion {\mu(A \cup gA) = a+\epsilon}. By definition of {f}, we thus have

\displaystyle  c(A) \geq \frac{1}{2} f(a-\epsilon) + \frac{1}{2} f(a+\epsilon).

Taking infima in {A} (and noting that the hypotheses on {\epsilon} are independent of {A}) we conclude that

\displaystyle  f(a) \geq \frac{1}{2} f(a-\epsilon) + \frac{1}{2} f(a+\epsilon)

for all {0 < \epsilon < a-a^2}. As {f} is a local minimum and {\epsilon} is arbitrarily small, this implies that {f} is locally constant, a contradiction. This establishes Theorem 2.

We observe the following corollary:

Corollary 3 Let {G} be a compact connected group, with a Haar probability measure {\mu}. Let {A, B, C} be compact subsets of {G}, and let {\delta := \min(\mu(A),\mu(B),\mu(C))}. Then one has the pointwise estimate

\displaystyle  1_A * 1_B * 1_C \geq \frac{1}{4} (\mu(A)+\mu(B)+\mu(C)-1)_+^2

if {\mu(A)+\mu(B)+\mu(C)-1 \leq 2 \delta}, and

\displaystyle  1_A * 1_B * 1_C \geq \delta (\mu(A)+\mu(B)+\mu(C)-1-\delta)

if {\mu(A)+\mu(B)+\mu(C)-1 \geq 2 \delta}.

Once again, the bounds are completely sharp, as can be seen by computing {1_A*1_B*1_C} when {A,B,C} are arcs of a circle. For quasirandom {G}, one can do much better than these bounds, as discussed in this recent blog post; thus, the abelian case is morally the worst case here, although it seems difficult to convert this intuition into a rigorous reduction.

Proof: By cyclic permutation we may take {\delta = \mu(C)}. For any

\displaystyle  (\mu(A)+\mu(B)-1)_+ \leq t \leq \min(\mu(A),\mu(B)),

we can bound

\displaystyle  1_A*1_B*1_C \geq \min(1_A*1_B,t)*1_C

\displaystyle  \geq \int_G \min(1_A*1_B,t)\ d\mu - t (1-\mu(C))

\displaystyle  \geq t (\mu(A)+\mu(B)-t) - t (1-\mu(C))

\displaystyle  = t \min( \mu(A)+\mu(B)+\mu(C)-1-t )

where we used Theorem 2 to obtain the third line. Optimising in {t}, we obtain the claim. \Box

Emmanuel Breuillard, Ben Green and I have just uploaded to the arXiv the short paper “A nilpotent Freiman dimension lemma“, submitted to the special volume of the European Journal of Combinatorics in honour of Yahya Ould Hamidoune.  This paper is a nonabelian (or more precisely, nilpotent) variant of the following additive combinatorics lemma of Freiman:

Freiman’s lemma.  Let A be a finite subset of a Euclidean space with |A+A| \leq K|A|.  Then A is contained in an affine subspace of dimension at most {}\lfloor K-1 \rfloor.

This can be viewed as a “cheap” version of the more well known theorem of Freiman that places sets of small doubling in a torsion-free abelian group inside a generalised arithmetic progression.  The advantage here is that the bound on the dimension is extremely explicit.

Our main result is

Theorem.  Let A be a finite subset of a simply-connected nilpotent Lie group G which is a K-approximate group (i.e. A is symmetric, contains the identity, and A \cdot A can be covered by up to K left translates of A.  Then A can be covered by at most K^{2+29K^9} left-translates of a closed connected Lie subgroup of dimension at most K^9.

We remark that our previous paper established a similar result, in which the dimension bound was improved to 6\log_2 K, but at the cost of worsening the covering number to O_K(1), and with a much more complicated proof  (91 pages instead of 8). Furthermore, the bound on O_K(1) is ineffective, due to the use of ultraproducts in the argument (though it is likely that some extremely lousy explicit bound could eventually be squeezed out of the argument by finitising everything).  Note that the step of the ambient nilpotent group G does not influence the final bounds in the theorem, although we do of course need this step to be finite.  A simple quotienting argument allows one to deduce a corollary of the above theorem in which the ambient group is assumed to be residually torsion-free nilpotent instead of being a simply connected nilpotent Lie group, but we omit the statement of this corollary here.

To motivate the proof of this theorem, let us first show a simple case of an argument of Gleason, which is very much in the spirit of Freiman’s lemma:

Gleason Lemma (special case).  Let A be a finite symmetric subset of a Euclidean space, and let 0 = H_0 \subset H_1 \subset \ldots \subset H_k be a sequence of subspaces in this space, such that the sets 2A \cap H_i are strictly increasing in i for i=0,\ldots,k.  Then |5A| \geq k|A|, where 5A = A+A+A+A+A.

Proof.    By hypothesis, for each i=1,\ldots,k, the projection B_i of 2A \cap H_i to H_i / H_{i-1} is non-trivial, finite, and symmetric.   In particular, since the vector space H_i/H_{i-1} is torsion-free, B_i+B_i is strictly larger than B_i.  Equivalently, one can find a_i in (2A \cap H_i) + (2A \cap H_i) that does not lie in (2A \cap H_i) + H_{i-1}; in particular, a_i \in 4A \cap H_i and a_i+A is disjoint from H_{i-1}+A.  As a consequence, the a_i+A are disjoint and lie in 5A, whence the claim. \Box

Note that by combining the contrapositive of this lemma with a greedy algorithm, one can show that any K-approximate group in a Euclidean space is contained in a subspace of dimension at most K^4, which is a weak version of Freiman’s lemma.

To extend the argument to the nilpotent setting we use the following idea.  Observe that any non-trivial genuine subgroup H of a nilpotent group G will contain at least one non-trivial central element; indeed, by intersecting H with the lower central series G = G_1 \geq G_2 \geq G_3 \geq \ldots of G, and considering the last intersection H \cap G_k which is non-trivial, one obtains the claim.  It turns out that one can adapt this argument to approximate groups, so that any sufficiently large K-approximate subgroup A of G will contain a non-trivial element that centralises a large fraction of A.  Passing to this large fraction and quotienting out the central element, we obtain a new approximate group.    If, after a bounded number of steps, this procedure gives an approximate group of bounded size, we are basically done.  If, however, the process continues, then by using some Lie group theory, one can find a long sequence H_0 \leq H_1 \leq \ldots \leq H_k of connected Lie subgroups of G, such that the sets A^2 \cap H_i are strictly increasing in i.   Using some Lie group theory and the hypotheses on G, one can deduce that the group \langle A^2 \cap H_{i+1}\rangle generated by A^2 \cap H_{i+1} is much larger than \langle A^2 \cap H_i \rangle, in the sense that the latter group has infinite index in the former.   It then turns out that the Gleason argument mentioned above can be adapted to this setting.

Let {L: H \rightarrow H} be a self-adjoint operator on a finite-dimensional Hilbert space {H}. The behaviour of this operator can be completely described by the spectral theorem for finite-dimensional self-adjoint operators (i.e. Hermitian matrices, when viewed in coordinates), which provides a sequence {\lambda_1,\ldots,\lambda_n \in {\bf R}} of eigenvalues and an orthonormal basis {e_1,\ldots,e_n} of eigenfunctions such that {L e_i = \lambda_i e_i} for all {i=1,\ldots,n}. In particular, given any function {m: \sigma(L) \rightarrow {\bf C}} on the spectrum {\sigma(L) := \{ \lambda_1,\ldots,\lambda_n\}} of {L}, one can then define the linear operator {m(L): H \rightarrow H} by the formula

\displaystyle  m(L) e_i := m(\lambda_i) e_i,

which then gives a functional calculus, in the sense that the map {m \mapsto m(L)} is a {C^*}-algebra isometric homomorphism from the algebra {BC(\sigma(L) \rightarrow {\bf C})} of bounded continuous functions from {\sigma(L)} to {{\bf C}}, to the algebra {B(H \rightarrow H)} of bounded linear operators on {H}. Thus, for instance, one can define heat operators {e^{-tL}} for {t>0}, Schrödinger operators {e^{itL}} for {t \in {\bf R}}, resolvents {\frac{1}{L-z}} for {z \not \in \sigma(L)}, and (if {L} is positive) wave operators {e^{it\sqrt{L}}} for {t \in {\bf R}}. These will be bounded operators (and, in the case of the Schrödinger and wave operators, unitary operators, and in the case of the heat operators with {L} positive, they will be contractions). Among other things, this functional calculus can then be used to solve differential equations such as the heat equation

\displaystyle  u_t + Lu = 0; \quad u(0) = f \ \ \ \ \ (1)

the Schrödinger equation

\displaystyle  u_t + iLu = 0; \quad u(0) = f \ \ \ \ \ (2)

the wave equation

\displaystyle  u_{tt} + Lu = 0; \quad u(0) = f; \quad u_t(0) = g \ \ \ \ \ (3)

or the Helmholtz equation

\displaystyle  (L-z) u = f. \ \ \ \ \ (4)

The functional calculus can also be associated to a spectral measure. Indeed, for any vectors {f, g \in H}, there is a complex measure {\mu_{f,g}} on {\sigma(L)} with the property that

\displaystyle  \langle m(L) f, g \rangle_H = \int_{\sigma(L)} m(x) d\mu_{f,g}(x);

indeed, one can set {\mu_{f,g}} to be the discrete measure on {\sigma(L)} defined by the formula

\displaystyle  \mu_{f,g}(E) := \sum_{i: \lambda_i \in E} \langle f, e_i \rangle_H \langle e_i, g \rangle_H.

One can also view this complex measure as a coefficient

\displaystyle  \mu_{f,g} = \langle \mu f, g \rangle_H

of a projection-valued measure {\mu} on {\sigma(L)}, defined by setting

\displaystyle  \mu(E) f := \sum_{i: \lambda_i \in E} \langle f, e_i \rangle_H e_i.

Finally, one can view {L} as unitarily equivalent to a multiplication operator {M: f \mapsto g f} on {\ell^2(\{1,\ldots,n\})}, where {g} is the real-valued function {g(i) := \lambda_i}, and the intertwining map {U: \ell^2(\{1,\ldots,n\}) \rightarrow H} is given by

\displaystyle  U ( (c_i)_{i=1}^n ) := \sum_{i=1}^n c_i e_i,

so that {L = U M U^{-1}}.

It is an important fact in analysis that many of these above assertions extend to operators on an infinite-dimensional Hilbert space {H}, so long as one one is careful about what “self-adjoint operator” means; these facts are collectively referred to as the spectral theorem. For instance, it turns out that most of the above claims have analogues for bounded self-adjoint operators {L: H \rightarrow H}. However, in the theory of partial differential equations, one often needs to apply the spectral theorem to unbounded, densely defined linear operators {L: D \rightarrow H}, which (initially, at least), are only defined on a dense subspace {D} of the Hilbert space {H}. A very typical situation arises when {H = L^2(\Omega)} is the square-integrable functions on some domain or manifold {\Omega} (which may have a boundary or be otherwise “incomplete”), and {D = C^\infty_c(\Omega)} are the smooth compactly supported functions on {\Omega}, and {L} is some linear differential operator. It is then of interest to obtain the spectral theorem for such operators, so that one build operators such as {e^{-tL}, e^{itL}, \frac{1}{L-z}, e^{it\sqrt{L}}} or to solve equations such as (1), (2), (3), (4).

In order to do this, some necessary conditions on the densely defined operator {L: D \rightarrow H} must be imposed. The most obvious is that of symmetry, which asserts that

\displaystyle  \langle Lf, g \rangle_H = \langle f, Lg \rangle_H \ \ \ \ \ (5)

for all {f, g \in D}. In some applications, one also wants to impose positiveness, which asserts that

\displaystyle  \langle Lf, f \rangle_H \geq 0 \ \ \ \ \ (6)

for all {f \in D}. These hypotheses are sufficient in the case when {L} is bounded, and in particular when {H} is finite dimensional. However, as it turns out, for unbounded operators these conditions are not, by themselves, enough to obtain a good spectral theory. For instance, one consequence of the spectral theorem should be that the resolvents {(L-z)^{-1}} are well-defined for any strictly complex {z}, which by duality implies that the image of {L-z} should be dense in {H}. However, this can fail if one just assumes symmetry, or symmetry and positive definiteness. A well-known example occurs when {H} is the Hilbert space {H := L^2((0,1))}, {D := C^\infty_c((0,1))} is the space of test functions, and {L} is the one-dimensional Laplacian {L := -\frac{d^2}{dx^2}}. Then {L} is symmetric and positive, but the operator {L-k^2} does not have dense image for any complex {k}, since

\displaystyle  \langle (L-\overline{k}^2) f, e^{\overline{k}x} \rangle_H = 0

for all test functions {f \in C^\infty_c((0,1))}, as can be seen from a routine integration by parts. As such, the resolvent map is not everywhere uniquely defined. There is also a lack of uniqueness for the wave, heat, and Schrödinger equations for this operator (note that there are no spatial boundary conditions specified in these equations).

Another example occurs when {H := L^2((0,+\infty))}, {D := C^\infty_c((0,+\infty))}, {L} is the momentum operator {L := i \frac{d}{dx}}. Then the resolvent {(L-z)^{-1}} can be uniquely defined for {z} in the upper half-plane, but not in the lower half-plane, due to the obstruction

\displaystyle  \langle (L-z) f, e^{i \bar{z} x} \rangle_H = 0

for all test functions {f} (note that the function {e^{i\bar{z} x}} lies in {L^2((0,+\infty))} when {z} is in the lower half-plane). For related reasons, the translation operators {e^{itL}} have a problem with either uniqueness or existence (depending on whether {t} is positive or negative), due to the unspecified boundary behaviour at the origin.

The key property that lets one avoid this bad behaviour is that of essential self-adjointness. Once {L} is essentially self-adjoint, then spectral theorem becomes applicable again, leading to all the expected behaviour (e.g. existence and uniqueness for the various PDE given above).

Unfortunately, the concept of essential self-adjointness is defined rather abstractly, and is difficult to verify directly; unlike the symmetry condition (5) or the positive condition (6), it is not a “local” condition that can be easily verified just by testing {L} on various inputs, but is instead a more “global” condition. In practice, to verify this property, one needs to invoke one of a number of a partial converses to the spectral theorem, which roughly speaking asserts that if at least one of the expected consequences of the spectral theorem is true for some symmetric densely defined operator {L}, then {L} is self-adjoint. Examples of “expected consequences” include:

  • Existence of resolvents {(L-z)^{-1}} (or equivalently, dense image for {L-z});
  • Existence of a contractive heat propagator semigroup {e^{tL}} (in the positive case);
  • Existence of a unitary Schrödinger propagator group {e^{itL}};
  • Existence of a unitary wave propagator group {e^{it\sqrt{L}}} (in the positive case);
  • Existence of a “reasonable” functional calculus.
  • Unitary equivalence with a multiplication operator.

Thus, to actually verify essential self-adjointness of a differential operator, one typically has to first solve a PDE (such as the wave, Schrödinger, heat, or Helmholtz equation) by some non-spectral method (e.g. by a contraction mapping argument, or a perturbation argument based on an operator already known to be essentially self-adjoint). Once one can solve one of the PDEs, then one can apply one of the known converse spectral theorems to obtain essential self-adjointness, and then by the forward spectral theorem one can then solve all the other PDEs as well. But there is no getting out of that first step, which requires some input (typically of an ODE, PDE, or geometric nature) that is external to what abstract spectral theory can provide. For instance, if one wants to establish essential self-adjointness of the Laplace-Beltrami operator {L = -\Delta_g} on a smooth Riemannian manifold {(M,g)} (using {C^\infty_c(M)} as the domain space), it turns out (under reasonable regularity hypotheses) that essential self-adjointness is equivalent to geodesic completeness of the manifold, which is a global ODE condition rather than a local one: one needs geodesics to continue indefinitely in order to be able to (unitarily) solve PDEs such as the wave equation, which in turn leads to essential self-adjointness. (Note that the domains {(0,1)} and {(0,+\infty)} in the previous examples were not geodesically complete.) For this reason, essential self-adjointness of a differential operator is sometimes referred to as quantum completeness (with the completeness of the associated Hamilton-Jacobi flow then being the analogous classical completeness).

In these notes, I wanted to record (mostly for my own benefit) the forward and converse spectral theorems, and to verify essential self-adjointness of the Laplace-Beltrami operator on geodesically complete manifolds. This is extremely standard analysis (covered, for instance, in the texts of Reed and Simon), but I wanted to write it down myself to make sure that I really understood this foundational material properly.

Read the rest of this entry »

In the previous set of notes we saw how a representation-theoretic property of groups, namely Kazhdan’s property (T), could be used to demonstrate expansion in Cayley graphs. In this set of notes we discuss a different representation-theoretic property of groups, namely quasirandomness, which is also useful for demonstrating expansion in Cayley graphs, though in a somewhat different way to property (T). For instance, whereas property (T), being qualitative in nature, is only interesting for infinite groups such as {SL_d({\bf Z})} or {SL_d({\bf R})}, and only creates Cayley graphs after passing to a finite quotient, quasirandomness is a quantitative property which is directly applicable to finite groups, and is able to deduce expansion in a Cayley graph, provided that random walks in that graph are known to become sufficiently “flat” in a certain sense.

The definition of quasirandomness is easy enough to state:

Definition 1 (Quasirandom groups) Let {G} be a finite group, and let {D \geq 1}. We say that {G} is {D}-quasirandom if all non-trivial unitary representations {\rho: G \rightarrow U(H)} of {G} have dimension at least {D}. (Recall a representation is trivial if {\rho(g)} is the identity for all {g \in G}.)

Exercise 1 Let {G} be a finite group, and let {D \geq 1}. A unitary representation {\rho: G \rightarrow U(H)} is said to be irreducible if {H} has no {G}-invariant subspaces other than {\{0\}} and {H}. Show that {G} is {D}-quasirandom if and only if every non-trivial irreducible representation of {G} has dimension at least {D}.

Remark 1 The terminology “quasirandom group” was introduced explicitly (though with slightly different notational conventions) by Gowers in 2008 in his detailed study of the concept; the name arises because dense Cayley graphs in quasirandom groups are quasirandom graphs in the sense of Chung, Graham, and Wilson, as we shall see below. This property had already been used implicitly to construct expander graphs by Sarnak and Xue in 1991, and more recently by Gamburd in 2002 and by Bourgain and Gamburd in 2008. One can of course define quasirandomness for more general locally compact groups than the finite ones, but we will only need this concept in the finite case. (A paper of Kunze and Stein from 1960, for instance, exploits the quasirandomness properties of the locally compact group {SL_2({\bf R})} to obtain mixing estimates in that group.)

Quasirandomness behaves fairly well with respect to quotients and short exact sequences:

Exercise 2 Let {0 \rightarrow H \rightarrow G \rightarrow K \rightarrow 0} be a short exact sequence of finite groups {H,G,K}.

  • (i) If {G} is {D}-quasirandom, show that {K} is {D}-quasirandom also. (Equivalently: any quotient of a {D}-quasirandom finite group is again a {D}-quasirandom finite group.)
  • (ii) Conversely, if {H} and {K} are both {D}-quasirandom, show that {G} is {D}-quasirandom also. (In particular, the direct or semidirect product of two {D}-quasirandom finite groups is again a {D}-quasirandom finite group.)

Informally, we will call {G} quasirandom if it is {D}-quasirandom for some “large” {D}, though the precise meaning of “large” will depend on context. For applications to expansion in Cayley graphs, “large” will mean “{D \geq |G|^c} for some constant {c>0} independent of the size of {G}“, but other regimes of {D} are certainly of interest.

The way we have set things up, the trivial group {G = \{1\}} is infinitely quasirandom (i.e. it is {D}-quasirandom for every {D}). This is however a degenerate case and will not be discussed further here. In the non-trivial case, a finite group can only be quasirandom if it is large and has no large subgroups:

Exercise 3 Let {D \geq 1}, and let {G} be a finite {D}-quasirandom group.

  • (i) Show that if {G} is non-trivial, then {|G| \geq D+1}. (Hint: use the mean zero component {\tau\downharpoonright_{\ell^2(G)_0}} of the regular representation {\tau: G \rightarrow U(\ell^2(G))}.) In particular, non-trivial finite groups cannot be infinitely quasirandom.
  • (ii) Show that any proper subgroup {H} of {G} has index {[G:H] \geq D+1}. (Hint: use the mean zero component of the quasiregular representation.)

The following exercise shows that quasirandom groups have to be quite non-abelian, and in particular perfect:

Exercise 4 (Quasirandomness, abelianness, and perfection) Let {G} be a finite group.

  • (i) If {G} is abelian and non-trivial, show that {G} is not {2}-quasirandom. (Hint: use Fourier analysis or the classification of finite abelian groups.)
  • (ii) Show that {G} is {2}-quasirandom if and only if it is not perfect, i.e. the commutator group {[G,G]} is a proper subgroup of {G}. (Equivalently, {G} is {2}-quasirandom if and only if it has no non-trivial abelian quotients.)

Later on we shall see that there is a converse to the above two exercises; any non-trivial perfect finite group with no large subgroups will be quasirandom.

Exercise 5 Let {G} be a finite {D}-quasirandom group. Show that for any subgroup {G'} of {G}, {G'} is {D/[G:G']}-quasirandom, where {[G:G'] := |G|/|G'|} is the index of {G'} in {G}. (Hint: use induced representations.)

Now we give an example of a more quasirandom group.

Lemma 2 (Frobenius lemma) If {F_p} is a field of some prime order {p}, then {SL_2(F_p)} is {\frac{p-1}{2}}-quasirandom.

This should be compared with the cardinality {|SL_2(F_p)|} of the special linear group, which is easily computed to be {(p^2-1) \times p = p^3 - p}.

Proof: We may of course take {p} to be odd. Suppose for contradiction that we have {\rho: SL_2(F_p) \rightarrow U_d({\bf C})} be a non-trivial representation on a unitary group of some dimension {d} with {d < \frac{p-1}{2}}. Set {a} to be the group element

\displaystyle  a := \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix},

and suppose first that {\rho(a)} is non-trivial. Since {a^p=1}, we have {\rho(a)^p=1}; thus all the eigenvalues of {\rho(a)} are {p^{th}} roots of unity. On the other hand, by conjugating {a} by diagonal matrices in {SL_2(F_p)}, we see that {a} is conjugate to {a^m} (and hence {\rho(a)} conjugate to {\rho(a)^m}) whenever {m} is a quadratic residue mod {p}. As such, the eigenvalues of {\rho(a)} must be permuted by the operation {x \mapsto x^m} for any quadratic residue mod {p}. Since {\rho(a)} has at least one non-trivial eigenvalue, and there are {\frac{p-1}{2}} distinct quadratic residues, we conclude that {\rho(a)} has at least {\frac{p-1}{2}} distinct eigenvalues. But {\rho(a)} is a {d \times d} matrix with {d < \frac{p-1}{2}}, a contradiction. Thus {a} lies in the kernel of {\rho}. By conjugation, we then see that this kernel contains all unipotent matrices. But these matrices generate {SL_2(F_p)} (see exercise below), and so {\rho} is trivial, a contradiction. \Box

Exercise 6 Show that for any prime {p}, the unipotent matrices

\displaystyle  \begin{pmatrix} 1 & t \\ 0 & 1 \end{pmatrix}, \begin{pmatrix} 1 & 0 \\ t & 1 \end{pmatrix}

for any {t \in F_p} generate {SL_2(F_p)} as a group.

Exercise 7 Let {G} be a finite group, and let {D \geq 1}. If {G} is generated by a collection {G_1,\ldots,G_k} of {D}-quasirandom subgroups, show that {G} is itself {D}-quasirandom.

Exercise 8 Show that {SL_d(F_p)} is {\frac{p-1}{2}}-quasirandom for any {d \geq 2} and any prime {p}. (This is not sharp; the optimal bound here is {\gg_d p^{d-1}}, which follows from the results of Landazuri and Seitz.)

As a corollary of the above results and Exercise 2, we see that the projective special linear group {PSL_d(F_p)} is also {\frac{p-1}{2}}-quasirandom.

Remark 2 One can ask whether the bound {\frac{p-1}{2}} in Lemma 2 is sharp, assuming of course that {p} is odd. Noting that {SL_2(F_p)} acts linearly on the plane {F_p^2}, we see that it also acts projectively on the projective line {PF_p^1 := (F_p^2 \backslash \{0\}) / F_p^\times}, which has {p+1} elements. Thus {SL_2(F_p)} acts via the quasiregular representation on the {p+1}-dimensional space {\ell^2(PF_p^1)}, and also on the {p}-dimensional subspace {\ell^2(PF_p^1)_0}; this latter representation (known as the Steinberg representation) is irreducible. This shows that the {\frac{p-1}{2}} bound cannot be improved beyond {p}. More generally, given any character {\chi: F_p^\times \rightarrow S^1}, {SL_2(F_p)} acts on the {p+1}-dimensional space {V_\chi} of functions {f \in \ell^2( F_p^2 \backslash \{0\} )} that obey the twisted dilation invariance {f(tx) = \chi(t) f(x)} for all {t \in F_p^\times} and {x \in F_p^2 \backslash \{0\}}; these are known as the principal series representations. When {\chi} is the trivial character, this is the quasiregular representation discussed earlier. For most other characters, this is an irreducible representation, but it turns out that when {\chi} is the quadratic representation (thus taking values in {\{-1,+1\}} while being non-trivial), the principal series representation splits into the direct sum of two {\frac{p+1}{2}}-dimensional representations, which comes very close to matching the bound in Lemma 2. There is a parallel series of representations to the principal series (known as the discrete series) which is more complicated to describe (roughly speaking, one has to embed {F_p} in a quadratic extension {F_{p^2}} and then use a rotated version of the above construction, to change a split torus into a non-split torus), but can generate irreducible representations of dimension {\frac{p-1}{2}}, showing that the bound in Lemma 2 is in fact exactly sharp. These constructions can be generalised to arbitrary finite groups of Lie type using Deligne-Luzstig theory, but this is beyond the scope of this course (and of my own knowledge in the subject).

Exercise 9 Let {p} be an odd prime. Show that for any {n \geq p+2}, the alternating group {A_n} is {p-1}-quasirandom. (Hint: show that all cycles of order {p} in {A_n} are conjugate to each other in {A_n} (and not just in {S_n}); in particular, a cycle is conjugate to its {j^{th}} power for all {j=1,\ldots,p-1}. Also, as {n \geq 5}, {A_n} is simple, and so the cycles of order {p} generate the entire group.)

Remark 3 By using more precise information on the representations of the alternating group (using the theory of Specht modules and Young tableaux), one can show the slightly sharper statement that {A_n} is {n-1}-quasirandom for {n \geq 6} (but is only {3}-quasirandom for {n=5} due to isocahedral symmetry, and {1}-quasirandom for {n \leq 4} due to lack of perfectness). Using Exercise 3 with the index {n} subgroup {A_{n-1}}, we see that the bound {n-1} cannot be improved. Thus, {A_n} (for large {n}) is not as quasirandom as the special linear groups {SL_d(F_p)} (for {p} large and {d} bounded), because in the latter case the quasirandomness is as strong as a power of the size of the group, whereas in the former case it is only logarithmic in size.

If one replaces the alternating group {A_n} with the slightly larger symmetric group {S_n}, then quasirandomness is destroyed (since {S_n}, having the abelian quotient {S_n/A_n}, is not perfect); indeed, {S_n} is {1}-quasirandom and no better.

Remark 4 Thanks to the monumental achievement of the classification of finite simple groups, we know that apart from a finite number (26, to be precise) of sporadic exceptions, all finite simple groups (up to isomorphism) are either a cyclic group {{\bf Z}/p{\bf Z}}, an alternating group {A_n}, or is a finite simple group of Lie type such as {PSL_d(F_p)}. (We will define the concept of a finite simple group of Lie type more precisely in later notes, but suffice to say for now that such groups are constructed from reductive algebraic groups, for instance {PSL_d(F_p)} is constructed from {SL_d} in characteristic {p}.) In the case of finite simple groups {G} of Lie type with bounded rank {r=O(1)}, it is known from the work of Landazuri and Seitz that such groups are {\gg |G|^c}-quasirandom for some {c>0} depending only on the rank. On the other hand, by the previous remark, the large alternating groups do not have this property, and one can show that the finite simple groups of Lie type with large rank also do not have this property. Thus, we see using the classification that if a finite simple group {G} is {|G|^c}-quasirandom for some {c>0} and {|G|} is sufficiently large depending on {c}, then {G} is a finite simple group of Lie type with rank {O_c(1)}. It would be of interest to see if there was an alternate way to establish this fact that did not rely on the classification, as it may lead to an alternate approach to proving the classification (or perhaps a weakened version thereof).

A key reason why quasirandomness is desirable for the purposes of demonstrating expansion is that quasirandom groups happen to be rapidly mixing at large scales, as we shall see below the fold. As such, quasirandomness is an important tool for demonstrating expansion in Cayley graphs, though because expansion is a phenomenon that must hold at all scales, one needs to supplement quasirandomness with some additional input that creates mixing at small or medium scales also before one can deduce expansion. As an example of this technique of combining quasirandomness with mixing at small and medium scales, we present a proof (due to Sarnak-Xue, and simplified by Gamburd) of a weak version of the famous “3/16 theorem” of Selberg on the least non-trivial eigenvalue of the Laplacian on a modular curve, which among other things can be used to construct a family of expander Cayley graphs in {SL_2({\bf Z}/N{\bf Z})} (compare this with the property (T)-based methods in the previous notes, which could construct expander Cayley graphs in {SL_d({\bf Z}/N{\bf Z})} for any fixed {d \geq 3}).

Read the rest of this entry »

Van Vu and I have just uploaded to the arXiv our short survey article, “Random matrices: The Four Moment Theorem for Wigner ensembles“, submitted to the MSRI book series, as part of the proceedings on the MSRI semester program on random matrix theory from last year.  This is a highly condensed version (at 17 pages) of a much longer survey (currently at about 48 pages, though not completely finished) that we are currently working on, devoted to the recent advances in understanding the universality phenomenon for spectral statistics of Wigner matrices.  In this abridged version of the survey, we focus on a key tool in the subject, namely the Four Moment Theorem which roughly speaking asserts that the statistics of a Wigner matrix depend only on the first four moments of the entries.  We give a sketch of proof of this theorem, and two sample applications: a central limit theorem for individual eigenvalues of a Wigner matrix (extending a result of Gustavsson in the case of GUE), and the verification of a conjecture of Wigner, Dyson, and Mehta on the universality of the asymptotic k-point correlation functions even for discrete ensembles (provided that we interpret convergence in the vague topology sense).

For reasons of space, this paper is very far from an exhaustive survey even of the narrow topic of universality for Wigner matrices, but should hopefully be an accessible entry point into the subject nevertheless.

RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.

RSS Buzz feed

Follow

Get every new post delivered to your Inbox.

Join 942 other followers