You are currently browsing the monthly archive for December 2010.

One of my favourite unsolved problems in harmonic analysis is the restriction problem. This problem, first posed explicitly by Elias Stein, can take many equivalent forms, but one of them is this: one starts with a smooth compact hypersurface ${S}$ (possibly with boundary) in ${{\bf R}^d}$, such as the unit sphere ${S = S^2}$ in ${{\bf R}^3}$, and equips it with surface measure ${d\sigma}$. One then takes a bounded measurable function ${f \in L^\infty(S,d\sigma)}$ on this surface, and then computes the (inverse) Fourier transform

$\displaystyle \widehat{fd\sigma}(x) = \int_S e^{2\pi i x \cdot \omega} f(\omega) d\sigma(\omega)$

of the measure ${fd\sigma}$. As ${f}$ is bounded and ${d\sigma}$ is a finite measure, this is a bounded function on ${{\bf R}^d}$; from the dominated convergence theorem, it is also continuous. The restriction problem asks whether this Fourier transform also decays in space, and specifically whether ${\widehat{fd\sigma}}$ lies in ${L^q({\bf R}^d)}$ for some ${q < \infty}$. (This is a natural space to control decay because it is translation invariant, which is compatible on the frequency space side with the modulation invariance of ${L^\infty(S,d\sigma)}$.) By the closed graph theorem, this is the case if and only if there is an estimate of the form

$\displaystyle \| \widehat{f d\sigma} \|_{L^q({\bf R}^d)} \leq C_{q,d,S} \|f\|_{L^\infty(S,d\sigma)} \ \ \ \ \ (1)$

for some constant ${C_{q,d,S}}$ that can depend on ${q,d,S}$ but not on ${f}$. By a limiting argument, to provide such an estimate, it suffices to prove such an estimate under the additional assumption that ${f}$ is smooth.

Strictly speaking, the above problem should be called the extension problem, but it is dual to the original formulation of the restriction problem, which asks to find those exponents ${1 \leq q' \leq \infty}$ for which the Fourier transform of an ${L^{q'}({\bf R}^d)}$ function ${g}$ can be meaningfully restricted to a hypersurface ${S}$, in the sense that the map ${g \mapsto \hat g|_{S}}$ can be continuously defined from ${L^{q'}({\bf R}^d)}$ to, say, ${L^1(S,d\sigma)}$. A duality argument shows that the exponents ${q'}$ for which the restriction property holds are the dual exponents to the exponents ${q}$ for which the extension problem holds.

There are several motivations for studying the restriction problem. The problem is connected to the classical question of determining the nature of the convergence of various Fourier summation methods (and specifically, Bochner-Riesz summation); very roughly speaking, if one wishes to perform a partial Fourier transform by restricting the frequencies (possibly using a well-chosen weight) to some region ${B}$ (such as a ball), then one expects this operation to well behaved if the boundary ${\partial B}$ of this region has good restriction (or extension) properties. More generally, the restriction problem for a surface ${S}$ is connected to the behaviour of Fourier multipliers whose symbols are singular at ${S}$. The problem is also connected to the analysis of various linear PDE such as the Helmholtz equation, Schro\”dinger equation, wave equation, and the (linearised) Korteweg-de Vries equation, because solutions to such equations can be expressed via the Fourier transform in the form ${fd\sigma}$ for various surfaces ${S}$ (the sphere, paraboloid, light cone, and cubic for the Helmholtz, Schrödinger, wave, and linearised Korteweg de Vries equation respectively). A particular family of restriction-type theorems for such surfaces, known as Strichartz estimates, play a foundational role in the nonlinear perturbations of these linear equations (e.g. the nonlinear Schrödinger equation, the nonlinear wave equation, and the Korteweg-de Vries equation). Last, but not least, there is a a fundamental connection between the restriction problem and the Kakeya problem, which roughly speaking concerns how tubes that point in different directions can overlap. Indeed, by superimposing special functions of the type ${\widehat{fd\sigma}}$, known as wave packets, and which are concentrated on tubes in various directions, one can “encode” the Kakeya problem inside the restriction problem; in particular, the conjectured solution to the restriction problem implies the conjectured solution to the Kakeya problem. Finally, the restriction problem serves as a simplified toy model for studying discrete exponential sums whose coefficients do not have a well controlled phase; this perspective was, for instance, used by Ben Green when he established Roth’s theorem in the primes by Fourier-analytic methods, which was in turn one of the main inspirations for our later work establishing arbitrarily long progressions in the primes, although we ended up using ergodic-theoretic arguments instead of Fourier-analytic ones and so did not directly use restriction theory in that paper.

The estimate (1) is trivial for ${q=\infty}$ and becomes harder for smaller ${q}$. The geometry, and more precisely the curvature, of the surface ${S}$, plays a key role: if ${S}$ contains a portion which is completely flat, then it is not difficult to concoct an ${f}$ for which ${\widehat{f d\sigma}}$ fails to decay in the normal direction to this flat portion, and so there are no restriction estimates for any finite ${q}$. Conversely, if ${S}$ is not infinitely flat at any point, then from the method of stationary phase, the Fourier transform ${\widehat{d\sigma}}$ can be shown to decay at a power rate at infinity, and this together with a standard method known as the ${TT^*}$ argument can be used to give non-trivial restriction estimates for finite ${q}$. However, these arguments fall somewhat short of obtaining the best possible exponents ${q}$. For instance, in the case of the sphere ${S = S^{d-1} \subset {\bf R}^d}$, the Fourier transform ${\widehat{d\sigma}(x)}$ is known to decay at the rate ${O(|x|^{-(d-1)/2})}$ and no better as ${d \rightarrow \infty}$, which shows that the condition ${q > \frac{2d}{d-1}}$ is necessary in order for (1) to hold for this surface. The restriction conjecture for ${S^{d-1}}$ asserts that this necessary condition is also sufficient. However, the ${TT^*}$-based argument gives only the Tomas-Stein theorem, which in this context gives (1) in the weaker range ${q \geq \frac{2(d+1)}{d-1}}$. (On the other hand, by the nature of the ${TT^*}$ method, the Tomas-Stein theorem does allow the ${L^\infty(S,d\sigma)}$ norm on the right-hand side to be relaxed to ${L^2(S,d\sigma)}$, at which point the Tomas-Stein exponent ${\frac{2(d+1)}{d-1}}$ becomes best possible. The fact that the Tomas-Stein theorem has an ${L^2}$ norm on the right-hand side is particularly valuable for applications to PDE, leading in particular to the Strichartz estimates mentioned earlier.)

Over the last two decades, there was a fair amount of work in pushing past the Tomas-Stein barrier. For sake of concreteness let us work just with the restriction problem for the unit sphere ${S^2}$ in ${{\bf R}^3}$. Here, the restriction conjecture asserts that (1) holds for all ${q > 3}$, while the Tomas-Stein theorem gives only ${q \geq 4}$. By combining a multiscale analysis approach with some new progress on the Kakeya conjecture, Bourgain was able to obtain the first improvement on this range, establishing the restriction conjecture for ${q > 4 - \frac{2}{15}}$. The methods were steadily refined over the years; until recently, the best result (due to myself) was that the conjecture held for all ${q > 3 \frac{1}{3}}$, which proceeded by analysing a “bilinear ${L^2}$” variant of the problem studied previously by Bourgain and by Wolff. This is essentially the limit of that method; the relevant bilinear ${L^2}$ estimate fails for ${q < 3 + \frac{1}{3}}$. (This estimate was recently established at the endpoint ${q=3+\frac{1}{3}}$ by Jungjin Lee (personal communication), though this does not quite improve the range of exponents in (1) due to a logarithmic inefficiency in converting the bilinear estimate to a linear one.)

On the other hand, the full range ${q>3}$ of exponents in (1) was obtained by Bennett, Carbery, and myself (with an alternate proof later given by Guth), but only under the additional assumption of non-coplanar interactions. In three dimensions, this assumption was enforced by replacing (1) with the weaker trilinear (and localised) variant

$\displaystyle \| \widehat{f_1 d\sigma_1} \widehat{f_2 d\sigma_2} \widehat{f_3 d\sigma_3} \|_{L^{q/3}(B(0,R))} \leq C_{q,d,S_1,S_2,S_3,\epsilon} R^\epsilon \ \ \ \ \ (2)$

$\displaystyle \|f_1\|_{L^\infty(S_1,d\sigma_1)} \|f_2\|_{L^\infty(S_2,d\sigma_2)} \|f_3\|_{L^\infty(S_3,d\sigma_3)}$

where ${\epsilon>0}$ and ${R \geq 1}$ are arbitrary, ${B(0,R)}$ is the ball of radius ${R}$ in ${{\bf R}^3}$, and ${S_1,S_2,S_3}$ are compact portions of ${S}$ whose unit normals ${n_1(),n_2(),n_3()}$ are never coplanar, thus there is a uniform lower bound

$\displaystyle |n_1(\omega_1) \wedge n_2(\omega_2) \wedge n_3(\omega_3)| \geq c$

for some ${c>0}$ and all ${\omega_1 \in S_1, \omega_2 \in S_2, \omega_3 \in S_3}$. If it were not for this non-coplanarity restriction, (2) would be equivalent to (1) (by setting ${S_1=S_2=S_3}$ and ${f_1=f_2=f_3}$, with the converse implication coming from Hölder’s inequality; the ${R^\epsilon}$ loss can be removed by a lemma from a paper of mine). At the time we wrote this paper, we tried fairly hard to try to remove this non-coplanarity restriction in order to recover progress on the original restriction conjecture, but without much success.

A few weeks ago, though, Bourgain and Guth found a new way to use multiscale analysis to “interpolate” between the result of Bennett, Carbery and myself (that has optimal exponents, but requires non-coplanar interactions), with a more classical square function estimate of Córdoba that handles the coplanar case. A direct application of this interpolation method already ties with the previous best known result in three dimensions (i.e. that (1) holds for ${q > 3 \frac{1}{3}}$). But it also allows for the insertion of additional input, such as the best Kakeya estimate currently known in three dimensions, due to Wolff. This enlarges the range slightly to ${q > 3.3}$. The method also can extend to variable-coefficient settings, and in some of these cases (where there is so much “compression” going on that no additional Kakeya estimates are available) the estimates are best possible.

As is often the case in this field, there is a lot of technical book-keeping and juggling of parameters in the formal arguments of Bourgain and Guth, but the main ideas and “numerology” can be expressed fairly readily. (In mathematics, numerology refers to the empirically observed relationships between various key exponents and other numerical parameters; in many cases, one can use shortcuts such as dimensional analysis or informal heuristic, to compute these exponents long before the formal argument is completely in place.) Below the fold, I would like to record this numerology for the simplest of the Bourgain-Guth arguments, namely a reproof of (1) for ${p > 3 \frac{1}{3}}$. This is primarily for my own benefit, but may be of interest to other experts in this particular topic. (See also my 2003 lecture notes on the restriction conjecture.)

In order to focus on the ideas in the paper (rather than on the technical details), I will adopt an informal, heuristic approach, for instance by interpreting the uncertainty principle and the pigeonhole principle rather liberally, and by focusing on main terms in a decomposition and ignoring secondary terms. I will also be somewhat vague with regard to asymptotic notation such as ${\ll}$. Making the arguments rigorous requires a certain amount of standard but tedious effort (and is one of the main reasons why the Bourgain-Guth paper is as long as it is), which I will not focus on here.

I’ve just uploaded to the arXiv my paper “Outliers in the spectrum of iid matrices with bounded rank perturbations“, submitted to Probability Theory and Related Fields. This paper is concerned with outliers to the circular law for iid random matrices. Recall that if ${X_n}$ is an ${n \times n}$ matrix whose entries are iid complex random variables with mean zero and variance one, then the ${n}$ complex eigenvalues of the normalised matrix ${\frac{1}{\sqrt{n}} X_n}$ will almost surely be distributed according to the circular law distribution ${\frac{1}{\pi} 1_{|z| \leq 1} d^2 z}$ in the limit ${n \rightarrow \infty}$. (See these lecture notes for further discussion of this law.)

The circular law is also stable under bounded rank perturbations: if ${C_n}$ is a deterministic rank ${O(1)}$ matrix of polynomial size (i.e. of operator norm ${O(n^{O(1)})}$), then the circular law also holds for ${\frac{1}{\sqrt{n}} X_n + C_n}$ (this is proven in a paper of myself, Van Vu, and Manjunath Krisnhapur). In particular, the bulk of the eigenvalues (i.e. ${(1-o(1)) n}$ of the ${n}$ eigenvalues) will lie inside the unit disk ${\{ z \in {\bf C}: |z| \leq 1 \}}$.

However, this leaves open the possibility for one or more outlier eigenvalues that lie significantly outside the unit disk; the arguments in the paper cited above give some upper bound on the number of such eigenvalues (of the form ${O(n^{1-c})}$ for some absolute constant ${c>0}$) but does not exclude them entirely. And indeed, numerical data shows that such outliers can exist for certain bounded rank perturbations.

In this paper, some results are given as to when outliers exist, and how they are distributed. The easiest case is of course when there is no bounded rank perturbation: ${C_n=0}$. In that case, an old result of Bai and Yin and of Geman shows that the spectral radius of ${\frac{1}{\sqrt{n}} X_n}$ is almost surely ${1+o(1)}$, thus all eigenvalues will be contained in a ${o(1)}$ neighbourhood of the unit disk, and so there are no significant outliers. The proof is based on the moment method.

Now we consider a bounded rank perturbation ${C_n}$ which is nonzero, but which has a bounded operator norm: ${\|C_n\|_{op} = O(1)}$. In this case, it turns out that the matrix ${\frac{1}{\sqrt{n}} X_n + C_n}$ will have outliers if the deterministic component ${C_n}$ has outliers. More specifically (and under the technical hypothesis that the entries of ${X_n}$ have bounded fourth moment), if ${\lambda}$ is an eigenvalue of ${C_n}$ with ${|\lambda| > 1}$, then (for ${n}$ large enough), ${\frac{1}{\sqrt{n}} X_n + C_n}$ will almost surely have an eigenvalue at ${\lambda+o(1)}$, and furthermore these will be the only outlier eigenvalues of ${\frac{1}{\sqrt{n}} X_n + C_n}$.

Thus, for instance, adding a bounded nilpotent low rank matrix to ${\frac{1}{\sqrt{n}} X_n}$ will not create any outliers, because the nilpotent matrix only has eigenvalues at zero. On the other hand, adding a bounded Hermitian low rank matrix will create outliers as soon as this matrix has an operator norm greater than ${1}$.

When I first thought about this problem (which was communicated to me by Larry Abbott), I believed that it was quite difficult, because I knew that the eigenvalues of non-Hermitian matrices were quite unstable with respect to general perturbations (as discussed in this previous blog post), and that there were no interlacing inequalities in this case to control bounded rank perturbations (as discussed in this post). However, as it turns out I had arrived at the wrong conclusion, especially in the exterior of the unit disk in which the resolvent is actually well controlled and so there is no pseudospectrum present to cause instability. This was pointed out to me by Alice Guionnet at an AIM workshop last week, after I had posed the above question during an open problems session. Furthermore, at the same workshop, Percy Deift emphasised the point that the basic determinantal identity

$\displaystyle \det(1 + AB) = \det(1 + BA) \ \ \ \ \ (1)$

for ${n \times k}$ matrices ${A}$ and ${k \times n}$ matrices ${B}$ was a particularly useful identity in random matrix theory, as it converted problems about large (${n \times n}$) matrices into problems about small (${k \times k}$) matrices, which was particularly convenient in the regime when ${n \rightarrow \infty}$ and ${k}$ was fixed. (Percy was speaking in the context of invariant ensembles, but the point is in fact more general than this.)

From this, it turned out to be a relatively simple manner to transform what appeared to be an intractable ${n \times n}$ matrix problem into quite a well-behaved ${k \times k}$ matrix problem for bounded ${k}$. Specifically, suppose that ${C_n}$ had rank ${k}$, so that one can factor ${C_n = A_n B_n}$ for some (deterministic) ${n \times k}$ matrix ${A_n}$ and ${k \times n}$ matrix ${B_n}$. To find an eigenvalue ${z}$ of ${\frac{1}{\sqrt{n}} X_n + C_n}$, one has to solve the characteristic polynomial equation

$\displaystyle \det( \frac{1}{\sqrt{n}} X_n + A_n B_n - z ) = 0.$

This is an ${n \times n}$ determinantal equation, which looks difficult to control analytically. But we can manipulate it using (1). If we make the assumption that ${z}$ is outside the spectrum of ${\frac{1}{\sqrt{n}} X_n}$ (which we can do as long as ${z}$ is well away from the unit disk, as the unperturbed matrix ${\frac{1}{\sqrt{n}} X_n}$ has no outliers), we can divide by ${\frac{1}{\sqrt{n}} X_n - z}$ to arrive at

$\displaystyle \det( 1 + (\frac{1}{\sqrt{n}} X_n-z)^{-1} A_n B_n ) = 0.$

Now we apply the crucial identity (1) to rearrange this as

$\displaystyle \det( 1 + B_n (\frac{1}{\sqrt{n}} X_n-z)^{-1} A_n ) = 0.$

The crucial point is that this is now an equation involving only a ${k \times k}$ determinant, rather than an ${n \times n}$ one, and is thus much easier to solve. The situation is particularly simple for rank one perturbations

$\displaystyle \frac{1}{\sqrt{n}} X_n + u_n v_n^*$

in which case the eigenvalue equation is now just a scalar equation

$\displaystyle 1 + \langle (\frac{1}{\sqrt{n}} X_n-z)^{-1} u_n, v_n \rangle = 0$

that involves what is basically a single coefficient of the resolvent ${(\frac{1}{\sqrt{n}} X_n-z)^{-1}}$. (It is also an instructive exercise to derive this eigenvalue equation directly, rather than through (1).) There is by now a very well-developed theory for how to control such coefficients (particularly for ${z}$ in the exterior of the unit disk, in which case such basic tools as Neumann series work just fine); in particular, one has precise enough control on these coefficients to obtain the result on outliers mentioned above.

The same method can handle some other bounded rank perturbations. One basic example comes from looking at iid matrices with a non-zero mean ${\mu}$ and variance ${1}$; this can be modeled by ${\frac{1}{\sqrt{n}} X_n + \mu \sqrt{n} \phi_n \phi_n^*}$ where ${\phi_n}$ is the unit vector ${\phi_n := \frac{1}{\sqrt{n}} (1,\ldots,1)^*}$. Here, the bounded rank perturbation ${\mu \sqrt{n} \phi_n \phi_n^*}$ has a large operator norm (equal to ${|\mu| \sqrt{n}}$), so the previous result does not directly apply. Nevertheless, the self-adjoint nature of the perturbation has a stabilising effect, and I was able to show that there is still only one outlier, and that it is at the expected location of ${\mu \sqrt{n}+o(1)}$.

If one moves away from the case of self-adjoint perturbations, though, the situation changes. Let us now consider a matrix of the form ${\frac{1}{\sqrt{n}} X_n + \mu \sqrt{n} \phi_n \psi_n^*}$, where ${\psi_n}$ is a randomised version of ${\phi_n}$, e.g. ${\psi_n := \frac{1}{\sqrt{n}} (\pm 1, \ldots, \pm 1)^*}$, where the ${\pm 1}$ are iid Bernoulli signs; such models were proposed recently by Rajan and Abbott as a model for neural networks in which some nodes are excitatory (and give columns with positive mean) and some are inhibitory (leading to columns with negative mean). Despite the superficial similarity with the previous example, the outlier behaviour is now quite different. Instead of having one extremely large outlier (of size ${\sim\sqrt{n}}$) at an essentially deterministic location, we now have a number of eigenvalues of size ${O(1)}$, scattered according to a random process. Indeed, (in the case when the entries of ${X_n}$ were real and bounded) I was able to show that the outlier point process converged (in the sense of converging ${k}$-point correlation functions) to the zeroes of a random Laurent series

$\displaystyle g(z) = 1 - \mu \sum_{j=0}^\infty \frac{g_j}{z^{j+1}}$

where ${g_0,g_1,g_2,\ldots \equiv N(0,1)}$ are iid real Gaussians. This is basically because the coefficients of the resolvent ${(\frac{1}{\sqrt{n}} X_n - zI)^{-1}}$ have a Neumann series whose coefficients enjoy a central limit theorem.

On the other hand, as already observed numerically (and rigorously, in the gaussian case) by Rajan and Abbott, if one projects such matrices to have row sum zero, then the outliers all disappear. This can be explained by another appeal to (1); this projection amounts to right-multiplying ${\frac{1}{\sqrt{n}} X_n + \mu \sqrt{n} \phi_n \psi_n^*}$ by the projection matrix ${P}$ to the zero-sum vectors. But by (1), the non-zero eigenvalues of the resulting matrix ${(\frac{1}{\sqrt{n}} X_n + \mu \sqrt{n} \phi_n \psi_n^*)P}$ are the same as those for ${P (\frac{1}{\sqrt{n}} X_n + \mu \sqrt{n} \phi_n \psi_n^*)}$. Since ${P}$ annihilates ${\phi_n}$, we thus see that in this case the bounded rank perturbation plays no role, and the question reduces to obtaining a circular law with no outliers for ${P \frac{1}{\sqrt{n}} X_n}$. As it turns out, this can be done by invoking the machinery of Van Vu and myself that we used to prove the circular law for various random matrix models.

The first volume of my 2009 blog book, “An epsilon of room“, has now been published by the AMS, as part of the Graduate Studies in Mathematics series.  (So I finally have a book whose cover is at least partially in yellow, which for some reason seems to be the traditional colour for mathematics texts.) This volume contains the material from my 245B and 245C classes, and can thus be viewed as a second text in graduate real analysis.  (I plan to have one volume of the 2010 blog book to be devoted to the material for the 245A class  I just taught, and would thus serve as a first text in graduate real analysis to complement this volume.)

The second volume, which covers a wide range of other topics, should also be published in the near future.

This week I am at the American Institute of Mathematics, as an organiser on a workshop on the universality phenomenon in random matrices. There have been a number of interesting discussions so far in this workshop. Percy Deift, in a lecture on universality for invariant ensembles, gave some applications of what he only half-jokingly termed “the most important identity in mathematics”, namely the formula

$\displaystyle \hbox{det}( 1 + AB ) = \hbox{det}(1 + BA)$

whenever ${A, B}$ are ${n \times k}$ and ${k \times n}$ matrices respectively (or more generally, ${A}$ and ${B}$ could be linear operators with sufficiently good spectral properties that make both sides equal). Note that the left-hand side is an ${n \times n}$ determinant, while the right-hand side is a ${k \times k}$ determinant; this formula is particularly useful when computing determinants of large matrices (or of operators), as one can often use it to transform such determinants into much smaller determinants. In particular, the asymptotic behaviour of ${n \times n}$ determinants as ${n \rightarrow \infty}$ can be converted via this formula to determinants of a fixed size (independent of ${n}$), which is often a more favourable situation to analyse. Unsurprisingly, this trick is particularly useful for understanding the asymptotic behaviour of determinantal processes.

There are many ways to prove the identity. One is to observe first that when ${A, B}$ are invertible square matrices of the same size, that ${1+BA}$ and ${1+AB}$ are conjugate to each other and thus clearly have the same determinant; a density argument then removes the invertibility hypothesis, and a padding-by-zeroes argument then extends the square case to the rectangular case. Another is to proceed via the spectral theorem, noting that ${AB}$ and ${BA}$ have the same non-zero eigenvalues.

By rescaling, one obtains the variant identity

$\displaystyle \hbox{det}( z + AB ) = z^{n-k} \hbox{det}(z + BA)$

which essentially relates the characteristic polynomial of ${AB}$ with that of ${BA}$. When ${n=k}$, a comparison of coefficients this already gives important basic identities such as ${\hbox{tr}(AB) = \hbox{tr}(BA)}$ and ${\hbox{det}(AB) = \hbox{det}(BA)}$; when ${n}$ is not equal to ${k}$, an inspection of the ${z^{n-k}}$ coefficient similarly gives the Cauchy-Binet formula (which, incidentally, is also useful when performing computations on determinantal processes).

Thanks to this formula (and with a crucial insight of Alice Guionnet), I was able to solve a problem (on outliers for the circular law) that I had in the back of my mind for a few months, and initially posed to me by Larry Abbott; I hope to talk more about this in a future post.

Today, though, I wish to talk about another piece of mathematics that emerged from an afternoon of free-form discussion that we managed to schedule within the AIM workshop. Specifically, we hammered out a heuristic model of the mesoscopic structure of the eigenvalues ${\lambda_1 \leq \ldots \leq \lambda_n}$ of the ${n \times n}$ Gaussian Unitary Ensemble (GUE), where ${n}$ is a large integer. As is well known, the probability density of these eigenvalues is given by the Ginebre distribution

$\displaystyle \frac{1}{Z_n} e^{-H(\lambda)}\ d\lambda$

where ${d\lambda = d\lambda_1 \ldots d\lambda_n}$ is Lebesgue measure on the Weyl chamber ${\{ (\lambda_1,\ldots,\lambda_n) \in {\bf R}^n: \lambda_1 \leq \ldots \leq \lambda_n \}}$, ${Z_n}$ is a constant, and the Hamiltonian ${H}$ is given by the formula

$\displaystyle H(\lambda_1,\ldots,\lambda_n) := \sum_{j=1}^n \frac{\lambda_j^2}{2} - 2 \sum_{1 \leq i < j \leq n} \log |\lambda_i-\lambda_j|.$

At the macroscopic scale of ${\sqrt{n}}$, the eigenvalues ${\lambda_j}$ are distributed according to the Wigner semicircle law

$\displaystyle \rho_{sc}(x) := \frac{1}{2\pi} (4-x^2)_+^{1/2}.$

Indeed, if one defines the classical location ${\gamma_i^{cl}}$ of the ${i^{th}}$ eigenvalue to be the unique solution in ${[-2\sqrt{n}, 2\sqrt{n}]}$ to the equation

$\displaystyle \int_{-2\sqrt{n}}^{\gamma_i^{cl}/\sqrt{n}} \rho_{sc}(x)\ dx = \frac{i}{n}$

then it is known that the random variable ${\lambda_i}$ is quite close to ${\gamma_i^{cl}}$. Indeed, a result of Gustavsson shows that, in the bulk region when ${\epsilon n < i 0}$, ${\lambda_i}$ is distributed asymptotically as a gaussian random variable with mean ${\gamma_i^{cl}}$ and variance ${\sqrt{\frac{\log n}{\pi}} \times \frac{1}{\sqrt{n} \rho_{sc}(\gamma_i^{cl})}}$. Note that from the semicircular law, the factor ${\frac{1}{\sqrt{n} \rho_{sc}(\gamma_i^{cl})}}$ is the mean eigenvalue spacing.

At the other extreme, at the microscopic scale of the mean eigenvalue spacing (which is comparable to ${1/\sqrt{n}}$ in the bulk, but can be as large as ${n^{-1/6}}$ at the edge), the eigenvalues are asymptotically distributed with respect to a special determinantal point process, namely the Dyson sine process in the bulk (and the Airy process on the edge), as discussed in this previous post.

Here, I wish to discuss the mesoscopic structure of the eigenvalues, in which one involves scales that are intermediate between the microscopic scale ${1/\sqrt{n}}$ and the macroscopic scale ${\sqrt{n}}$, for instance in correlating the eigenvalues ${\lambda_i}$ and ${\lambda_j}$ in the regime ${|i-j| \sim n^\theta}$ for some ${0 < \theta < 1}$. Here, there is a surprising phenomenon; there is quite a long-range correlation between such eigenvalues. The result of Gustavsson shows that both ${\lambda_i}$ and ${\lambda_j}$ behave asymptotically like gaussian random variables, but a further result from the same paper shows that the correlation between these two random variables is asymptotic to ${1-\theta}$ (in the bulk, at least); thus, for instance, adjacent eigenvalues ${\lambda_{i+1}}$ and ${\lambda_i}$ are almost perfectly correlated (which makes sense, as their spacing is much less than either of their standard deviations), but that even very distant eigenvalues, such as ${\lambda_{n/4}}$ and ${\lambda_{3n/4}}$, have a correlation comparable to ${1/\log n}$. One way to get a sense of this is to look at the trace

$\displaystyle \lambda_1 + \ldots + \lambda_n.$

This is also the sum of the diagonal entries of a GUE matrix, and is thus normally distributed with a variance of ${n}$. In contrast, each of the ${\lambda_i}$ (in the bulk, at least) has a variance comparable to ${\log n/n}$. In order for these two facts to be consistent, the average correlation between pairs of eigenvalues then has to be of the order of ${1/\log n}$.

Below the fold, I give a heuristic way to see this correlation, based on Taylor expansion of the convex Hamiltonian ${H(\lambda)}$ around the minimum ${\gamma}$, which gives a conceptual probabilistic model for the mesoscopic structure of the GUE eigenvalues. While this heuristic is in no way rigorous, it does seem to explain many of the features currently known or conjectured about GUE, and looks likely to extend also to other models.

﻿

Tanja Eisner and I have just uploaded to the arXiv our paper “Large values of the Gowers-Host-Kra seminorms“, submitted to Journal d’Analyse Mathematique. This paper is concerned with the properties of three closely related families of (semi)norms, indexed by a positive integer ${k}$:

• The Gowers uniformity norms ${\|f\|_{U^k(G)}}$ of a (bounded, measurable, compactly supported) function ${f: G \rightarrow {\bf C}}$ taking values on a locally compact abelian group ${G}$, equipped with a Haar measure ${\mu}$;
• The Gowers uniformity norms ${\|f\|_{U^k([N])}}$ of a function ${f: [N] \rightarrow {\bf C}}$ on a discrete interval ${\{1,\ldots,N\}}$; and
• The Gowers-Host-Kra seminorms ${\|f\|_{U^k(X)}}$ of a function ${f \in L^\infty(X)}$ on an ergodic measure-preserving system ${X = (X,{\mathcal X},\mu,T)}$.

These norms have been discussed in depth in previous blog posts, so I will just quickly review the definition of the first norm here (the other two (semi)norms are defined similarly). The ${U^k(G)}$ norm is defined recursively by setting

$\displaystyle \| f \|_{U^1(G)} := |\int_G f\ d\mu|$

and

$\displaystyle \|f\|_{U^k(G)}^{2^k} := \int_G \| \Delta_h f \|_{U^{k-1}(G)}^{2^{k-1}}\ d\mu(h)$

where ${\Delta_h f(x) := f(x+h) \overline{f(x)}}$. Equivalently, one has

$\displaystyle \|f\|_{U^k(G)} := (\int_G \ldots \int_G \Delta_{h_1} \ldots \Delta_{h_k} f(x)\ d\mu(x) d\mu(h_1) \ldots d\mu(h_k))^{1/2^k}.$

Informally, the Gowers uniformity norm ${\|f\|_{U^k(G)}}$ measures the extent to which (the phase of ${f}$) behaves like a polynomial of degree less than ${k}$. Indeed, if ${\|f\|_{L^\infty(G)} \leq 1}$ and ${G}$ is compact with normalised Haar measure ${\mu(G)=1}$, it is not difficult to show that ${\|f\|_{U^k(G)}}$ is at most ${1}$, with equality if and only if ${f}$ takes the form ${f = e(P) := e^{2\pi iP}}$ almost everywhere, where ${P: G \rightarrow {\bf R}/{\bf Z}}$ is a polynomial of degree less than ${k}$ (which means that ${\partial_{h_1} \ldots \partial_{h_k} P(x) = 0}$ for all ${x,h_1,\ldots,h_k \in G}$).

Our first result is to show that this result is robust, uniformly over all choices of group ${G}$:

Theorem 1 (${L^\infty}$-near extremisers) Let ${G}$ be a compact abelian group with normalised Haar measure ${\mu(G)=1}$, and let ${f \in L^\infty(G)}$ be such that ${\|f\|_{L^\infty(G)} \leq 1}$ and ${\|f\|_{U^k(G)} \geq 1-\epsilon}$ for some ${\epsilon > 0}$ and ${k \geq 1}$. Then there exists a polynomial ${P: G \rightarrow {\bf R}/{\bf Z}}$ of degree at most ${k-1}$ such that ${\|f-e(P)\|_{L^1(G)} = o(1)}$, where ${o(1)}$ is bounded by a quantity ${c_k(\epsilon)}$ that goes to zero as ${\epsilon \rightarrow 0}$ for fixed ${k}$.

The quantity ${o(1)}$ can be described effectively (it is of polynomial size in ${\epsilon}$), but we did not seek to optimise it here. This result was already known in the case of vector spaces ${G = {\bf F}_p^n}$ over a fixed finite field ${{\bf F}_p}$ (where it is essentially equivalent to the assertion that the property of being a polynomial of degree at most ${k-1}$ is locally testable); the extension to general groups ${G}$ turns out to fairly routine. The basic idea is to use the recursive structure of the Gowers norms, which tells us in particular that if ${\|f\|_{U^k(G)}}$ is close to one, then ${\|\Delta_h f\|_{U^{k-1}(G)}}$ is close to one for most ${h}$, which by induction implies that ${\Delta_h f}$ is close to ${e(Q_h)}$ for some polynomials ${Q_h}$ of degree at most ${k-2}$ and for most ${h}$. (Actually, it is not difficult to use cocycle equations such as ${\Delta_{h+k} f = \Delta_h f \times T^h \Delta_k f}$ (when ${|f|=1}$) to upgrade “for most ${h}$” to “for all ${h}$“.) To finish the job, one would like to express the ${Q_h}$ as derivatives ${Q_h = \partial_h P}$ of a polynomial ${P}$ of degree at most ${k-1}$. This turns out to be equivalent to requiring that the ${Q_h}$ obey the cocycle equation

$\displaystyle Q_{h+k} = Q_h + T^h Q_k$

where ${T^h F(x) := F(x+h)}$ is the translate of ${F}$ by ${h}$. (In the paper, the sign conventions are reversed, so that ${T^h F(x) := F(x-h)}$, in order to be compatible with ergodic theory notation, but this makes no substantial difference to the arguments or results.) However, one does not quite get this right away; instead, by using some separation properties of polynomials, one can show the weaker statement that

$\displaystyle Q_{h+k} = Q_h + T^h Q_k + c_{h,k} \ \ \ \ \ (1)$

where the ${c_{h,k}}$ are small real constants. To eliminate these constants, one exploits the trivial cohomology of the real line. From (1) one soon concludes that the ${c_{h,k}}$ obey the ${2}$-cocycle equation

$\displaystyle c_{h,k} + c_{h+k,l} = c_{h,k+l} + c_{k,l}$

and an averaging argument then shows that ${c_{h,k}}$ is a ${2}$-coboundary in the sense that

$\displaystyle c_{h,k} = b_{h+k} - b_h - b_k$

for some small scalar ${b_h}$ depending on ${h}$. Subtracting ${b_h}$ from ${Q_h}$ then gives the claim.

Similar results and arguments also hold for the ${U^k([N])}$ and ${U^k(X)}$ norms, which we will not detail here.

Dimensional analysis reveals that the ${L^\infty}$ norm is not actually the most natural norm with which to compare the ${U^k}$ norms against. An application of Young’s convolution inequality in fact reveals that one has the inequality

$\displaystyle \|f\|_{U^k(G)} \leq \|f\|_{L^{p_k}(G)} \ \ \ \ \ (2)$

where ${p_k}$ is the critical exponent ${p_k := 2^k/(k+1)}$, without any compactness or normalisation hypothesis on the group ${G}$ and the Haar measure ${\mu}$. This allows us to extend the ${U^k(G)}$ norm to all of ${L^{p_k}(G)}$. There is then a stronger inverse theorem available:

Theorem 2 (${L^{p_k}}$-near extremisers) Let ${G}$ be a locally compact abelian group, and let ${f \in L^{p_k}(G)}$ be such that ${\|f\|_{L^{p_k}(G)} \leq 1}$ and ${\|f\|_{U^k(G)} \geq 1-\epsilon}$ for some ${\epsilon > 0}$ and ${k \geq 1}$. Then there exists a coset ${H}$ of a compact open subgroup ${H}$ of ${G}$, and a polynomial ${P: H to {\bf R}/{\bf Z}}$ of degree at most ${k-1}$ such that ${\|f-e(P) 1_H\|_{L^{p_k}(G)} = o(1)}$.

Conversely, it is not difficult to show that equality in (2) is attained when ${f}$ takes the form ${e(P) 1_H}$ as above. The main idea of proof is to use an inverse theorem for Young’s inequality due to Fournier to reduce matters to the ${L^\infty}$ case that was already established. An analogous result is also obtained for the ${U^k(X)}$ norm on an ergodic system; but for technical reasons, the methods do not seem to apply easily to the ${U^k([N])}$ norm. (This norm is essentially equivalent to the ${U^k({\bf Z}/\tilde N{\bf Z})}$ norm up to constants, with ${\tilde N}$ comparable to ${N}$, but when working with near-extremisers, norms that are only equivalent up to constants can have quite different near-extremal behaviour.)

In the case when ${G}$ is a Euclidean group ${{\bf R}^d}$, it is possible to use the sharp Young inequality of Beckner and of Brascamp-Lieb to improve (2) somewhat. For instance, when ${k=3}$, one has

$\displaystyle \|f\|_{U^3({\bf R}^d)} \leq 2^{-d/8} \|f\|_{L^2({\bf R}^d)}$

with equality attained if and only if ${f}$ is a gaussian modulated by a quadratic polynomial phase. This additional gain of ${2^{-d/8}}$ allows one to pinpoint the threshold ${1-\epsilon}$ for the previous near-extremiser results in the case of ${U^3}$ norms. For instance, by using the Host-Kra machinery of characteristic factors for the ${U^3(X)}$ norm, combined with an explicit and concrete analysis of the ${2}$-step nilsystems generated by that machinery, we can show that

$\displaystyle \|f\|_{U^3(X)} \leq 2^{-1/8} \|f\|_{L^2(X)}$

whenever ${X}$ is a totally ergodic system and ${f}$ is orthogonal to all linear and quadratic eigenfunctions (which would otherwise form immediate counterexamples to the above inequality), with the factor ${2^{-1/8}}$ being best possible. We can also establish analogous results for the ${U^3([N])}$ and ${U^3({\bf Z}/N{\bf Z})}$ norms (using the inverse ${U^3}$ theorem of Ben Green and myself, in place of the Host-Kra machinery), although it is not clear to us whether the ${2^{-1/8}}$ threshold remains best possible in this case.