You are currently browsing the monthly archive for August 2008.

This month I am at MSRI, for the programs of Ergodic Theory and Additive Combinatorics, and Analysis on Singular Spaces, that are currently ongoing here.  This week I am giving three lectures on the correspondence principle, and on finitary versions of ergodic theory, for the introductory workshop in the former program.  The article here is broadly describing the content of these talks (which are slightly different in theme from that announced in the abstract, due to some recent developments).  [These lectures were also recorded on video and should be available from the MSRI web site within a few months.]

As many readers may already know, my good friend and fellow mathematical blogger Tim Gowers, having wrapped up work on the Princeton Companion to Mathematics (which I believe is now in press), has begun another mathematical initiative, namely a “Tricks Wiki” to act as a repository for mathematical tricks and techniques.    Tim has already started the ball rolling with several seed articles on his own blog, and asked me to also contribute some articles.  (As I understand it, these articles will be migrated to the Wiki in a few months, once it is fully set up, and then they will evolve with edits and contributions by anyone who wishes to pitch in, in the spirit of Wikipedia; in particular, articles are not intended to be permanently authored or signed by any single contributor.)

So today I’d like to start by extracting some material from an old post of mine on “Amplification, arbitrage, and the tensor power trick” (as well as from some of the comments), and converting it to the Tricks Wiki format, while also taking the opportunity to add a few more examples.

Title: The tensor power trick

Quick description: If one wants to prove an inequality $X \leq Y$ for some non-negative quantities X, Y, but can only see how to prove a quasi-inequality $X \leq CY$ that loses a multiplicative constant C, then try to replace all objects involved in the problem by “tensor powers” of themselves and apply the quasi-inequality to those powers.  If all goes well, one can show that $X^M \leq C Y^M$ for all $M \geq 1$, with a constant C which is independent of M, which implies that $X \leq Y$ as desired by taking $M^{th}$ roots and then taking limits as $M \to \infty$.

Jim Colliander, Mark Keel, Gigliola Staffilani, Hideo Takaoka, and I have just uploaded to the arXiv the paper “Weakly turbulent solutions for the cubic defocusing nonlinear Schrödinger equation“, which we have submitted to Inventiones Mathematicae. This paper concerns the numerically observed phenomenon of weak turbulence for the periodic defocusing cubic non-linear Schrödinger equation

$-i u_t + \Delta u = |u|^2 u$ (1)

in two spatial dimensions, thus u is a function from ${\Bbb R} \times {\Bbb T}^2$ to ${\Bbb C}$.  This equation has three important conserved quantities: the mass

$M(u) = M(u(t)) := \int_{{\Bbb T}^2} |u(t,x)|^2\ dx$

the momentum

$\vec p(u) = \vec p(u(t)) = \int_{{\Bbb T}^2} \hbox{Im}( \nabla u(t,x) \overline{u(t,x)} )\ dx$

and the energy

$E(u) = E(u(t)) := \int_{{\Bbb T}^2} \frac{1}{2} |\nabla u(t,x)|^2 + \frac{1}{4} |u(t,x)|^4\ dx$.

(These conservation laws, incidentally, are related to the basic symmetries of phase rotation, spatial translation, and time translation, via Noether’s theorem.) Using these conservation laws and some standard PDE technology (specifically, some Strichartz estimates for the periodic Schrödinger equation), one can establish global wellposedness for the initial value problem for this equation in (say) the smooth category; thus for every smooth $u_0: {\Bbb T}^2 \to {\Bbb C}$ there is a unique global smooth solution $u: {\Bbb R} \times {\Bbb T}^2 \to {\Bbb C}$ to (1) with initial data $u(0,x) = u_0(x)$, whose mass, momentum, and energy remain constant for all time.

However, the mass, momentum, and energy only control three of the infinitely many degrees of freedom available to a function on the torus, and so the above result does not fully describe the dynamics of solutions over time.  In particular, the three conserved quantities inhibit, but do not fully prevent the possibility of a low-to-high frequency cascade, in which the mass, momentum, and energy of the solution remain conserved, but shift to increasingly higher frequencies (or equivalently, to finer spatial scales) as time goes to infinity.  This phenomenon has been observed numerically, and is sometimes referred to as weak turbulence (in contrast to strong turbulence, which is similar but happens within a finite time span rather than asymptotically).

To illustrate how this can happen, let us normalise the torus as ${\Bbb T}^2 = ({\Bbb R}/2\pi {\Bbb Z})^2$.  A simple example of a frequency cascade would be a scenario in which solution $u(t,x) = u(t,x_1,x_2)$ starts off at a low frequency at time zero, e.g. $u(0,x) = A e^{i x_1}$ for some constant amplitude A, and ends up at a high frequency at a later time T, e.g. $u(T,x) = A e^{i N x_1}$ for some large frequency N. This scenario is consistent with conservation of mass, but not conservation of energy or momentum and thus does not actually occur for solutions to (1).  A more complicated example would be a solution supported on two low frequencies at time zero, e.g. $u(0,x) = A e^{ix_1} + A e^{-ix_1}$, and ends up at two high frequencies later, e.g. $u(T,x) = A e^{iNx_1} + A e^{-iNx_1}$.  This scenario is consistent with conservation of mass and momentum, but not energy.  Finally, consider the scenario which starts off at $u(0,x) = A e^{i Nx_1} + A e^{iNx_2}$ and ends up at $u(T,x) = A + A e^{i(N x_1 + N x_2)}$.  This scenario is consistent with all three conservation laws, and exhibits a mild example of a low-to-high frequency cascade, in which the solution starts off at frequency N and ends up with half of its mass at the slightly higher frequency $\sqrt{2} N$, with the other half of its mass at the zero frequency.  More generally, given four frequencies $n_1, n_2, n_3, n_4 \in {\Bbb Z}^2$ which form the four vertices of a rectangle in order, one can concoct a similar scenario, compatible with all conservation laws, in which the solution starts off at frequencies $n_1, n_3$ and propagates to frequencies $n_2, n_4$.

One way to measure a frequency cascade quantitatively is to use the Sobolev norms $H^s({\Bbb T}^2)$ for $s > 1$; roughly speaking, a low-to-high frequency cascade occurs precisely when these Sobolev norms get large.  (Note that mass and energy conservation ensure that the $H^s({\Bbb T}^2)$ norms stay bounded for $0 \leq s \leq 1$.)  For instance, in the cascade from $u(0,x) = A e^{i Nx_1} + A e^{iNx_2}$ to $u(T,x) = A + A e^{i(N x_1 + N x_2)}$, the $H^s({\Bbb T}^2)$ norm is roughly $2^{1/2} A N^s$ at time zero and $2^{s/2} A N^s$ at time T, leading to a slight increase in that norm for $s > 1$.  Numerical evidence then suggests the following

Conjecture. (Weak turbulence) There exist smooth solutions $u(t,x)$ to (1) such that $\|u(t)\|_{H^s({\Bbb T}^2)}$ goes to infinity as $t \to \infty$ for any $s > 1$.

We were not able to establish this conjecture, but we have the following partial result (“weak weak turbulence”, if you will):

Theorem. Given any $\varepsilon > 0, K > 0, s > 1$, there exists a smooth solution $u(t,x)$ to (1) such that $\|u(0)\|_{H^s({\Bbb T}^2)} \leq \epsilon$ and $\|u(T)\|_{H^s({\Bbb T}^2)} > K$ for some time T.

This is in marked contrast to (1) in one spatial dimension ${\Bbb T}$, which is completely integrable and has an infinite number of conservation laws beyond the mass, energy, and momentum which serve to keep all $H^s({\Bbb T}^2)$ norms bounded in time.  It is also in contrast to the linear Schrödinger equation, in which all Sobolev norms are preserved, and to the non-periodic analogue of (1), which is conjectured to disperse to a linear solution (i.e. to scatter) from any finite mass data (see this earlier post for the current status of that conjecture).  Thus our theorem can be viewed as evidence that the 2D periodic cubic NLS does not behave at all like a completely integrable system or a linear solution, even for small data.  (An earlier result of Kuksin gives (in our notation) the weaker result that the ratio $\|u(T)\|_{H^s({\Bbb T}^2)} / \|u(0)\|_{H^s({\Bbb T}^2)}$ can be made arbitrarily large when $s > 1$, thus showing that large initial data can exhibit movement to higher frequencies; the point of our paper is that we can achieve the same for arbitrarily small data.) Intuitively, the problem is that the torus is compact and so there is no place for the solution to disperse its mass; instead, it must continually interact nonlinearly with itself, which is what eventually causes the weak turbulence.

I can maybe make some unorganised comments, though. Firstly, I am very lucky to have some excellent collaborators who put a lot of effort into our joint papers; many of the papers appearing recently on this blog, for instance, were to a large extent handled by co-authors. Generally, I find that papers written in collaboration take longer than singly-authored papers, but the net effort expended per author is significantly less (and the quality of writing higher). Also, I find that I can work on many joint papers in parallel (since the ball is often in another co-author’s court, or is pending some other development), but only on one single-authored paper at a time.

[For reasons having to do with the academic calendar, many more of these papers get finished during the summer than any other time of year, but many of these projects have actually been gestating for quite some time. (There should be a joint paper appearing shortly which we have been working on for about three or four years, for instance; and I have been thinking about the global regularity problem for wave maps problem on and off (mostly off) since about 2000.) So a paper being released every week does not actually correspond to a week being the time needed to conceive and then write up a paper; there is in fact quite a long pipeline of development which mostly happens out of public view.]

I have just uploaded to the arXiv the third installment of my “heatwave” project, entitled “Global regularity of wave maps V.  Large data local well-posedness in the energy class“. This (rather technical) paper establishes another of the key ingredients necessary to establish the global existence of smooth wave maps from 2+1-dimensional spacetime ${\Bbb R}^{1+2}$ to hyperbolic space $\mathbf{H} = \mathbf{H}^m$.  Specifically, a large data local well-posedness result is established, constructing a local solution from any initial data with finite (but possibly quite large) energy, and furthermore that the solution depends continuously on the initial data in the energy topology.  (This topology was constructed in my previous paper.)  Once one has this result, the only remaining task is to show a “Palais-Smale property” for wave maps, in that if singularities form in the wave maps equation, then there exists a non-trivial minimal-energy blowup solution, whose orbit is almost periodic modulo the symmetries of the equation.  I anticipate this to the most difficult component of the whole project, and is the subject of the fourth (and hopefully final) installment of this series.

This local result is closely related to the small energy global regularity theory developed in recent years by myself, by Krieger, and by Tataru.  In particular, the complicated function spaces used in that paper (which ultimately originate from a precursor paper of Tataru).  The main new difficulties here are to extend the small energy theory to large energy (by localising time suitably), and to establish continuous dependence on the data (i.e. two solutions which are initially close in the energy topology, need to stay close in that topology).  The former difficulty is in principle manageable by exploiting finite speed of propagation (exploiting the fact (arising from the monotone convergence theorem) that large energy data becomes small energy data at sufficiently small spatial scales), but for technical reasons (having to do with my choice of gauge) I was not able to do this and had to deal with the large energy case directly (and in any case, a genuinely large energy theory is going to be needed to construct the minimal energy blowup solution in the next paper).  The latter difficulty is in principle resolvable by adapting the existence theory to differences of solutions, rather than to individual solutions, but the nonlinear choice of gauge adds a rather tedious amount of complexity to the task of making this rigorous.  (It may be that simpler gauges, such as the Coulomb gauge, may be usable here, at least in the case $m=2$ of the hyperbolic plane (cf. the work of Krieger), but such gauges cause additional analytic problems as they do not renormalise the nonlinearity as strongly as the caloric gauge.  The paper of Tataru establishes these goals, but assumes an isometric embedding of the target manifold into a Euclidean space, which is unfortunately not available for hyperbolic space targets.)

The main technical difficulty that had to be overcome in the paper was that there were two different time variables t, s (one for the wave maps equation and one for the heat flow), and three types of PDE (hyperbolic, parabolic, and ODE) that one has to solve forward in t, forward in s, and backwards in s respectively.  In order to close the argument in the large energy case, this necessitated a rather complicated iteration-type scheme, in which one solved for the caloric gauge, established parabolic regularity estimates for that gauge, propagated a “wave-tension field” by the heat flow, and then solved a wave maps type equation using that field as a forcing term.  The argument can eventually be closed using mostly “off-the-shelf” function space estimates from previous papers, but is remarkably lengthy, especially when analysing differences of two solutions.  (One drawback of using off-the-shelf estimates, though, is that one does not get particularly good control of the solution over extended periods of time; in particular, the spaces used here cannot detect the decay of the solution over extended periods of time (unlike, say, Strichartz spaces $L^q_t L^r_x$ for $q < \infty$) and so will not be able to supply the long-time perturbation theory that will be needed in the next paper in this series.  I believe I know how to re-engineer these spaces to achieve this, though, and the details should follow in the forthcoming paper.)

Van Vu and I have just uploaded to the arXiv our new paper, “Random matrices: Universality of ESDs and the circular law“, with an appendix by Manjunath Krishnapur (and some numerical data and graphs by Philip Wood).  One of the things we do in this paper (which was our original motivation for this project) was to finally establish the endpoint case of the circular law (in both strong and weak forms) for random iid matrices $A_n = (a_{ij})_{1 \leq i,j \leq n}$, where the coefficients $a_{ij}$ are iid random variables with mean zero and unit variance.  (The strong circular law says that with probability 1, the empirical spectral distribution (ESD) of the normalised eigenvalues $\frac{1}{\sqrt{n}} \lambda_1, \ldots, \frac{1}{\sqrt{n}} \lambda_n$ of the matrix $A_n$ converges to the uniform distribution on the unit circle as $n \to \infty$.  The weak circular law asserts the same thing, but with convergence in probability rather than almost sure convergence; this is in complete analogy with the weak and strong law of large numbers, and in fact this law is used in the proof.)  In a previous paper, we had established the same claim but under the additional assumption that the $(2+\eta)^{th}$ moment ${\Bbb E} |a_{ij}|^{2+\eta}$ was finite for some $\eta > 0$; this builds upon a significant body of earlier work by Mehta, Girko, Bai, Bai-Silverstein, Gotze-Tikhomirov, and Pan-Zhou, as discussed in the blog article for the previous paper.

As it turned out, though, in the course of this project we found a more general universality principle (or invariance principle) which implied our results about the circular law, but is perhaps more interesting in its own right.  Observe that the statement of the circular law can be split into two sub-statements:

1. (Universality for iid ensembles) In the asymptotic limit $n \to \infty$, the ESD of the random matrix $A_n$ is independent of the choice of distribution of the coefficients, so long as they are normalised in mean and variance.  In particular, the ESD of such a matrix is asymptotically the same as that of a (real or complex) gaussian matrix $G_n$ with the same mean and variance.
2. (Circular law for gaussian matrices) In the asymptotic limit $n \to \infty$, the ESD of a gaussian matrix $G_n$ converges to the circular law.

The reason we single out the gaussian matrix ensemble $G_n$ is that it has a much richer algebraic structure (for instance, the real (resp. complex) gaussian ensemble is invariant under right and left multiplication by the orthogonal group O(n) (resp. the unitary group U(n))).  Because of this, it is possible to compute the eigenvalue distribution very explicitly by algebraic means (for instance, using the machinery of orthogonal polynomials).  In particular, the circular law for complex gaussian matrices (Statement 2 above) was established all the way back in 1967 by Mehta, using an explicit formula for the distribution of the ESD in this case due to Ginibre.

These highly algebraic techniques completely break down for more general iid ensembles, such as the Bernoulli ensemble of matrices whose entries are +1 or -1 with an equal probability of each.  Nevertheless, it is a remarkable phenomenon – which has been referred to as universality in the literature, for instance in this survey by Deift – that the spectral properties of random matrices for non-algebraic ensembles are in many cases asymptotically indistinguishable in the limit $n \to \infty$ from that of algebraic ensembles with the same mean and variance (i.e. Statement 1 above).  One might view this as a sort of “non-Hermitian, non-commutative” analogue of the universality phenomenon represented by the central limit theorem, in which the limiting distribution of a normalised average

$\displaystyle \overline{X}_n := \frac{1}{\sqrt{n}} (X_1 + \ldots + X_n )$ (1)

of an iid sequence depends only on the mean and variance of the elements of that sequence (assuming of course that these quantities are finite), and not on the underlying distribution.  (The Hermitian non-commutative analogue of the CLT is known as Wigner’s semicircular law.)

Previous approaches to the circular law did not build upon the gaussian case, but instead proceeded directly, in particular controlling the ESD of a random matrix $A_n$ via estimates on the Stieltjes transform

$\displaystyle \frac{1}{n} \log |\det( \frac{1}{\sqrt{n}} A_n - zI )|$ (2)

of that matrix for complex numbers z.  This method required a combination of delicate analysis (in particular, a bound on the least singular values of $\frac{1}{\sqrt{n}} A_n - zI$), and algebra (in order to compute and then invert the Stieltjes transform).  [As a general rule, and oversimplifying somewhat, algebra tends to be used to control main terms in a computation, while analysis is used to control error terms.]

What we discovered while working on our paper was that the algebra and analysis could be largely decoupled from each other: that one could establish a universality principle (Statement 1 above) by relying primarily on tools from analysis (most notably the bound on least singular values mentioned earlier, but also Talagrand’s concentration of measure inequality, and a universality principle for the singular value distribution of random matrices due to Dozier and Silverstein), so that the algebraic heavy lifting only needs to be done in the gaussian case (Statement 2 above) where the task is greatly simplified by all the additional algebraic structure available in that setting.   This suggests a possible strategy to proving other conjectures in random matrices (for instance concerning the eigenvalue spacing distribution of random iid matrices), by first establishing universality to swap the general random matrix ensemble with an algebraic ensemble (without fully understanding the limiting behaviour of either), and then using highly algebraic tools to understand the latter ensemble.  (There is now a sophisticated theory in place to deal with the latter task, but the former task – understanding universality – is still only poorly understood in many cases.)