You are currently browsing the category archive for the ‘math.AP’ category.

As we are all now very much aware, tsunamis are water waves that start in the deep ocean, usually because of an underwater earthquake (though tsunamis can also be caused by underwater landslides or volcanoes), and then propagate towards shore. Initially, tsunamis have relatively small amplitude (a metre or so is typical), which would seem to render them as harmless as wind waves. And indeed, tsunamis often pass by ships in deep ocean without anyone on board even noticing.

However, being generated by an event as large as an earthquake, the wavelength of the tsunami is huge – 200 kilometres is typical (in contrast with wind waves, whose wavelengths are typically closer to 100 metres). In particular, the wavelength of the tsunami is far greater than the depth of the ocean (which is typically 2-3 kilometres). As such, even in the deep ocean, the dynamics of tsunamis are essentially governed by the shallow water equations. One consequence of these equations is that the speed of propagation ${v}$ of a tsunami can be approximated by the formula

$\displaystyle v \approx \sqrt{g b} \ \ \ \ \ (1)$

where ${b}$ is the depth of the ocean, and ${g \approx 9.8 ms^{-2}}$ is the force of gravity. As such, tsunamis in deep water move very fast – speeds such as 500 kilometres per hour (300 miles per hour) are quite typical; enough to travel from Japan to the US, for instance, in less than a day. Ultimately, this is due to the incompressibility of water (and conservation of mass); the massive net pressure (or more precisely, spatial variations in this pressure) of a very broad and deep wave of water forces the profile of the wave to move horizontally at vast speeds. (Note though that this is the phase velocity of the tsunami wave, and not the velocity of the water molecues themselves, which are far slower.)

As the tsunami approaches shore, the depth ${b}$ of course decreases, causing the tsunami to slow down, at a rate proportional to the square root of the depth, as per (1). Unfortunately, wave shoaling then forces the amplitude ${A}$ to increase at an inverse rate governed by Green’s law,

$\displaystyle A \propto \frac{1}{b^{1/4}} \ \ \ \ \ (2)$

at least until the amplitude becomes comparable to the water depth (at which point the assumptions that underlie the above approximate results break down; also, in two (horizontal) spatial dimensions there will be some decay of amplitude as the tsunami spreads outwards). If one starts with a tsunami whose initial amplitude was ${A_0}$ at depth ${b_0}$ and computes the point at which the amplitude ${A}$ and depth ${b}$ become comparable using the proportionality relationship (2), some high school algebra then reveals that at this point, amplitude of a tsunami (and the depth of the water) is about ${A_0^{4/5} b_0^{1/5}}$. Thus, for instance, a tsunami with initial amplitude of one metre at a depth of 2 kilometres can end up with a final amplitude of about 5 metres near shore, while still traveling at about ten metres per second (35 kilometres per hour, or 22 miles per hour), and we have all now seen the impact that can have when it hits shore.

While tsunamis are far too massive of an event to be able to control (at least in the deep ocean), we can at least model them mathematically, allowing one to predict their impact at various places along the coast with high accuracy. (For instance, here is a video of the NOAA’s model of the March 11 tsunami, which has matched up very well with subsequent measurements.) The full equations and numerical methods used to perform such models are somewhat sophisticated, but by making a large number of simplifying assumptions, it is relatively easy to come up with a rough model that already predicts the basic features of tsunami propagation, such as the velocity formula (1) and the amplitude proportionality law (2). I give this (standard) derivation below the fold. The argument will largely be heuristic in nature; there are very interesting analytic issues in actually justifying many of the steps below rigorously, but I will not discuss these matters here.

This week I am at the American Institute of Mathematics, as an organiser on a workshop on the universality phenomenon in random matrices. There have been a number of interesting discussions so far in this workshop. Percy Deift, in a lecture on universality for invariant ensembles, gave some applications of what he only half-jokingly termed “the most important identity in mathematics”, namely the formula

$\displaystyle \hbox{det}( 1 + AB ) = \hbox{det}(1 + BA)$

whenever ${A, B}$ are ${n \times k}$ and ${k \times n}$ matrices respectively (or more generally, ${A}$ and ${B}$ could be linear operators with sufficiently good spectral properties that make both sides equal). Note that the left-hand side is an ${n \times n}$ determinant, while the right-hand side is a ${k \times k}$ determinant; this formula is particularly useful when computing determinants of large matrices (or of operators), as one can often use it to transform such determinants into much smaller determinants. In particular, the asymptotic behaviour of ${n \times n}$ determinants as ${n \rightarrow \infty}$ can be converted via this formula to determinants of a fixed size (independent of ${n}$), which is often a more favourable situation to analyse. Unsurprisingly, this trick is particularly useful for understanding the asymptotic behaviour of determinantal processes.

There are many ways to prove the identity. One is to observe first that when ${A, B}$ are invertible square matrices of the same size, that ${1+BA}$ and ${1+AB}$ are conjugate to each other and thus clearly have the same determinant; a density argument then removes the invertibility hypothesis, and a padding-by-zeroes argument then extends the square case to the rectangular case. Another is to proceed via the spectral theorem, noting that ${AB}$ and ${BA}$ have the same non-zero eigenvalues.

By rescaling, one obtains the variant identity

$\displaystyle \hbox{det}( z + AB ) = z^{n-k} \hbox{det}(z + BA)$

which essentially relates the characteristic polynomial of ${AB}$ with that of ${BA}$. When ${n=k}$, a comparison of coefficients this already gives important basic identities such as ${\hbox{tr}(AB) = \hbox{tr}(BA)}$ and ${\hbox{det}(AB) = \hbox{det}(BA)}$; when ${n}$ is not equal to ${k}$, an inspection of the ${z^{n-k}}$ coefficient similarly gives the Cauchy-Binet formula (which, incidentally, is also useful when performing computations on determinantal processes).

Thanks to this formula (and with a crucial insight of Alice Guionnet), I was able to solve a problem (on outliers for the circular law) that I had in the back of my mind for a few months, and initially posed to me by Larry Abbott; I hope to talk more about this in a future post.

Today, though, I wish to talk about another piece of mathematics that emerged from an afternoon of free-form discussion that we managed to schedule within the AIM workshop. Specifically, we hammered out a heuristic model of the mesoscopic structure of the eigenvalues ${\lambda_1 \leq \ldots \leq \lambda_n}$ of the ${n \times n}$ Gaussian Unitary Ensemble (GUE), where ${n}$ is a large integer. As is well known, the probability density of these eigenvalues is given by the Ginebre distribution

$\displaystyle \frac{1}{Z_n} e^{-H(\lambda)}\ d\lambda$

where ${d\lambda = d\lambda_1 \ldots d\lambda_n}$ is Lebesgue measure on the Weyl chamber ${\{ (\lambda_1,\ldots,\lambda_n) \in {\bf R}^n: \lambda_1 \leq \ldots \leq \lambda_n \}}$, ${Z_n}$ is a constant, and the Hamiltonian ${H}$ is given by the formula

$\displaystyle H(\lambda_1,\ldots,\lambda_n) := \sum_{j=1}^n \frac{\lambda_j^2}{2} - 2 \sum_{1 \leq i < j \leq n} \log |\lambda_i-\lambda_j|.$

At the macroscopic scale of ${\sqrt{n}}$, the eigenvalues ${\lambda_j}$ are distributed according to the Wigner semicircle law

$\displaystyle \rho_{sc}(x) := \frac{1}{2\pi} (4-x^2)_+^{1/2}.$

Indeed, if one defines the classical location ${\gamma_i^{cl}}$ of the ${i^{th}}$ eigenvalue to be the unique solution in ${[-2\sqrt{n}, 2\sqrt{n}]}$ to the equation

$\displaystyle \int_{-2\sqrt{n}}^{\gamma_i^{cl}/\sqrt{n}} \rho_{sc}(x)\ dx = \frac{i}{n}$

then it is known that the random variable ${\lambda_i}$ is quite close to ${\gamma_i^{cl}}$. Indeed, a result of Gustavsson shows that, in the bulk region when ${\epsilon n < i < (1-\epsilon) n}$ for some fixed ${\epsilon > 0}$, ${\lambda_i}$ is distributed asymptotically as a gaussian random variable with mean ${\gamma_i^{cl}}$ and variance ${\sqrt{\frac{\log n}{\pi}} \times \frac{1}{\sqrt{n} \rho_{sc}(\gamma_i^{cl})}}$. Note that from the semicircular law, the factor ${\frac{1}{\sqrt{n} \rho_{sc}(\gamma_i^{cl})}}$ is the mean eigenvalue spacing.

At the other extreme, at the microscopic scale of the mean eigenvalue spacing (which is comparable to ${1/\sqrt{n}}$ in the bulk, but can be as large as ${n^{-1/6}}$ at the edge), the eigenvalues are asymptotically distributed with respect to a special determinantal point process, namely the Dyson sine process in the bulk (and the Airy process on the edge), as discussed in this previous post.

Here, I wish to discuss the mesoscopic structure of the eigenvalues, in which one involves scales that are intermediate between the microscopic scale ${1/\sqrt{n}}$ and the macroscopic scale ${\sqrt{n}}$, for instance in correlating the eigenvalues ${\lambda_i}$ and ${\lambda_j}$ in the regime ${|i-j| \sim n^\theta}$ for some ${0 < \theta < 1}$. Here, there is a surprising phenomenon; there is quite a long-range correlation between such eigenvalues. The result of Gustavsson shows that both ${\lambda_i}$ and ${\lambda_j}$ behave asymptotically like gaussian random variables, but a further result from the same paper shows that the correlation between these two random variables is asymptotic to ${1-\theta}$ (in the bulk, at least); thus, for instance, adjacent eigenvalues ${\lambda_{i+1}}$ and ${\lambda_i}$ are almost perfectly correlated (which makes sense, as their spacing is much less than either of their standard deviations), but that even very distant eigenvalues, such as ${\lambda_{n/4}}$ and ${\lambda_{3n/4}}$, have a correlation comparable to ${1/\log n}$. One way to get a sense of this is to look at the trace

$\displaystyle \lambda_1 + \ldots + \lambda_n.$

This is also the sum of the diagonal entries of a GUE matrix, and is thus normally distributed with a variance of ${n}$. In contrast, each of the ${\lambda_i}$ (in the bulk, at least) has a variance comparable to ${\log n/n}$. In order for these two facts to be consistent, the average correlation between pairs of eigenvalues then has to be of the order of ${1/\log n}$.

Below the fold, I give a heuristic way to see this correlation, based on Taylor expansion of the convex Hamiltonian ${H(\lambda)}$ around the minimum ${\gamma}$, which gives a conceptual probabilistic model for the mesoscopic structure of the GUE eigenvalues. While this heuristic is in no way rigorous, it does seem to explain many of the features currently known or conjectured about GUE, and looks likely to extend also to other models.

One of the key difficulties in performing analysis in infinite-dimensional function spaces, as opposed to finite-dimensional vector spaces, is that the Bolzano-Weierstrass theorem no longer holds: a bounded sequence in an infinite-dimensional function space need not have any convergent subsequences (when viewed using the strong topology). To put it another way, the closed unit ball in an infinite-dimensional function space usually fails to be (sequentially) compact.

As compactness is such a useful property to have in analysis, various tools have been developed over the years to try to salvage some sort of substitute for the compactness property in infinite-dimensional spaces. One of these tools is concentration compactness, which was discussed previously on this blog. This can be viewed as a compromise between weak compactness (which is true in very general circumstances, but is often too weak for applications) and strong compactness (which would be very useful in applications, but is usually false), in which one obtains convergence in an intermediate sense that involves a group of symmetries acting on the function space in question.

Concentration compactness is usually stated and proved in the language of standard analysis: epsilons and deltas, limits and supremas, and so forth. In this post, I wanted to note that one could also state and prove the basic foundations of concentration compactness in the framework of nonstandard analysis, in which one now deals with infinitesimals and ultralimits instead of epsilons and ordinary limits. This is a fairly mild change of viewpoint, but I found it to be informative to view this subject from a slightly different perspective. The nonstandard proofs require a fair amount of general machinery to set up, but conversely, once all the machinery is up and running, the proofs become slightly shorter, and can exploit tools from (standard) infinitary analysis, such as orthogonal projections in Hilbert spaces, or the continuous-pure point decomposition of measures. Because of the substantial amount of setup required, nonstandard proofs tend to have significantly more net complexity than their standard counterparts when it comes to basic results (such as those presented in this post), but the gap between the two narrows when the results become more difficult, and for particularly intricate and deep results it can happen that nonstandard proofs end up being simpler overall than their standard analogues, particularly if the nonstandard proof is able to tap the power of some existing mature body of infinitary mathematics (e.g. ergodic theory, measure theory, Hilbert space theory, or topological group theory) which is difficult to directly access in the standard formulation of the argument.

Hans Lindblad and I have just uploaded to the arXiv our joint paper “Asymptotic decay for a one-dimensional nonlinear wave equation“, submitted to Analysis & PDE.  This paper, to our knowledge, is the first paper to analyse the asymptotic behaviour of the one-dimensional defocusing nonlinear wave equation

${}-u_{tt}+u_{xx} = |u|^{p-1} u$ (1)

where $u: {\bf R} \times {\bf R} \to {\bf R}$ is the solution and $p>1$ is a fixed exponent.  Nowadays, this type of equation is considered a very simple example of a non-linear wave equation (there is only one spatial dimension, the equation is semilinear, the conserved energy is positive definite and coercive, and there are no derivatives in the nonlinear term), and indeed it is not difficult to show that any solution whose conserved energy

$E[u] := \int_{{\bf R}} \frac{1}{2} |u_t|^2 + \frac{1}{2} |u_x|^2 + \frac{1}{p+1} |u|^{p+1}\ dx$

is finite, will exist globally for all time (and remain finite energy, of course).  In particular, from the one-dimensional Gagliardo-Nirenberg inequality (a variant of the Sobolev embedding theorem), such solutions will remain uniformly bounded in $L^\infty_x({\bf R})$ for all time.

However, this leaves open the question of the asymptotic behaviour of such solutions in the limit as $t \to \infty$.  In higher dimensions, there are a variety of scattering and asymptotic completeness results which show that solutions to nonlinear wave equations such as (1) decay asymptotically in various senses, at least if one is in the perturbative regime in which the solution is assumed small in some sense (e.g. small energy).  For instance, a typical result might be that spatial norms such as $\|u(t)\|_{L^q({\bf R})}$ might go to zero (in an average sense, at least).   In general, such results for nonlinear wave equations are ultimately based on the fact that the linear wave equation in higher dimensions also enjoys an analogous decay as $t \to +\infty$, as linear waves in higher dimensions spread out and disperse over time.  (This can be formalised by decay estimates on the fundamental solution of the linear wave equation, or by basic estimates such as the (long-time) Strichartz estimates and their relatives.)  The idea is then to view the nonlinear wave equation as a perturbation of the linear one.

On the other hand, the solution to the linear one-dimensional wave equation

$-u_{tt} + u_{xx} = 0$ (2)

does not exhibit any decay in time; as one learns in an undergraduate PDE class, the general (finite energy) solution to such an equation is given by the superposition of two travelling waves,

$u(t,x) = f(x+t) + g(x-t)$ (3)

where $f$ and $g$ also have finite energy, so in particular norms such as $\|u(t)\|_{L^\infty_x({\bf R})}$ cannot decay to zero as $t \to \infty$ unless the solution is completely trivial.

Nevertheless, we were able to establish a nonlinear decay effect for equation (1), caused more by the nonlinear right-hand side of (1) than by the linear left-hand side, to obtain $L^\infty_x({\bf R})$ decay on the average:

Theorem 1. (Average $L^\infty_x$ decay) If $u$ is a finite energy solution to (1), then $\frac{1}{2T} \int_{-T}^T \|u(t)\|_{L^\infty_x({\bf R})}$ tends to zero as $T \to \infty$.

Actually we prove a slightly stronger statement than Theorem 1, in that the decay is uniform among all solutions with a given energy bound, but I will stick to the above formulation of the main result for simplicity.

Informally, the reason for the nonlinear decay is as follows.  The linear evolution tries to force waves to move at constant velocity (indeed, from (3) we see that linear waves move at the speed of light $c=1$).  But the defocusing nature of the nonlinearity will spread out any wave that is propagating along a constant velocity worldline.  This intuition can be formalised by a Morawetz-type energy estimate that shows that the nonlinear potential energy must decay along any rectangular slab of spacetime (that represents the neighbourhood of a constant velocity worldline).

Now, just because the linear wave equation propagates along constant velocity worldlines, this does not mean that the nonlinear wave equation does too; one could imagine that a wave packet could propagate along a more complicated trajectory $t \mapsto x(t)$ in which the velocity $x'(t)$ is not constant.  However, energy methods still force the solution of the nonlinear wave equation to obey finite speed of propagation, which in the wave packet context means (roughly speaking) that the nonlinear trajectory $t \mapsto x(t)$ is a Lipschitz continuous function (with Lipschitz constant at most $1$).

And now we deploy a trick which appears to be new to the field of nonlinear wave equations: we invoke the Rademacher differentiation theorem (or Lebesgue differentiation theorem), which asserts that Lipschitz continuous functions are almost everywhere differentiable.  (By coincidence, I am teaching this theorem in my current course, both in one dimension (which is the case of interest here) and in higher dimensions.)  A compactness argument allows one to extract a quantitative estimate from this theorem (cf. this earlier blog post of mine) which, roughly speaking, tells us that there are large portions of the trajectory $t \mapsto x(t)$ which behave approximately linearly at an appropriate scale.  This turns out to be a good enough control on the trajectory that one can apply the Morawetz inequality and rule out the existence of persistent wave packets over long periods of time, which is what leads to Theorem 1.

There is still scope for further work to be done on the asymptotics.  In particular, we still do not have a good understanding of what the asymptotic profile of the solution should be, even in the perturbative regime; standard nonlinear geometric optics methods do not appear to work very well due to the extremely weak decay.

In my previous post, I briefly discussed the work of the four Fields medalists of 2010 (Lindenstrauss, Ngo, Smirnov, and Villani). In this post I will discuss the work of Dan Spielman (winner of the Nevanlinna prize), Yves Meyer (winner of the Gauss prize), and Louis Nirenberg (winner of the Chern medal). Again by chance, the work of all three of the recipients overlaps to some extent with my own areas of expertise, so I will be able to discuss a sample contribution for each of them. Again, my choice of contribution is somewhat idiosyncratic and is not intended to represent the “best” work of each of the awardees.

As is now widely reported, the Fields medals for 2010 have been awarded to Elon Lindenstrauss, Ngo Bao Chau, Stas Smirnov, and Cedric Villani. Concurrently, the Nevanlinna prize (for outstanding contributions to mathematical aspects of information science) was awarded to Dan Spielman, the Gauss prize (for outstanding mathematical contributions that have found significant applications outside of mathematics) to Yves Meyer, and the Chern medal (for lifelong achievement in mathematics) to Louis Nirenberg. All of the recipients are of course exceptionally qualified and deserving for these awards; congratulations to all of them. (I should mention that I myself was only very tangentially involved in the awards selection process, and like everyone else, had to wait until the ceremony to find out the winners. I imagine that the work of the prize committees must have been extremely difficult.)

Today, I thought I would mention one result of each of the Fields medalists; by chance, three of the four medalists work in areas reasonably close to my own. (Ngo is rather more distant from my areas of expertise, but I will give it a shot anyway.) This will of course only be a tiny sample of each of their work, and I do not claim to be necessarily describing their “best” achievement, as I only know a portion of the research of each of them, and my selection choice may be somewhat idiosyncratic. (I may discuss the work of Spielman, Meyer, and Nirenberg in a later post.)

A recurring theme in mathematics is that of duality: a mathematical object ${X}$ can either be described internally (or in physical space, or locally), by describing what ${X}$ physically consists of (or what kind of maps exist into ${X}$), or externally (or in frequency space, or globally), by describing what ${X}$ globally interacts or resonates with (or what kind of maps exist out of ${X}$). These two fundamentally opposed perspectives on the object ${X}$ are often dual to each other in various ways: performing an operation on ${X}$ may transform it one way in physical space, but in a dual way in frequency space, with the frequency space description often being a “inversion” of the physical space description. In several important cases, one is fortunate enough to have some sort of fundamental theorem connecting the internal and external perspectives. Here are some (closely inter-related) examples of this perspective:

1. Vector space duality A vector space ${V}$ over a field ${F}$ can be described either by the set of vectors inside ${V}$, or dually by the set of linear functionals ${\lambda: V \rightarrow F}$ from ${V}$ to the field ${F}$ (or equivalently, the set of vectors inside the dual space ${V^*}$). (If one is working in the category of topological vector spaces, one would work instead with continuous linear functionals; and so forth.) A fundamental connection between the two is given by the Hahn-Banach theorem (and its relatives).
2. Vector subspace duality In a similar spirit, a subspace ${W}$ of ${V}$ can be described either by listing a basis or a spanning set, or dually by a list of linear functionals that cut out that subspace (i.e. a spanning set for the orthogonal complement ${W^\perp := \{ \lambda \in V^*: \lambda(w)=0 \hbox{ for all } w \in W \})}$. Again, the Hahn-Banach theorem provides a fundamental connection between the two perspectives.
3. Convex duality More generally, a (closed, bounded) convex body ${K}$ in a vector space ${V}$ can be described either by listing a set of (extreme) points whose convex hull is ${K}$, or else by listing a set of (irreducible) linear inequalities that cut out ${K}$. The fundamental connection between the two is given by the Farkas lemma.
4. Ideal-variety duality In a slightly different direction, an algebraic variety ${V}$ in an affine space ${A^n}$ can be viewed either “in physical space” or “internally” as a collection of points in ${V}$, or else “in frequency space” or “externally” as a collection of polynomials on ${A^n}$ whose simultaneous zero locus cuts out ${V}$. The fundamental connection between the two perspectives is given by the nullstellensatz, which then leads to many of the basic fundamental theorems in classical algebraic geometry.
5. Hilbert space duality An element ${v}$ in a Hilbert space ${H}$ can either be thought of in physical space as a vector in that space, or in momentum space as a covector ${w \mapsto \langle v, w \rangle}$ on that space. The fundamental connection between the two is given by the Riesz representation theorem for Hilbert spaces.
6. Semantic-syntactic duality Much more generally still, a mathematical theory can either be described internally or syntactically via its axioms and theorems, or externally or semantically via its models. The fundamental connection between the two perspectives is given by the Gödel completeness theorem.
7. Intrinsic-extrinsic duality A (Riemannian) manifold ${M}$ can either be viewed intrinsically (using only concepts that do not require an ambient space, such as the Levi-Civita connection), or extrinsically, for instance as the level set of some defining function in an ambient space. Some important connections between the two perspectives includes the Nash embedding theorem and the theorema egregium.
8. Group duality A group ${G}$ can be described either via presentations (lists of generators, together with relations between them) or representations (realisations of that group in some more concrete group of transformations). A fundamental connection between the two is Cayley’s theorem. Unfortunately, in general it is difficult to build upon this connection (except in special cases, such as the abelian case), and one cannot always pass effortlessly from one perspective to the other.
9. Pontryagin group duality A (locally compact Hausdorff) abelian group ${G}$ can be described either by listing its elements ${g \in G}$, or by listing the characters ${\chi: G \rightarrow {\bf R}/{\bf Z}}$ (i.e. continuous homomorphisms from ${G}$ to the unit circle, or equivalently elements of ${\hat G}$). The connection between the two is the focus of abstract harmonic analysis.
10. Pontryagin subgroup duality A subgroup ${H}$ of a locally compact abelian group ${G}$ can be described either by generators in ${H}$, or generators in the orthogonal complement ${H^\perp := \{ \xi \in \hat G: \xi \cdot h = 0 \hbox{ for all } h \in H \}}$. One of the fundamental connections between the two is the Poisson summation formula.
11. Fourier duality A (sufficiently nice) function ${f: G \rightarrow {\bf C}}$ on a locally compact abelian group ${G}$ (equipped with a Haar measure ${\mu}$) can either be described in physical space (by its values ${f(x)}$ at each element ${x}$ of ${G}$) or in frequency space (by the values ${\hat f(\xi) = \int_G f(x) e( - \xi \cdot x )\ d\mu(x)}$ at elements ${\xi}$ of the Pontryagin dual ${\hat G}$). The fundamental connection between the two is the Fourier inversion formula.
12. The uncertainty principle The behaviour of a function ${f}$ at physical scales above (resp. below) a certain scale ${R}$ is almost completely controlled by the behaviour of its Fourier transform ${\hat f}$ at frequency scales below (resp. above) the dual scale ${1/R}$ and vice versa, thanks to various mathematical manifestations of the uncertainty principle. (The Poisson summation formula can also be viewed as a variant of this principle, using subgroups instead of scales.)
13. Stone/Gelfand duality A (locally compact Hausdorff) topological space ${X}$ can be viewed in physical space (as a collection of points), or dually, via the ${C^*}$ algebra ${C(X)}$ of continuous complex-valued functions on that space, or (in the case when ${X}$ is compact and totally disconnected) via the boolean algebra of clopen sets (or equivalently, the idempotents of ${C(X)}$). The fundamental connection between the two is given by the Stone representation theorem or the (commutative) Gelfand-Naimark theorem.

I have discussed a fair number of these examples in previous blog posts (indeed, most of the links above are to my own blog). In this post, I would like to discuss the uncertainty principle, that describes the dual relationship between physical space and frequency space. There are various concrete formalisations of this principle, most famously the Heisenberg uncertainty principle and the Hardy uncertainty principle – but in many situations, it is the heuristic formulation of the principle that is more useful and insightful than any particular rigorous theorem that attempts to capture that principle. Unfortunately, it is a bit tricky to formulate this heuristic in a succinct way that covers all the various applications of that principle; the Heisenberg inequality ${\Delta x \cdot \Delta \xi \gtrsim 1}$ is a good start, but it only captures a portion of what the principle tells us. Consider for instance the following (deliberately vague) statements, each of which can be viewed (heuristically, at least) as a manifestation of the uncertainty principle:

1. A function which is band-limited (restricted to low frequencies) is featureless and smooth at fine scales, but can be oscillatory (i.e. containing plenty of cancellation) at coarse scales. Conversely, a function which is smooth at fine scales will be almost entirely restricted to low frequencies.
2. A function which is restricted to high frequencies is oscillatory at fine scales, but is negligible at coarse scales. Conversely, a function which is oscillatory at fine scales will be almost entirely restricted to high frequencies.
3. Projecting a function to low frequencies corresponds to averaging out (or spreading out) that function at fine scales, leaving only the coarse scale behaviour.
4. Projecting a frequency to high frequencies corresponds to removing the averaged coarse scale behaviour, leaving only the fine scale oscillation.
5. The number of degrees of freedom of a function is bounded by the product of its spatial uncertainty and its frequency uncertainty (or more generally, by the volume of the phase space uncertainty). In particular, there are not enough degrees of freedom for a non-trivial function to be simulatenously localised to both very fine scales and very low frequencies.
6. To control the coarse scale (or global) averaged behaviour of a function, one essentially only needs to know the low frequency components of the function (and vice versa).
7. To control the fine scale (or local) oscillation of a function, one only needs to know the high frequency components of the function (and vice versa).
8. Localising a function to a region of physical space will cause its Fourier transform (or inverse Fourier transform) to resemble a plane wave on every dual region of frequency space.
9. Averaging a function along certain spatial directions or at certain scales will cause the Fourier transform to become localised to the dual directions and scales. The smoother the averaging, the sharper the localisation.
10. The smoother a function is, the more rapidly decreasing its Fourier transform (or inverse Fourier transform) is (and vice versa).
11. If a function is smooth or almost constant in certain directions or at certain scales, then its Fourier transform (or inverse Fourier transform) will decay away from the dual directions or beyond the dual scales.
12. If a function has a singularity spanning certain directions or certain scales, then its Fourier transform (or inverse Fourier transform) will decay slowly along the dual directions or within the dual scales.
13. Localisation operations in position approximately commute with localisation operations in frequency so long as the product of the spatial uncertainty and the frequency uncertainty is significantly larger than one.
14. In the high frequency (or large scale) limit, position and frequency asymptotically behave like a pair of classical observables, and partial differential equations asymptotically behave like classical ordinary differential equations. At lower frequencies (or finer scales), the former becomes a “quantum mechanical perturbation” of the latter, with the strength of the quantum effects increasing as one moves to increasingly lower frequencies and finer spatial scales.
15. Etc., etc.
16. Almost all of the above statements generalise to other locally compact abelian groups than ${{\bf R}}$ or ${{\bf R}^n}$, in which the concept of a direction or scale is replaced by that of a subgroup or an approximate subgroup. (In particular, as we will see below, the Poisson summation formula can be viewed as another manifestation of the uncertainty principle.)

I think of all of the above (closely related) assertions as being instances of “the uncertainty principle”, but it seems difficult to combine them all into a single unified assertion, even at the heuristic level; they seem to be better arranged as a cloud of tightly interconnected assertions, each of which is reinforced by several of the others. The famous inequality ${\Delta x \cdot \Delta \xi \gtrsim 1}$ is at the centre of this cloud, but is by no means the only aspect of it.

The uncertainty principle (as interpreted in the above broad sense) is one of the most fundamental principles in harmonic analysis (and more specifically, to the subfield of time-frequency analysis), second only to the Fourier inversion formula (and more generally, Plancherel’s theorem) in importance; understanding this principle is a key piece of intuition in the subject that one has to internalise before one can really get to grips with this subject (and also with closely related subjects, such as semi-classical analysis and microlocal analysis). Like many fundamental results in mathematics, the principle is not actually that difficult to understand, once one sees how it works; and when one needs to use it rigorously, it is usually not too difficult to improvise a suitable formalisation of the principle for the occasion. But, given how vague this principle is, it is difficult to present this principle in a traditional “theorem-proof-remark” manner. Even in the more informal format of a blog post, I was surprised by how challenging it was to describe my own understanding of this piece of mathematics in a linear fashion, despite (or perhaps because of) it being one of the most central and basic conceptual tools in my own personal mathematical toolbox. In the end, I chose to give below a cloud of interrelated discussions about this principle rather than a linear development of the theory, as this seemed to more closely align with the nature of this principle.

A (smooth) Riemannian manifold is a smooth manifold ${M}$ without boundary, equipped with a Riemannian metric ${{\rm g}}$, which assigns a length ${|v|_{{\rm g}(x)} \in {\bf R}^+}$ to every tangent vector ${v \in T_x M}$ at a point ${x \in M}$, and more generally assigns an inner product

$\displaystyle \langle v, w \rangle_{{\rm g}(x)} \in {\bf R}$

to every pair of tangent vectors ${v, w \in T_x M}$ at a point ${x \in M}$. (We use Roman font for ${g}$ here, as we will need to use ${g}$ to denote group elements later in this post.) This inner product is assumed to symmetric, positive definite, and smoothly varying in ${x}$, and the length is then given in terms of the inner product by the formula

$\displaystyle |v|_{{\rm g}(x)}^2 := \langle v, v \rangle_{{\rm g}(x)}.$

In coordinates (and also using abstract index notation), the metric ${{\rm g}}$ can be viewed as an invertible symmetric rank ${(0,2)}$ tensor ${{\rm g}_{ij}(x)}$, with

$\displaystyle \langle v, w \rangle_{{\rm g}(x)} = {\rm g}_{ij}(x) v^i w^j.$

One can also view the Riemannian metric as providing a (self-adjoint) identification between the tangent bundle ${TM}$ of the manifold and the cotangent bundle ${T^* M}$; indeed, every tangent vector ${v \in T_x M}$ is then identified with the cotangent vector ${\iota_{TM \rightarrow T^* M}(v) \in T_x^* M}$, defined by the formula

$\displaystyle \iota_{TM \rightarrow T^* M}(v)(w) := \langle v, w \rangle_{{\rm g}(x)}.$

In coordinates, ${\iota_{TM \rightarrow T^* M}(v)_i = {\rm g}_{ij} v^j}$.

A fundamental dynamical system on the tangent bundle (or equivalently, the cotangent bundle, using the above identification) of a Riemannian manifold is that of geodesic flow. Recall that geodesics are smooth curves ${\gamma: [a,b] \rightarrow M}$ that minimise the length

$\displaystyle |\gamma| := \int_a^b |\gamma'(t)|_{{\rm g}(\gamma(t))}\ dt.$

There is some degeneracy in this definition, because one can reparameterise the curve ${\gamma}$ without affecting the length. In order to fix this degeneracy (and also because the square of the speed is a more tractable quantity analytically than the speed itself), it is better if one replaces the length with the energy

$\displaystyle E(\gamma) := \frac{1}{2} \int_a^b |\gamma'(t)|_{{\rm g}(\gamma(t))}^2\ dt.$

Minimising the energy of a parameterised curve ${\gamma}$ turns out to be the same as minimising the length, together with an additional requirement that the speed ${|\gamma'(t)|_{{\rm g}(\gamma(t))}}$ stay constant in time. Minimisers (and more generally, critical points) of the energy functional (holding the endpoints fixed) are known as geodesic flows. From a physical perspective, geodesic flow governs the motion of a particle that is subject to no external forces and thus moves freely, save for the constraint that it must always lie on the manifold ${M}$.

One can also view geodesic flows as a dynamical system on the tangent bundle (with the state at any time ${t}$ given by the position ${\gamma(t) \in M}$ and the velocity ${\gamma'(t) \in T_{\gamma(t)} M}$) or on the cotangent bundle (with the state then given by the position ${\gamma(t) \in M}$ and the momentum ${\iota_{TM \rightarrow T^* M}( \gamma'(t) ) \in T_{\gamma(t)}^* M}$). With the latter perspective (sometimes referred to as cogeodesic flow), geodesic flow becomes a Hamiltonian flow, with Hamiltonian ${H: T^* M \rightarrow {\bf R}}$ given as

$\displaystyle H( x, p ) := \frac{1}{2} \langle p, p \rangle_{{\rm g}(x)^{-1}} = \frac{1}{2} {\rm g}^{ij}(x) p_i p_j$

where ${\langle ,\rangle_{{\rm g}(x)^{-1}}: T^*_x M \times T^*_x M \rightarrow {\bf R}}$ is the inverse inner product to ${\langle, \rangle_{{\rm g}(x)}: T_x M \times T_x M \rightarrow {\bf R}}$, which can be defined for instance by the formula

$\displaystyle \langle p_1, p_2 \rangle_{{\rm g}(x)^{-1}} = \langle \iota_{TM \rightarrow T^* M}^{-1}(p_1), \iota_{TM \rightarrow T^* M}^{-1}(p_2)\rangle_{{\rm g}(x)}.$

In coordinates, geodesic flow is given by Hamilton’s equations of motion

$\displaystyle \frac{d}{dt} x^i = {\rm g}^{ij} p_j; \quad \frac{d}{dt} p_i = - \frac{1}{2} (\partial_i {\rm g}^{jk}(x)) p_j p_k.$

In terms of the velocity ${v^i := \frac{d}{dt} x^i = {\rm g}^{ij} p_j}$, we can rewrite these equations as the geodesic equation

$\displaystyle \frac{d}{dt} v^i = - \Gamma^i_{jk} v^j v^k$

where

$\displaystyle \Gamma^i_{jk} = \frac{1}{2} {\rm g}^{im} (\partial_k {\rm g}_{mj} + \partial_j {\rm g}_{mk} - \partial_m {\rm g}_{jk} )$

are the Christoffel symbols; using the Levi-Civita connection ${\nabla}$, this can be written more succinctly as

$\displaystyle (\gamma^* \nabla)_t v = 0.$

If the manifold ${M}$ is an embedded submanifold of a larger Euclidean space ${R^n}$, with the metric ${{\rm g}}$ on ${M}$ being induced from the standard metric on ${{\bf R}^n}$, then the geodesic flow equation can be rewritten in the equivalent form

$\displaystyle \gamma''(t) \perp T_{\gamma(t)} M,$

where ${\gamma}$ is now viewed as taking values in ${{\bf R}^n}$, and ${T_{\gamma(t)} M}$ is similarly viewed as a subspace of ${{\bf R}^n}$. This is intuitively obvious from the geometric interpretation of geodesics: if the curvature of a curve ${\gamma}$ contains components that are transverse to the manifold rather than normal to it, then it is geometrically clear that one should be able to shorten the curve by shifting it along the indicated transverse direction. It is an instructive exercise to rigorously formulate the above intuitive argument. This fact also conforms well with one’s physical intuition of geodesic flow as the motion of a free particle constrained to be in ${M}$; the normal quantity ${\gamma''(t)}$ then corresponds to the centripetal force necessary to keep the particle lying in ${M}$ (otherwise it would fly off along a tangent line to ${M}$, as per Newton’s first law). The precise value of the normal vector ${\gamma''(t)}$ can be computed via the second fundamental form as ${\gamma''(t) = \Pi_{\gamma(t)}( \gamma'(t), \gamma'(t) )}$, but we will not need this formula here.

In a beautiful paper from 1966, Vladimir Arnold (who, sadly, passed away last week), observed that many basic equations in physics, including the Euler equations of motion of a rigid body, and also (by which is a priori a remarkable coincidence) the Euler equations of fluid dynamics of an inviscid incompressible fluid, can be viewed (formally, at least) as geodesic flows on a (finite or infinite dimensional) Riemannian manifold. And not just any Riemannian manifold: the manifold is a Lie group (or, to be truly pedantic, a torsor of that group), equipped with a right-invariant (or left-invariant, depending on one’s conventions) metric. In the context of rigid bodies, the Lie group is the group ${SE(3) = {\bf R}^3 \rtimes SO(3)}$ of rigid motions; in the context of incompressible fluids, it is the group ${Sdiff({\bf R}^3}$) of measure-preserving diffeomorphisms. The right-invariance makes the Hamiltonian mechanics of geodesic flow in this context (where it is sometimes known as the Euler-Arnold equation or the Euler-Poisson equation) quite special; it becomes (formally, at least) completely integrable, and also indicates (in principle, at least) a way to reformulate these equations in a Lax pair formulation. And indeed, many further completely integrable equations, such as the Korteweg-de Vries equation, have since been reinterpreted as Euler-Arnold flows.

From a physical perspective, this all fits well with the interpretation of geodesic flow as the free motion of a system subject only to a physical constraint, such as rigidity or incompressibility. (I do not know, though, of a similarly intuitive explanation as to why the Korteweg de Vries equation is a geodesic flow.)

One consequence of being a completely integrable system is that one has a large number of conserved quantities. In the case of the Euler equations of motion of a rigid body, the conserved quantities are the linear and angular momentum (as observed in an external reference frame, rather than the frame of the object). In the case of the two-dimensional Euler equations, the conserved quantities are the pointwise values of the vorticity (as viewed in Lagrangian coordinates, rather than Eulerian coordinates). In higher dimensions, the conserved quantity is now the (Hodge star of) the vorticity, again viewed in Lagrangian coordinates. The vorticity itself then evolves by the vorticity equation, and is subject to vortex stretching as the diffeomorphism between the initial and final state becomes increasingly sheared.

The elegant Euler-Arnold formalism is reasonably well-known in some circles (particularly in Lagrangian and symplectic dynamics, where it can be viewed as a special case of the Euler-Poincaré formalism or Lie-Poisson formalism respectively), but not in others; I for instance was only vaguely aware of it until recently, and I think that even in fluid mechanics this perspective to the subject is not always emphasised. Given the circumstances, I thought it would therefore be appropriate to present Arnold’s original 1966 paper here. (For a more modern treatment of these topics, see the books of Arnold-Khesin and Marsden-Ratiu.)

In order to avoid technical issues, I will work formally, ignoring questions of regularity or integrability, and pretending that infinite-dimensional manifolds behave in exactly the same way as their finite-dimensional counterparts. In the finite-dimensional setting, it is not difficult to make all of the formal discussion below rigorous; but the situation in infinite dimensions is substantially more delicate. (Indeed, it is a notorious open problem whether the Euler equations for incompressible fluids even forms a global continuous flow in a reasonable topology in the first place!) However, I do not want to discuss these analytic issues here; see this paper of Ebin and Marsden for a treatment of these topics.

Semilinear dispersive and wave equations, of which the defocusing nonlinear wave equation

$\displaystyle -\partial_{tt} u + \Delta u = |u|^{p-1} u \ \ \ \ \ (1)$

is a typical example (where ${p>1}$ is a fixed exponent, and ${u: {\bf R}^{1+n} \rightarrow {\bf R}}$ is a scalar field), can be viewed as a “tug of war” between a linear dispersive equation, in this case the linear wave equation

$\displaystyle -\partial_{tt} u + \Delta u = 0 \ \ \ \ \ (2)$

and a nonlinear ODE, in this case the equation

$\displaystyle -\partial_{tt} u = |u|^{p-1} u. \ \ \ \ \ (3)$

If the nonlinear term was not present, leaving only the dispersive equation (2), then as the term “dispersive” suggests, in the asymptotic limit ${t \rightarrow \infty}$, the solution ${u(t,x)}$ would spread out in space and decay in amplitude. For instance, in the model case when ${d=3}$ and the initial position ${u(0,x)}$ vanishes (leaving only the initial velocity ${u_t(0,x)}$ as non-trivial initial data), the solution ${u(t,x)}$ for ${t>0}$ is given by the formula

$\displaystyle u(t,x) = \frac{1}{4\pi t} \int_{|y-x|=t} u_t(0,y)\ d\sigma$

where ${d\sigma}$ is surface measure on the sphere ${\{ y \in {\bf R}^3: |y-x| = t \}}$. (To avoid technical issues, let us restrict attention to classical (smooth) solutions.) Thus, if the initial velocity was bounded and compactly supported, then the solution ${u(t,x)}$ would be bounded by ${O(1/t)}$ and would thus would decay uniformly to zero as ${t \rightarrow \infty}$. Similar phenomena occur for all dimensions greater than ${1}$.

Conversely, if the dispersive term was not present, leaving only the ODE (3), then one no longer expects decay; indeed, given the conserved energy ${\frac{1}{2} u_t^2 + \frac{1}{p+1} |u|^{p+1}}$ for the ODE (3), we do not expect any decay at all (and indeed, solutions are instead periodic in time for each fixed ${x}$, as can easily be seen by viewing the ODE (and the energy curves) in phase space).

Depending on the relative “size” of the dispersive term ${\Delta u}$ and the nonlinear term ${|u|^{p-1} u}$, one can heuristically describe the behaviour of a solution ${u}$ at various positions at times as either being dispersion dominated (in which ${|\Delta u| \gg |u|^p}$), nonlinearity dominated (in which ${|u|^p \gg |\Delta u|}$), or contested (in which ${|\Delta u|}$, ${|u|^p}$ are comparable in size). Very roughly speaking, when one is in the dispersion dominated regime, then perturbation theory becomes effective, and one can often show that the solution to the nonlinear equation indeed behaves like the solution to the linear counterpart, in particular exhibiting decay as ${t \rightarrow \infty}$. In principle, perturbation theory is also available in the nonlinearity dominated regime (in which the dispersion is now viewed as the perturbation, and the nonlinearity as the main term), but in practice this is often difficult to apply (due to the nonlinearity of the approximating equation and the large number of derivatives present in the perturbative term), and so one has to fall back on non-perturbative tools, such as conservation laws and monotonicity formulae. The contested regime is the most interesting, and gives rise to intermediate types of behaviour that are not present in the purely dispersive or purely nonlinear equations, such as solitary wave solutions (solitons) or solutions that blow up in finite time.

In order to analyse how solutions behave in each of these regimes rigorously, one usually works with a variety of function spaces (such as Lebesgue spaces ${L^p}$ and Sobolev spaces ${H^s}$). As such, one generally needs to first establish a number of function space estimates (e.g. Sobolev inequalities, Hölder-type inequalities, Strichartz estimates, etc.) in order to study these equations at the formal level.

Unfortunately, this emphasis on function spaces and their estimates can obscure the underlying physical intuition behind the dynamics of these equations, and the field of analysis of PDE sometimes acquires a reputation for being unduly technical as a consequence. However, as noted in a previous blog post, one can view function space norms as a way to formalise the intuitive notions of the “height” (amplitude) and “width” (wavelength) of a function (wave).

It turns out that one can similarly analyse the behaviour of nonlinear dispersive equations on a similar heuristic level, as that of understanding the dynamics as the amplitude ${A(t)}$ and wavelength ${1/N(t)}$ (or frequency ${N(t)}$) of a wave. Below the fold I give some examples of this heuristic; for sake of concreteness I restrict attention to the nonlinear wave equation (1), though one can of course extend this heuristic to many other models also. Rigorous analogues of the arguments here can be found in several places, such as the book of Shatah and Struwe, or my own book on the subject.

Our study of random matrices, to date, has focused on somewhat general ensembles, such as iid random matrices or Wigner random matrices, in which the distribution of the individual entries of the matrices was essentially arbitrary (as long as certain moments, such as the mean and variance, were normalised). In these notes, we now focus on two much more special, and much more symmetric, ensembles:

• The Gaussian Unitary Ensemble (GUE), which is an ensemble of random ${n \times n}$ Hermitian matrices ${M_n}$ in which the upper-triangular entries are iid with distribution ${N(0,1)_{\bf C}}$, and the diagonal entries are iid with distribution ${N(0,1)_{\bf R}}$, and independent of the upper-triangular ones; and
• The Gaussian random matrix ensemble, which is an ensemble of random ${n \times n}$ (non-Hermitian) matrices ${M_n}$ whose entries are iid with distribution ${N(0,1)_{\bf C}}$.

The symmetric nature of these ensembles will allow us to compute the spectral distribution by exact algebraic means, revealing a surprising connection with orthogonal polynomials and with determinantal processes. This will, for instance, recover the semi-circular law for GUE, but will also reveal fine spacing information, such as the distribution of the gap between adjacent eigenvalues, which is largely out of reach of tools such as the Stieltjes transform method and the moment method (although the moment method, with some effort, is able to control the extreme edges of the spectrum).

Similarly, we will see for the first time the circular law for eigenvalues of non-Hermitian matrices.

There are a number of other highly symmetric ensembles which can also be treated by the same methods, most notably the Gaussian Orthogonal Ensemble (GOE) and the Gaussian Symplectic Ensemble (GSE). However, for simplicity we shall focus just on the above two ensembles. For a systematic treatment of these ensembles, see the text by Deift.