You are currently browsing the category archive for the ‘math.AP’ category.

Let ${L: H \rightarrow H}$ be a self-adjoint operator on a finite-dimensional Hilbert space ${H}$. The behaviour of this operator can be completely described by the spectral theorem for finite-dimensional self-adjoint operators (i.e. Hermitian matrices, when viewed in coordinates), which provides a sequence ${\lambda_1,\ldots,\lambda_n \in {\bf R}}$ of eigenvalues and an orthonormal basis ${e_1,\ldots,e_n}$ of eigenfunctions such that ${L e_i = \lambda_i e_i}$ for all ${i=1,\ldots,n}$. In particular, given any function ${m: \sigma(L) \rightarrow {\bf C}}$ on the spectrum ${\sigma(L) := \{ \lambda_1,\ldots,\lambda_n\}}$ of ${L}$, one can then define the linear operator ${m(L): H \rightarrow H}$ by the formula

$\displaystyle m(L) e_i := m(\lambda_i) e_i,$

which then gives a functional calculus, in the sense that the map ${m \mapsto m(L)}$ is a ${C^*}$-algebra isometric homomorphism from the algebra ${BC(\sigma(L) \rightarrow {\bf C})}$ of bounded continuous functions from ${\sigma(L)}$ to ${{\bf C}}$, to the algebra ${B(H \rightarrow H)}$ of bounded linear operators on ${H}$. Thus, for instance, one can define heat operators ${e^{-tL}}$ for ${t>0}$, Schrödinger operators ${e^{itL}}$ for ${t \in {\bf R}}$, resolvents ${\frac{1}{L-z}}$ for ${z \not \in \sigma(L)}$, and (if ${L}$ is positive) wave operators ${e^{it\sqrt{L}}}$ for ${t \in {\bf R}}$. These will be bounded operators (and, in the case of the Schrödinger and wave operators, unitary operators, and in the case of the heat operators with ${L}$ positive, they will be contractions). Among other things, this functional calculus can then be used to solve differential equations such as the heat equation

$\displaystyle u_t + Lu = 0; \quad u(0) = f \ \ \ \ \ (1)$

the Schrödinger equation

$\displaystyle u_t + iLu = 0; \quad u(0) = f \ \ \ \ \ (2)$

the wave equation

$\displaystyle u_{tt} + Lu = 0; \quad u(0) = f; \quad u_t(0) = g \ \ \ \ \ (3)$

or the Helmholtz equation

$\displaystyle (L-z) u = f. \ \ \ \ \ (4)$

The functional calculus can also be associated to a spectral measure. Indeed, for any vectors ${f, g \in H}$, there is a complex measure ${\mu_{f,g}}$ on ${\sigma(L)}$ with the property that

$\displaystyle \langle m(L) f, g \rangle_H = \int_{\sigma(L)} m(x) d\mu_{f,g}(x);$

indeed, one can set ${\mu_{f,g}}$ to be the discrete measure on ${\sigma(L)}$ defined by the formula

$\displaystyle \mu_{f,g}(E) := \sum_{i: \lambda_i \in E} \langle f, e_i \rangle_H \langle e_i, g \rangle_H.$

One can also view this complex measure as a coefficient

$\displaystyle \mu_{f,g} = \langle \mu f, g \rangle_H$

of a projection-valued measure ${\mu}$ on ${\sigma(L)}$, defined by setting

$\displaystyle \mu(E) f := \sum_{i: \lambda_i \in E} \langle f, e_i \rangle_H e_i.$

Finally, one can view ${L}$ as unitarily equivalent to a multiplication operator ${M: f \mapsto g f}$ on ${\ell^2(\{1,\ldots,n\})}$, where ${g}$ is the real-valued function ${g(i) := \lambda_i}$, and the intertwining map ${U: \ell^2(\{1,\ldots,n\}) \rightarrow H}$ is given by

$\displaystyle U ( (c_i)_{i=1}^n ) := \sum_{i=1}^n c_i e_i,$

so that ${L = U M U^{-1}}$.

It is an important fact in analysis that many of these above assertions extend to operators on an infinite-dimensional Hilbert space ${H}$, so long as one one is careful about what “self-adjoint operator” means; these facts are collectively referred to as the spectral theorem. For instance, it turns out that most of the above claims have analogues for bounded self-adjoint operators ${L: H \rightarrow H}$. However, in the theory of partial differential equations, one often needs to apply the spectral theorem to unbounded, densely defined linear operators ${L: D \rightarrow H}$, which (initially, at least), are only defined on a dense subspace ${D}$ of the Hilbert space ${H}$. A very typical situation arises when ${H = L^2(\Omega)}$ is the square-integrable functions on some domain or manifold ${\Omega}$ (which may have a boundary or be otherwise “incomplete”), and ${D = C^\infty_c(\Omega)}$ are the smooth compactly supported functions on ${\Omega}$, and ${L}$ is some linear differential operator. It is then of interest to obtain the spectral theorem for such operators, so that one build operators such as ${e^{-tL}, e^{itL}, \frac{1}{L-z}, e^{it\sqrt{L}}}$ or to solve equations such as (1), (2), (3), (4).

In order to do this, some necessary conditions on the densely defined operator ${L: D \rightarrow H}$ must be imposed. The most obvious is that of symmetry, which asserts that

$\displaystyle \langle Lf, g \rangle_H = \langle f, Lg \rangle_H \ \ \ \ \ (5)$

for all ${f, g \in D}$. In some applications, one also wants to impose positive definiteness, which asserts that

$\displaystyle \langle Lf, f \rangle_H \geq 0 \ \ \ \ \ (6)$

for all ${f \in D}$. These hypotheses are sufficient in the case when ${L}$ is bounded, and in particular when ${H}$ is finite dimensional. However, as it turns out, for unbounded operators these conditions are not, by themselves, enough to obtain a good spectral theory. For instance, one consequence of the spectral theorem should be that the resolvents ${(L-z)^{-1}}$ are well-defined for any strictly complex ${z}$, which by duality implies that the image of ${L-z}$ should be dense in ${H}$. However, this can fail if one just assumes symmetry, or symmetry and positive definiteness. A well-known example occurs when ${H}$ is the Hilbert space ${H := L^2((0,1))}$, ${D := C^\infty_c((0,1))}$ is the space of test functions, and ${L}$ is the one-dimensional Laplacian ${L := -\frac{d^2}{dx^2}}$. Then ${L}$ is symmetric and positive, but the operator ${L-k^2}$ does not have dense image for any complex ${k}$, since

$\displaystyle \langle (L-\overline{k}^2) f, e^{\overline{k}x} \rangle_H = 0$

for all test functions ${f \in C^\infty_c((0,1))}$, as can be seen from a routine integration by parts. As such, the resolvent map is not everywhere uniquely defined. There is also a lack of uniqueness for the wave, heat, and Schrödinger equations for this operator (note that there are no spatial boundary conditions specified in these equations).

Another example occurs when ${H := L^2((0,+\infty))}$, ${D := C^\infty_c((0,+\infty))}$, ${L}$ is the momentum operator ${L := i \frac{d}{dx}}$. Then the resolvent ${(L-z)^{-1}}$ can be uniquely defined for ${z}$ in the upper half-plane, but not in the lower half-plane, due to the obstruction

$\displaystyle \langle (L-z) f, e^{i \bar{z} x} \rangle_H = 0$

for all test functions ${f}$ (note that the function ${e^{i\bar{z} x}}$ lies in ${L^2((0,+\infty))}$ when ${z}$ is in the lower half-plane). For related reasons, the translation operators ${e^{itL}}$ have a problem with either uniqueness or existence (depending on whether ${t}$ is positive or negative), due to the unspecified boundary behaviour at the origin.

The key property that lets one avoid this bad behaviour is that of essential self-adjointness. Once ${L}$ is essentially self-adjoint, then spectral theorem becomes applicable again, leading to all the expected behaviour (e.g. existence and uniqueness for the various PDE given above).

Unfortunately, the concept of essential self-adjointness is defined rather abstractly, and is difficult to verify directly; unlike the symmetry condition (5) or the positive condition (6), it is not a “local” condition that can be easily verified just by testing ${L}$ on various inputs, but is instead a more “global” condition. In practice, to verify this property, one needs to invoke one of a number of a partial converses to the spectral theorem, which roughly speaking asserts that if at least one of the expected consequences of the spectral theorem is true for some symmetric densely defined operator ${L}$, then ${L}$ is self-adjoint. Examples of “expected consequences” include:

• Existence of resolvents ${(L-z)^{-1}}$ (or equivalently, dense image for ${L-z}$);
• Existence of a contractive heat propagator semigroup ${e^{tL}}$ (in the positive case);
• Existence of a unitary Schrödinger propagator group ${e^{itL}}$;
• Existence of a unitary wave propagator group ${e^{it\sqrt{L}}}$ (in the positive case);
• Existence of a “reasonable” functional calculus.
• Unitary equivalence with a multiplication operator.

Thus, to actually verify essential self-adjointness of a differential operator, one typically has to first solve a PDE (such as the wave, Schrödinger, heat, or Helmholtz equation) by some non-spectral method (e.g. by a contraction mapping argument, or a perturbation argument based on an operator already known to be essentially self-adjoint). Once one can solve one of the PDEs, then one can apply one of the known converse spectral theorems to obtain essential self-adjointness, and then by the forward spectral theorem one can then solve all the other PDEs as well. But there is no getting out of that first step, which requires some input (typically of an ODE, PDE, or geometric nature) that is external to what abstract spectral theory can provide. For instance, if one wants to establish essential self-adjointness of the Laplace-Beltrami operator ${L = -\Delta_g}$ on a smooth Riemannian manifold ${(M,g)}$ (using ${C^\infty_c(M)}$ as the domain space), it turns out (under reasonable regularity hypotheses) that essential self-adjointness is equivalent to geodesic completeness of the manifold, which is a global ODE condition rather than a local one: one needs geodesics to continue indefinitely in order to be able to (unitarily) solve PDEs such as the wave equation, which in turn leads to essential self-adjointness. (Note that the domains ${(0,1)}$ and ${(0,+\infty)}$ in the previous examples were not geodesically complete.) For this reason, essential self-adjointness of a differential operator is sometimes referred to as quantum completeness (with the completeness of the associated Hamilton-Jacobi flow then being the analogous classical completeness).

In these notes, I wanted to record (mostly for my own benefit) the forward and converse spectral theorems, and to verify essential self-adjointness of the Laplace-Beltrami operator on geodesically complete manifolds. This is extremely standard analysis (covered, for instance, in the texts of Reed and Simon), but I wanted to write it down myself to make sure that I really understood this foundational material properly.

In the previous set of notes we saw how a representation-theoretic property of groups, namely Kazhdan’s property (T), could be used to demonstrate expansion in Cayley graphs. In this set of notes we discuss a different representation-theoretic property of groups, namely quasirandomness, which is also useful for demonstrating expansion in Cayley graphs, though in a somewhat different way to property (T). For instance, whereas property (T), being qualitative in nature, is only interesting for infinite groups such as ${SL_d({\bf Z})}$ or ${SL_d({\bf R})}$, and only creates Cayley graphs after passing to a finite quotient, quasirandomness is a quantitative property which is directly applicable to finite groups, and is able to deduce expansion in a Cayley graph, provided that random walks in that graph are known to become sufficiently “flat” in a certain sense.

The definition of quasirandomness is easy enough to state:

Definition 1 (Quasirandom groups) Let ${G}$ be a finite group, and let ${D \geq 1}$. We say that ${G}$ is ${D}$-quasirandom if all non-trivial unitary representations ${\rho: G \rightarrow U(H)}$ of ${G}$ have dimension at least ${D}$. (Recall a representation is trivial if ${\rho(g)}$ is the identity for all ${g \in G}$.)

Exercise 1 Let ${G}$ be a finite group, and let ${D \geq 1}$. A unitary representation ${\rho: G \rightarrow U(H)}$ is said to be irreducible if ${H}$ has no ${G}$-invariant subspaces other than ${\{0\}}$ and ${H}$. Show that ${G}$ is ${D}$-quasirandom if and only if every non-trivial irreducible representation of ${G}$ has dimension at least ${D}$.

Remark 1 The terminology “quasirandom group” was introduced explicitly (though with slightly different notational conventions) by Gowers in 2008 in his detailed study of the concept; the name arises because dense Cayley graphs in quasirandom groups are quasirandom graphs in the sense of Chung, Graham, and Wilson, as we shall see below. This property had already been used implicitly to construct expander graphs by Sarnak and Xue in 1991, and more recently by Gamburd in 2002 and by Bourgain and Gamburd in 2008. One can of course define quasirandomness for more general locally compact groups than the finite ones, but we will only need this concept in the finite case. (A paper of Kunze and Stein from 1960, for instance, exploits the quasirandomness properties of the locally compact group ${SL_2({\bf R})}$ to obtain mixing estimates in that group.)

Quasirandomness behaves fairly well with respect to quotients and short exact sequences:

Exercise 2 Let ${0 \rightarrow H \rightarrow G \rightarrow K \rightarrow 0}$ be a short exact sequence of finite groups ${H,G,K}$.

• (i) If ${G}$ is ${D}$-quasirandom, show that ${K}$ is ${D}$-quasirandom also. (Equivalently: any quotient of a ${D}$-quasirandom finite group is again a ${D}$-quasirandom finite group.)
• (ii) Conversely, if ${H}$ and ${K}$ are both ${D}$-quasirandom, show that ${G}$ is ${D}$-quasirandom also. (In particular, the direct or semidirect product of two ${D}$-quasirandom finite groups is again a ${D}$-quasirandom finite group.)

Informally, we will call ${G}$ quasirandom if it is ${D}$-quasirandom for some “large” ${D}$, though the precise meaning of “large” will depend on context. For applications to expansion in Cayley graphs, “large” will mean “${D \geq |G|^c}$ for some constant ${c>0}$ independent of the size of ${G}$“, but other regimes of ${D}$ are certainly of interest.

The way we have set things up, the trivial group ${G = \{1\}}$ is infinitely quasirandom (i.e. it is ${D}$-quasirandom for every ${D}$). This is however a degenerate case and will not be discussed further here. In the non-trivial case, a finite group can only be quasirandom if it is large and has no large subgroups:

Exercise 3 Let ${D \geq 1}$, and let ${G}$ be a finite ${D}$-quasirandom group.

• (i) Show that if ${G}$ is non-trivial, then ${|G| \geq D+1}$. (Hint: use the mean zero component ${\tau\downharpoonright_{\ell^2(G)_0}}$ of the regular representation ${\tau: G \rightarrow U(\ell^2(G))}$.) In particular, non-trivial finite groups cannot be infinitely quasirandom.
• (ii) Show that any proper subgroup ${H}$ of ${G}$ has index ${[G:H] \geq D+1}$. (Hint: use the mean zero component of the quasiregular representation.)

The following exercise shows that quasirandom groups have to be quite non-abelian, and in particular perfect:

Exercise 4 (Quasirandomness, abelianness, and perfection) Let ${G}$ be a finite group.

• (i) If ${G}$ is abelian and non-trivial, show that ${G}$ is not ${2}$-quasirandom. (Hint: use Fourier analysis or the classification of finite abelian groups.)
• (ii) Show that ${G}$ is ${2}$-quasirandom if and only if it is perfect, i.e. the commutator group ${[G,G]}$ is equal to ${G}$. (Equivalently, ${G}$ is ${2}$-quasirandom if and only if it has no non-trivial abelian quotients.)

Later on we shall see that there is a converse to the above two exercises; any non-trivial perfect finite group with no large subgroups will be quasirandom.

Exercise 5 Let ${G}$ be a finite ${D}$-quasirandom group. Show that for any subgroup ${G'}$ of ${G}$, ${G'}$ is ${D/[G:G']}$-quasirandom, where ${[G:G'] := |G|/|G'|}$ is the index of ${G'}$ in ${G}$. (Hint: use induced representations.)

Now we give an example of a more quasirandom group.

Lemma 2 (Frobenius lemma) If ${F_p}$ is a field of some prime order ${p}$, then ${SL_2(F_p)}$ is ${\frac{p-1}{2}}$-quasirandom.

This should be compared with the cardinality ${|SL_2(F_p)|}$ of the special linear group, which is easily computed to be ${(p^2-1) \times p = p^3 - p}$.

Proof: We may of course take ${p}$ to be odd. Suppose for contradiction that we have a non-trivial representation ${\rho: SL_2(F_p) \rightarrow U_d({\bf C})}$ on a unitary group of some dimension ${d}$ with ${d < \frac{p-1}{2}}$. Set ${a}$ to be the group element

$\displaystyle a := \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix},$

and suppose first that ${\rho(a)}$ is non-trivial. Since ${a^p=1}$, we have ${\rho(a)^p=1}$; thus all the eigenvalues of ${\rho(a)}$ are ${p^{th}}$ roots of unity. On the other hand, by conjugating ${a}$ by diagonal matrices in ${SL_2(F_p)}$, we see that ${a}$ is conjugate to ${a^m}$ (and hence ${\rho(a)}$ conjugate to ${\rho(a)^m}$) whenever ${m}$ is a quadratic residue mod ${p}$. As such, the eigenvalues of ${\rho(a)}$ must be permuted by the operation ${x \mapsto x^m}$ for any quadratic residue mod ${p}$. Since ${\rho(a)}$ has at least one non-trivial eigenvalue, and there are ${\frac{p-1}{2}}$ distinct quadratic residues, we conclude that ${\rho(a)}$ has at least ${\frac{p-1}{2}}$ distinct eigenvalues. But ${\rho(a)}$ is a ${d \times d}$ matrix with ${d < \frac{p-1}{2}}$, a contradiction. Thus ${a}$ lies in the kernel of ${\rho}$. By conjugation, we then see that this kernel contains all unipotent matrices. But these matrices generate ${SL_2(F_p)}$ (see exercise below), and so ${\rho}$ is trivial, a contradiction. $\Box$

Exercise 6 Show that for any prime ${p}$, the unipotent matrices

$\displaystyle \begin{pmatrix} 1 & t \\ 0 & 1 \end{pmatrix}, \begin{pmatrix} 1 & 0 \\ t & 1 \end{pmatrix}$

for ${t}$ ranging over ${F_p}$ generate ${SL_2(F_p)}$ as a group.

Exercise 7 Let ${G}$ be a finite group, and let ${D \geq 1}$. If ${G}$ is generated by a collection ${G_1,\ldots,G_k}$ of ${D}$-quasirandom subgroups, show that ${G}$ is itself ${D}$-quasirandom.

Exercise 8 Show that ${SL_d(F_p)}$ is ${\frac{p-1}{2}}$-quasirandom for any ${d \geq 2}$ and any prime ${p}$. (This is not sharp; the optimal bound here is ${\gg_d p^{d-1}}$, which follows from the results of Landazuri and Seitz.)

As a corollary of the above results and Exercise 2, we see that the projective special linear group ${PSL_d(F_p)}$ is also ${\frac{p-1}{2}}$-quasirandom.

Remark 2 One can ask whether the bound ${\frac{p-1}{2}}$ in Lemma 2 is sharp, assuming of course that ${p}$ is odd. Noting that ${SL_2(F_p)}$ acts linearly on the plane ${F_p^2}$, we see that it also acts projectively on the projective line ${PF_p^1 := (F_p^2 \backslash \{0\}) / F_p^\times}$, which has ${p+1}$ elements. Thus ${SL_2(F_p)}$ acts via the quasiregular representation on the ${p+1}$-dimensional space ${\ell^2(PF_p^1)}$, and also on the ${p}$-dimensional subspace ${\ell^2(PF_p^1)_0}$; this latter representation (known as the Steinberg representation) is irreducible. This shows that the ${\frac{p-1}{2}}$ bound cannot be improved beyond ${p}$. More generally, given any character ${\chi: F_p^\times \rightarrow S^1}$, ${SL_2(F_p)}$ acts on the ${p+1}$-dimensional space ${V_\chi}$ of functions ${f \in \ell^2( F_p^2 \backslash \{0\} )}$ that obey the twisted dilation invariance ${f(tx) = \chi(t) f(x)}$ for all ${t \in F_p^\times}$ and ${x \in F_p^2 \backslash \{0\}}$; these are known as the principal series representations. When ${\chi}$ is the trivial character, this is the quasiregular representation discussed earlier. For most other characters, this is an irreducible representation, but it turns out that when ${\chi}$ is the quadratic representation (thus taking values in ${\{-1,+1\}}$ while being non-trivial), the principal series representation splits into the direct sum of two ${\frac{p+1}{2}}$-dimensional representations, which comes very close to matching the bound in Lemma 2. There is a parallel series of representations to the principal series (known as the discrete series) which is more complicated to describe (roughly speaking, one has to embed ${F_p}$ in a quadratic extension ${F_{p^2}}$ and then use a rotated version of the above construction, to change a split torus into a non-split torus), but can generate irreducible representations of dimension ${\frac{p-1}{2}}$, showing that the bound in Lemma 2 is in fact exactly sharp. These constructions can be generalised to arbitrary finite groups of Lie type using Deligne-Luzstig theory, but this is beyond the scope of this course (and of my own knowledge in the subject).

Exercise 9 Let ${p}$ be an odd prime. Show that for any ${n \geq p+2}$, the alternating group ${A_n}$ is ${p-1}$-quasirandom. (Hint: show that all cycles of order ${p}$ in ${A_n}$ are conjugate to each other in ${A_n}$ (and not just in ${S_n}$); in particular, a cycle is conjugate to its ${j^{th}}$ power for all ${j=1,\ldots,p-1}$. Also, as ${n \geq 5}$, ${A_n}$ is simple, and so the cycles of order ${p}$ generate the entire group.)

Remark 3 By using more precise information on the representations of the alternating group (using the theory of Specht modules and Young tableaux), one can show the slightly sharper statement that ${A_n}$ is ${n-1}$-quasirandom for ${n \geq 6}$ (but is only ${3}$-quasirandom for ${n=5}$ due to icosahedral symmetry, and ${1}$-quasirandom for ${n \leq 4}$ due to lack of perfectness). Using Exercise 3 with the index ${n}$ subgroup ${A_{n-1}}$, we see that the bound ${n-1}$ cannot be improved. Thus, ${A_n}$ (for large ${n}$) is not as quasirandom as the special linear groups ${SL_d(F_p)}$ (for ${p}$ large and ${d}$ bounded), because in the latter case the quasirandomness is as strong as a power of the size of the group, whereas in the former case it is only logarithmic in size.

If one replaces the alternating group ${A_n}$ with the slightly larger symmetric group ${S_n}$, then quasirandomness is destroyed (since ${S_n}$, having the abelian quotient ${S_n/A_n}$, is not perfect); indeed, ${S_n}$ is ${1}$-quasirandom and no better.

Remark 4 Thanks to the monumental achievement of the classification of finite simple groups, we know that apart from a finite number (26, to be precise) of sporadic exceptions, all finite simple groups (up to isomorphism) are either a cyclic group ${{\bf Z}/p{\bf Z}}$, an alternating group ${A_n}$, or is a finite simple group of Lie type such as ${PSL_d(F_p)}$. (We will define the concept of a finite simple group of Lie type more precisely in later notes, but suffice to say for now that such groups are constructed from reductive algebraic groups, for instance ${PSL_d(F_p)}$ is constructed from ${SL_d}$ in characteristic ${p}$.) In the case of finite simple groups ${G}$ of Lie type with bounded rank ${r=O(1)}$, it is known from the work of Landazuri and Seitz that such groups are ${\gg |G|^c}$-quasirandom for some ${c>0}$ depending only on the rank. On the other hand, by the previous remark, the large alternating groups do not have this property, and one can show that the finite simple groups of Lie type with large rank also do not have this property. Thus, we see using the classification that if a finite simple group ${G}$ is ${|G|^c}$-quasirandom for some ${c>0}$ and ${|G|}$ is sufficiently large depending on ${c}$, then ${G}$ is a finite simple group of Lie type with rank ${O_c(1)}$. It would be of interest to see if there was an alternate way to establish this fact that did not rely on the classification, as it may lead to an alternate approach to proving the classification (or perhaps a weakened version thereof).

A key reason why quasirandomness is desirable for the purposes of demonstrating expansion is that quasirandom groups happen to be rapidly mixing at large scales, as we shall see below the fold. As such, quasirandomness is an important tool for demonstrating expansion in Cayley graphs, though because expansion is a phenomenon that must hold at all scales, one needs to supplement quasirandomness with some additional input that creates mixing at small or medium scales also before one can deduce expansion. As an example of this technique of combining quasirandomness with mixing at small and medium scales, we present a proof (due to Sarnak-Xue, and simplified by Gamburd) of a weak version of the famous “3/16 theorem” of Selberg on the least non-trivial eigenvalue of the Laplacian on a modular curve, which among other things can be used to construct a family of expander Cayley graphs in ${SL_2({\bf Z}/N{\bf Z})}$ (compare this with the property (T)-based methods in the previous notes, which could construct expander Cayley graphs in ${SL_d({\bf Z}/N{\bf Z})}$ for any fixed ${d \geq 3}$).

A few days ago, I released a preprint entitled “Localisation and compactness properties of the Navier-Stokes global regularity problem“, discussed in this previous blog post.  As it turns out, I was somewhat impatient to finalise the paper and move on to other things, and the original preprint was still somewhat rough in places (contradicting my own advice on this matter), with a number of typos of minor to moderate severity.  But a bit more seriously, I discovered on a further proofreading that there was a subtle error in a component of the argument that I had believed to be routine – namely the persistence of higher regularity for mild solutions.   As a consequence, some of the implications stated in the first version were not exactly correct as stated; but they can be repaired by replacing a “bad” notion of global regularity for a certain class of data with a “good” notion.   I have completed (and proofread) an updated version of the ms, which should appear at the arXiv link of the paper in a day or two (and which I have also placed at this link).  (In the meantime, it is probably best not to read the original ms too carefully, as this could lead to some confusion.)   I’ve also added a new section that shows that, due to this technicality, one can exhibit smooth $H^1$ initial data to the Navier-Stokes equation for which there are no smooth solutions, which superficially sounds very close to a negative solution to the global regularity problem, but is actually nothing of the sort.

Let me now describe the issue in more detail (and also to explain why I missed it previously).  A standard principle in the theory of evolutionary partial differentiation equations is that regularity in space can be used to imply regularity in time.  To illustrate this, consider a solution $u$ to the supercritical nonlinear wave equation

$-\partial_{tt} u + \Delta u = u^7$  (1)

for some field $u: {\bf R} \times {\bf R}^3 \to {\bf R}$.   Suppose one already knew that $u$ had some regularity in space, and in particular the $C^0_t C^2_x \cap C^1_t C^1_x$ norm of $u$ was bounded (thus $u$ and up to two spatial derivatives of $u$ were bounded).  Then, by (1), we see that two time derivatives of $u$ were also bounded, and one then gets the additional regularity of $C^2_t C^0_x$.

In a similar vein, suppose one initially knew that $u$ had the regularity $C^0_t C^3_x \cap C^1_t C^2_x$.  Then (1) soon tells us that $u$ also has the regularity $C^2_t C^1_x$; then, if one differentiates (1) in time to obtain

$-\partial_{ttt} u + \Delta \partial_t u = 7 u^6 \partial_t u$

one can conclude that $u$ also has the regularity of $C^3_t C^0_x$.  One can continue this process indefinitely; in particular, if one knew that $u \in C^0_t C^\infty_x \cap C^1_t C^\infty_x$, then these sorts of manipulations show that $u$ is infinitely smooth in both space and time.

The issue that caught me by surprise is that for the Navier-Stokes equations

$\partial_t u + (u \cdot \nabla) u =\Delta u -\nabla p$  (2)

$\nabla \cdot u = 0$

(setting the forcing term $f$ equal to zero for simplicity), infinite regularity in space does not automatically imply infinite regularity in time, even if one assumes the initial data lies in a standard function space such as the Sobolev space $H^1_x({\bf R}^3)$.  The problem lies with the pressure term $p$, which is recovered from the velocity via the elliptic equation

$\Delta p = -\nabla^2 \cdot (u \otimes u)$ (3)

that can be obtained by taking the divergence of (2).   This equation is solved by a non-local integral operator:

$\displaystyle p(t,x) = \int_{{\bf R}^3} \frac{\nabla^2 \cdot (u \otimes u)(t,y)}{4\pi |x-y|}\ dy.$

If, say, $u$ lies in $H^1_x({\bf R}^3)$, then there is no difficulty establishing a bound on $p$ in terms of $u$ (for instance, one can use singular integral theory and Sobolev embedding to place $p$ in $L^3_x({\bf R}^3)$.  However, one runs into difficulty when trying to compute time derivatives of $p$.  Differentiating (3) once, one gets

$\Delta \partial_t p = -2\nabla^2 \cdot (u \otimes \partial_t u)$.

At the regularity of $H^1$, one can still (barely) control this quantity by using (2) to expand out $\partial_t u$ and using some integration by parts.  But when one wishes to compute a second time derivative of the pressure, one obtains (after integration by parts) an expansion of the form

$\Delta \partial_{tt} p = -4\nabla^2 \cdot (\Delta u \otimes \Delta u) + \ldots$

and now there is not enough regularity on $u$ available to get any control on $\partial_{tt} p$, even if one assumes that $u$ is smooth.   Indeed, following this observation, I was able to show that given generic smooth $H^1$ data, the pressure $p$ will instantaneously fail to be $C^2$ in time, and thence (by (2)) the velocity will instantaneously fail to be $C^3$ in time.  (Switching to the vorticity formulation buys one further degree of time differentiability, but does not fully eliminate the problem; the vorticity $\omega$ will fail to be $C^4$ in time.  Switching to material coordinates seems to makes things very slightly better, but I believe there is still a breakdown of time regularity in these coordinates also.)

For later times t>0 (and assuming homogeneous data f=0 for simplicity), this issue no longer arises, because of the instantaneous smoothing effect of the Navier-Stokes flow, which for instance will upgrade $H^1_x$ regularity to $H^\infty_x$ regularity instantaneously.  It is only the initial time at which some time irregularity can occur.

This breakdown of regularity does not actually impact the original formulation of the Clay Millennium Prize problem, though, because in that problem the initial velocity is required to be Schwartz class (so all derivatives are rapidly decreasing).  In this class, the regularity theory works as expected; if one has a solution which already has some reasonable regularity (e.g. a mild $H^1$ solution) and the data is Schwartz, then the solution will be smooth in spacetime.   (Another class where things work as expected is when the vorticity is Schwartz; in such cases, the solution remains smooth in both space and time (for short times, at least), and the Schwartz nature of the vorticity is preserved (because the vorticity is subject to fewer non-local effects than the velocity, as it is not directly affected by the pressure).)

This issue means that one of the implications in the original paper (roughly speaking, that global regularity for Schwartz data implies global regularity for smooth $H^1$ data) is not correct as stated.  But this can be fixed by weakening the notion of global regularity in the latter setting, by limiting the amount of time differentiability available at the initial time.  More precisely, call a solution $u: [0,T] \times {\bf R}^3 \to {\bf R}^3$ and $p: [0,T] \times {\bf R}^3 \to {\bf R}$ almost smooth if

• $u$ and $p$ are smooth on the half-open slab $(0,T] \times {\bf R}^3$; and
• For every $k \geq 0$, $\nabla^k_x u, \nabla^k_x p, \nabla^x_u \partial_t u$ exist and are continuous on the full slab $[0,T] \times {\bf R}^3$.

Thus, an almost smooth solution is the same concept as a smooth solution, except that at time zero, the velocity field is only $C^1_t C^\infty_x$, and the pressure field is only $C^0_t C^\infty_x$.  This is still enough regularity to interpret the Navier-Stokes equation (2) in a classical manner, but falls slightly short of full smoothness.

(I had already introduced this notion of almost smoothness in the more general setting of smooth finite energy solutions in the first draft of this paper, but had failed to realise that it was also necessary in the smooth $H^1$ setting also.)

One can now “fix” the global regularity conjectures for Navier-Stokes in the smooth $H^1$ or smooth finite energy setting by requiring the solutions to merely be almost smooth instead of smooth.  Once one does so, the results in my paper then work as before: roughly speaking, if one knows that Schwartz data produces smooth solutions, one can conclude that smooth $H^1$ or smooth finite energy data produces almost smooth solutions (and the paper now contains counterexamples to show that one does not always have smooth solutions in this category).

The diagram of implications between conjectures has been adjusted to reflect this issue, and now reads as follows:

I’ve just uploaded to the arXiv my paper “Localisation and compactness properties of the Navier-Stokes global regularity problem“, submitted to Analysis and PDE. This paper concerns the global regularity problem for the Navier-Stokes system of equations

$\displaystyle \partial_t u + (u \cdot \nabla) u = \Delta u - \nabla p + f \ \ \ \ \ (1)$

$\displaystyle \nabla \cdot u = 0 \ \ \ \ \ (2)$

$\displaystyle u(0,\cdot) = u_0 \ \ \ \ \ (3)$

in three dimensions. Thus, we specify initial data ${(u_0,f,T)}$, where ${0 < T < \infty}$ is a time, ${u_0: {\bf R}^3 \rightarrow {\bf R}^3}$ is the initial velocity field (which, in order to be compatible with (2), (3), is required to be divergence-free), ${f: [0,T] \times {\bf R}^3 \rightarrow {\bf R}^3}$ is the forcing term, and then seek to extend this initial data to a solution ${(u,p,u_0,f,T)}$ with this data, where the velocity field ${u: [0,T] \times {\bf R}^3 \rightarrow {\bf R}^3}$ and pressure term ${p: [0,T] \times {\bf R}^3 \rightarrow {\bf R}}$ are the unknown fields.

Roughly speaking, the global regularity problem asserts that given every smooth set of initial data ${(u_0,f,T)}$, there exists a smooth solution ${(u,p,u_0,f,T)}$ to the Navier-Stokes equation with this data. However, this is not a good formulation of the problem because it does not exclude the possibility that one or more of the fields ${u_0, f, u, p}$ grows too fast at spatial infinity. This problem is evident even for the much simpler heat equation

$\displaystyle \partial_t u = \Delta u$

$\displaystyle u(0,\cdot) = u_0.$

As long as one has some mild conditions at infinity on the smooth initial data ${u_0: {\bf R}^3 \rightarrow {\bf R}}$ (e.g. polynomial growth at spatial infinity), then one can solve this equation using the fundamental solution of the heat equation:

$\displaystyle u(t,x) = \frac{1}{(4\pi t)^{3/2}} \int_{{\bf R}^3} u_0(y) e^{-|x-y|^2/4t}\ dy.$

If furthermore ${u}$ is a tempered distribution, one can use Fourier-analytic methods to show that this is the unique solution to the heat equation with this data. But once one allows sufficiently rapid growth at spatial infinity, existence and uniqueness can break down. Consider for instance the backwards heat kernel

$\displaystyle u(t,x) = \frac{1}{(4\pi(T-t))^{3/2}} e^{|x|^2/(T-t)}$

for some ${T>0}$, which is smooth (albeit exponentially growing) at time zero, and is a smooth solution to the heat equation for ${0 \leq t < T}$, but develops a dramatic singularity at time ${t=T}$. A famous example of Tychonoff from 1935, based on a power series construction, also shows that uniqueness for the heat equation can also fail once growth conditions are removed. An explicit example of non-uniqueness for the heat equation is given by the contour integral

$\displaystyle u(t,x_1,x_2,x_3) = \int_\gamma \exp(e^{\pi i/4} x_1 z + e^{5\pi i/8} z^{3/2} - itz^2)\ dz$

where ${\gamma}$ is the ${L}$-shaped contour consisting of the positive real axis and the upper imaginary axis, with ${z^{3/2}}$ being interpreted with the standard branch (with cut on the negative axis). One can show by contour integration that this function solves the heat equation and is smooth (but rapidly growing at infinity), and vanishes for ${t<0}$, but is not identically zero for ${t>0}$.

Thus, in order to obtain a meaningful (and physically realistic) problem, one needs to impose some decay (or at least limited growth) hypotheses on the data ${u_0,f}$ and solution ${u,p}$ in addition to smoothness. For the data, one can impose a variety of such hypotheses, including the following:

• (Finite energy data) One has ${\|u_0\|_{L^2_x({\bf R}^3)} < \infty}$ and ${\| f \|_{L^\infty_t L^2_x([0,T] \times {\bf R}^3)} < \infty}$.
• (${H^1}$ data) One has ${\|u_0\|_{H^1_x({\bf R}^3)} < \infty}$ and ${\| f \|_{L^\infty_t H^1_x([0,T] \times {\bf R}^3)} < \infty}$.
• (Schwartz data) One has ${\sup_{x \in {\bf R}^3} ||x|^m \nabla_x^k u_0(x)| < \infty}$ and ${\sup_{(t,x) \in [0,T] \times {\bf R}^3} ||x|^m \nabla_x^k \partial_t^l f(t,x)| < \infty}$ for all ${m,k,l \geq 0}$.
• (Periodic data) There is some ${0 < L < \infty}$ such that ${u_0(x+Lk) = u_0(x)}$ and ${f(t,x+Lk) = f(t,x)}$ for all ${(t,x) \in [0,T] \times {\bf R}^3}$ and ${k \in {\bf Z}^3}$.
• (Homogeneous data) ${f=0}$.

Note that smoothness alone does not necessarily imply finite energy, ${H^1}$, or the Schwartz property. For instance, the (scalar) function ${u(x) = \exp( i |x|^{10} ) (1+|x|)^{-2}}$ is smooth and finite energy, but not in ${H^1}$ or Schwartz. Periodicity is of course incompatible with finite energy, ${H^1}$, or the Schwartz property, except in the trivial case when the data is identically zero.

Similarly, one can impose conditions at spatial infinity on the solution, such as the following:

• (Finite energy solution) One has ${\| u \|_{L^\infty_t L^2_x([0,T] \times {\bf R}^3)} < \infty}$.
• (${H^1}$ solution) One has ${\| u \|_{L^\infty_t H^1_x([0,T] \times {\bf R}^3)} < \infty}$ and ${\| u \|_{L^2_t H^2_x([0,T] \times {\bf R}^3)} < \infty}$.
• (Partially periodic solution) There is some ${0 < L < \infty}$ such that ${u(t,x+Lk) = u(t,x)}$ for all ${(t,x) \in [0,T] \times {\bf R}^3}$ and ${k \in {\bf Z}^3}$.
• (Fully periodic solution) There is some ${0 < L < \infty}$ such that ${u(t,x+Lk) = u(t,x)}$ and ${p(t,x+Lk) = p(t,x)}$ for all ${(t,x) \in [0,T] \times {\bf R}^3}$ and ${k \in {\bf Z}^3}$.

(The ${L^2_t H^2_x}$ component of the ${H^1}$ solution is for technical reasons, and should not be paid too much attention for this discussion.) Note that we do not consider the notion of a Schwartz solution; as we shall see shortly, this is too restrictive a concept of solution to the Navier-Stokes equation.

Finally, one can downgrade the regularity of the solution down from smoothness. There are many ways to do so; two such examples include

• (${H^1}$ mild solutions) The solution is not smooth, but is ${H^1}$ (in the preceding sense) and solves the equation (1) in the sense that the Duhamel formula

$\displaystyle u(t) = e^{t\Delta} u_0 + \int_0^t e^{(t-t')\Delta} (-(u\cdot\nabla) u-\nabla p+f)(t')\ dt'$

holds.

• (Leray-Hopf weak solution) The solution ${u}$ is not smooth, but lies in ${L^\infty_t L^2_x \cap L^2_t H^1_x}$, solves (1) in the sense of distributions (after rewriting the system in divergence form), and obeys an energy inequality.

Finally, one can ask for two types of global regularity results on the Navier-Stokes problem: a qualitative regularity result, in which one merely provides existence of a smooth solution without any explicit bounds on that solution, and a quantitative regularity result, which provides bounds on the solution in terms of the initial data, e.g. a bound of the form

$\displaystyle \| u \|_{L^\infty_t H^1_x([0,T] \times {\bf R}^3)} \leq F( \|u_0\|_{H^1_x({\bf R}^3)} + \|f\|_{L^\infty_t H^1_x([0,T] \times {\bf R}^3)}, T )$

for some function ${F: {\bf R}^+ \times {\bf R}^+ \rightarrow {\bf R}^+}$. One can make a further distinction between local quantitative results, in which ${F}$ is allowed to depend on ${T}$, and global quantitative results, in which there is no dependence on ${T}$ (the latter is only reasonable though in the homogeneous case, or if ${f}$ has some decay in time).

By combining these various hypotheses and conclusions, we see that one can write down quite a large number of slightly different variants of the global regularity problem. In the official formulation of the regularity problem for the Clay Millennium prize, a positive correct solution to either of the following two problems would be accepted for the prize:

• Conjecture 1.4 (Qualitative regularity for homogeneous periodic data) If ${(u_0,0,T)}$ is periodic, smooth, and homogeneous, then there exists a smooth partially periodic solution ${(u,p,u_0,0,T)}$ with this data.
• Conjecture 1.3 (Qualitative regularity for homogeneous Schwartz data) If ${(u_0,0,T)}$ is Schwartz and homogeneous, then there exists a smooth finite energy solution ${(u,p,u_0,0,T)}$ with this data.

(The numbering here corresponds to the numbering in the paper.)

Furthermore, a negative correct solution to either of the following two problems would also be accepted for the prize:

• Conjecture 1.6 (Qualitative regularity for periodic data) If ${(u_0,f,T)}$ is periodic and smooth, then there exists a smooth partially periodic solution ${(u,p,u_0,f,T)}$ with this data.
• Conjecture 1.5 (Qualitative regularity for Schwartz data) If ${(u_0,f,T)}$ is Schwartz, then there exists a smooth finite energy solution ${(u,p,u_0,f,T)}$ with this data.

I am not announcing any major progress on these conjectures here. What my paper does study, though, is the question of whether the answer to these conjectures is somehow sensitive to the choice of formulation. For instance:

1. Note in the periodic formulations of the Clay prize problem that the solution is only required to be partially periodic, rather than fully periodic; thus the pressure has no periodicity hypothesis. One can ask the extent to which the above problems change if one also requires pressure periodicity.
2. In another direction, one can ask the extent to which quantitative formulations of the Navier-Stokes problem are stronger than their qualitative counterparts; in particular, whether it is possible that each choice of initial data in a certain class leads to a smooth solution, but with no uniform bound on that solution in terms of various natural norms of the data.
3. Finally, one can ask the extent to which the conjecture depends on the category of data. For instance, could it be that global regularity is true for smooth periodic data but false for Schwartz data? True for Schwartz data but false for smooth ${H^1}$ data? And so forth.

One motivation for the final question (which was posed to me by my colleague, Andrea Bertozzi) is that the Schwartz property on the initial data ${u_0}$ tends to be instantly destroyed by the Navier-Stokes flow. This can be seen by introducing the vorticity ${\omega := \nabla \times u}$. If ${u(t)}$ is Schwartz, then from Stokes’ theorem we necessarily have vanishing of certain moments of the vorticity, for instance:

$\displaystyle \int_{{\bf R}^3} \omega_1 (x_2^2-x_3^2)\ dx = 0.$

On the other hand, some integration by parts using (1) reveals that such moments are usually not preserved by the flow; for instance, one has the law

$\displaystyle \partial_t \int_{{\bf R}^3} \omega_1(t,x) (x_2^2-x_3^2)\ dx = 4\int_{{\bf R}^3} u_2(t,x) u_3(t,x)\ dx,$

and one can easily concoct examples for which the right-hand side is non-zero at time zero. This suggests that the Schwartz class may be unnecessarily restrictive for Conjecture 1.3 or Conjecture 1.5.

My paper arose out of an attempt to address these three questions, and ended up obtaining partial results in all three directions. Roughly speaking, the results that address these three questions are as follows:

1. (Homogenisation) If one only assumes partial periodicity instead of full periodicity, then the forcing term ${f}$ becomes irrelevant. In particular, Conjecture 1.4 and Conjecture 1.6 are equivalent.
2. (Concentration compactness) In the ${H^1}$ category (both periodic and nonperiodic, homogeneous or nonhomogeneous), the qualitative and quantitative formulations of the Navier-Stokes global regularity problem are essentially equivalent.
3. (Localisation) The (inhomogeneous) Navier-Stokes problems in the Schwartz, smooth ${H^1}$, and finite energy categories are essentially equivalent to each other, and are also implied by the (fully) periodic version of these problems.

The first two of these families of results are relatively routine, drawing on existing methods in the literature; the localisation results though are somewhat more novel, and introduce some new local energy and local enstrophy estimates which may be of independent interest.

Broadly speaking, the moral to draw from these results is that the precise formulation of the Navier-Stokes equation global regularity problem is only of secondary importance; modulo a number of caveats and technicalities, the various formulations are close to being equivalent, and a breakthrough on any one of the formulations is likely to lead (either directly or indirectly) to a comparable breakthrough on any of the others.

This is only a caricature of the actual implications, though. Below is the diagram from the paper indicating the various formulations of the Navier-Stokes equations, and the known implications between them:

The above three streams of results are discussed in more detail below the fold.

Perhaps the most fundamental differential operator on Euclidean space ${{\bf R}^d}$ is the Laplacian

$\displaystyle \Delta := \sum_{j=1}^d \frac{\partial^2}{\partial x_j^2}.$

The Laplacian is a linear translation-invariant operator, and as such is necessarily diagonalised by the Fourier transform

$\displaystyle \hat f(\xi) := \int_{{\bf R}^d} f(x) e^{-2\pi i x \cdot \xi}\ dx.$

Indeed, we have

$\displaystyle \widehat{\Delta f}(\xi) = - 4 \pi^2 |\xi|^2 \hat f(\xi)$

for any suitably nice function ${f}$ (e.g. in the Schwartz class; alternatively, one can work in very rough classes, such as the space of tempered distributions, provided of course that one is willing to interpret all operators in a distributional or weak sense).

Because of this explicit diagonalisation, it is a straightforward manner to define spectral multipliers ${m(-\Delta)}$ of the Laplacian for any (measurable, polynomial growth) function ${m: [0,+\infty) \rightarrow {\bf C}}$, by the formula

$\displaystyle \widehat{m(-\Delta) f}(\xi) := m( 4\pi^2 |\xi|^2 ) \hat f(\xi).$

(The presence of the minus sign in front of the Laplacian has some minor technical advantages, as it makes ${-\Delta}$ positive semi-definite. One can also define spectral multipliers more abstractly from general functional calculus, after establishing that the Laplacian is essentially self-adjoint.) Many of these multipliers are of importance in PDE and analysis, such as the fractional derivative operators ${(-\Delta)^{s/2}}$, the heat propagators ${e^{t\Delta}}$, the (free) Schrödinger propagators ${e^{it\Delta}}$, the wave propagators ${e^{\pm i t \sqrt{-\Delta}}}$ (or ${\cos(t \sqrt{-\Delta})}$ and ${\frac{\sin(t\sqrt{-\Delta})}{\sqrt{-\Delta}}}$, depending on one’s conventions), the spectral projections ${1_I(\sqrt{-\Delta})}$, the Bochner-Riesz summation operators ${(1 + \frac{\Delta}{4\pi^2 R^2})_+^\delta}$, or the resolvents ${R(z) := (-\Delta-z)^{-1}}$.

Each of these families of multipliers are related to the others, by means of various integral transforms (and also, in some cases, by analytic continuation). For instance:

1. Using the Laplace transform, one can express (sufficiently smooth) multipliers in terms of heat operators. For instance, using the identity

$\displaystyle \lambda^{s/2} = \frac{1}{\Gamma(-s/2)} \int_0^\infty t^{-1-s/2} e^{-t\lambda}\ dt$

(using analytic continuation if necessary to make the right-hand side well-defined), with ${\Gamma}$ being the Gamma function, we can write the fractional derivative operators in terms of heat kernels:

$\displaystyle (-\Delta)^{s/2} = \frac{1}{\Gamma(-s/2)} \int_0^\infty t^{-1-s/2} e^{t\Delta}\ dt. \ \ \ \ \ (1)$

2. Using analytic continuation, one can connect heat operators ${e^{t\Delta}}$ to Schrödinger operators ${e^{it\Delta}}$, a process also known as Wick rotation. Analytic continuation is a notoriously unstable process, and so it is difficult to use analytic continuation to obtain any quantitative estimates on (say) Schrödinger operators from their heat counterparts; however, this procedure can be useful for propagating identities from one family to another. For instance, one can derive the fundamental solution for the Schrödinger equation from the fundamental solution for the heat equation by this method.
3. Using the Fourier inversion formula, one can write general multipliers as integral combinations of Schrödinger or wave propagators; for instance, if ${z}$ lies in the upper half plane ${{\bf H} := \{ z \in {\bf C}: \hbox{Im} z > 0 \}}$, one has

$\displaystyle \frac{1}{x-z} = i\int_0^\infty e^{-itx} e^{itz}\ dt$

for any real number ${x}$, and thus we can write resolvents in terms of Schrödinger propagators:

$\displaystyle R(z) = i\int_0^\infty e^{it\Delta} e^{itz}\ dt. \ \ \ \ \ (2)$

In a similar vein, if ${k \in {\bf H}}$, then

$\displaystyle \frac{1}{x^2-k^2} = \frac{i}{k} \int_0^\infty \cos(tx) e^{ikt}\ dt$

for any ${x>0}$, so one can also write resolvents in terms of wave propagators:

$\displaystyle R(k^2) = \frac{i}{k} \int_0^\infty \cos(t\sqrt{-\Delta}) e^{ikt}\ dt. \ \ \ \ \ (3)$

4. Using the Cauchy integral formula, one can express (sufficiently holomorphic) multipliers in terms of resolvents (or limits of resolvents). For instance, if ${t > 0}$, then from the Cauchy integral formula (and Jordan’s lemma) one has

$\displaystyle e^{itx} = \frac{1}{2\pi i} \lim_{\epsilon \rightarrow 0^+} \int_{\bf R} \frac{e^{ity}}{y-x+i\epsilon}\ dy$

for any ${x \in {\bf R}}$, and so one can (formally, at least) write Schrödinger propagators in terms of resolvents:

$\displaystyle e^{-it\Delta} = - \frac{1}{2\pi i} \lim_{\epsilon \rightarrow 0^+} \int_{\bf R} e^{ity} R(y+i\epsilon)\ dy. \ \ \ \ \ (4)$

5. The imaginary part of ${\frac{1}{\pi} \frac{1}{x-(y+i\epsilon)}}$ is the Poisson kernel ${\frac{\epsilon}{\pi} \frac{1}{(y-x)^2+\epsilon^2}}$, which is an approximation to the identity. As a consequence, for any reasonable function ${m(x)}$, one has (formally, at least)

$\displaystyle m(x) = \lim_{\epsilon \rightarrow 0^+} \frac{1}{\pi} \int_{\bf R} (\hbox{Im} \frac{1}{x-(y+i\epsilon)}) m(y)\ dy$

which leads (again formally) to the ability to express arbitrary multipliers in terms of imaginary (or skew-adjoint) parts of resolvents:

$\displaystyle m(-\Delta) = \lim_{\epsilon \rightarrow 0^+} \frac{1}{\pi} \int_{\bf R} (\hbox{Im} R(y+i\epsilon)) m(y)\ dy. \ \ \ \ \ (5)$

Among other things, this type of formula (with ${-\Delta}$ replaced by a more general self-adjoint operator) is used in the resolvent-based approach to the spectral theorem (by using the limiting imaginary part of resolvents to build spectral measure). Note that one can also express ${\hbox{Im} R(y+i\epsilon)}$ as ${\frac{1}{2i} (R(y+i\epsilon) - R(y-i\epsilon))}$.

Remark 1 The ability of heat operators, Schrödinger propagators, wave propagators, or resolvents to generate other spectral multipliers can be viewed as a sort of manifestation of the Stone-Weierstrass theorem (though with the caveat that the spectrum of the Laplacian is non-compact and so the Stone-Weierstrass theorem does not directly apply). Indeed, observe the *-algebra type properties

$\displaystyle e^{s\Delta} e^{t\Delta} = e^{(s+t)\Delta}; \quad (e^{s\Delta})^* = e^{s\Delta}$

$\displaystyle e^{is\Delta} e^{it\Delta} = e^{i(s+t)\Delta}; \quad (e^{is\Delta})^* = e^{-is\Delta}$

$\displaystyle e^{is\sqrt{-\Delta}} e^{it\sqrt{-\Delta}} = e^{i(s+t)\sqrt{-\Delta}}; \quad (e^{is\sqrt{-\Delta}})^* = e^{-is\sqrt{-\Delta}}$

$\displaystyle R(z) R(w) = \frac{R(w)-R(z)}{z-w}; \quad R(z)^* = R(\overline{z}).$

Because of these relationships, it is possible (in principle, at least), to leverage one’s understanding one family of spectral multipliers to gain control on another family of multipliers. For instance, the fact that the heat operators ${e^{t\Delta}}$ have non-negative kernel (a fact which can be seen from the maximum principle, or from the Brownian motion interpretation of the heat kernels) implies (by (1)) that the fractional integral operators ${(-\Delta)^{-s/2}}$ for ${s>0}$ also have non-negative kernel. Or, the fact that the wave equation enjoys finite speed of propagation (and hence that the wave propagators ${\cos(t\sqrt{-\Delta})}$ have distributional convolution kernel localised to the ball of radius ${|t|}$ centred at the origin), can be used (by (3)) to show that the resolvents ${R(k^2)}$ have a convolution kernel that is essentially localised to the ball of radius ${O( 1 / |\hbox{Im}(k)| )}$ around the origin.

In this post, I would like to continue this theme by using the resolvents ${R(z) = (-\Delta-z)^{-1}}$ to control other spectral multipliers. These resolvents are well-defined whenever ${z}$ lies outside of the spectrum ${[0,+\infty)}$ of the operator ${-\Delta}$. In the model three-dimensional case ${d=3}$, they can be defined explicitly by the formula

$\displaystyle R(k^2) f(x) = \int_{{\bf R}^3} \frac{e^{ik|x-y|}}{4\pi |x-y|} f(y)\ dy$

whenever ${k}$ lives in the upper half-plane ${\{ k \in {\bf C}: \hbox{Im}(k) > 0 \}}$, ensuring the absolute convergence of the integral for test functions ${f}$. (In general dimension, explicit formulas are still available, but involve Bessel functions. But asymptotically at least, and ignoring higher order terms, one simply replaces ${\frac{e^{ik|x-y|}}{4\pi |x-y|}}$ by ${\frac{e^{ik|x-y|}}{c_d |x-y|^{d-2}}}$ for some explicit constant ${c_d}$.) It is an instructive exercise to verify that this resolvent indeed inverts the operator ${-\Delta-k^2}$, either by using Fourier analysis or by Green’s theorem.

Henceforth we restrict attention to three dimensions ${d=3}$ for simplicity. One consequence of the above explicit formula is that for positive real ${\lambda > 0}$, the resolvents ${R(\lambda+i\epsilon)}$ and ${R(\lambda-i\epsilon)}$ tend to different limits as ${\epsilon \rightarrow 0}$, reflecting the jump discontinuity in the resolvent function at the spectrum; as one can guess from formulae such as (4) or (5), such limits are of interest for understanding many other spectral multipliers. Indeed, for any test function ${f}$, we see that

$\displaystyle \lim_{\epsilon \rightarrow 0^+} R(\lambda+i\epsilon) f(x) = \int_{{\bf R}^3} \frac{e^{i\sqrt{\lambda}|x-y|}}{4\pi |x-y|} f(y)\ dy$

and

$\displaystyle \lim_{\epsilon \rightarrow 0^+} R(\lambda-i\epsilon) f(x) = \int_{{\bf R}^3} \frac{e^{-i\sqrt{\lambda}|x-y|}}{4\pi |x-y|} f(y)\ dy.$

Both of these functions

$\displaystyle u_\pm(x) := \int_{{\bf R}^3} \frac{e^{\pm i\sqrt{\lambda}|x-y|}}{4\pi |x-y|} f(y)\ dy$

solve the Helmholtz equation

$\displaystyle (-\Delta-\lambda) u_\pm = f, \ \ \ \ \ (6)$

but have different asymptotics at infinity. Indeed, if ${\int_{{\bf R}^3} f(y)\ dy = A}$, then we have the asymptotic

$\displaystyle u_\pm(x) = \frac{A e^{\pm i \sqrt{\lambda}|x|}}{4\pi|x|} + O( \frac{1}{|x|^2}) \ \ \ \ \ (7)$

as ${|x| \rightarrow \infty}$, leading also to the Sommerfeld radiation condition

$\displaystyle u_\pm(x) = O(\frac{1}{|x|}); \quad (\partial_r \mp i\sqrt{\lambda}) u_\pm(x) = O( \frac{1}{|x|^2}) \ \ \ \ \ (8)$

where ${\partial_r := \frac{x}{|x|} \cdot \nabla_x}$ is the outgoing radial derivative. Indeed, one can show using an integration by parts argument that ${u_\pm}$ is the unique solution of the Helmholtz equation (6) obeying (8) (see below). ${u_+}$ is known as the outward radiating solution of the Helmholtz equation (6), and ${u_-}$ is known as the inward radiating solution. Indeed, if one views the function ${u_\pm(t,x) := e^{-i\lambda t} u_\pm(x)}$ as a solution to the inhomogeneous Schrödinger equation

$\displaystyle (i\partial_t + \Delta) u_\pm = - e^{-i\lambda t} f$

and using the de Broglie law that a solution to such an equation with wave number ${k \in {\bf R}^3}$ (i.e. resembling ${A e^{i k \cdot x}}$ for some amplitide ${A}$) should propagate at (group) velocity ${2k}$, we see (heuristically, at least) that the outward radiating solution will indeed propagate radially away from the origin at speed ${2\sqrt{\lambda}}$, while inward radiating solution propagates inward at the same speed.

There is a useful quantitative version of the convergence

$\displaystyle R(\lambda \pm i\epsilon) f \rightarrow u_\pm, \ \ \ \ \ (9)$

known as the limiting absorption principle:

Theorem 1 (Limiting absorption principle) Let ${f}$ be a test function on ${{\bf R}^3}$, let ${\lambda > 0}$, and let ${\sigma > 0}$. Then one has

$\displaystyle \| R(\lambda \pm i\epsilon) f \|_{H^{0,-1/2-\sigma}({\bf R}^3)} \leq C_\sigma \lambda^{-1/2} \|f\|_{H^{0,1/2+\sigma}({\bf R}^3)}$

for all ${\epsilon > 0}$, where ${C_\sigma > 0}$ depends only on ${\sigma}$, and ${H^{0,s}({\bf R}^3)}$ is the weighted norm

$\displaystyle \|f\|_{H^{0,s}({\bf R}^3)} := \| \langle x \rangle^s f \|_{L^2_x({\bf R}^3)}$

and ${\langle x \rangle := (1+|x|^2)^{1/2}}$.

This principle allows one to extend the convergence (9) from test functions ${f}$ to all functions in the weighted space ${H^{0,1/2+\sigma}}$ by a density argument (though the radiation condition (8) has to be adapted suitably for this scale of spaces when doing so). The weighted space ${H^{0,-1/2-\sigma}}$ on the left-hand side is optimal, as can be seen from the asymptotic (7); a duality argument similarly shows that the weighted space ${H^{0,1/2+\sigma}}$ on the right-hand side is also optimal.

We prove this theorem below the fold. As observed long ago by Kato (and also reproduced below), this estimate is equivalent (via a Fourier transform in the spectral variable ${\lambda}$) to a useful estimate for the free Schrödinger equation known as the local smoothing estimate, which in particular implies the well-known RAGE theorem for that equation; it also has similar consequences for the free wave equation. As we shall see, it also encodes some spectral information about the Laplacian; for instance, it can be used to show that the Laplacian has no eigenvalues, resonances, or singular continuous spectrum. These spectral facts are already obvious from the Fourier transform representation of the Laplacian, but the point is that the limiting absorption principle also applies to more general operators for which the explicit diagonalisation afforded by the Fourier transform is not available. (Igor Rodnianski and I are working on a paper regarding this topic, of which I hope to say more about soon.)

In order to illustrate the main ideas and suppress technical details, I will be a little loose with some of the rigorous details of the arguments, and in particular will be manipulating limits and integrals at a somewhat formal level.

As we are all now very much aware, tsunamis are water waves that start in the deep ocean, usually because of an underwater earthquake (though tsunamis can also be caused by underwater landslides or volcanoes), and then propagate towards shore. Initially, tsunamis have relatively small amplitude (a metre or so is typical), which would seem to render them as harmless as wind waves. And indeed, tsunamis often pass by ships in deep ocean without anyone on board even noticing.

However, being generated by an event as large as an earthquake, the wavelength of the tsunami is huge – 200 kilometres is typical (in contrast with wind waves, whose wavelengths are typically closer to 100 metres). In particular, the wavelength of the tsunami is far greater than the depth of the ocean (which is typically 2-3 kilometres). As such, even in the deep ocean, the dynamics of tsunamis are essentially governed by the shallow water equations. One consequence of these equations is that the speed of propagation ${v}$ of a tsunami can be approximated by the formula

$\displaystyle v \approx \sqrt{g b} \ \ \ \ \ (1)$

where ${b}$ is the depth of the ocean, and ${g \approx 9.8 ms^{-2}}$ is the force of gravity. As such, tsunamis in deep water move very fast – speeds such as 500 kilometres per hour (300 miles per hour) are quite typical; enough to travel from Japan to the US, for instance, in less than a day. Ultimately, this is due to the incompressibility of water (and conservation of mass); the massive net pressure (or more precisely, spatial variations in this pressure) of a very broad and deep wave of water forces the profile of the wave to move horizontally at vast speeds. (Note though that this is the phase velocity of the tsunami wave, and not the velocity of the water molecues themselves, which are far slower.)

As the tsunami approaches shore, the depth ${b}$ of course decreases, causing the tsunami to slow down, at a rate proportional to the square root of the depth, as per (1). Unfortunately, wave shoaling then forces the amplitude ${A}$ to increase at an inverse rate governed by Green’s law,

$\displaystyle A \propto \frac{1}{b^{1/4}} \ \ \ \ \ (2)$

at least until the amplitude becomes comparable to the water depth (at which point the assumptions that underlie the above approximate results break down; also, in two (horizontal) spatial dimensions there will be some decay of amplitude as the tsunami spreads outwards). If one starts with a tsunami whose initial amplitude was ${A_0}$ at depth ${b_0}$ and computes the point at which the amplitude ${A}$ and depth ${b}$ become comparable using the proportionality relationship (2), some high school algebra then reveals that at this point, amplitude of a tsunami (and the depth of the water) is about ${A_0^{4/5} b_0^{1/5}}$. Thus, for instance, a tsunami with initial amplitude of one metre at a depth of 2 kilometres can end up with a final amplitude of about 5 metres near shore, while still traveling at about ten metres per second (35 kilometres per hour, or 22 miles per hour), and we have all now seen the impact that can have when it hits shore.

While tsunamis are far too massive of an event to be able to control (at least in the deep ocean), we can at least model them mathematically, allowing one to predict their impact at various places along the coast with high accuracy. (For instance, here is a video of the NOAA’s model of the March 11 tsunami, which has matched up very well with subsequent measurements.) The full equations and numerical methods used to perform such models are somewhat sophisticated, but by making a large number of simplifying assumptions, it is relatively easy to come up with a rough model that already predicts the basic features of tsunami propagation, such as the velocity formula (1) and the amplitude proportionality law (2). I give this (standard) derivation below the fold. The argument will largely be heuristic in nature; there are very interesting analytic issues in actually justifying many of the steps below rigorously, but I will not discuss these matters here.

This week I am at the American Institute of Mathematics, as an organiser on a workshop on the universality phenomenon in random matrices. There have been a number of interesting discussions so far in this workshop. Percy Deift, in a lecture on universality for invariant ensembles, gave some applications of what he only half-jokingly termed “the most important identity in mathematics”, namely the formula

$\displaystyle \hbox{det}( 1 + AB ) = \hbox{det}(1 + BA)$

whenever ${A, B}$ are ${n \times k}$ and ${k \times n}$ matrices respectively (or more generally, ${A}$ and ${B}$ could be linear operators with sufficiently good spectral properties that make both sides equal). Note that the left-hand side is an ${n \times n}$ determinant, while the right-hand side is a ${k \times k}$ determinant; this formula is particularly useful when computing determinants of large matrices (or of operators), as one can often use it to transform such determinants into much smaller determinants. In particular, the asymptotic behaviour of ${n \times n}$ determinants as ${n \rightarrow \infty}$ can be converted via this formula to determinants of a fixed size (independent of ${n}$), which is often a more favourable situation to analyse. Unsurprisingly, this trick is particularly useful for understanding the asymptotic behaviour of determinantal processes.

There are many ways to prove the identity. One is to observe first that when ${A, B}$ are invertible square matrices of the same size, that ${1+BA}$ and ${1+AB}$ are conjugate to each other and thus clearly have the same determinant; a density argument then removes the invertibility hypothesis, and a padding-by-zeroes argument then extends the square case to the rectangular case. Another is to proceed via the spectral theorem, noting that ${AB}$ and ${BA}$ have the same non-zero eigenvalues.

By rescaling, one obtains the variant identity

$\displaystyle \hbox{det}( z + AB ) = z^{n-k} \hbox{det}(z + BA)$

which essentially relates the characteristic polynomial of ${AB}$ with that of ${BA}$. When ${n=k}$, a comparison of coefficients this already gives important basic identities such as ${\hbox{tr}(AB) = \hbox{tr}(BA)}$ and ${\hbox{det}(AB) = \hbox{det}(BA)}$; when ${n}$ is not equal to ${k}$, an inspection of the ${z^{n-k}}$ coefficient similarly gives the Cauchy-Binet formula (which, incidentally, is also useful when performing computations on determinantal processes).

Thanks to this formula (and with a crucial insight of Alice Guionnet), I was able to solve a problem (on outliers for the circular law) that I had in the back of my mind for a few months, and initially posed to me by Larry Abbott; I hope to talk more about this in a future post.

Today, though, I wish to talk about another piece of mathematics that emerged from an afternoon of free-form discussion that we managed to schedule within the AIM workshop. Specifically, we hammered out a heuristic model of the mesoscopic structure of the eigenvalues ${\lambda_1 \leq \ldots \leq \lambda_n}$ of the ${n \times n}$ Gaussian Unitary Ensemble (GUE), where ${n}$ is a large integer. As is well known, the probability density of these eigenvalues is given by the Ginebre distribution

$\displaystyle \frac{1}{Z_n} e^{-H(\lambda)}\ d\lambda$

where ${d\lambda = d\lambda_1 \ldots d\lambda_n}$ is Lebesgue measure on the Weyl chamber ${\{ (\lambda_1,\ldots,\lambda_n) \in {\bf R}^n: \lambda_1 \leq \ldots \leq \lambda_n \}}$, ${Z_n}$ is a constant, and the Hamiltonian ${H}$ is given by the formula

$\displaystyle H(\lambda_1,\ldots,\lambda_n) := \sum_{j=1}^n \frac{\lambda_j^2}{2} - 2 \sum_{1 \leq i < j \leq n} \log |\lambda_i-\lambda_j|.$

At the macroscopic scale of ${\sqrt{n}}$, the eigenvalues ${\lambda_j}$ are distributed according to the Wigner semicircle law

$\displaystyle \rho_{sc}(x) := \frac{1}{2\pi} (4-x^2)_+^{1/2}.$

Indeed, if one defines the classical location ${\gamma_i^{cl}}$ of the ${i^{th}}$ eigenvalue to be the unique solution in ${[-2\sqrt{n}, 2\sqrt{n}]}$ to the equation

$\displaystyle \int_{-2\sqrt{n}}^{\gamma_i^{cl}/\sqrt{n}} \rho_{sc}(x)\ dx = \frac{i}{n}$

then it is known that the random variable ${\lambda_i}$ is quite close to ${\gamma_i^{cl}}$. Indeed, a result of Gustavsson shows that, in the bulk region when ${\epsilon n < i < (1-\epsilon) n}$ for some fixed ${\epsilon > 0}$, ${\lambda_i}$ is distributed asymptotically as a gaussian random variable with mean ${\gamma_i^{cl}}$ and variance ${\sqrt{\frac{\log n}{\pi}} \times \frac{1}{\sqrt{n} \rho_{sc}(\gamma_i^{cl})}}$. Note that from the semicircular law, the factor ${\frac{1}{\sqrt{n} \rho_{sc}(\gamma_i^{cl})}}$ is the mean eigenvalue spacing.

At the other extreme, at the microscopic scale of the mean eigenvalue spacing (which is comparable to ${1/\sqrt{n}}$ in the bulk, but can be as large as ${n^{-1/6}}$ at the edge), the eigenvalues are asymptotically distributed with respect to a special determinantal point process, namely the Dyson sine process in the bulk (and the Airy process on the edge), as discussed in this previous post.

Here, I wish to discuss the mesoscopic structure of the eigenvalues, in which one involves scales that are intermediate between the microscopic scale ${1/\sqrt{n}}$ and the macroscopic scale ${\sqrt{n}}$, for instance in correlating the eigenvalues ${\lambda_i}$ and ${\lambda_j}$ in the regime ${|i-j| \sim n^\theta}$ for some ${0 < \theta < 1}$. Here, there is a surprising phenomenon; there is quite a long-range correlation between such eigenvalues. The result of Gustavsson shows that both ${\lambda_i}$ and ${\lambda_j}$ behave asymptotically like gaussian random variables, but a further result from the same paper shows that the correlation between these two random variables is asymptotic to ${1-\theta}$ (in the bulk, at least); thus, for instance, adjacent eigenvalues ${\lambda_{i+1}}$ and ${\lambda_i}$ are almost perfectly correlated (which makes sense, as their spacing is much less than either of their standard deviations), but that even very distant eigenvalues, such as ${\lambda_{n/4}}$ and ${\lambda_{3n/4}}$, have a correlation comparable to ${1/\log n}$. One way to get a sense of this is to look at the trace

$\displaystyle \lambda_1 + \ldots + \lambda_n.$

This is also the sum of the diagonal entries of a GUE matrix, and is thus normally distributed with a variance of ${n}$. In contrast, each of the ${\lambda_i}$ (in the bulk, at least) has a variance comparable to ${\log n/n}$. In order for these two facts to be consistent, the average correlation between pairs of eigenvalues then has to be of the order of ${1/\log n}$.

Below the fold, I give a heuristic way to see this correlation, based on Taylor expansion of the convex Hamiltonian ${H(\lambda)}$ around the minimum ${\gamma}$, which gives a conceptual probabilistic model for the mesoscopic structure of the GUE eigenvalues. While this heuristic is in no way rigorous, it does seem to explain many of the features currently known or conjectured about GUE, and looks likely to extend also to other models.

One of the key difficulties in performing analysis in infinite-dimensional function spaces, as opposed to finite-dimensional vector spaces, is that the Bolzano-Weierstrass theorem no longer holds: a bounded sequence in an infinite-dimensional function space need not have any convergent subsequences (when viewed using the strong topology). To put it another way, the closed unit ball in an infinite-dimensional function space usually fails to be (sequentially) compact.

As compactness is such a useful property to have in analysis, various tools have been developed over the years to try to salvage some sort of substitute for the compactness property in infinite-dimensional spaces. One of these tools is concentration compactness, which was discussed previously on this blog. This can be viewed as a compromise between weak compactness (which is true in very general circumstances, but is often too weak for applications) and strong compactness (which would be very useful in applications, but is usually false), in which one obtains convergence in an intermediate sense that involves a group of symmetries acting on the function space in question.

Concentration compactness is usually stated and proved in the language of standard analysis: epsilons and deltas, limits and supremas, and so forth. In this post, I wanted to note that one could also state and prove the basic foundations of concentration compactness in the framework of nonstandard analysis, in which one now deals with infinitesimals and ultralimits instead of epsilons and ordinary limits. This is a fairly mild change of viewpoint, but I found it to be informative to view this subject from a slightly different perspective. The nonstandard proofs require a fair amount of general machinery to set up, but conversely, once all the machinery is up and running, the proofs become slightly shorter, and can exploit tools from (standard) infinitary analysis, such as orthogonal projections in Hilbert spaces, or the continuous-pure point decomposition of measures. Because of the substantial amount of setup required, nonstandard proofs tend to have significantly more net complexity than their standard counterparts when it comes to basic results (such as those presented in this post), but the gap between the two narrows when the results become more difficult, and for particularly intricate and deep results it can happen that nonstandard proofs end up being simpler overall than their standard analogues, particularly if the nonstandard proof is able to tap the power of some existing mature body of infinitary mathematics (e.g. ergodic theory, measure theory, Hilbert space theory, or topological group theory) which is difficult to directly access in the standard formulation of the argument.

Hans Lindblad and I have just uploaded to the arXiv our joint paper “Asymptotic decay for a one-dimensional nonlinear wave equation“, submitted to Analysis & PDE.  This paper, to our knowledge, is the first paper to analyse the asymptotic behaviour of the one-dimensional defocusing nonlinear wave equation

${}-u_{tt}+u_{xx} = |u|^{p-1} u$ (1)

where $u: {\bf R} \times {\bf R} \to {\bf R}$ is the solution and $p>1$ is a fixed exponent.  Nowadays, this type of equation is considered a very simple example of a non-linear wave equation (there is only one spatial dimension, the equation is semilinear, the conserved energy is positive definite and coercive, and there are no derivatives in the nonlinear term), and indeed it is not difficult to show that any solution whose conserved energy

$E[u] := \int_{{\bf R}} \frac{1}{2} |u_t|^2 + \frac{1}{2} |u_x|^2 + \frac{1}{p+1} |u|^{p+1}\ dx$

is finite, will exist globally for all time (and remain finite energy, of course).  In particular, from the one-dimensional Gagliardo-Nirenberg inequality (a variant of the Sobolev embedding theorem), such solutions will remain uniformly bounded in $L^\infty_x({\bf R})$ for all time.

However, this leaves open the question of the asymptotic behaviour of such solutions in the limit as $t \to \infty$.  In higher dimensions, there are a variety of scattering and asymptotic completeness results which show that solutions to nonlinear wave equations such as (1) decay asymptotically in various senses, at least if one is in the perturbative regime in which the solution is assumed small in some sense (e.g. small energy).  For instance, a typical result might be that spatial norms such as $\|u(t)\|_{L^q({\bf R})}$ might go to zero (in an average sense, at least).   In general, such results for nonlinear wave equations are ultimately based on the fact that the linear wave equation in higher dimensions also enjoys an analogous decay as $t \to +\infty$, as linear waves in higher dimensions spread out and disperse over time.  (This can be formalised by decay estimates on the fundamental solution of the linear wave equation, or by basic estimates such as the (long-time) Strichartz estimates and their relatives.)  The idea is then to view the nonlinear wave equation as a perturbation of the linear one.

On the other hand, the solution to the linear one-dimensional wave equation

$-u_{tt} + u_{xx} = 0$ (2)

does not exhibit any decay in time; as one learns in an undergraduate PDE class, the general (finite energy) solution to such an equation is given by the superposition of two travelling waves,

$u(t,x) = f(x+t) + g(x-t)$ (3)

where $f$ and $g$ also have finite energy, so in particular norms such as $\|u(t)\|_{L^\infty_x({\bf R})}$ cannot decay to zero as $t \to \infty$ unless the solution is completely trivial.

Nevertheless, we were able to establish a nonlinear decay effect for equation (1), caused more by the nonlinear right-hand side of (1) than by the linear left-hand side, to obtain $L^\infty_x({\bf R})$ decay on the average:

Theorem 1. (Average $L^\infty_x$ decay) If $u$ is a finite energy solution to (1), then $\frac{1}{2T} \int_{-T}^T \|u(t)\|_{L^\infty_x({\bf R})}$ tends to zero as $T \to \infty$.

Actually we prove a slightly stronger statement than Theorem 1, in that the decay is uniform among all solutions with a given energy bound, but I will stick to the above formulation of the main result for simplicity.

Informally, the reason for the nonlinear decay is as follows.  The linear evolution tries to force waves to move at constant velocity (indeed, from (3) we see that linear waves move at the speed of light $c=1$).  But the defocusing nature of the nonlinearity will spread out any wave that is propagating along a constant velocity worldline.  This intuition can be formalised by a Morawetz-type energy estimate that shows that the nonlinear potential energy must decay along any rectangular slab of spacetime (that represents the neighbourhood of a constant velocity worldline).

Now, just because the linear wave equation propagates along constant velocity worldlines, this does not mean that the nonlinear wave equation does too; one could imagine that a wave packet could propagate along a more complicated trajectory $t \mapsto x(t)$ in which the velocity $x'(t)$ is not constant.  However, energy methods still force the solution of the nonlinear wave equation to obey finite speed of propagation, which in the wave packet context means (roughly speaking) that the nonlinear trajectory $t \mapsto x(t)$ is a Lipschitz continuous function (with Lipschitz constant at most $1$).

And now we deploy a trick which appears to be new to the field of nonlinear wave equations: we invoke the Rademacher differentiation theorem (or Lebesgue differentiation theorem), which asserts that Lipschitz continuous functions are almost everywhere differentiable.  (By coincidence, I am teaching this theorem in my current course, both in one dimension (which is the case of interest here) and in higher dimensions.)  A compactness argument allows one to extract a quantitative estimate from this theorem (cf. this earlier blog post of mine) which, roughly speaking, tells us that there are large portions of the trajectory $t \mapsto x(t)$ which behave approximately linearly at an appropriate scale.  This turns out to be a good enough control on the trajectory that one can apply the Morawetz inequality and rule out the existence of persistent wave packets over long periods of time, which is what leads to Theorem 1.

There is still scope for further work to be done on the asymptotics.  In particular, we still do not have a good understanding of what the asymptotic profile of the solution should be, even in the perturbative regime; standard nonlinear geometric optics methods do not appear to work very well due to the extremely weak decay.

In my previous post, I briefly discussed the work of the four Fields medalists of 2010 (Lindenstrauss, Ngo, Smirnov, and Villani). In this post I will discuss the work of Dan Spielman (winner of the Nevanlinna prize), Yves Meyer (winner of the Gauss prize), and Louis Nirenberg (winner of the Chern medal). Again by chance, the work of all three of the recipients overlaps to some extent with my own areas of expertise, so I will be able to discuss a sample contribution for each of them. Again, my choice of contribution is somewhat idiosyncratic and is not intended to represent the “best” work of each of the awardees.