You are currently browsing the category archive for the ‘math.MP’ category.

Mathematicians study a variety of different mathematical structures, but perhaps the structures that are most commonly associated with mathematics are the number systems, such as the integers ${{\bf Z}}$ or the real numbers ${{\bf R}}$. Indeed, the use of number systems is so closely identified with the practice of mathematics that one sometimes forgets that it is possible to do mathematics without explicit reference to any concept of number. For instance, the ancient Greeks were able to prove many theorems in Euclidean geometry, well before the development of Cartesian coordinates and analytic geometry in the seventeenth century, or the formal constructions or axiomatisations of the real number system that emerged in the nineteenth century (not to mention precursor concepts such as zero or negative numbers, whose very existence was highly controversial, if entertained at all, to the ancient Greeks). To do this, the Greeks used geometric operations as substitutes for the arithmetic operations that would be more familiar to modern mathematicians. For instance, concatenation of line segments or planar regions serves as a substitute for addition; the operation of forming a rectangle out of two line segments would serve as a substitute for multiplication; the concept of similarity can be used as a substitute for ratios or division; and so forth.

A similar situation exists in modern physics. Physical quantities such as length, mass, momentum, charge, and so forth are routinely measured and manipulated using the real number system ${{\bf R}}$ (or related systems, such as ${{\bf R}^3}$ if one wishes to measure a vector-valued physical quantity such as velocity). Much as analytic geometry allows one to use the laws of algebra and trigonometry to calculate and prove theorems in geometry, the identification of physical quantities with numbers allows one to express physical laws and relationships (such as Einstein’s famous mass-energy equivalence ${E=mc^2}$) as algebraic (or differential) equations, which can then be solved and otherwise manipulated through the extensive mathematical toolbox that has been developed over the centuries to deal with such equations.

However, as any student of physics is aware, most physical quantities are not represented purely by one or more numbers, but instead by a combination of a number and some sort of unit. For instance, it would be a category error to assert that the length of some object was a number such as ${10}$; instead, one has to say something like “the length of this object is ${10}$ yards”, combining both a number ${10}$ and a unit (in this case, the yard). Changing the unit leads to a change in the numerical value assigned to this physical quantity, even though no physical change to the object being measured has occurred. For instance, if one decides to use feet as the unit of length instead of yards, then the length of the object is now ${30}$ feet; if one instead uses metres, the length is now ${9.144}$ metres; and so forth. But nothing physical has changed when performing this change of units, and these lengths are considered all equal to each other:

$\displaystyle 10 \hbox{ yards } = 30 \hbox{ feet } = 9.144 \hbox{ metres}.$

It is then common to declare that while physical quantities and units are not, strictly speaking, numbers, they should be manipulated using the laws of algebra as if they were numerical quantities. For instance, if an object travels ${10}$ metres in ${5}$ seconds, then its speed should be

$\displaystyle (10 m) / (5 s) = 2 ms^{-1}$

where we use the usual abbreviations of ${m}$ and ${s}$ for metres and seconds respectively. Similarly, if the speed of light ${c}$ is ${c=299 792 458 ms^{-1}}$ and an object has mass ${10 kg}$, then Einstein’s mass-energy equivalence ${E=mc^2}$ then tells us that the energy-content of this object is

$\displaystyle (10 kg) (299 792 458 ms^{-1})^2 \approx 8.99 \times 10^{17} kg m^2 s^{-2}.$

Note that the symbols ${kg, m, s}$ are being manipulated algebraically as if they were mathematical variables such as ${x}$ and ${y}$. By collecting all these units together, we see that every physical quantity gets assigned a unit of a certain dimension: for instance, we see here that the energy ${E}$ of an object can be given the unit of ${kg m^2 s^{-2}}$ (more commonly known as a Joule), which has the dimension of ${M L^2 T^{-2}}$ where ${M, L, T}$ are the dimensions of mass, length, and time respectively.

There is however one important limitation to the ability to manipulate “dimensionful” quantities as if they were numbers: one is not supposed to add, subtract, or compare two physical quantities if they have different dimensions, although it is acceptable to multiply or divide two such quantities. For instance, if ${m}$ is a mass (having the units ${M}$) and ${v}$ is a speed (having the units ${LT^{-1}}$), then it is physically “legitimate” to form an expression such as ${\frac{1}{2} mv^2}$, but not an expression such as ${m+v}$ or ${m-v}$; in a similar spirit, statements such as ${m=v}$ or ${m\geq v}$ are physically meaningless. This combines well with the mathematical distinction between vector, scalar, and matrix quantities, which among other things prohibits one from adding together two such quantities if their vector or matrix type are different (e.g. one cannot add a scalar to a vector, or a vector to a matrix), and also places limitations on when two such quantities can be multiplied together. A related limitation, which is not always made explicit in physics texts, is that transcendental mathematical functions such as ${\sin}$ or ${\exp}$ should only be applied to arguments that are dimensionless; thus, for instance, if ${v}$ is a speed, then ${\hbox{arctanh}(v)}$ is not physically meaningful, but ${\hbox{arctanh}(v/c)}$ is (this particular quantity is known as the rapidity associated to this speed).

These limitations may seem like a weakness in the mathematical modeling of physical quantities; one may think that one could get a more “powerful” mathematical framework if one were allowed to perform dimensionally inconsistent operations, such as add together a mass and a velocity, add together a vector and a scalar, exponentiate a length, etc. Certainly there is some precedent for this in mathematics; for instance, the formalism of Clifford algebras does in fact allow one to (among other things) add vectors with scalars, and in differential geometry it is quite common to formally apply transcendental functions (such as the exponential function) to a differential form (for instance, the Liouville measure ${\frac{1}{n!} \omega^n}$ of a symplectic manifold can be usefully thought of as a component of the exponential ${\exp(\omega)}$ of the symplectic form ${\omega}$).

However, there are several reasons why it is advantageous to retain the limitation to only perform dimensionally consistent operations. One is that of error correction: one can often catch (and correct for) errors in one’s calculations by discovering a dimensional inconsistency, and tracing it back to the first step where it occurs. Also, by performing dimensional analysis, one can often identify the form of a physical law before one has fully derived it. For instance, if one postulates the existence of a mass-energy relationship involving only the mass of an object ${m}$, the energy content ${E}$, and the speed of light ${c}$, dimensional analysis is already sufficient to deduce that the relationship must be of the form ${E = \alpha mc^2}$ for some dimensionless absolute constant ${\alpha}$; the only remaining task is then to work out the constant of proportionality ${\alpha}$, which requires physical arguments beyond that provided by dimensional analysis. (This is a simple instance of a more general application of dimensional analysis known as the Buckingham ${\pi}$ theorem.)

The use of units and dimensional analysis has certainly been proven to be very effective tools in physics. But one can pose the question of whether it has a properly grounded mathematical foundation, in order to settle any lingering unease about using such tools in physics, and also in order to rigorously develop such tools for purely mathematical purposes (such as analysing identities and inequalities in such fields of mathematics as harmonic analysis or partial differential equations).

The example of Euclidean geometry mentioned previously offers one possible approach to formalising the use of dimensions. For instance, one could model the length of a line segment not by a number, but rather by the equivalence class of all line segments congruent to the original line segment (cf. the Frege-Russell definition of a number). Similarly, the area of a planar region can be modeled not by a number, but by the equivalence class of all regions that are equidecomposable with the original region (one can, if one wishes, restrict attention here to measurable sets in order to avoid Banach-Tarski-type paradoxes, though that particular paradox actually only arises in three and higher dimensions). As mentioned before, it is then geometrically natural to multiply two lengths to form an area, by taking a rectangle whose line segments have the stated lengths, and using the area of that rectangle as a product. This geometric picture works well for units such as length and volume that have a spatial geometric interpretation, but it is less clear how to apply it for more general units. For instance, it does not seem geometrically natural (or, for that matter, conceptually helpful) to envision the equation ${E=mc^2}$ as the assertion that the energy ${E}$ is the volume of a rectangular box whose height is the mass ${m}$ and whose length and width is given by the speed of light ${c}$.

But there are at least two other ways to formalise dimensionful quantities in mathematics, which I will discuss below the fold. The first is a “parametric” model in which dimensionful objects are modeled as numbers (or vectors, matrices, etc.) depending on some base dimensional parameters (such as units of length, mass, and time, or perhaps a coordinate system for space or spacetime), and transforming according to some representation of a structure group that encodes the range of these parameters; this type of “coordinate-heavy” model is often used (either implicitly or explicitly) by physicists in order to efficiently perform calculations, particularly when manipulating vector or tensor-valued quantities. The second is an “abstract” model in which dimensionful objects now live in an abstract mathematical space (e.g. an abstract vector space), in which only a subset of the operations available to general-purpose number systems such as ${{\bf R}}$ or ${{\bf R}^3}$ are available, namely those operations which are “dimensionally consistent” or invariant (or more precisely, equivariant) with respect to the action of the underlying structure group. This sort of “coordinate-free” approach tends to be the one which is preferred by pure mathematicians, particularly in the various branches of modern geometry, in part because it can lead to greater conceptual clarity, as well as results of great generality; it is also close to the more informal practice of treating mathematical manipulations that do not preserve dimensional consistency as being physically meaningless.

Things are pretty quiet here during the holiday season, but one small thing I have been working on recently is a set of notes on special relativity that I will be working through in a few weeks with some bright high school students here at our local math circle.  I have only two hours to spend with this group, and it is unlikely that we will reach the end of the notes (in which I derive the famous mass-energy equivalence relation E=mc^2, largely following Einstein’s original derivation as discussed in this previous blog post); instead we will probably spend a fair chunk of time on related topics which do not actually require special relativity per se, such as spacetime diagrams, the Doppler shift effect, and an analysis of my airport puzzle.  This will be my first time doing something of this sort (in which I will be spending as much time interacting directly with the students as I would lecturing);  I’m not sure exactly how it will play out, being a little outside of my usual comfort zone of undergraduate and graduate teaching, but am looking forward to finding out how it goes.   (In particular, it may end up that the discussion deviates somewhat from my prepared notes.)

The material covered in my notes is certainly not new, but I ultimately decided that it was worth putting up here in case some readers here had any corrections or other feedback to contribute (which, as always, would be greatly appreciated).

[Dec 24 and then Jan 21: notes updated, in response to comments.]

Way back in 2007, I wrote a blog post giving Einstein’s derivation of his famous equation ${E=mc^2}$ for the rest energy of a body with mass ${m}$. (Throughout this post, mass is used to refer to the invariant mass (also known as rest mass) of an object.) This derivation used a number of physical assumptions, including the following:

1. The two postulates of special relativity: firstly, that the laws of physics are the same in every inertial reference frame, and secondly that the speed of light in vacuum is equal ${c}$ in every such inertial frame.
2. Planck’s law and de Broglie’s law for photons, relating the frequency, energy, and momentum of such photons together.
3. The law of conservation of energy, and the law of conservation of momentum, as well as the additivity of these quantities (i.e. the energy of a system is the sum of the energy of its components, and similarly for momentum).
4. The Newtonian approximations ${E \approx E_0 + \frac{1}{2} m|v|^2}$, ${p \approx m v}$ to energy and momentum at low velocities.

The argument was one-dimensional in nature, in the sense that only one of the three spatial dimensions was actually used in the proof.

As was pointed out in comments in the previous post by Laurens Gunnarsen, this derivation has the curious feature of needing some laws from quantum mechanics (specifically, the Planck and de Broglie laws) in order to derive an equation in special relativity (which does not ostensibly require quantum mechanics). One can then ask whether one can give a derivation that does not require such laws. As pointed out in previous comments, one can use the representation theory of the Lorentz group ${SO(d,1)}$ to give a nice derivation that avoids any quantum mechanics, but it now needs at least two spatial dimensions instead of just one. I decided to work out this derivation in a way that does not explicitly use representation theory (although it is certainly lurking beneath the surface). The concept of momentum is only barely used in this derivation, and the main ingredients are now reduced to the following:

1. The two postulates of special relativity;
2. The law of conservation of energy (and the additivity of energy);
3. The Newtonian approximation ${E \approx E_0 + \frac{1}{2} m|v|^2}$ at low velocities.

The argument (which uses a little bit of calculus, but is otherwise elementary) is given below the fold. Whereas Einstein’s original argument considers a mass emitting two photons in several different reference frames, the argument here considers a large mass breaking up into two equal smaller masses. Viewing this situation in different reference frames gives a functional equation for the relationship between energy, mass, and velocity, which can then be solved using some calculus, using the Newtonian approximation as a boundary condition, to give the famous ${E=mc^2}$ formula.

Disclaimer: As with the previous post, the arguments here are physical arguments rather than purely mathematical ones, and thus do not really qualify as a rigorous mathematical argument, due to the implicit use of a number of physical and metaphysical hypotheses beyond the ones explicitly listed above. (But it would be difficult to say anything non-tautological at all about the physical world if one could rely solely on ${100\%}$ rigorous mathematical reasoning.)

A few days ago, I released a preprint entitled “Localisation and compactness properties of the Navier-Stokes global regularity problem“, discussed in this previous blog post.  As it turns out, I was somewhat impatient to finalise the paper and move on to other things, and the original preprint was still somewhat rough in places (contradicting my own advice on this matter), with a number of typos of minor to moderate severity.  But a bit more seriously, I discovered on a further proofreading that there was a subtle error in a component of the argument that I had believed to be routine – namely the persistence of higher regularity for mild solutions.   As a consequence, some of the implications stated in the first version were not exactly correct as stated; but they can be repaired by replacing a “bad” notion of global regularity for a certain class of data with a “good” notion.   I have completed (and proofread) an updated version of the ms, which should appear at the arXiv link of the paper in a day or two (and which I have also placed at this link).  (In the meantime, it is probably best not to read the original ms too carefully, as this could lead to some confusion.)   I’ve also added a new section that shows that, due to this technicality, one can exhibit smooth $H^1$ initial data to the Navier-Stokes equation for which there are no smooth solutions, which superficially sounds very close to a negative solution to the global regularity problem, but is actually nothing of the sort.

Let me now describe the issue in more detail (and also to explain why I missed it previously).  A standard principle in the theory of evolutionary partial differentiation equations is that regularity in space can be used to imply regularity in time.  To illustrate this, consider a solution $u$ to the supercritical nonlinear wave equation

$-\partial_{tt} u + \Delta u = u^7$  (1)

for some field $u: {\bf R} \times {\bf R}^3 \to {\bf R}$.   Suppose one already knew that $u$ had some regularity in space, and in particular the $C^0_t C^2_x \cap C^1_t C^1_x$ norm of $u$ was bounded (thus $u$ and up to two spatial derivatives of $u$ were bounded).  Then, by (1), we see that two time derivatives of $u$ were also bounded, and one then gets the additional regularity of $C^2_t C^0_x$.

In a similar vein, suppose one initially knew that $u$ had the regularity $C^0_t C^3_x \cap C^1_t C^2_x$.  Then (1) soon tells us that $u$ also has the regularity $C^2_t C^1_x$; then, if one differentiates (1) in time to obtain

$-\partial_{ttt} u + \Delta \partial_t u = 7 u^6 \partial_t u$

one can conclude that $u$ also has the regularity of $C^3_t C^0_x$.  One can continue this process indefinitely; in particular, if one knew that $u \in C^0_t C^\infty_x \cap C^1_t C^\infty_x$, then these sorts of manipulations show that $u$ is infinitely smooth in both space and time.

The issue that caught me by surprise is that for the Navier-Stokes equations

$\partial_t u + (u \cdot \nabla) u =\Delta u -\nabla p$  (2)

$\nabla \cdot u = 0$

(setting the forcing term $f$ equal to zero for simplicity), infinite regularity in space does not automatically imply infinite regularity in time, even if one assumes the initial data lies in a standard function space such as the Sobolev space $H^1_x({\bf R}^3)$.  The problem lies with the pressure term $p$, which is recovered from the velocity via the elliptic equation

$\Delta p = -\nabla^2 \cdot (u \otimes u)$ (3)

that can be obtained by taking the divergence of (2).   This equation is solved by a non-local integral operator:

$\displaystyle p(t,x) = \int_{{\bf R}^3} \frac{\nabla^2 \cdot (u \otimes u)(t,y)}{4\pi |x-y|}\ dy.$

If, say, $u$ lies in $H^1_x({\bf R}^3)$, then there is no difficulty establishing a bound on $p$ in terms of $u$ (for instance, one can use singular integral theory and Sobolev embedding to place $p$ in $L^3_x({\bf R}^3)$.  However, one runs into difficulty when trying to compute time derivatives of $p$.  Differentiating (3) once, one gets

$\Delta \partial_t p = -2\nabla^2 \cdot (u \otimes \partial_t u)$.

At the regularity of $H^1$, one can still (barely) control this quantity by using (2) to expand out $\partial_t u$ and using some integration by parts.  But when one wishes to compute a second time derivative of the pressure, one obtains (after integration by parts) an expansion of the form

$\Delta \partial_{tt} p = -4\nabla^2 \cdot (\Delta u \otimes \Delta u) + \ldots$

and now there is not enough regularity on $u$ available to get any control on $\partial_{tt} p$, even if one assumes that $u$ is smooth.   Indeed, following this observation, I was able to show that given generic smooth $H^1$ data, the pressure $p$ will instantaneously fail to be $C^2$ in time, and thence (by (2)) the velocity will instantaneously fail to be $C^3$ in time.  (Switching to the vorticity formulation buys one further degree of time differentiability, but does not fully eliminate the problem; the vorticity $\omega$ will fail to be $C^4$ in time.  Switching to material coordinates seems to makes things very slightly better, but I believe there is still a breakdown of time regularity in these coordinates also.)

For later times t>0 (and assuming homogeneous data f=0 for simplicity), this issue no longer arises, because of the instantaneous smoothing effect of the Navier-Stokes flow, which for instance will upgrade $H^1_x$ regularity to $H^\infty_x$ regularity instantaneously.  It is only the initial time at which some time irregularity can occur.

This breakdown of regularity does not actually impact the original formulation of the Clay Millennium Prize problem, though, because in that problem the initial velocity is required to be Schwartz class (so all derivatives are rapidly decreasing).  In this class, the regularity theory works as expected; if one has a solution which already has some reasonable regularity (e.g. a mild $H^1$ solution) and the data is Schwartz, then the solution will be smooth in spacetime.   (Another class where things work as expected is when the vorticity is Schwartz; in such cases, the solution remains smooth in both space and time (for short times, at least), and the Schwartz nature of the vorticity is preserved (because the vorticity is subject to fewer non-local effects than the velocity, as it is not directly affected by the pressure).)

This issue means that one of the implications in the original paper (roughly speaking, that global regularity for Schwartz data implies global regularity for smooth $H^1$ data) is not correct as stated.  But this can be fixed by weakening the notion of global regularity in the latter setting, by limiting the amount of time differentiability available at the initial time.  More precisely, call a solution $u: [0,T] \times {\bf R}^3 \to {\bf R}^3$ and $p: [0,T] \times {\bf R}^3 \to {\bf R}$ almost smooth if

• $u$ and $p$ are smooth on the half-open slab $(0,T] \times {\bf R}^3$; and
• For every $k \geq 0$, $\nabla^k_x u, \nabla^k_x p, \nabla^x_u \partial_t u$ exist and are continuous on the full slab $[0,T] \times {\bf R}^3$.

Thus, an almost smooth solution is the same concept as a smooth solution, except that at time zero, the velocity field is only $C^1_t C^\infty_x$, and the pressure field is only $C^0_t C^\infty_x$.  This is still enough regularity to interpret the Navier-Stokes equation (2) in a classical manner, but falls slightly short of full smoothness.

(I had already introduced this notion of almost smoothness in the more general setting of smooth finite energy solutions in the first draft of this paper, but had failed to realise that it was also necessary in the smooth $H^1$ setting also.)

One can now “fix” the global regularity conjectures for Navier-Stokes in the smooth $H^1$ or smooth finite energy setting by requiring the solutions to merely be almost smooth instead of smooth.  Once one does so, the results in my paper then work as before: roughly speaking, if one knows that Schwartz data produces smooth solutions, one can conclude that smooth $H^1$ or smooth finite energy data produces almost smooth solutions (and the paper now contains counterexamples to show that one does not always have smooth solutions in this category).

The diagram of implications between conjectures has been adjusted to reflect this issue, and now reads as follows:

I’ve just uploaded to the arXiv my paper “Localisation and compactness properties of the Navier-Stokes global regularity problem“, submitted to Analysis and PDE. This paper concerns the global regularity problem for the Navier-Stokes system of equations

$\displaystyle \partial_t u + (u \cdot \nabla) u = \Delta u - \nabla p + f \ \ \ \ \ (1)$

$\displaystyle \nabla \cdot u = 0 \ \ \ \ \ (2)$

$\displaystyle u(0,\cdot) = u_0 \ \ \ \ \ (3)$

in three dimensions. Thus, we specify initial data ${(u_0,f,T)}$, where ${0 < T < \infty}$ is a time, ${u_0: {\bf R}^3 \rightarrow {\bf R}^3}$ is the initial velocity field (which, in order to be compatible with (2), (3), is required to be divergence-free), ${f: [0,T] \times {\bf R}^3 \rightarrow {\bf R}^3}$ is the forcing term, and then seek to extend this initial data to a solution ${(u,p,u_0,f,T)}$ with this data, where the velocity field ${u: [0,T] \times {\bf R}^3 \rightarrow {\bf R}^3}$ and pressure term ${p: [0,T] \times {\bf R}^3 \rightarrow {\bf R}}$ are the unknown fields.

Roughly speaking, the global regularity problem asserts that given every smooth set of initial data ${(u_0,f,T)}$, there exists a smooth solution ${(u,p,u_0,f,T)}$ to the Navier-Stokes equation with this data. However, this is not a good formulation of the problem because it does not exclude the possibility that one or more of the fields ${u_0, f, u, p}$ grows too fast at spatial infinity. This problem is evident even for the much simpler heat equation

$\displaystyle \partial_t u = \Delta u$

$\displaystyle u(0,\cdot) = u_0.$

As long as one has some mild conditions at infinity on the smooth initial data ${u_0: {\bf R}^3 \rightarrow {\bf R}}$ (e.g. polynomial growth at spatial infinity), then one can solve this equation using the fundamental solution of the heat equation:

$\displaystyle u(t,x) = \frac{1}{(4\pi t)^{3/2}} \int_{{\bf R}^3} u_0(y) e^{-|x-y|^2/4t}\ dy.$

If furthermore ${u}$ is a tempered distribution, one can use Fourier-analytic methods to show that this is the unique solution to the heat equation with this data. But once one allows sufficiently rapid growth at spatial infinity, existence and uniqueness can break down. Consider for instance the backwards heat kernel

$\displaystyle u(t,x) = \frac{1}{(4\pi(T-t))^{3/2}} e^{|x|^2/(T-t)}$

for some ${T>0}$, which is smooth (albeit exponentially growing) at time zero, and is a smooth solution to the heat equation for ${0 \leq t < T}$, but develops a dramatic singularity at time ${t=T}$. A famous example of Tychonoff from 1935, based on a power series construction, also shows that uniqueness for the heat equation can also fail once growth conditions are removed. An explicit example of non-uniqueness for the heat equation is given by the contour integral

$\displaystyle u(t,x_1,x_2,x_3) = \int_\gamma \exp(e^{\pi i/4} x_1 z + e^{5\pi i/8} z^{3/2} - itz^2)\ dz$

where ${\gamma}$ is the ${L}$-shaped contour consisting of the positive real axis and the upper imaginary axis, with ${z^{3/2}}$ being interpreted with the standard branch (with cut on the negative axis). One can show by contour integration that this function solves the heat equation and is smooth (but rapidly growing at infinity), and vanishes for ${t<0}$, but is not identically zero for ${t>0}$.

Thus, in order to obtain a meaningful (and physically realistic) problem, one needs to impose some decay (or at least limited growth) hypotheses on the data ${u_0,f}$ and solution ${u,p}$ in addition to smoothness. For the data, one can impose a variety of such hypotheses, including the following:

• (Finite energy data) One has ${\|u_0\|_{L^2_x({\bf R}^3)} < \infty}$ and ${\| f \|_{L^\infty_t L^2_x([0,T] \times {\bf R}^3)} < \infty}$.
• (${H^1}$ data) One has ${\|u_0\|_{H^1_x({\bf R}^3)} < \infty}$ and ${\| f \|_{L^\infty_t H^1_x([0,T] \times {\bf R}^3)} < \infty}$.
• (Schwartz data) One has ${\sup_{x \in {\bf R}^3} ||x|^m \nabla_x^k u_0(x)| < \infty}$ and ${\sup_{(t,x) \in [0,T] \times {\bf R}^3} ||x|^m \nabla_x^k \partial_t^l f(t,x)| < \infty}$ for all ${m,k,l \geq 0}$.
• (Periodic data) There is some ${0 < L < \infty}$ such that ${u_0(x+Lk) = u_0(x)}$ and ${f(t,x+Lk) = f(t,x)}$ for all ${(t,x) \in [0,T] \times {\bf R}^3}$ and ${k \in {\bf Z}^3}$.
• (Homogeneous data) ${f=0}$.

Note that smoothness alone does not necessarily imply finite energy, ${H^1}$, or the Schwartz property. For instance, the (scalar) function ${u(x) = \exp( i |x|^{10} ) (1+|x|)^{-2}}$ is smooth and finite energy, but not in ${H^1}$ or Schwartz. Periodicity is of course incompatible with finite energy, ${H^1}$, or the Schwartz property, except in the trivial case when the data is identically zero.

Similarly, one can impose conditions at spatial infinity on the solution, such as the following:

• (Finite energy solution) One has ${\| u \|_{L^\infty_t L^2_x([0,T] \times {\bf R}^3)} < \infty}$.
• (${H^1}$ solution) One has ${\| u \|_{L^\infty_t H^1_x([0,T] \times {\bf R}^3)} < \infty}$ and ${\| u \|_{L^2_t H^2_x([0,T] \times {\bf R}^3)} < \infty}$.
• (Partially periodic solution) There is some ${0 < L < \infty}$ such that ${u(t,x+Lk) = u(t,x)}$ for all ${(t,x) \in [0,T] \times {\bf R}^3}$ and ${k \in {\bf Z}^3}$.
• (Fully periodic solution) There is some ${0 < L < \infty}$ such that ${u(t,x+Lk) = u(t,x)}$ and ${p(t,x+Lk) = p(t,x)}$ for all ${(t,x) \in [0,T] \times {\bf R}^3}$ and ${k \in {\bf Z}^3}$.

(The ${L^2_t H^2_x}$ component of the ${H^1}$ solution is for technical reasons, and should not be paid too much attention for this discussion.) Note that we do not consider the notion of a Schwartz solution; as we shall see shortly, this is too restrictive a concept of solution to the Navier-Stokes equation.

Finally, one can downgrade the regularity of the solution down from smoothness. There are many ways to do so; two such examples include

• (${H^1}$ mild solutions) The solution is not smooth, but is ${H^1}$ (in the preceding sense) and solves the equation (1) in the sense that the Duhamel formula

$\displaystyle u(t) = e^{t\Delta} u_0 + \int_0^t e^{(t-t')\Delta} (-(u\cdot\nabla) u-\nabla p+f)(t')\ dt'$

holds.

• (Leray-Hopf weak solution) The solution ${u}$ is not smooth, but lies in ${L^\infty_t L^2_x \cap L^2_t H^1_x}$, solves (1) in the sense of distributions (after rewriting the system in divergence form), and obeys an energy inequality.

Finally, one can ask for two types of global regularity results on the Navier-Stokes problem: a qualitative regularity result, in which one merely provides existence of a smooth solution without any explicit bounds on that solution, and a quantitative regularity result, which provides bounds on the solution in terms of the initial data, e.g. a bound of the form

$\displaystyle \| u \|_{L^\infty_t H^1_x([0,T] \times {\bf R}^3)} \leq F( \|u_0\|_{H^1_x({\bf R}^3)} + \|f\|_{L^\infty_t H^1_x([0,T] \times {\bf R}^3)}, T )$

for some function ${F: {\bf R}^+ \times {\bf R}^+ \rightarrow {\bf R}^+}$. One can make a further distinction between local quantitative results, in which ${F}$ is allowed to depend on ${T}$, and global quantitative results, in which there is no dependence on ${T}$ (the latter is only reasonable though in the homogeneous case, or if ${f}$ has some decay in time).

By combining these various hypotheses and conclusions, we see that one can write down quite a large number of slightly different variants of the global regularity problem. In the official formulation of the regularity problem for the Clay Millennium prize, a positive correct solution to either of the following two problems would be accepted for the prize:

• Conjecture 1.4 (Qualitative regularity for homogeneous periodic data) If ${(u_0,0,T)}$ is periodic, smooth, and homogeneous, then there exists a smooth partially periodic solution ${(u,p,u_0,0,T)}$ with this data.
• Conjecture 1.3 (Qualitative regularity for homogeneous Schwartz data) If ${(u_0,0,T)}$ is Schwartz and homogeneous, then there exists a smooth finite energy solution ${(u,p,u_0,0,T)}$ with this data.

(The numbering here corresponds to the numbering in the paper.)

Furthermore, a negative correct solution to either of the following two problems would also be accepted for the prize:

• Conjecture 1.6 (Qualitative regularity for periodic data) If ${(u_0,f,T)}$ is periodic and smooth, then there exists a smooth partially periodic solution ${(u,p,u_0,f,T)}$ with this data.
• Conjecture 1.5 (Qualitative regularity for Schwartz data) If ${(u_0,f,T)}$ is Schwartz, then there exists a smooth finite energy solution ${(u,p,u_0,f,T)}$ with this data.

I am not announcing any major progress on these conjectures here. What my paper does study, though, is the question of whether the answer to these conjectures is somehow sensitive to the choice of formulation. For instance:

1. Note in the periodic formulations of the Clay prize problem that the solution is only required to be partially periodic, rather than fully periodic; thus the pressure has no periodicity hypothesis. One can ask the extent to which the above problems change if one also requires pressure periodicity.
2. In another direction, one can ask the extent to which quantitative formulations of the Navier-Stokes problem are stronger than their qualitative counterparts; in particular, whether it is possible that each choice of initial data in a certain class leads to a smooth solution, but with no uniform bound on that solution in terms of various natural norms of the data.
3. Finally, one can ask the extent to which the conjecture depends on the category of data. For instance, could it be that global regularity is true for smooth periodic data but false for Schwartz data? True for Schwartz data but false for smooth ${H^1}$ data? And so forth.

One motivation for the final question (which was posed to me by my colleague, Andrea Bertozzi) is that the Schwartz property on the initial data ${u_0}$ tends to be instantly destroyed by the Navier-Stokes flow. This can be seen by introducing the vorticity ${\omega := \nabla \times u}$. If ${u(t)}$ is Schwartz, then from Stokes’ theorem we necessarily have vanishing of certain moments of the vorticity, for instance:

$\displaystyle \int_{{\bf R}^3} \omega_1 (x_2^2-x_3^2)\ dx = 0.$

On the other hand, some integration by parts using (1) reveals that such moments are usually not preserved by the flow; for instance, one has the law

$\displaystyle \partial_t \int_{{\bf R}^3} \omega_1(t,x) (x_2^2-x_3^2)\ dx = 4\int_{{\bf R}^3} u_2(t,x) u_3(t,x)\ dx,$

and one can easily concoct examples for which the right-hand side is non-zero at time zero. This suggests that the Schwartz class may be unnecessarily restrictive for Conjecture 1.3 or Conjecture 1.5.

My paper arose out of an attempt to address these three questions, and ended up obtaining partial results in all three directions. Roughly speaking, the results that address these three questions are as follows:

1. (Homogenisation) If one only assumes partial periodicity instead of full periodicity, then the forcing term ${f}$ becomes irrelevant. In particular, Conjecture 1.4 and Conjecture 1.6 are equivalent.
2. (Concentration compactness) In the ${H^1}$ category (both periodic and nonperiodic, homogeneous or nonhomogeneous), the qualitative and quantitative formulations of the Navier-Stokes global regularity problem are essentially equivalent.
3. (Localisation) The (inhomogeneous) Navier-Stokes problems in the Schwartz, smooth ${H^1}$, and finite energy categories are essentially equivalent to each other, and are also implied by the (fully) periodic version of these problems.

The first two of these families of results are relatively routine, drawing on existing methods in the literature; the localisation results though are somewhat more novel, and introduce some new local energy and local enstrophy estimates which may be of independent interest.

Broadly speaking, the moral to draw from these results is that the precise formulation of the Navier-Stokes equation global regularity problem is only of secondary importance; modulo a number of caveats and technicalities, the various formulations are close to being equivalent, and a breakthrough on any one of the formulations is likely to lead (either directly or indirectly) to a comparable breakthrough on any of the others.

This is only a caricature of the actual implications, though. Below is the diagram from the paper indicating the various formulations of the Navier-Stokes equations, and the known implications between them:

The above three streams of results are discussed in more detail below the fold.

As we are all now very much aware, tsunamis are water waves that start in the deep ocean, usually because of an underwater earthquake (though tsunamis can also be caused by underwater landslides or volcanoes), and then propagate towards shore. Initially, tsunamis have relatively small amplitude (a metre or so is typical), which would seem to render them as harmless as wind waves. And indeed, tsunamis often pass by ships in deep ocean without anyone on board even noticing.

However, being generated by an event as large as an earthquake, the wavelength of the tsunami is huge – 200 kilometres is typical (in contrast with wind waves, whose wavelengths are typically closer to 100 metres). In particular, the wavelength of the tsunami is far greater than the depth of the ocean (which is typically 2-3 kilometres). As such, even in the deep ocean, the dynamics of tsunamis are essentially governed by the shallow water equations. One consequence of these equations is that the speed of propagation ${v}$ of a tsunami can be approximated by the formula

$\displaystyle v \approx \sqrt{g b} \ \ \ \ \ (1)$

where ${b}$ is the depth of the ocean, and ${g \approx 9.8 ms^{-2}}$ is the force of gravity. As such, tsunamis in deep water move very fast – speeds such as 500 kilometres per hour (300 miles per hour) are quite typical; enough to travel from Japan to the US, for instance, in less than a day. Ultimately, this is due to the incompressibility of water (and conservation of mass); the massive net pressure (or more precisely, spatial variations in this pressure) of a very broad and deep wave of water forces the profile of the wave to move horizontally at vast speeds. (Note though that this is the phase velocity of the tsunami wave, and not the velocity of the water molecues themselves, which are far slower.)

As the tsunami approaches shore, the depth ${b}$ of course decreases, causing the tsunami to slow down, at a rate proportional to the square root of the depth, as per (1). Unfortunately, wave shoaling then forces the amplitude ${A}$ to increase at an inverse rate governed by Green’s law,

$\displaystyle A \propto \frac{1}{b^{1/4}} \ \ \ \ \ (2)$

at least until the amplitude becomes comparable to the water depth (at which point the assumptions that underlie the above approximate results break down; also, in two (horizontal) spatial dimensions there will be some decay of amplitude as the tsunami spreads outwards). If one starts with a tsunami whose initial amplitude was ${A_0}$ at depth ${b_0}$ and computes the point at which the amplitude ${A}$ and depth ${b}$ become comparable using the proportionality relationship (2), some high school algebra then reveals that at this point, amplitude of a tsunami (and the depth of the water) is about ${A_0^{4/5} b_0^{1/5}}$. Thus, for instance, a tsunami with initial amplitude of one metre at a depth of 2 kilometres can end up with a final amplitude of about 5 metres near shore, while still traveling at about ten metres per second (35 kilometres per hour, or 22 miles per hour), and we have all now seen the impact that can have when it hits shore.

While tsunamis are far too massive of an event to be able to control (at least in the deep ocean), we can at least model them mathematically, allowing one to predict their impact at various places along the coast with high accuracy. (For instance, here is a video of the NOAA’s model of the March 11 tsunami, which has matched up very well with subsequent measurements.) The full equations and numerical methods used to perform such models are somewhat sophisticated, but by making a large number of simplifying assumptions, it is relatively easy to come up with a rough model that already predicts the basic features of tsunami propagation, such as the velocity formula (1) and the amplitude proportionality law (2). I give this (standard) derivation below the fold. The argument will largely be heuristic in nature; there are very interesting analytic issues in actually justifying many of the steps below rigorously, but I will not discuss these matters here.

Last week I gave a talk at the Trinity Mathematical Society at Trinity College, Cambridge UK.  As the audience was primarily undergraduate, I gave a fairly non-technical talk on the universality phenomenon, based on this blog article of mine on the same topic.  It was a quite light and informal affair, and this is reflected in the talk slides (which, in particular, play up quite strongly the role of former students and Fellows of Trinity College in this story).   There was some interest in making these slides available publicly, so I have placed them on this site here.  (Note: copyright for the images in these slides has not been secured.)

This week I am at the American Institute of Mathematics, as an organiser on a workshop on the universality phenomenon in random matrices. There have been a number of interesting discussions so far in this workshop. Percy Deift, in a lecture on universality for invariant ensembles, gave some applications of what he only half-jokingly termed “the most important identity in mathematics”, namely the formula

$\displaystyle \hbox{det}( 1 + AB ) = \hbox{det}(1 + BA)$

whenever ${A, B}$ are ${n \times k}$ and ${k \times n}$ matrices respectively (or more generally, ${A}$ and ${B}$ could be linear operators with sufficiently good spectral properties that make both sides equal). Note that the left-hand side is an ${n \times n}$ determinant, while the right-hand side is a ${k \times k}$ determinant; this formula is particularly useful when computing determinants of large matrices (or of operators), as one can often use it to transform such determinants into much smaller determinants. In particular, the asymptotic behaviour of ${n \times n}$ determinants as ${n \rightarrow \infty}$ can be converted via this formula to determinants of a fixed size (independent of ${n}$), which is often a more favourable situation to analyse. Unsurprisingly, this trick is particularly useful for understanding the asymptotic behaviour of determinantal processes.

There are many ways to prove the identity. One is to observe first that when ${A, B}$ are invertible square matrices of the same size, that ${1+BA}$ and ${1+AB}$ are conjugate to each other and thus clearly have the same determinant; a density argument then removes the invertibility hypothesis, and a padding-by-zeroes argument then extends the square case to the rectangular case. Another is to proceed via the spectral theorem, noting that ${AB}$ and ${BA}$ have the same non-zero eigenvalues.

By rescaling, one obtains the variant identity

$\displaystyle \hbox{det}( z + AB ) = z^{n-k} \hbox{det}(z + BA)$

which essentially relates the characteristic polynomial of ${AB}$ with that of ${BA}$. When ${n=k}$, a comparison of coefficients this already gives important basic identities such as ${\hbox{tr}(AB) = \hbox{tr}(BA)}$ and ${\hbox{det}(AB) = \hbox{det}(BA)}$; when ${n}$ is not equal to ${k}$, an inspection of the ${z^{n-k}}$ coefficient similarly gives the Cauchy-Binet formula (which, incidentally, is also useful when performing computations on determinantal processes).

Thanks to this formula (and with a crucial insight of Alice Guionnet), I was able to solve a problem (on outliers for the circular law) that I had in the back of my mind for a few months, and initially posed to me by Larry Abbott; I hope to talk more about this in a future post.

Today, though, I wish to talk about another piece of mathematics that emerged from an afternoon of free-form discussion that we managed to schedule within the AIM workshop. Specifically, we hammered out a heuristic model of the mesoscopic structure of the eigenvalues ${\lambda_1 \leq \ldots \leq \lambda_n}$ of the ${n \times n}$ Gaussian Unitary Ensemble (GUE), where ${n}$ is a large integer. As is well known, the probability density of these eigenvalues is given by the Ginebre distribution

$\displaystyle \frac{1}{Z_n} e^{-H(\lambda)}\ d\lambda$

where ${d\lambda = d\lambda_1 \ldots d\lambda_n}$ is Lebesgue measure on the Weyl chamber ${\{ (\lambda_1,\ldots,\lambda_n) \in {\bf R}^n: \lambda_1 \leq \ldots \leq \lambda_n \}}$, ${Z_n}$ is a constant, and the Hamiltonian ${H}$ is given by the formula

$\displaystyle H(\lambda_1,\ldots,\lambda_n) := \sum_{j=1}^n \frac{\lambda_j^2}{2} - 2 \sum_{1 \leq i < j \leq n} \log |\lambda_i-\lambda_j|.$

At the macroscopic scale of ${\sqrt{n}}$, the eigenvalues ${\lambda_j}$ are distributed according to the Wigner semicircle law

$\displaystyle \rho_{sc}(x) := \frac{1}{2\pi} (4-x^2)_+^{1/2}.$

Indeed, if one defines the classical location ${\gamma_i^{cl}}$ of the ${i^{th}}$ eigenvalue to be the unique solution in ${[-2\sqrt{n}, 2\sqrt{n}]}$ to the equation

$\displaystyle \int_{-2\sqrt{n}}^{\gamma_i^{cl}/\sqrt{n}} \rho_{sc}(x)\ dx = \frac{i}{n}$

then it is known that the random variable ${\lambda_i}$ is quite close to ${\gamma_i^{cl}}$. Indeed, a result of Gustavsson shows that, in the bulk region when ${\epsilon n < i < (1-\epsilon) n}$ for some fixed ${\epsilon > 0}$, ${\lambda_i}$ is distributed asymptotically as a gaussian random variable with mean ${\gamma_i^{cl}}$ and variance ${\sqrt{\frac{\log n}{\pi}} \times \frac{1}{\sqrt{n} \rho_{sc}(\gamma_i^{cl})}}$. Note that from the semicircular law, the factor ${\frac{1}{\sqrt{n} \rho_{sc}(\gamma_i^{cl})}}$ is the mean eigenvalue spacing.

At the other extreme, at the microscopic scale of the mean eigenvalue spacing (which is comparable to ${1/\sqrt{n}}$ in the bulk, but can be as large as ${n^{-1/6}}$ at the edge), the eigenvalues are asymptotically distributed with respect to a special determinantal point process, namely the Dyson sine process in the bulk (and the Airy process on the edge), as discussed in this previous post.

Here, I wish to discuss the mesoscopic structure of the eigenvalues, in which one involves scales that are intermediate between the microscopic scale ${1/\sqrt{n}}$ and the macroscopic scale ${\sqrt{n}}$, for instance in correlating the eigenvalues ${\lambda_i}$ and ${\lambda_j}$ in the regime ${|i-j| \sim n^\theta}$ for some ${0 < \theta < 1}$. Here, there is a surprising phenomenon; there is quite a long-range correlation between such eigenvalues. The result of Gustavsson shows that both ${\lambda_i}$ and ${\lambda_j}$ behave asymptotically like gaussian random variables, but a further result from the same paper shows that the correlation between these two random variables is asymptotic to ${1-\theta}$ (in the bulk, at least); thus, for instance, adjacent eigenvalues ${\lambda_{i+1}}$ and ${\lambda_i}$ are almost perfectly correlated (which makes sense, as their spacing is much less than either of their standard deviations), but that even very distant eigenvalues, such as ${\lambda_{n/4}}$ and ${\lambda_{3n/4}}$, have a correlation comparable to ${1/\log n}$. One way to get a sense of this is to look at the trace

$\displaystyle \lambda_1 + \ldots + \lambda_n.$

This is also the sum of the diagonal entries of a GUE matrix, and is thus normally distributed with a variance of ${n}$. In contrast, each of the ${\lambda_i}$ (in the bulk, at least) has a variance comparable to ${\log n/n}$. In order for these two facts to be consistent, the average correlation between pairs of eigenvalues then has to be of the order of ${1/\log n}$.

Below the fold, I give a heuristic way to see this correlation, based on Taylor expansion of the convex Hamiltonian ${H(\lambda)}$ around the minimum ${\gamma}$, which gives a conceptual probabilistic model for the mesoscopic structure of the GUE eigenvalues. While this heuristic is in no way rigorous, it does seem to explain many of the features currently known or conjectured about GUE, and looks likely to extend also to other models.

The month of April has been designated as Mathematics Awareness Month by the major American mathematics organisations (the AMS, ASA, MAA, and SIAM).  I was approached to write a popular mathematics article for April 2011 (the theme for that month is “Mathematics and Complexity”).  While I have written a fair number of expository articles (including several on this blog) aimed at a mathematical audience, I actually have not had much experience writing articles at the popular mathematics level, and so I found this task to be remarkably difficult.  At this level of exposition, one not only needs to explain the facts, but also to tell a story; I have experience in the former but not in the latter.

I decided to write on the topic of universality – the phenomenon that the macroscopic behaviour of a dynamical system can be largely independent of the precise microscopic structure.   Below the fold is a first draft of the article; I would definitely welcome feedback and corrections.  It does not yet have any pictures, but I plan to rectify that in the final draft.  It also does not have a title, but this will be easy to address later.   But perhaps the biggest thing lacking right now is a narrative “hook”; I don’t yet have any good ideas as to how to make the story of universality compelling to a lay audience.  Any suggestions in this regard would be particularly appreciated.

I have not yet decided where I would try to publish this article; in fact, I might just publish it here on this blog (and eventually, in one of the blog book compilations).

As is now widely reported, the Fields medals for 2010 have been awarded to Elon Lindenstrauss, Ngo Bao Chau, Stas Smirnov, and Cedric Villani. Concurrently, the Nevanlinna prize (for outstanding contributions to mathematical aspects of information science) was awarded to Dan Spielman, the Gauss prize (for outstanding mathematical contributions that have found significant applications outside of mathematics) to Yves Meyer, and the Chern medal (for lifelong achievement in mathematics) to Louis Nirenberg. All of the recipients are of course exceptionally qualified and deserving for these awards; congratulations to all of them. (I should mention that I myself was only very tangentially involved in the awards selection process, and like everyone else, had to wait until the ceremony to find out the winners. I imagine that the work of the prize committees must have been extremely difficult.)

Today, I thought I would mention one result of each of the Fields medalists; by chance, three of the four medalists work in areas reasonably close to my own. (Ngo is rather more distant from my areas of expertise, but I will give it a shot anyway.) This will of course only be a tiny sample of each of their work, and I do not claim to be necessarily describing their “best” achievement, as I only know a portion of the research of each of them, and my selection choice may be somewhat idiosyncratic. (I may discuss the work of Spielman, Meyer, and Nirenberg in a later post.)