You are currently browsing the category archive for the ‘math.MP’ category.
The 2014 Fields medallists have just been announced as (in alphabetical order of surname) Artur Avila, Manjul Bhargava, Martin Hairer, and Maryam Mirzakhani (see also these nice video profiles for the winners, which is a new initiative of the IMU and the Simons foundation). This time four years ago, I wrote a blog post discussing one result from each of the 2010 medallists; I thought I would try to repeat the exercise here, although the work of the medallists this time around is a little bit further away from my own direct area of expertise than last time, and so my discussion will unfortunately be a bit superficial (and possibly not completely accurate) in places. As before, I am picking these results based on my own idiosyncratic tastes, and they should not be viewed as necessarily being the “best” work of these medallists. (See also the press releases for Avila, Bhargava, Hairer, and Mirzakhani.)
Artur Avila works in dynamical systems and in the study of Schrödinger operators. The work of Avila that I am most familiar with is his solution with Svetlana Jitormiskaya of the ten martini problem of Kac, the solution to which (according to Barry Simon) he offered ten martinis for, hence the name. The problem involves perhaps the simplest example of a Schrödinger operator with non-trivial spectral properties, namely the almost Mathieu operator defined for parameters and by a discrete one-dimensional Schrödinger operator with cosine potential:
This is a bounded self-adjoint operator and thus has a spectrum that is a compact subset of the real line; it arises in a number of physical contexts, most notably in the theory of the integer quantum Hall effect, though I will not discuss these applications here. Remarkably, the structure of this spectrum depends crucially on the Diophantine properties of the frequency . For instance, if is a rational number, then the operator is periodic with period , and then basic (discrete) Floquet theory tells us that the spectrum is simply the union of (possibly touching) intervals. But for irrational (in which case the spectrum is independent of the phase ), the situation is much more fractal in nature, for instance in the critical case the spectrum (as a function of ) gives rise to the Hofstadter butterfly. The “ten martini problem” asserts that for every irrational and every choice of coupling constant , the spectrum is homeomorphic to a Cantor set. Prior to the work of Avila and Jitormiskaya, there were a number of partial results on this problem, notably the result of Puig establishing Cantor spectrum for a full measure set of parameters , as well as results requiring a perturbative hypothesis, such as being very small or very large. The result was also already known for being either very close to rational (i.e. a Liouville number) or very far from rational (a Diophantine number), although the analyses for these two cases failed to meet in the middle, leaving some cases untreated. The argument uses a wide variety of existing techniques, both perturbative and non-perturbative, to attack this problem, as well as an amusing argument by contradiction: they assume (in certain regimes) that the spectrum fails to be a Cantor set, and use this hypothesis to obtain additional Lipschitz control on the spectrum (as a function of the frequency ), which they can then use (after much effort) to improve existing arguments and conclude that the spectrum was in fact Cantor after all!
Manjul Bhargava produces amazingly beautiful mathematics, though most of it is outside of my own area of expertise. One part of his work that touches on an area of my own interest (namely, random matrix theory) is his ongoing work with many co-authors on modeling (both conjecturally and rigorously) the statistics of various key number-theoretic features of elliptic curves (such as their rank, their Selmer group, or their Tate-Shafarevich groups). For instance, with Kane, Lenstra, Poonen, and Rains, Manjul has proposed a very general random matrix model that predicts all of these statistics (for instance, predicting that the -component of the Tate-Shafarevich group is distributed like the cokernel of a certain random -adic matrix, very much in the spirit of the Cohen-Lenstra heuristics discussed in this previous post). But what is even more impressive is that Manjul and his coauthors have been able to verify several non-trivial fragments of this model (e.g. showing that certain moments have the predicted asymptotics), giving for the first time non-trivial upper and lower bounds for various statistics, for instance obtaining lower bounds on how often an elliptic curve has rank or rank , leading most recently (in combination with existing work of Gross-Zagier and of Kolyvagin, among others) to his amazing result with Skinner and Zhang that at least of all elliptic curves over (ordered by height) obey the Birch and Swinnerton-Dyer conjecture. Previously it was not even known that a positive proportion of curves obeyed the conjecture. This is still a fair ways from resolving the conjecture fully (in particular, the situation with the presumably small number of curves of rank and higher is still very poorly understood, and the theory of Gross-Zagier and Kolyvagin that this work relies on, which was initially only available for , has only been extended to totally real number fields thus far, by the work of Zhang), but it certainly does provide hope that the conjecture could be within reach in a statistical sense at least.
Martin Hairer works in at the interface between probability and partial differential equations, and in particular in the theory of stochastic differential equations (SDEs). The result of his that is closest to my own interests is his remarkable demonstration with Jonathan Mattingly of unique invariant measure for the two-dimensional stochastically forced Navier-Stokes equation
on the two-torus , where is a Gaussian field that forces a fixed set of frequencies. It is expected that for any reasonable choice of initial data, the solution to this equation should asymptotically be distributed according to Kolmogorov’s power law, as discussed in this previous post. This is still far from established rigorously (although there are some results in this direction for dyadic models, see e.g. this paper of Cheskidov, Shvydkoy, and Friedlander). However, Hairer and Mattingly were able to show that there was a unique probability distribution to almost every initial data would converge to asymptotically; by the ergodic theorem, this is equivalent to demonstrating the existence and uniqueness of an invariant measure for the flow. Existence can be established using standard methods, but uniqueness is much more difficult. One of the standard routes to uniqueness is to establish a “strong Feller property” that enforces some continuity on the transition operators; among other things, this would mean that two ergodic probability measures with intersecting supports would in fact have a non-trivial common component, contradicting the ergodic theorem (which forces different ergodic measures to be mutually singular). Since all ergodic measures for Navier-Stokes can be seen to contain the origin in their support, this would give uniqueness. Unfortunately, the strong Feller property is unlikely to hold in the infinite-dimensional phase space for Navier-Stokes; but Hairer and Mattingly develop a clean abstract substitute for this property, which they call the asymptotic strong Feller property, which is again a regularity property on the transition operator; this in turn is then demonstrated by a careful application of Malliavin calculus.
Maryam Mirzakhani has mostly focused on the geometry and dynamics of Teichmuller-type moduli spaces, such as the moduli space of Riemann surfaces with a fixed genus and a fixed number of cusps (or with a fixed number of boundaries that are geodesics of a prescribed length). These spaces have an incredibly rich structure, ranging from geometric structure (such as the Kahler geometry given by the Weil-Petersson metric), to dynamical structure (through the action of the mapping class group on this and related spaces), to algebraic structure (viewing these spaces as algebraic varieties), and are thus connected to many other objects of interest in geometry and dynamics. For instance, by developing a new recursive formula for the Weil-Petersson volume of this space, Mirzakhani was able to asymptotically count the number of simple prime geodesics of length up to some threshold in a hyperbolic surface (or more precisely, she obtained asymptotics for the number of such geodesics in a given orbit of the mapping class group); the answer turns out to be polynomial in , in contrast to the much larger class of non-simple prime geodesics, whose asymptotics are exponential in (the “prime number theorem for geodesics”, developed in a classic series of works by Delsart, Huber, Selberg, and Margulis); she also used this formula to establish a new proof of a conjecture of Witten on intersection numbers that was first proven by Kontsevich. More recently, in two lengthy papers with Eskin and with Eskin-Mohammadi, Mirzakhani established rigidity theorems for the action of on such moduli spaces that are close analogues of Ratner’s celebrated rigidity theorems for unipotently generated groups (discussed in this previous blog post). Ratner’s theorems are already notoriously difficult to prove, and rely very much on the polynomial stability properties of unipotent flows; in this even more complicated setting, the unipotent flows are no longer tractable, and Mirzakhani instead uses a recent “exponential drift” method of Benoist and Quint with as a substitute. Ratner’s theorems are incredibly useful for all sorts of problems connected to homogeneous dynamics, and the analogous theorems established by Mirzakhani, Eskin, and Mohammadi have a similarly broad range of applications, for instance in counting periodic billiard trajectories in rational polygons.
Many fluid equations are expected to exhibit turbulence in their solutions, in which a significant portion of their energy ends up in high frequency modes. A typical example arises from the three-dimensional periodic Navier-Stokes equations
where is the velocity field, is a forcing term, is a pressure field, and is the viscosity. To study the dynamics of energy for this system, we first pass to the Fourier transform
so that the system becomes
We may normalise (and ) to have mean zero, so that . Then we introduce the dyadic energies
where ranges over the powers of two, and is shorthand for . Taking the inner product of (1) with , we obtain the energy flow equation
where range over powers of two, is the energy flow rate
is the energy dissipation rate
and is the energy injection rate
The Navier-Stokes equations are notoriously difficult to solve in general. Despite this, Kolmogorov in 1941 was able to give a convincing heuristic argument for what the distribution of the dyadic energies should become over long times, assuming that some sort of distributional steady state is reached. It is common to present this argument in the form of dimensional analysis, but one can also give a more “first principles” form Kolmogorov’s argument, which I will do here. Heuristically, one can divide the frequency scales into three regimes:
- The injection regime in which the energy injection rate dominates the right-hand side of (2);
- The energy flow regime in which the flow rates dominate the right-hand side of (2); and
- The dissipation regime in which the dissipation dominates the right-hand side of (2).
If we assume a fairly steady and smooth forcing term , then will be supported on the low frequency modes , and so we heuristically expect the injection regime to consist of the low scales . Conversely, if we take the viscosity to be small, we expect the dissipation regime to only occur for very large frequencies , with the energy flow regime occupying the intermediate frequencies.
We can heuristically predict the dividing line between the energy flow regime. Of all the flow rates , it turns out in practice that the terms in which (i.e., interactions between comparable scales, rather than widely separated scales) will dominate the other flow rates, so we will focus just on these terms. It is convenient to return back to physical space, decomposing the velocity field into Littlewood-Paley components
of the velocity field at frequency . By Plancherel’s theorem, this field will have an norm of , and as a naive model of turbulence we expect this field to be spread out more or less uniformly on the torus, so we have the heuristic
and a similar heuristic applied to gives
(One can consider modifications of the Kolmogorov model in which is concentrated on a lower-dimensional subset of the three-dimensional torus, leading to some changes in the numerology below, but we will not consider such variants here.) Since
we thus arrive at the heuristic
Of course, there is the possibility that due to significant cancellation, the energy flow is significantly less than , but we will assume that cancellation effects are not that significant, so that we typically have
or (assuming that does not oscillate too much in , and are close to )
On the other hand, we clearly have
We thus expect to be in the dissipation regime when
and in the energy flow regime when
Now we study the energy flow regime further. We assume a “statistically scale-invariant” dynamics in this regime, in particular assuming a power law
for some . From (3), we then expect an average asymptotic of the form
for some structure constants that depend on the exact nature of the turbulence; here we have replaced the factor by the comparable term to make things more symmetric. In order to attain a steady state in the energy flow regime, we thus need a cancellation in the structure constants:
On the other hand, if one is assuming statistical scale invariance, we expect the structure constants to be scale-invariant (in the energy flow regime), in that
for dyadic . Also, since the Euler equations conserve energy, the energy flows symmetrise to zero,
which from (7) suggests a similar cancellation among the structure constants
Combining this with the scale-invariance (9), we see that for fixed , we may organise the structure constants for dyadic into sextuples which sum to zero (including some degenerate tuples of order less than six). This will automatically guarantee the cancellation (8) required for a steady state energy distribution, provided that
or in other words
for any other value of , there is no particular reason to expect this cancellation (8) to hold. Thus we are led to the heuristic conclusion that the most stable power law distribution for the energies is the law
or in terms of shell energies, we have the famous Kolmogorov 5/3 law
Given that frequency interactions tend to cascade from low frequencies to high (if only because there are so many more high frequencies than low ones), the above analysis predicts a stablising effect around this power law: scales at which a law (6) holds for some are likely to lose energy in the near-term, while scales at which a law (6) hold for some are conversely expected to gain energy, thus nudging the exponent of power law towards .
We can solve for in terms of energy dissipation as follows. If we let be the frequency scale demarcating the transition from the energy flow regime (5) to the dissipation regime (4), we have
and hence by (10)
On the other hand, if we let be the energy dissipation at this scale (which we expect to be the dominant scale of energy dissipation), we have
Some simple algebra then lets us solve for and as
and
Thus, we have the Kolmogorov prediction
for
with energy dissipation occuring at the high end of this scale, which is counterbalanced by the energy injection at the low end of the scale.
As in the previous post, all computations here are at the formal level only.
In the previous blog post, the Euler equations for inviscid incompressible fluid flow were interpreted in a Lagrangian fashion, and then Noether’s theorem invoked to derive the known conservation laws for these equations. In a bit more detail: starting with Lagrangian space and Eulerian space , we let be the space of volume-preserving, orientation-preserving maps from Lagrangian space to Eulerian space. Given a curve , we can define the Lagrangian velocity field as the time derivative of , and the Eulerian velocity field . The volume-preserving nature of ensures that is a divergence-free vector field:
If we formally define the functional
then one can show that the critical points of this functional (with appropriate boundary conditions) obey the Euler equations
for some pressure field . As discussed in the previous post, the time translation symmetry of this functional yields conservation of the Hamiltonian
the rigid motion symmetries of Eulerian space give conservation of the total momentum
and total angular momentum
and the diffeomorphism symmetries of Lagrangian space give conservation of circulation
for any closed loop in , or equivalently pointwise conservation of the Lagrangian vorticity , where is the -form associated with the vector field using the Euclidean metric on , with denoting pullback by .
It turns out that one can generalise the above calculations. Given any self-adjoint operator on divergence-free vector fields , we can define the functional
as we shall see below the fold, critical points of this functional (with appropriate boundary conditions) obey the generalised Euler equations
for some pressure field , where in coordinates is with the usual summation conventions. (When , , and this term can be absorbed into the pressure , and we recover the usual Euler equations.) Time translation symmetry then gives conservation of the Hamiltonian
If the operator commutes with rigid motions on , then we have conservation of total momentum
and total angular momentum
and the diffeomorphism symmetries of Lagrangian space give conservation of circulation
or pointwise conservation of the Lagrangian vorticity . These applications of Noether’s theorem proceed exactly as the previous post; we leave the details to the interested reader.
One particular special case of interest arises in two dimensions , when is the inverse derivative . The vorticity is a -form, which in the two-dimensional setting may be identified with a scalar. In coordinates, if we write , then
Since is also divergence-free, we may therefore write
where the stream function is given by the formula
If we take the curl of the generalised Euler equation (2), we obtain (after some computation) the surface quasi-geostrophic equation
This equation has strong analogies with the three-dimensional incompressible Euler equations, and can be viewed as a simplified model for that system; see this paper of Constantin, Majda, and Tabak for details.
Now we can specialise the general conservation laws derived previously to this setting. The conserved Hamiltonian is
(a law previously observed for this equation in the abovementioned paper of Constantin, Majda, and Tabak). As commutes with rigid motions, we also have (formally, at least) conservation of momentum
(which up to trivial transformations is also expressible in impulse form as , after integration by parts), and conservation of angular momentum
(which up to trivial transformations is ). Finally, diffeomorphism invariance gives pointwise conservation of Lagrangian vorticity , thus is transported by the flow (which is also evident from (3). In particular, all integrals of the form for a fixed function are conserved by the flow.
Mathematicians study a variety of different mathematical structures, but perhaps the structures that are most commonly associated with mathematics are the number systems, such as the integers or the real numbers . Indeed, the use of number systems is so closely identified with the practice of mathematics that one sometimes forgets that it is possible to do mathematics without explicit reference to any concept of number. For instance, the ancient Greeks were able to prove many theorems in Euclidean geometry, well before the development of Cartesian coordinates and analytic geometry in the seventeenth century, or the formal constructions or axiomatisations of the real number system that emerged in the nineteenth century (not to mention precursor concepts such as zero or negative numbers, whose very existence was highly controversial, if entertained at all, to the ancient Greeks). To do this, the Greeks used geometric operations as substitutes for the arithmetic operations that would be more familiar to modern mathematicians. For instance, concatenation of line segments or planar regions serves as a substitute for addition; the operation of forming a rectangle out of two line segments would serve as a substitute for multiplication; the concept of similarity can be used as a substitute for ratios or division; and so forth.
A similar situation exists in modern physics. Physical quantities such as length, mass, momentum, charge, and so forth are routinely measured and manipulated using the real number system (or related systems, such as if one wishes to measure a vector-valued physical quantity such as velocity). Much as analytic geometry allows one to use the laws of algebra and trigonometry to calculate and prove theorems in geometry, the identification of physical quantities with numbers allows one to express physical laws and relationships (such as Einstein’s famous mass-energy equivalence ) as algebraic (or differential) equations, which can then be solved and otherwise manipulated through the extensive mathematical toolbox that has been developed over the centuries to deal with such equations.
However, as any student of physics is aware, most physical quantities are not represented purely by one or more numbers, but instead by a combination of a number and some sort of unit. For instance, it would be a category error to assert that the length of some object was a number such as ; instead, one has to say something like “the length of this object is yards”, combining both a number and a unit (in this case, the yard). Changing the unit leads to a change in the numerical value assigned to this physical quantity, even though no physical change to the object being measured has occurred. For instance, if one decides to use feet as the unit of length instead of yards, then the length of the object is now feet; if one instead uses metres, the length is now metres; and so forth. But nothing physical has changed when performing this change of units, and these lengths are considered all equal to each other:
It is then common to declare that while physical quantities and units are not, strictly speaking, numbers, they should be manipulated using the laws of algebra as if they were numerical quantities. For instance, if an object travels metres in seconds, then its speed should be
where we use the usual abbreviations of and for metres and seconds respectively. Similarly, if the speed of light is and an object has mass , then Einstein’s mass-energy equivalence then tells us that the energy-content of this object is
Note that the symbols are being manipulated algebraically as if they were mathematical variables such as and . By collecting all these units together, we see that every physical quantity gets assigned a unit of a certain dimension: for instance, we see here that the energy of an object can be given the unit of (more commonly known as a Joule), which has the dimension of where are the dimensions of mass, length, and time respectively.
There is however one important limitation to the ability to manipulate “dimensionful” quantities as if they were numbers: one is not supposed to add, subtract, or compare two physical quantities if they have different dimensions, although it is acceptable to multiply or divide two such quantities. For instance, if is a mass (having the units ) and is a speed (having the units ), then it is physically “legitimate” to form an expression such as , but not an expression such as or ; in a similar spirit, statements such as or are physically meaningless. This combines well with the mathematical distinction between vector, scalar, and matrix quantities, which among other things prohibits one from adding together two such quantities if their vector or matrix type are different (e.g. one cannot add a scalar to a vector, or a vector to a matrix), and also places limitations on when two such quantities can be multiplied together. A related limitation, which is not always made explicit in physics texts, is that transcendental mathematical functions such as or should only be applied to arguments that are dimensionless; thus, for instance, if is a speed, then is not physically meaningful, but is (this particular quantity is known as the rapidity associated to this speed).
These limitations may seem like a weakness in the mathematical modeling of physical quantities; one may think that one could get a more “powerful” mathematical framework if one were allowed to perform dimensionally inconsistent operations, such as add together a mass and a velocity, add together a vector and a scalar, exponentiate a length, etc. Certainly there is some precedent for this in mathematics; for instance, the formalism of Clifford algebras does in fact allow one to (among other things) add vectors with scalars, and in differential geometry it is quite common to formally apply transcendental functions (such as the exponential function) to a differential form (for instance, the Liouville measure of a symplectic manifold can be usefully thought of as a component of the exponential of the symplectic form ).
However, there are several reasons why it is advantageous to retain the limitation to only perform dimensionally consistent operations. One is that of error correction: one can often catch (and correct for) errors in one’s calculations by discovering a dimensional inconsistency, and tracing it back to the first step where it occurs. Also, by performing dimensional analysis, one can often identify the form of a physical law before one has fully derived it. For instance, if one postulates the existence of a mass-energy relationship involving only the mass of an object , the energy content , and the speed of light , dimensional analysis is already sufficient to deduce that the relationship must be of the form for some dimensionless absolute constant ; the only remaining task is then to work out the constant of proportionality , which requires physical arguments beyond that provided by dimensional analysis. (This is a simple instance of a more general application of dimensional analysis known as the Buckingham theorem.)
The use of units and dimensional analysis has certainly been proven to be very effective tools in physics. But one can pose the question of whether it has a properly grounded mathematical foundation, in order to settle any lingering unease about using such tools in physics, and also in order to rigorously develop such tools for purely mathematical purposes (such as analysing identities and inequalities in such fields of mathematics as harmonic analysis or partial differential equations).
The example of Euclidean geometry mentioned previously offers one possible approach to formalising the use of dimensions. For instance, one could model the length of a line segment not by a number, but rather by the equivalence class of all line segments congruent to the original line segment (cf. the Frege-Russell definition of a number). Similarly, the area of a planar region can be modeled not by a number, but by the equivalence class of all regions that are equidecomposable with the original region (one can, if one wishes, restrict attention here to measurable sets in order to avoid Banach-Tarski-type paradoxes, though that particular paradox actually only arises in three and higher dimensions). As mentioned before, it is then geometrically natural to multiply two lengths to form an area, by taking a rectangle whose line segments have the stated lengths, and using the area of that rectangle as a product. This geometric picture works well for units such as length and volume that have a spatial geometric interpretation, but it is less clear how to apply it for more general units. For instance, it does not seem geometrically natural (or, for that matter, conceptually helpful) to envision the equation as the assertion that the energy is the volume of a rectangular box whose height is the mass and whose length and width is given by the speed of light .
But there are at least two other ways to formalise dimensionful quantities in mathematics, which I will discuss below the fold. The first is a “parametric” model in which dimensionful objects are modeled as numbers (or vectors, matrices, etc.) depending on some base dimensional parameters (such as units of length, mass, and time, or perhaps a coordinate system for space or spacetime), and transforming according to some representation of a structure group that encodes the range of these parameters; this type of “coordinate-heavy” model is often used (either implicitly or explicitly) by physicists in order to efficiently perform calculations, particularly when manipulating vector or tensor-valued quantities. The second is an “abstract” model in which dimensionful objects now live in an abstract mathematical space (e.g. an abstract vector space), in which only a subset of the operations available to general-purpose number systems such as or are available, namely those operations which are “dimensionally consistent” or invariant (or more precisely, equivariant) with respect to the action of the underlying structure group. This sort of “coordinate-free” approach tends to be the one which is preferred by pure mathematicians, particularly in the various branches of modern geometry, in part because it can lead to greater conceptual clarity, as well as results of great generality; it is also close to the more informal practice of treating mathematical manipulations that do not preserve dimensional consistency as being physically meaningless.
Things are pretty quiet here during the holiday season, but one small thing I have been working on recently is a set of notes on special relativity that I will be working through in a few weeks with some bright high school students here at our local math circle. I have only two hours to spend with this group, and it is unlikely that we will reach the end of the notes (in which I derive the famous mass-energy equivalence relation E=mc^2, largely following Einstein’s original derivation as discussed in this previous blog post); instead we will probably spend a fair chunk of time on related topics which do not actually require special relativity per se, such as spacetime diagrams, the Doppler shift effect, and an analysis of my airport puzzle. This will be my first time doing something of this sort (in which I will be spending as much time interacting directly with the students as I would lecturing); I’m not sure exactly how it will play out, being a little outside of my usual comfort zone of undergraduate and graduate teaching, but am looking forward to finding out how it goes. (In particular, it may end up that the discussion deviates somewhat from my prepared notes.)
The material covered in my notes is certainly not new, but I ultimately decided that it was worth putting up here in case some readers here had any corrections or other feedback to contribute (which, as always, would be greatly appreciated).
[Dec 24 and then Jan 21: notes updated, in response to comments.]
Way back in 2007, I wrote a blog post giving Einstein’s derivation of his famous equation for the rest energy of a body with mass . (Throughout this post, mass is used to refer to the invariant mass (also known as rest mass) of an object.) This derivation used a number of physical assumptions, including the following:
- The two postulates of special relativity: firstly, that the laws of physics are the same in every inertial reference frame, and secondly that the speed of light in vacuum is equal in every such inertial frame.
- Planck’s law and de Broglie’s law for photons, relating the frequency, energy, and momentum of such photons together.
- The law of conservation of energy, and the law of conservation of momentum, as well as the additivity of these quantities (i.e. the energy of a system is the sum of the energy of its components, and similarly for momentum).
- The Newtonian approximations , to energy and momentum at low velocities.
The argument was one-dimensional in nature, in the sense that only one of the three spatial dimensions was actually used in the proof.
As was pointed out in comments in the previous post by Laurens Gunnarsen, this derivation has the curious feature of needing some laws from quantum mechanics (specifically, the Planck and de Broglie laws) in order to derive an equation in special relativity (which does not ostensibly require quantum mechanics). One can then ask whether one can give a derivation that does not require such laws. As pointed out in previous comments, one can use the representation theory of the Lorentz group to give a nice derivation that avoids any quantum mechanics, but it now needs at least two spatial dimensions instead of just one. I decided to work out this derivation in a way that does not explicitly use representation theory (although it is certainly lurking beneath the surface). The concept of momentum is only barely used in this derivation, and the main ingredients are now reduced to the following:
- The two postulates of special relativity;
- The law of conservation of energy (and the additivity of energy);
- The Newtonian approximation at low velocities.
The argument (which uses a little bit of calculus, but is otherwise elementary) is given below the fold. Whereas Einstein’s original argument considers a mass emitting two photons in several different reference frames, the argument here considers a large mass breaking up into two equal smaller masses. Viewing this situation in different reference frames gives a functional equation for the relationship between energy, mass, and velocity, which can then be solved using some calculus, using the Newtonian approximation as a boundary condition, to give the famous formula.
Disclaimer: As with the previous post, the arguments here are physical arguments rather than purely mathematical ones, and thus do not really qualify as a rigorous mathematical argument, due to the implicit use of a number of physical and metaphysical hypotheses beyond the ones explicitly listed above. (But it would be difficult to say anything non-tautological at all about the physical world if one could rely solely on rigorous mathematical reasoning.)
A few days ago, I released a preprint entitled “Localisation and compactness properties of the Navier-Stokes global regularity problem“, discussed in this previous blog post. As it turns out, I was somewhat impatient to finalise the paper and move on to other things, and the original preprint was still somewhat rough in places (contradicting my own advice on this matter), with a number of typos of minor to moderate severity. But a bit more seriously, I discovered on a further proofreading that there was a subtle error in a component of the argument that I had believed to be routine – namely the persistence of higher regularity for mild solutions. As a consequence, some of the implications stated in the first version were not exactly correct as stated; but they can be repaired by replacing a “bad” notion of global regularity for a certain class of data with a “good” notion. I have completed (and proofread) an updated version of the ms, which should appear at the arXiv link of the paper in a day or two (and which I have also placed at this link). (In the meantime, it is probably best not to read the original ms too carefully, as this could lead to some confusion.) I’ve also added a new section that shows that, due to this technicality, one can exhibit smooth initial data to the Navier-Stokes equation for which there are no smooth solutions, which superficially sounds very close to a negative solution to the global regularity problem, but is actually nothing of the sort.
Let me now describe the issue in more detail (and also to explain why I missed it previously). A standard principle in the theory of evolutionary partial differentiation equations is that regularity in space can be used to imply regularity in time. To illustrate this, consider a solution to the supercritical nonlinear wave equation
(1)
for some field . Suppose one already knew that had some regularity in space, and in particular the norm of was bounded (thus and up to two spatial derivatives of were bounded). Then, by (1), we see that two time derivatives of were also bounded, and one then gets the additional regularity of .
In a similar vein, suppose one initially knew that had the regularity . Then (1) soon tells us that also has the regularity ; then, if one differentiates (1) in time to obtain
one can conclude that also has the regularity of . One can continue this process indefinitely; in particular, if one knew that , then these sorts of manipulations show that is infinitely smooth in both space and time.
The issue that caught me by surprise is that for the Navier-Stokes equations
(2)
(setting the forcing term equal to zero for simplicity), infinite regularity in space does not automatically imply infinite regularity in time, even if one assumes the initial data lies in a standard function space such as the Sobolev space . The problem lies with the pressure term , which is recovered from the velocity via the elliptic equation
(3)
that can be obtained by taking the divergence of (2). This equation is solved by a non-local integral operator:
If, say, lies in , then there is no difficulty establishing a bound on in terms of (for instance, one can use singular integral theory and Sobolev embedding to place in . However, one runs into difficulty when trying to compute time derivatives of . Differentiating (3) once, one gets
.
At the regularity of , one can still (barely) control this quantity by using (2) to expand out and using some integration by parts. But when one wishes to compute a second time derivative of the pressure, one obtains (after integration by parts) an expansion of the form
and now there is not enough regularity on available to get any control on , even if one assumes that is smooth. Indeed, following this observation, I was able to show that given generic smooth data, the pressure will instantaneously fail to be in time, and thence (by (2)) the velocity will instantaneously fail to be in time. (Switching to the vorticity formulation buys one further degree of time differentiability, but does not fully eliminate the problem; the vorticity will fail to be in time. Switching to material coordinates seems to makes things very slightly better, but I believe there is still a breakdown of time regularity in these coordinates also.)
For later times t>0 (and assuming homogeneous data f=0 for simplicity), this issue no longer arises, because of the instantaneous smoothing effect of the Navier-Stokes flow, which for instance will upgrade regularity to regularity instantaneously. It is only the initial time at which some time irregularity can occur.
This breakdown of regularity does not actually impact the original formulation of the Clay Millennium Prize problem, though, because in that problem the initial velocity is required to be Schwartz class (so all derivatives are rapidly decreasing). In this class, the regularity theory works as expected; if one has a solution which already has some reasonable regularity (e.g. a mild solution) and the data is Schwartz, then the solution will be smooth in spacetime. (Another class where things work as expected is when the vorticity is Schwartz; in such cases, the solution remains smooth in both space and time (for short times, at least), and the Schwartz nature of the vorticity is preserved (because the vorticity is subject to fewer non-local effects than the velocity, as it is not directly affected by the pressure).)
This issue means that one of the implications in the original paper (roughly speaking, that global regularity for Schwartz data implies global regularity for smooth data) is not correct as stated. But this can be fixed by weakening the notion of global regularity in the latter setting, by limiting the amount of time differentiability available at the initial time. More precisely, call a solution and almost smooth if
- and are smooth on the half-open slab ; and
- For every , exist and are continuous on the full slab .
Thus, an almost smooth solution is the same concept as a smooth solution, except that at time zero, the velocity field is only , and the pressure field is only . This is still enough regularity to interpret the Navier-Stokes equation (2) in a classical manner, but falls slightly short of full smoothness.
(I had already introduced this notion of almost smoothness in the more general setting of smooth finite energy solutions in the first draft of this paper, but had failed to realise that it was also necessary in the smooth setting also.)
One can now “fix” the global regularity conjectures for Navier-Stokes in the smooth or smooth finite energy setting by requiring the solutions to merely be almost smooth instead of smooth. Once one does so, the results in my paper then work as before: roughly speaking, if one knows that Schwartz data produces smooth solutions, one can conclude that smooth or smooth finite energy data produces almost smooth solutions (and the paper now contains counterexamples to show that one does not always have smooth solutions in this category).
The diagram of implications between conjectures has been adjusted to reflect this issue, and now reads as follows:
I’ve just uploaded to the arXiv my paper “Localisation and compactness properties of the Navier-Stokes global regularity problem“, submitted to Analysis and PDE. This paper concerns the global regularity problem for the Navier-Stokes system of equations
in three dimensions. Thus, we specify initial data , where is a time, is the initial velocity field (which, in order to be compatible with (2), (3), is required to be divergence-free), is the forcing term, and then seek to extend this initial data to a solution with this data, where the velocity field and pressure term are the unknown fields.
Roughly speaking, the global regularity problem asserts that given every smooth set of initial data , there exists a smooth solution to the Navier-Stokes equation with this data. However, this is not a good formulation of the problem because it does not exclude the possibility that one or more of the fields grows too fast at spatial infinity. This problem is evident even for the much simpler heat equation
As long as one has some mild conditions at infinity on the smooth initial data (e.g. polynomial growth at spatial infinity), then one can solve this equation using the fundamental solution of the heat equation:
If furthermore is a tempered distribution, one can use Fourier-analytic methods to show that this is the unique solution to the heat equation with this data. But once one allows sufficiently rapid growth at spatial infinity, existence and uniqueness can break down. Consider for instance the backwards heat kernel
for some , which is smooth (albeit exponentially growing) at time zero, and is a smooth solution to the heat equation for , but develops a dramatic singularity at time . A famous example of Tychonoff from 1935, based on a power series construction, also shows that uniqueness for the heat equation can also fail once growth conditions are removed. An explicit example of non-uniqueness for the heat equation is given by the contour integral
where is the -shaped contour consisting of the positive real axis and the upper imaginary axis, with being interpreted with the standard branch (with cut on the negative axis). One can show by contour integration that this function solves the heat equation and is smooth (but rapidly growing at infinity), and vanishes for , but is not identically zero for .
Thus, in order to obtain a meaningful (and physically realistic) problem, one needs to impose some decay (or at least limited growth) hypotheses on the data and solution in addition to smoothness. For the data, one can impose a variety of such hypotheses, including the following:
- (Finite energy data) One has and .
- ( data) One has and .
- (Schwartz data) One has and for all .
- (Periodic data) There is some such that and for all and .
- (Homogeneous data) .
Note that smoothness alone does not necessarily imply finite energy, , or the Schwartz property. For instance, the (scalar) function is smooth and finite energy, but not in or Schwartz. Periodicity is of course incompatible with finite energy, , or the Schwartz property, except in the trivial case when the data is identically zero.
Similarly, one can impose conditions at spatial infinity on the solution, such as the following:
- (Finite energy solution) One has .
- ( solution) One has and .
- (Partially periodic solution) There is some such that for all and .
- (Fully periodic solution) There is some such that and for all and .
(The component of the solution is for technical reasons, and should not be paid too much attention for this discussion.) Note that we do not consider the notion of a Schwartz solution; as we shall see shortly, this is too restrictive a concept of solution to the Navier-Stokes equation.
Finally, one can downgrade the regularity of the solution down from smoothness. There are many ways to do so; two such examples include
- ( mild solutions) The solution is not smooth, but is (in the preceding sense) and solves the equation (1) in the sense that the Duhamel formula
holds.
- (Leray-Hopf weak solution) The solution is not smooth, but lies in , solves (1) in the sense of distributions (after rewriting the system in divergence form), and obeys an energy inequality.
Finally, one can ask for two types of global regularity results on the Navier-Stokes problem: a qualitative regularity result, in which one merely provides existence of a smooth solution without any explicit bounds on that solution, and a quantitative regularity result, which provides bounds on the solution in terms of the initial data, e.g. a bound of the form
for some function . One can make a further distinction between local quantitative results, in which is allowed to depend on , and global quantitative results, in which there is no dependence on (the latter is only reasonable though in the homogeneous case, or if has some decay in time).
By combining these various hypotheses and conclusions, we see that one can write down quite a large number of slightly different variants of the global regularity problem. In the official formulation of the regularity problem for the Clay Millennium prize, a positive correct solution to either of the following two problems would be accepted for the prize:
- Conjecture 1.4 (Qualitative regularity for homogeneous periodic data) If is periodic, smooth, and homogeneous, then there exists a smooth partially periodic solution with this data.
- Conjecture 1.3 (Qualitative regularity for homogeneous Schwartz data) If is Schwartz and homogeneous, then there exists a smooth finite energy solution with this data.
(The numbering here corresponds to the numbering in the paper.)
Furthermore, a negative correct solution to either of the following two problems would also be accepted for the prize:
- Conjecture 1.6 (Qualitative regularity for periodic data) If is periodic and smooth, then there exists a smooth partially periodic solution with this data.
- Conjecture 1.5 (Qualitative regularity for Schwartz data) If is Schwartz, then there exists a smooth finite energy solution with this data.
I am not announcing any major progress on these conjectures here. What my paper does study, though, is the question of whether the answer to these conjectures is somehow sensitive to the choice of formulation. For instance:
- Note in the periodic formulations of the Clay prize problem that the solution is only required to be partially periodic, rather than fully periodic; thus the pressure has no periodicity hypothesis. One can ask the extent to which the above problems change if one also requires pressure periodicity.
- In another direction, one can ask the extent to which quantitative formulations of the Navier-Stokes problem are stronger than their qualitative counterparts; in particular, whether it is possible that each choice of initial data in a certain class leads to a smooth solution, but with no uniform bound on that solution in terms of various natural norms of the data.
- Finally, one can ask the extent to which the conjecture depends on the category of data. For instance, could it be that global regularity is true for smooth periodic data but false for Schwartz data? True for Schwartz data but false for smooth data? And so forth.
One motivation for the final question (which was posed to me by my colleague, Andrea Bertozzi) is that the Schwartz property on the initial data tends to be instantly destroyed by the Navier-Stokes flow. This can be seen by introducing the vorticity . If is Schwartz, then from Stokes’ theorem we necessarily have vanishing of certain moments of the vorticity, for instance:
On the other hand, some integration by parts using (1) reveals that such moments are usually not preserved by the flow; for instance, one has the law
and one can easily concoct examples for which the right-hand side is non-zero at time zero. This suggests that the Schwartz class may be unnecessarily restrictive for Conjecture 1.3 or Conjecture 1.5.
My paper arose out of an attempt to address these three questions, and ended up obtaining partial results in all three directions. Roughly speaking, the results that address these three questions are as follows:
- (Homogenisation) If one only assumes partial periodicity instead of full periodicity, then the forcing term becomes irrelevant. In particular, Conjecture 1.4 and Conjecture 1.6 are equivalent.
- (Concentration compactness) In the category (both periodic and nonperiodic, homogeneous or nonhomogeneous), the qualitative and quantitative formulations of the Navier-Stokes global regularity problem are essentially equivalent.
- (Localisation) The (inhomogeneous) Navier-Stokes problems in the Schwartz, smooth , and finite energy categories are essentially equivalent to each other, and are also implied by the (fully) periodic version of these problems.
The first two of these families of results are relatively routine, drawing on existing methods in the literature; the localisation results though are somewhat more novel, and introduce some new local energy and local enstrophy estimates which may be of independent interest.
Broadly speaking, the moral to draw from these results is that the precise formulation of the Navier-Stokes equation global regularity problem is only of secondary importance; modulo a number of caveats and technicalities, the various formulations are close to being equivalent, and a breakthrough on any one of the formulations is likely to lead (either directly or indirectly) to a comparable breakthrough on any of the others.
This is only a caricature of the actual implications, though. Below is the diagram from the paper indicating the various formulations of the Navier-Stokes equations, and the known implications between them:
The above three streams of results are discussed in more detail below the fold.
As we are all now very much aware, tsunamis are water waves that start in the deep ocean, usually because of an underwater earthquake (though tsunamis can also be caused by underwater landslides or volcanoes), and then propagate towards shore. Initially, tsunamis have relatively small amplitude (a metre or so is typical), which would seem to render them as harmless as wind waves. And indeed, tsunamis often pass by ships in deep ocean without anyone on board even noticing.
However, being generated by an event as large as an earthquake, the wavelength of the tsunami is huge – 200 kilometres is typical (in contrast with wind waves, whose wavelengths are typically closer to 100 metres). In particular, the wavelength of the tsunami is far greater than the depth of the ocean (which is typically 2-3 kilometres). As such, even in the deep ocean, the dynamics of tsunamis are essentially governed by the shallow water equations. One consequence of these equations is that the speed of propagation of a tsunami can be approximated by the formula
where is the depth of the ocean, and is the force of gravity. As such, tsunamis in deep water move very fast – speeds such as 500 kilometres per hour (300 miles per hour) are quite typical; enough to travel from Japan to the US, for instance, in less than a day. Ultimately, this is due to the incompressibility of water (and conservation of mass); the massive net pressure (or more precisely, spatial variations in this pressure) of a very broad and deep wave of water forces the profile of the wave to move horizontally at vast speeds. (Note though that this is the phase velocity of the tsunami wave, and not the velocity of the water molecues themselves, which are far slower.)
As the tsunami approaches shore, the depth of course decreases, causing the tsunami to slow down, at a rate proportional to the square root of the depth, as per (1). Unfortunately, wave shoaling then forces the amplitude to increase at an inverse rate governed by Green’s law,
at least until the amplitude becomes comparable to the water depth (at which point the assumptions that underlie the above approximate results break down; also, in two (horizontal) spatial dimensions there will be some decay of amplitude as the tsunami spreads outwards). If one starts with a tsunami whose initial amplitude was at depth and computes the point at which the amplitude and depth become comparable using the proportionality relationship (2), some high school algebra then reveals that at this point, amplitude of a tsunami (and the depth of the water) is about . Thus, for instance, a tsunami with initial amplitude of one metre at a depth of 2 kilometres can end up with a final amplitude of about 5 metres near shore, while still traveling at about ten metres per second (35 kilometres per hour, or 22 miles per hour), and we have all now seen the impact that can have when it hits shore.
While tsunamis are far too massive of an event to be able to control (at least in the deep ocean), we can at least model them mathematically, allowing one to predict their impact at various places along the coast with high accuracy. (For instance, here is a video of the NOAA’s model of the March 11 tsunami, which has matched up very well with subsequent measurements.) The full equations and numerical methods used to perform such models are somewhat sophisticated, but by making a large number of simplifying assumptions, it is relatively easy to come up with a rough model that already predicts the basic features of tsunami propagation, such as the velocity formula (1) and the amplitude proportionality law (2). I give this (standard) derivation below the fold. The argument will largely be heuristic in nature; there are very interesting analytic issues in actually justifying many of the steps below rigorously, but I will not discuss these matters here.
Last week I gave a talk at the Trinity Mathematical Society at Trinity College, Cambridge UK. As the audience was primarily undergraduate, I gave a fairly non-technical talk on the universality phenomenon, based on this blog article of mine on the same topic. It was a quite light and informal affair, and this is reflected in the talk slides (which, in particular, play up quite strongly the role of former students and Fellows of Trinity College in this story). There was some interest in making these slides available publicly, so I have placed them on this site here. (Note: copyright for the images in these slides has not been secured.)
Recent Comments