You are currently browsing the category archive for the ‘math.AP’ category.

The Euler equations for incompressible inviscid fluids may be written as

\displaystyle \partial_t u + (u \cdot \nabla) u = -\nabla p

\displaystyle \nabla \cdot u = 0

where {u: [0,T] \times {\bf R}^n \rightarrow {\bf R}^n} is the velocity field, and {p: [0,T] \times {\bf R}^n \rightarrow {\bf R}} is the pressure field. To avoid technicalities we will assume that both fields are smooth, and that {u} is bounded. We will take the dimension {n} to be at least two, with the three-dimensional case {n=3} being of course especially interesting.

The Euler equations are the inviscid limit of the Navier-Stokes equations; as discussed in my previous post, one potential route to establishing finite time blowup for the latter equations when {n=3} is to be able to construct “computers” solving the Euler equations, which generate smaller replicas of themselves in a noise-tolerant manner (as the viscosity term in the Navier-Stokes equation is to be viewed as perturbative noise).

Perhaps the most prominent obstacles to this route are the conservation laws for the Euler equations, which limit the types of final states that a putative computer could reach from a given initial state. Most famously, we have the conservation of energy

\displaystyle \int_{{\bf R}^n} |u|^2\ dx \ \ \ \ \ (1)

 

(assuming sufficient decay of the velocity field at infinity); thus for instance it would not be possible for a computer to generate a replica of itself which had greater total energy than the initial computer. This by itself is not a fatal obstruction (in this paper of mine, I constructed such a “computer” for an averaged Euler equation that still obeyed energy conservation). However, there are other conservation laws also, for instance in three dimensions one also has conservation of helicity

\displaystyle \int_{{\bf R}^3} u \cdot (\nabla \times u)\ dx \ \ \ \ \ (2)

 

and (formally, at least) one has conservation of momentum

\displaystyle \int_{{\bf R}^3} u\ dx

and angular momentum

\displaystyle \int_{{\bf R}^3} x \times u\ dx

(although, as we shall discuss below, due to the slow decay of {u} at infinity, these integrals have to either be interpreted in a principal value sense, or else replaced with their vorticity-based formulations, namely impulse and moment of impulse). Total vorticity

\displaystyle \int_{{\bf R}^3} \nabla \times u\ dx

is also conserved, although it turns out in three dimensions that this quantity vanishes when one assumes sufficient decay at infinity. Then there are the pointwise conservation laws: the vorticity and the volume form are both transported by the fluid flow, while the velocity field (when viewed as a covector) is transported up to a gradient; among other things, this gives the transport of vortex lines as well as Kelvin’s circulation theorem, and can also be used to deduce the helicity conservation law mentioned above. In my opinion, none of these laws actually prohibits a self-replicating computer from existing within the laws of ideal fluid flow, but they do significantly complicate the task of actually designing such a computer, or of the basic “gates” that such a computer would consist of.

Below the fold I would like to record and derive all the conservation laws mentioned above, which to my knowledge essentially form the complete set of known conserved quantities for the Euler equations. The material here (although not the notation) is drawn from this text of Majda and Bertozzi.

Read the rest of this entry »

I’ve just uploaded to the arXiv the paper “Finite time blowup for an averaged three-dimensional Navier-Stokes equation“, submitted to J. Amer. Math. Soc.. The main purpose of this paper is to formalise the “supercriticality barrier” for the global regularity problem for the Navier-Stokes equation, which roughly speaking asserts that it is not possible to establish global regularity by any “abstract” approach which only uses upper bound function space estimates on the nonlinear part of the equation, combined with the energy identity. This is done by constructing a modification of the Navier-Stokes equations with a nonlinearity that obeys essentially all of the function space estimates that the true Navier-Stokes nonlinearity does, and which also obeys the energy identity, but for which one can construct solutions that blow up in finite time. Results of this type had been previously established by Montgomery-Smith, Gallagher-Paicu, and Li-Sinai for variants of the Navier-Stokes equation without the energy identity, and by Katz-Pavlovic and by Cheskidov for dyadic analogues of the Navier-Stokes equations in five and higher dimensions that obeyed the energy identity (see also the work of Plechac and Sverak and of Hou and Lei that also suggest blowup for other Navier-Stokes type models obeying the energy identity in five and higher dimensions), but to my knowledge this is the first blowup result for a Navier-Stokes type equation in three dimensions that also obeys the energy identity. Intriguingly, the method of proof in fact hints at a possible route to establishing blowup for the true Navier-Stokes equations, which I am now increasingly inclined to believe is the case (albeit for a very small set of initial data).

To state the results more precisely, recall that the Navier-Stokes equations can be written in the form

\displaystyle  \partial_t u + (u \cdot \nabla) u = \nu \Delta u + \nabla p

for a divergence-free velocity field {u} and a pressure field {p}, where {\nu>0} is the viscosity, which we will normalise to be one. We will work in the non-periodic setting, so the spatial domain is {{\bf R}^3}, and for sake of exposition I will not discuss matters of regularity or decay of the solution (but we will always be working with strong notions of solution here rather than weak ones). Applying the Leray projection {P} to divergence-free vector fields to this equation, we can eliminate the pressure, and obtain an evolution equation

\displaystyle  \partial_t u = \Delta u + B(u,u) \ \ \ \ \ (1)

purely for the velocity field, where {B} is a certain bilinear operator on divergence-free vector fields (specifically, {B(u,v) = -\frac{1}{2} P( (u \cdot \nabla) v + (v \cdot \nabla) u)}. The global regularity problem for Navier-Stokes is then equivalent to the global regularity problem for the evolution equation (1).

An important feature of the bilinear operator {B} appearing in (1) is the cancellation law

\displaystyle  \langle B(u,u), u \rangle = 0

(using the {L^2} inner product on divergence-free vector fields), which leads in particular to the fundamental energy identity

\displaystyle  \frac{1}{2} \int_{{\bf R}^3} |u(T,x)|^2\ dx + \int_0^T \int_{{\bf R}^3} |\nabla u(t,x)|^2\ dx dt = \frac{1}{2} \int_{{\bf R}^3} |u(0,x)|^2\ dx.

This identity (and its consequences) provide essentially the only known a priori bound on solutions to the Navier-Stokes equations from large data and arbitrary times. Unfortunately, as discussed in this previous post, the quantities controlled by the energy identity are supercritical with respect to scaling, which is the fundamental obstacle that has defeated all attempts to solve the global regularity problem for Navier-Stokes without any additional assumptions on the data or solution (e.g. perturbative hypotheses, or a priori control on a critical norm such as the {L^\infty_t L^3_x} norm).

Our main result is then (slightly informally stated) as follows

Theorem 1 There exists an averaged version {\tilde B} of the bilinear operator {B}, of the form

\displaystyle  \tilde B(u,v) := \int_\Omega m_{3,\omega}(D) Rot_{3,\omega}

\displaystyle B( m_{1,\omega}(D) Rot_{1,\omega} u, m_{2,\omega}(D) Rot_{2,\omega} v )\ d\mu(\omega)

for some probability space {(\Omega, \mu)}, some spatial rotation operators {Rot_{i,\omega}} for {i=1,2,3}, and some Fourier multipliers {m_{i,\omega}} of order {0}, for which one still has the cancellation law

\displaystyle  \langle \tilde B(u,u), u \rangle = 0

and for which the averaged Navier-Stokes equation

\displaystyle  \partial_t u = \Delta u + \tilde B(u,u) \ \ \ \ \ (2)

admits solutions that blow up in finite time.

(There are some integrability conditions on the Fourier multipliers {m_{i,\omega}} required in the above theorem in order for the conclusion to be non-trivial, but I am omitting them here for sake of exposition.)

Because spatial rotations and Fourier multipliers of order {0} are bounded on most function spaces, {\tilde B} automatically obeys almost all of the upper bound estimates that {B} does. Thus, this theorem blocks any attempt to prove global regularity for the true Navier-Stokes equations which relies purely on the energy identity and on upper bound estimates for the nonlinearity; one must use some additional structure of the nonlinear operator {B} which is not shared by an averaged version {\tilde B}. Such additional structure certainly exists – for instance, the Navier-Stokes equation has a vorticity formulation involving only differential operators rather than pseudodifferential ones, whereas a general equation of the form (2) does not. However, “abstract” approaches to global regularity generally do not exploit such structure, and thus cannot be used to affirmatively answer the Navier-Stokes problem.

It turns out that the particular averaged bilinear operator {B} that we will use will be a finite linear combination of local cascade operators, which take the form

\displaystyle  C(u,v) := \sum_{n \in {\bf Z}} (1+\epsilon_0)^{5n/2} \langle u, \psi_{1,n} \rangle \langle v, \psi_{2,n} \rangle \psi_{3,n}

where {\epsilon_0>0} is a small parameter, {\psi_1,\psi_2,\psi_3} are Schwartz vector fields whose Fourier transform is supported on an annulus, and {\psi_{i,n}(x) := (1+\epsilon_0)^{3n/2} \psi_i( (1+\epsilon_0)^n x)} is an {L^2}-rescaled version of {\psi_i} (basically a “wavelet” of wavelength about {(1+\epsilon_0)^{-n}} centred at the origin). Such operators were essentially introduced by Katz and Pavlovic as dyadic models for {B}; they have the essentially the same scaling property as {B} (except that one can only scale along powers of {1+\epsilon_0}, rather than over all positive reals), and in fact they can be expressed as an average of {B} in the sense of the above theorem, as can be shown after a somewhat tedious amount of Fourier-analytic symbol manipulations.

If we consider nonlinearities {\tilde B} which are a finite linear combination of local cascade operators, then the equation (2) more or less collapses to a system of ODE in certain “wavelet coefficients” of {u}. The precise ODE that shows up depends on what precise combination of local cascade operators one is using. Katz and Pavlovic essentially considered a single cascade operator together with its “adjoint” (needed to preserve the energy identity), and arrived (more or less) at the system of ODE

\displaystyle  \partial_t X_n = - (1+\epsilon_0)^{2n} X_n + (1+\epsilon_0)^{\frac{5}{2}(n-1)} X_{n-1}^2 - (1+\epsilon_0)^{\frac{5}{2} n} X_n X_{n+1} \ \ \ \ \ (3)

where {X_n: [0,T] \rightarrow {\bf R}} are scalar fields for each integer {n}. (Actually, Katz-Pavlovic worked with a technical variant of this particular equation, but the differences are not so important for this current discussion.) Note that the quadratic terms on the RHS carry a higher exponent of {1+\epsilon_0} than the dissipation term; this reflects the supercritical nature of this evolution (the energy {\frac{1}{2} \sum_n X_n^2} is monotone decreasing in this flow, so the natural size of {X_n} given the control on the energy is {O(1)}). There is a slight technical issue with the dissipation if one wishes to embed (3) into an equation of the form (2), but it is minor and I will not discuss it further here.

In principle, if the {X_n} mode has size comparable to {1} at some time {t_n}, then energy should flow from {X_n} to {X_{n+1}} at a rate comparable to {(1+\epsilon_0)^{\frac{5}{2} n}}, so that by time {t_{n+1} \approx t_n + (1+\epsilon_0)^{-\frac{5}{2} n}} or so, most of the energy of {X_n} should have drained into the {X_{n+1}} mode (with hardly any energy dissipated). Since the series {\sum_{n \geq 1} (1+\epsilon_0)^{-\frac{5}{2} n}} is summable, this suggests finite time blowup for this ODE as the energy races ever more quickly to higher and higher modes. Such a scenario was indeed established by Katz and Pavlovic (and refined by Cheskidov) if the dissipation strength {(1+\epsilon)^{2n}} was weakened somewhat (the exponent {2} has to be lowered to be less than {\frac{5}{3}}). As mentioned above, this is enough to give a version of Theorem 1 in five and higher dimensions.

On the other hand, it was shown a few years ago by Barbato, Morandin, and Romito that (3) in fact admits global smooth solutions (at least in the dyadic case {\epsilon_0=1}, and assuming non-negative initial data). Roughly speaking, the problem is that as energy is being transferred from {X_n} to {X_{n+1}}, energy is also simultaneously being transferred from {X_{n+1}} to {X_{n+2}}, and as such the solution races off to higher modes a bit too prematurely, without absorbing all of the energy from lower modes. This weakens the strength of the blowup to the point where the moderately strong dissipation in (3) is enough to kill the high frequency cascade before a true singularity occurs. Because of this, the original Katz-Pavlovic model cannot quite be used to establish Theorem 1 in three dimensions. (Actually, the original Katz-Pavlovic model had some additional dispersive features which allowed for another proof of global smooth solutions, which is an unpublished result of Nazarov.)

To get around this, I had to “engineer” an ODE system with similar features to (3) (namely, a quadratic nonlinearity, a monotone total energy, and the indicated exponents of {(1+\epsilon_0)} for both the dissipation term and the quadratic terms), but for which the cascade of energy from scale {n} to scale {n+1} was not interrupted by the cascade of energy from scale {n+1} to scale {n+2}. To do this, I needed to insert a delay in the cascade process (so that after energy was dumped into scale {n}, it would take some time before the energy would start to transfer to scale {n+1}), but the process also needed to be abrupt (once the process of energy transfer started, it needed to conclude very quickly, before the delayed transfer for the next scale kicked in). It turned out that one could build a “quadratic circuit” out of some basic “quadratic gates” (analogous to how an electrical circuit could be built out of basic gates such as amplifiers or resistors) that achieved this task, leading to an ODE system essentially of the form

\displaystyle \partial_t X_{1,n} = - (1+\epsilon_0)^{2n} X_{1,n}

\displaystyle  + (1+\epsilon_0)^{5n/2} (- \epsilon^{-2} X_{3,n} X_{4,n} - \epsilon X_{1,n} X_{2,n} - \epsilon^2 \exp(-K^{10}) X_{1,n} X_{3,n}

\displaystyle  + K X_{4,n-1}^2)

\displaystyle  \partial_t X_{2,n} = - (1+\epsilon_0)^{2n} X_{2,n} + (1+\epsilon_0)^{5n/2} (\epsilon X_{1,n}^2 - \epsilon^{-1} K^{10} X_{3,n}^2)

\displaystyle  \partial_t X_{3,n} = - (1+\epsilon_0)^{2n} X_{3,n} + (1+\epsilon_0)^{5n/2} (\epsilon^2 \exp(-K^{10}) X_{1,n}^2

\displaystyle + \epsilon^{-1} K^{10} X_{2,n} X_{3,n} )

\displaystyle  \partial_t X_{4,n} =- (1+\epsilon_0)^{2n} X_{4,n} + (1+\epsilon_0)^{5n/2} (\epsilon^{-2} X_{3,n} X_{1,n}

\displaystyle - (1+\epsilon_0)^{5/2} K X_{4,n} X_{1,n+1})

where {K \geq 1} is a suitable large parameter and {\epsilon > 0} is a suitable small parameter (much smaller than {1/K}). To visualise the dynamics of such a system, I found it useful to describe this system graphically by a “circuit diagram” that is analogous (but not identical) to the circuit diagrams arising in electrical engineering:

circuit-1

The coupling constants here range widely from being very large to very small; in practice, this makes the {X_{2,n}} and {X_{3,n}} modes absorb very little energy, but exert a sizeable influence on the remaining modes. If a lot of energy is suddenly dumped into {X_{1,n}}, what happens next is roughly as follows: for a moderate period of time, nothing much happens other than a trickle of energy into {X_{2,n}}, which in turn causes a rapid exponential growth of {X_{3,n}} (from a very low base). After this delay, {X_{3,n}} suddenly crosses a certain threshold, at which point it causes {X_{1,n}} and {X_{4,n}} to exchange energy back and forth with extreme speed. The energy from {X_{4,n}} then rapidly drains into {X_{1,n+1}}, and the process begins again (with a slight loss in energy due to the dissipation). If one plots the total energy {E_n := \frac{1}{2} ( X_{1,n}^2 + X_{2,n}^2 + X_{3,n}^2 + X_{4,n}^2 )} as a function of time, it looks schematically like this:

energy-blowup

As in the previous heuristic discussion, the time between cascades from one frequency scale to the next decay exponentially, leading to blowup at some finite time {T}. (One could describe the dynamics here as being similar to the famous “lighting the beacons” scene in the Lord of the Rings movies, except that (a) as each beacon gets ignited, the previous one is extinguished, as per the energy identity; (b) the time between beacon lightings decrease exponentially; and (c) there is no soundtrack.)

There is a real (but remote) possibility that this sort of construction can be adapted to the true Navier-Stokes equations. The basic blowup mechanism in the averaged equation is that of a von Neumann machine, or more precisely a construct (built within the laws of the inviscid evolution {\partial_t u = \tilde B(u,u)}) that, after some time delay, manages to suddenly create a replica of itself at a finer scale (and to largely erase its original instantiation in the process). In principle, such a von Neumann machine could also be built out of the laws of the inviscid form of the Navier-Stokes equations (i.e. the Euler equations). In physical terms, one would have to build the machine purely out of an ideal fluid (i.e. an inviscid incompressible fluid). If one could somehow create enough “logic gates” out of ideal fluid, one could presumably build a sort of “fluid computer”, at which point the task of building a von Neumann machine appears to reduce to a software engineering exercise rather than a PDE problem (providing that the gates are suitably stable with respect to perturbations, but (as with actual computers) this can presumably be done by converting the analog signals of fluid mechanics into a more error-resistant digital form). The key thing missing in this program (in both senses of the word) to establish blowup for Navier-Stokes is to construct the logic gates within the laws of ideal fluids. (Compare with the situation for cellular automata such as Conway’s “Game of Life“, in which Turing complete computers, universal constructors, and replicators have all been built within the laws of that game.)

The purpose of this post is to link to a short unpublished note of mine that I wrote back in 2010 but forgot to put on my web page at the time. Entitled “A physical space proof of the bilinear Strichartz and local smoothing estimates for the Schrodinger equation“, it gives a proof of two standard estimates for the free (linear) Schrodinger equation in flat Euclidean space, namely the bilinear Strichartz estimate and the local smoothing estimate, using primarily “physical space” methods such as integration by parts, instead of “frequency space” methods based on the Fourier transform, although a small amount of Fourier analysis (basically sectoral projection to make the Schrodinger waves move roughly in a given direction) is still needed.  This is somewhat in the spirit of an older paper of mine with Klainerman and Rodnianski doing something similar for the wave equation, and is also very similar to a paper of Planchon and Vega from 2009.  The hope was that by avoiding the finer properties of the Fourier transform, one could obtain a more robust argument which could also extend to nonlinear, non-free, or non-flat situations.   These notes were cited once or twice by some people that I had privately circulated them to, so I decided to put them online here for reference.

UPDATE, July 24: Fabrice Planchon has kindly supplied another note in which he gives a particularly simple proof of local smoothing in one dimension, and discusses some other variants of the method (related to the paper of Planchon and Vega cited earlier).

Consider the free Schrödinger equation in {d} spatial dimensions, which I will normalise as

\displaystyle  i u_t + \frac{1}{2} \Delta_{{\bf R}^d} u = 0 \ \ \ \ \ (1)

where {u: {\bf R} \times {\bf R}^d \rightarrow {\bf C}} is the unknown field and {\Delta_{{\bf R}^{d+1}} = \sum_{j=1}^d \frac{\partial^2}{\partial x_j^2}} is the spatial Laplacian. To avoid irrelevant technical issues I will restrict attention to smooth (classical) solutions to this equation, and will work locally in spacetime avoiding issues of decay at infinity (or at other singularities); I will also avoid issues involving branch cuts of functions such as {t^{d/2}} (if one wishes, one can restrict {d} to be even in order to safely ignore all branch cut issues). The space of solutions to (1) enjoys a number of symmetries. A particularly non-obvious symmetry is the pseudoconformal symmetry: if {u} solves (1), then the pseudoconformal solution {pc(u): {\bf R} \times {\bf R}^d \rightarrow {\bf C}} defined by

\displaystyle  pc(u)(t,x) := \frac{1}{(it)^{d/2}} \overline{u(\frac{1}{t}, \frac{x}{t})} e^{i|x|^2/2t} \ \ \ \ \ (2)

for {t \neq 0} can be seen after some computation to also solve (1). (If {u} has suitable decay at spatial infinity and one chooses a suitable branch cut for {(it)^{d/2}}, one can extend {pc(u)} continuously to the {t=0} spatial slice, whereupon it becomes essentially the spatial Fourier transform of {u(0,\cdot)}, but we will not need this fact for the current discussion.)

An analogous symmetry exists for the free wave equation in {d+1} spatial dimensions, which I will write as

\displaystyle  u_{tt} - \Delta_{{\bf R}^{d+1}} u = 0 \ \ \ \ \ (3)

where {u: {\bf R} \times {\bf R}^{d+1} \rightarrow {\bf C}} is the unknown field. In analogy to pseudoconformal symmetry, we have conformal symmetry: if {u: {\bf R} \times {\bf R}^{d+1} \rightarrow {\bf C}} solves (3), then the function {conf(u): {\bf R} \times {\bf R}^{d+1} \rightarrow {\bf C}}, defined in the interior {\{ (t,x): |x| < |t| \}} of the light cone by the formula

\displaystyle  conf(u)(t,x) := (t^2-|x|^2)^{-d/2} u( \frac{t}{t^2-|x|^2}, \frac{x}{t^2-|x|^2} ), \ \ \ \ \ (4)

also solves (3).

There are also some direct links between the Schrödinger equation in {d} dimensions and the wave equation in {d+1} dimensions. This can be easily seen on the spacetime Fourier side: solutions to (1) have spacetime Fourier transform (formally) supported on a {d}-dimensional hyperboloid, while solutions to (3) have spacetime Fourier transform formally supported on a {d+1}-dimensional cone. To link the two, one then observes that the {d}-dimensional hyperboloid can be viewed as a conic section (i.e. hyperplane slice) of the {d+1}-dimensional cone. In physical space, this link is manifested as follows: if {u: {\bf R} \times {\bf R}^d \rightarrow {\bf C}} solves (1), then the function {\iota_{1}(u): {\bf R} \times {\bf R}^{d+1} \rightarrow {\bf C}} defined by

\displaystyle  \iota_{1}(u)(t,x_1,\ldots,x_{d+1}) := e^{-i(t+x_{d+1})} u( \frac{t-x_{d+1}}{2}, x_1,\ldots,x_d)

solves (3). More generally, for any non-zero scaling parameter {\lambda}, the function {\iota_{\lambda}(u): {\bf R} \times {\bf R}^{d+1} \rightarrow {\bf C}} defined by

\displaystyle  \iota_{\lambda}(u)(t,x_1,\ldots,x_{d+1}) :=

\displaystyle  \lambda^{d/2} e^{-i\lambda(t+x_{d+1})} u( \lambda \frac{t-x_{d+1}}{2}, \lambda x_1,\ldots,\lambda x_d) \ \ \ \ \ (5)

solves (3).

As an “extra challenge” posed in an exercise in one of my books (Exercise 2.28, to be precise), I asked the reader to use the embeddings {\iota_1} (or more generally {\iota_\lambda}) to explicitly connect together the pseudoconformal transformation {pc} and the conformal transformation {conf}. It turns out that this connection is a little bit unusual, with the “obvious” guess (namely, that the embeddings {\iota_\lambda} intertwine {pc} and {conf}) being incorrect, and as such this particular task was perhaps too difficult even for a challenge question. I’ve been asked a couple times to provide the connection more explicitly, so I will do so below the fold.

Read the rest of this entry »

[These are notes intended mostly for myself, as these topics are useful in random matrix theory, but may be of interest to some readers also. -T.]

One of the most fundamental partial differential equations in mathematics is the heat equation

\displaystyle  \partial_t f = L f \ \ \ \ \ (1)

where {f: [0,+\infty) \times {\bf R}^n \rightarrow {\bf R}} is a scalar function {(t,x) \mapsto f(t,x)} of both time and space, and {L} is the Laplacian {L := \frac{1}{2} \Delta = \sum_{i=1}^n \frac{\partial^2}{\partial x_i^2}}. For the purposes of this post, we will ignore all technical issues of regularity and decay, and always assume that the solutions to equations such as (1) have all the regularity and decay in order to justify all formal operations such as the chain rule, integration by parts, or differentiation under the integral sign. The factor of {\frac{1}{2}} in the definition of the heat propagator {L} is of course an arbitrary normalisation, chosen for some minor technical reasons; one can certainly continue the discussion below with other choices of normalisations if desired.

In probability theory, this equation takes on particular significance when {f} is restricted to be non-negative, and furthermore to be a probability measure at each time, in the sense that

\displaystyle  \int_{{\bf R}^n} f(t,x)\ dx = 1

for all {t}. (Actually, it suffices to verify this constraint at time {t=0}, as the heat equation (1) will then preserve this constraint.) Indeed, in this case, one can interpret {f(t,x)\ dx} as the probability distribution of a Brownian motion

\displaystyle  dx = dB(t) \ \ \ \ \ (2)

where {x = x(t) \in {\bf R}^n} is a stochastic process with initial probability distribution {f(0,x)\ dx}; see for instance this previous blog post for more discussion.

A model example of a solution to the heat equation to keep in mind is that of the fundamental solution

\displaystyle  G(t,x) = \frac{1}{(2\pi t)^{n/2}} e^{-|x|^2/2t} \ \ \ \ \ (3)

defined for any {t>0}, which represents the distribution of Brownian motion of a particle starting at the origin {x=0} at time {t=0}. At time {t}, {G(t,x)} represents an {{\bf R}^n}-valued random variable, each coefficient of which is an independent random variable of mean zero and variance {t}. (As {t \rightarrow 0^+}, {G(t)} converges in the sense of distributions to a Dirac mass at the origin.)

The heat equation can also be viewed as the gradient flow for the Dirichlet form

\displaystyle  D(f,g) := \frac{1}{2} \int_{{\bf R}^n} \nabla f \cdot \nabla g\ dx \ \ \ \ \ (4)

since one has the integration by parts identity

\displaystyle  \int_{{\bf R}^n} Lf(x) g(x)\ dx = \int_{{\bf R}^n} f(x) Lg(x)\ dx = - D(f,g) \ \ \ \ \ (5)

for all smooth, rapidly decreasing {f,g}, which formally implies that {L f} is (half of) the negative gradient of the Dirichlet energy {D(f,f) = \frac{1}{2} \int_{{\bf R}^n} |\nabla f|^2\ dx} with respect to the {L^2({\bf R}^n,dx)} inner product. Among other things, this implies that the Dirichlet energy decreases in time:

\displaystyle  \partial_t D(f,f) = - 2 \int_{{\bf R}^n} |Lf|^2\ dx. \ \ \ \ \ (6)

For instance, for the fundamental solution (3), one can verify for any time {t>0} that

\displaystyle  D(G,G) = \frac{n}{2^{n+2} \pi^{n/2}} \frac{1}{t^{(n+2)/2}} \ \ \ \ \ (7)

(assuming I have not made a mistake in the calculation). In a similar spirit we have

\displaystyle  \partial_t \int_{{\bf R}^n} |f|^2\ dx = - 2 D(f,f). \ \ \ \ \ (8)

Since {D(f,f)} is non-negative, the formula (6) implies that {\int_{{\bf R}^n} |Lf|^2\ dx} is integrable in time, and in particular we see that {Lf} converges to zero as {t \rightarrow \infty}, in some averaged {L^2} sense at least; similarly, (8) suggests that {D(f,f)} also converges to zero. This suggests that {f} converges to a constant function; but as {f} is also supposed to decay to zero at spatial infinity, we thus expect solutions to the heat equation in {{\bf R}^n} to decay to zero in some sense as {t \rightarrow \infty}. However, the decay is only expected to be polynomial in nature rather than exponential; for instance, the solution (3) decays in the {L^\infty} norm like {O(t^{-n/2})}.

Since {L1=0}, we also observe the basic cancellation property

\displaystyle  \int_{{\bf R}^n} Lf(x) \ dx = 0 \ \ \ \ \ (9)

for any function {f}.

There are other quantities relating to {f} that also decrease in time under heat flow, particularly in the important case when {f} is a probability measure. In this case, it is natural to introduce the entropy

\displaystyle  S(f) := \int_{{\bf R}^n} f(x) \log f(x)\ dx.

Thus, for instance, if {f(x)\ dx} is the uniform distribution on some measurable subset {E} of {{\bf R}^n} of finite measure {|E|}, the entropy would be {-\log |E|}. Intuitively, as the entropy decreases, the probability distribution gets wider and flatter. For instance, in the case of the fundamental solution (3), one has {S(G) = -\frac{n}{2} \log( 2 \pi e t )} for any {t>0}, reflecting the fact that {G(t)} is approximately uniformly distributed on a ball of radius {O(\sqrt{t})} (and thus of measure {O(t^{n/2})}).

A short formal computation shows (if one assumes for simplicity that {f} is strictly positive, which is not an unreasonable hypothesis, particularly in view of the strong maximum principle) using (9), (5) that

\displaystyle  \partial_t S(f) = \int_{{\bf R}^n} (Lf) \log f + f \frac{Lf}{f}\ dx

\displaystyle  = \int_{{\bf R}^n} (Lf) \log f\ dx

\displaystyle  = - D( f, \log f )

\displaystyle  = - \frac{1}{2} \int_{{\bf R}^n} \frac{|\nabla f|^2}{f}\ dx

\displaystyle  = - 4D( g, g )

where {g := \sqrt{f}} is the square root of {f}. For instance, if {f} is the fundamental solution (3), one can check that {D(g,g) = \frac{n}{8t}} (note that this is a significantly cleaner formula than (7)!).

In particular, the entropy is decreasing, which corresponds well to one’s intuition that the heat equation (or Brownian motion) should serve to spread out a probability distribution over time.

Actually, one can say more: the rate of decrease {4D(g,g)} of the entropy is itself decreasing, or in other words the entropy is convex. I do not have a satisfactorily intuitive reason for this phenomenon, but it can be proved by straightforward application of basic several variable calculus tools (such as the chain rule, product rule, quotient rule, and integration by parts), and completing the square. Namely, by using the chain rule we have

\displaystyle  L \phi(f) = \phi'(f) Lf + \frac{1}{2} \phi''(f) |\nabla f|^2, \ \ \ \ \ (10)

valid for for any smooth function {\phi: {\bf R} \rightarrow {\bf R}}, we see from (1) that

\displaystyle  2 g \partial_t g = 2 g L g + |\nabla g|^2

and thus (again assuming that {f}, and hence {g}, is strictly positive to avoid technicalities)

\displaystyle  \partial_t g = Lg + \frac{|\nabla g|^2}{2g}.

We thus have

\displaystyle  \partial_t D(g,g) = 2 D(g,Lg) + D(g, \frac{|\nabla g|^2}{g} ).

It is now convenient to compute using the Einstein summation convention to hide the summation over indices {i,j = 1,\ldots,n}. We have

\displaystyle  2 D(g,Lg) = \frac{1}{2} \int_{{\bf R}^n} (\partial_i g) (\partial_i \partial_j \partial_j g)\ dx

and

\displaystyle  D(g, \frac{|\nabla g|^2}{g} ) = \frac{1}{2} \int_{{\bf R}^n} (\partial_i g) \partial_i \frac{\partial_j g \partial_j g}{g}\ dx.

By integration by parts and interchanging partial derivatives, we may write the first integral as

\displaystyle  2 D(g,Lg) = - \frac{1}{2} \int_{{\bf R}^n} (\partial_i \partial_j g) (\partial_i \partial_j g)\ dx,

and from the quotient and product rules, we may write the second integral as

\displaystyle  D(g, \frac{|\nabla g|^2}{g} ) = \int_{{\bf R}^n} \frac{(\partial_i g) (\partial_j g) (\partial_i \partial_j g)}{g} - \frac{(\partial_i g) (\partial_j g) (\partial_i g) (\partial_j g)}{2g^2}\ dx.

Gathering terms, completing the square, and making the summations explicit again, we see that

\displaystyle  \partial_t D(g,g) =- \frac{1}{2} \int_{{\bf R}^n} \frac{\sum_{i=1}^n \sum_{j=1}^n |g \partial_i \partial_j g - (\partial_i g) (\partial_j g)|^2}{g^2}\ dx

and so in particular {D(g,g)} is always decreasing.

The above identity can also be written as

\displaystyle  \partial_t D(g,g) = - \frac{1}{2} \int_{{\bf R}^n} |\nabla^2 \log g|^2 g^2\ dx.

Exercise 1 Give an alternate proof of the above identity by writing {f = e^{2u}}, {g = e^u} and deriving the equation {\partial_t u = Lu + |\nabla u|^2} for {u}.

It was observed in a well known paper of Bakry and Emery that the above monotonicity properties hold for a much larger class of heat flow-type equations, and lead to a number of important relations between energy and entropy, such as the log-Sobolev inequality of Gross and of Federbush, and the hypercontractivity inequality of Nelson; we will discuss one such family of generalisations (or more precisely, variants) below the fold.

Read the rest of this entry »

Lars Hörmander, who made fundamental contributions to all areas of partial differential equations, but particularly in developing the analysis of variable-coefficient linear PDE, died last Sunday, aged 81.

I unfortunately never met Hörmander personally, but of course I encountered his work all the time while working in PDE. One of his major contributions to the subject was to systematically develop the calculus of Fourier integral operators (FIOs), which are a substantial generalisation of pseudodifferential operators and which can be used to (approximately) solve linear partial differential equations, or to transform such equations into a more convenient form. Roughly speaking, Fourier integral operators are to linear PDE as canonical transformations are to Hamiltonian mechanics (and one can in fact view FIOs as a quantisation of a canonical transformation). They are a large class of transformations, for instance the Fourier transform, pseudodifferential operators, and smooth changes of the spatial variable are all examples of FIOs, and (as long as certain singular situations are avoided) the composition of two FIOs is again an FIO.

The full theory of FIOs is quite extensive, occupying the entire final volume of Hormander’s famous four-volume series “The Analysis of Linear Partial Differential Operators”. I am certainly not going to try to attempt to summarise it here, but I thought I would try to motivate how these operators arise when trying to transform functions. For simplicity we will work with functions {f \in L^2({\bf R}^n)} on a Euclidean domain {{\bf R}^n} (although FIOs can certainly be defined on more general smooth manifolds, and there is an extension of the theory that also works on manifolds with boundary). As this will be a heuristic discussion, we will ignore all the (technical, but important) issues of smoothness or convergence with regards to the functions, integrals and limits that appear below, and be rather vague with terms such as “decaying” or “concentrated”.

A function {f \in L^2({\bf R}^n)} can be viewed from many different perspectives (reflecting the variety of bases, or approximate bases, that the Hilbert space {L^2({\bf R}^n)} offers). Most directly, we have the physical space perspective, viewing {f} as a function {x \mapsto f(x)} of the physical variable {x \in {\bf R}^n}. In many cases, this function will be concentrated in some subregion {\Omega} of physical space. For instance, a gaussian wave packet

\displaystyle  f(x) = A e^{-(x-x_0)^2/\hbar} e^{i \xi_0 \cdot x/\hbar}, \ \ \ \ \ (1)

where {\hbar > 0}, {A \in {\bf C}} and {x_0, \xi_0 \in {\bf R}^n} are parameters, would be physically concentrated in the ball {B(x_0,\sqrt{\hbar})}. Then we have the frequency space (or momentum space) perspective, viewing {f} now as a function {\xi \mapsto \hat f(\xi)} of the frequency variable {\xi \in {\bf R}^n}. For this discussion, it will be convenient to normalise the Fourier transform using a small constant {\hbar > 0} (which has the physical interpretation of Planck’s constant if one is doing quantum mechanics), thus

\displaystyle  \hat f(\xi) := \frac{1}{(2\pi \hbar)^{n/2}} \int_{\bf R} e^{-i\xi \cdot x/\hbar} f(x)\ dx.

For instance, for the gaussian wave packet (1), one has

\displaystyle  \hat f(\xi) = A e^{i\xi_0 \cdot x_0/\hbar} e^{-(\xi-\xi_0)^2/\hbar} e^{-i \xi \cdot x_0/\hbar},

and so we see that {f} is concentrated in frequency space in the ball {B(\xi_0,\sqrt{\hbar})}.

However, there is a third (but less rigorous) way to view a function {f} in {L^2({\bf R}^n)}, which is the phase space perspective in which one tries to view {f} as distributed simultaneously in physical space and in frequency space, thus being something like a measure on the phase space {T^* {\bf R}^n := \{ (x,\xi): x, \xi \in {\bf R}^n\}}. Thus, for instance, the function (1) should heuristically be concentrated on the region {B(x_0,\sqrt{\hbar}) \times B(\xi_0,\sqrt{\hbar})} in phase space. Unfortunately, due to the uncertainty principle, there is no completely satisfactory way to canonically and rigorously define what the “phase space portrait” of a function {f} should be. (For instance, the Wigner transform of {f} can be viewed as an attempt to describe the distribution of the {L^2} energy of {f} in phase space, except that this transform can take negative or even complex values; see Folland’s book for further discussion.) Still, it is a very useful heuristic to think of functions has having a phase space portrait, which is something like a non-negative measure on phase space that captures the distribution of functions in both space and frequency, albeit with some “quantum fuzziness” that shows up whenever one tries to inspect this measure at scales of physical space and frequency space that together violate the uncertainty principle. (The score of a piece of music is a good everyday example of a phase space portrait of a function, in this case a sound wave; here, the physical space is the time axis (the horizontal dimension of the score) and the frequency space is the vertical dimension. Here, the time and frequency scales involved are well above the uncertainty principle limit (a typical note lasts many hundreds of cycles, whereas the uncertainty principle kicks in at {O(1)} cycles) and so there is no obstruction here to musical notation being unambiguous.) Furthermore, if one takes certain asymptotic limits, one can recover a precise notion of a phase space portrait; for instance if one takes the semiclassical limit {\hbar \rightarrow 0} then, under certain circumstances, the phase space portrait converges to a well-defined classical probability measure on phase space; closely related to this is the high frequency limit of a fixed function, which among other things defines the wave front set of that function, which can be viewed as another asymptotic realisation of the phase space portrait concept.

If functions in {L^2({\bf R}^n)} can be viewed as a sort of distribution in phase space, then linear operators {T: L^2({\bf R}^n) \rightarrow L^2({\bf R}^n)} should be viewed as various transformations on such distributions on phase space. For instance, a pseudodifferential operator {a(X,D)} should correspond (as a zeroth approximation) to multiplying a phase space distribution by the symbol {a(x,\xi)} of that operator, as discussed in this previous blog post. Note that such operators only change the amplitude of the phase space distribution, but not the support of that distribution.

Now we turn to operators that alter the support of a phase space distribution, rather than the amplitude; we will focus on unitary operators to emphasise the amplitude preservation aspect. These will eventually be key examples of Fourier integral operators. A physical translation {Tf(x) := f(x-x_0)} should correspond to pushing forward the distribution by the transformation {(x,\xi) \mapsto (x+x_0,\xi)}, as can be seen by comparing the physical and frequency space supports of {Tf} with that of {f}. Similarly, a frequency modulation {Tf(x) := e^{i \xi_0 \cdot x/\hbar} f(x)} should correspond to the transformation {(x,\xi) \mapsto (x,\xi+\xi_0)}; a linear change of variables {Tf(x) := |\hbox{det} L|^{-1/2} f(L^{-1} x)}, where {L: {\bf R}^n \rightarrow {\bf R}^n} is an invertible linear transformation, should correspond to {(x,\xi) \mapsto (Lx, (L^*)^{-1} \xi)}; and finally, the Fourier transform {Tf(x) := \hat f(x)} should correspond to the transformation {(x,\xi) \mapsto (\xi,-x)}.

Based on these examples, one may hope that given any diffeomorphism {\Phi: T^* {\bf R}^n \rightarrow T^* {\bf R}^n} of phase space, one could associate some sort of unitary (or approximately unitary) operator {T_\Phi: L^2({\bf R}^n) \rightarrow L^2({\bf R}^n)}, which (heuristically, at least) pushes the phase space portrait of a function forward by {\Phi}. However, there is an obstruction to doing so, which can be explained as follows. If {T_\Phi} pushes phase space portraits by {\Phi}, and pseudodifferential operators {a(X,D)} multiply phase space portraits by {a}, then this suggests the intertwining relationship

\displaystyle  a(X,D) T_\Phi \approx T_\Phi (a \circ \Phi)(X,D),

and thus {(a \circ \Phi)(X,D)} is approximately conjugate to {a(X,D)}:

\displaystyle  (a \circ \Phi)(X,D) \approx T_\Phi^{-1} a(X,D) T_\Phi. \ \ \ \ \ (2)

The formalisation of this fact in the theory of Fourier integral operators is known as Egorov’s theorem, due to Yu Egorov (and not to be confused with the more widely known theorem of Dmitri Egorov in measure theory).

Applying commutators, we conclude the approximate conjugacy relationship

\displaystyle  \frac{1}{i\hbar} [(a \circ \Phi)(X,D), (b \circ \Phi)(X,D)] \approx T_\Phi^{-1} \frac{1}{i\hbar} [a(X,D), b(X,D)] T_\Phi.

Now, the pseudodifferential calculus (as discussed in this previous post) tells us (heuristically, at least) that

\displaystyle  \frac{1}{i\hbar} [a(X,D), b(X,D)] \approx \{ a, b \}(X,D)

and

\displaystyle  \frac{1}{i\hbar} [(a \circ \Phi)(X,D), (b \circ \Phi)(X,D)] \approx \{ a \circ \Phi, b \circ \Phi \}(X,D)

where {\{,\}} is the Poisson bracket. Comparing this with (2), we are then led to the compatibility condition

\displaystyle  \{ a \circ \Phi, b \circ \Phi \} \approx \{ a, b \} \circ \Phi,

thus {\Phi} needs to preserve (approximately, at least) the Poisson bracket, or equivalently {\Phi} needs to be a symplectomorphism (again, approximately at least).

Now suppose that {\Phi: T^* {\bf R}^n \rightarrow T^* {\bf R}^n} is a symplectomorphism. This is morally equivalent to the graph {\Sigma := \{ (z, \Phi(z)): z \in T^* {\bf R}^n \}} being a Lagrangian submanifold of {T^* {\bf R}^n \times T^* {\bf R}^n} (where we give the second copy of phase space the negative {-\omega} of the usual symplectic form {\omega}, thus yielding {\omega \oplus -\omega} as the full symplectic form on {T^* {\bf R}^n \times T^* {\bf R}^n}; this is another instantiation of the closed graph theorem, as mentioned in this previous post. This graph is known as the canonical relation for the (putative) FIO that is associated to {\Phi}. To understand what it means for this graph to be Lagrangian, we coordinatise {T^* {\bf R}^n \times T^* {\bf R}^n} as {(x,\xi,y,\eta)} suppose temporarily that this graph was (locally, at least) a smooth graph in the {x} and {y} variables, thus

\displaystyle  \Sigma = \{ (x, F(x,y), y, G(x,y)): x, y \in {\bf R}^n \}

for some smooth functions {F, G: {\bf R}^n \rightarrow {\bf R}^n}. A brief computation shows that the Lagrangian property of {\Sigma} is then equivalent to the compatibility conditions

\displaystyle  \frac{\partial F_i}{\partial x_j} = \frac{\partial F_j}{\partial x_i}

\displaystyle  \frac{\partial G_i}{\partial y_j} = \frac{\partial G_j}{\partial y_i}

\displaystyle  \frac{\partial F_i}{\partial y_j} = - \frac{\partial G_j}{\partial x_i}

for {i,j=1,\ldots,n}, where {F_1,\ldots,F_n, G_1,\ldots,G_n} denote the components of {F,G}. Some Fourier analysis (or Hodge theory) lets us solve these equations as

\displaystyle  F_i = -\frac{\partial \phi}{\partial x_i}; \quad G_j = \frac{\partial \phi}{\partial y_j}

for some smooth potential function {\phi: {\bf R}^n \times {\bf R}^n \rightarrow {\bf R}}. Thus, we have parameterised our graph {\Sigma} as

\displaystyle  \Sigma = \{ (x, -\nabla_x \phi(x,y), y, \nabla_y \phi(x,y)): x,y \in {\bf R}^n \} \ \ \ \ \ (3)

so that {\Phi} maps {(x, -\nabla_x \phi(x,y))} to {(y, \nabla_y \phi(x,y))}.

A reasonable candidate for an operator associated to {\Phi} and {\Sigma} in this fashion is the oscillatory integral operator

\displaystyle  Tf(y) := \frac{1}{(2\pi \hbar)^{n/2}} \int_{{\bf R}^n} e^{i \phi(x,y)/\hbar} a(x,y) f(x)\ dx \ \ \ \ \ (4)

for some smooth amplitude function {a} (note that the Fourier transform is the special case when {a=1} and {\phi(x,y)=xy}, which helps explain the genesis of the term “Fourier integral operator”). Indeed, if one computes an inner product {\int_{{\bf R}^n} Tf(y) \overline{g(y)}\ dy} for gaussian wave packets {f, g} of the form (1) and localised in phase space near {(x_0,\xi_0), (y_0,\eta_0)} respectively, then a Taylor expansion of {\phi} around {(x_0,y_0)}, followed by a stationary phase computation, shows (again heuristically, and assuming {\phi} is suitably non-degenerate) that {T} has (3) as its canonical relation. (Furthermore, a refinement of this stationary phase calculation suggests that if {a} is normalised to be the half-density {|\det \nabla_x \nabla_y \phi|^{1/2}}, then {T} should be approximately unitary.) As such, we view (4) as an example of a Fourier integral operator (assuming various smoothness and non-degeneracy hypotheses on the phase {\phi} and amplitude {a} which we do not detail here).

Of course, it may be the case that {\Sigma} is not a graph in the {x,y} coordinates (for instance, the key examples of translation, modulation, and dilation are not of this form), but then it is often a graph in some other pair of coordinates, such as {\xi,y}. In that case one can compose the oscillatory integral construction given above with a Fourier transform, giving another class of FIOs of the form

\displaystyle  Tf(y) := \frac{1}{(2\pi \hbar)^{n/2}} \int_{{\bf R}^n} e^{i \phi(\xi,y)/\hbar} a(\xi,y) \hat f(\xi)\ d\xi. \ \ \ \ \ (5)

This class of FIOs covers many important cases; for instance, the translation, modulation, and dilation operators considered earlier can be written in this form after some Fourier analysis. Another typical example is the half-wave propagator {T := e^{it \sqrt{-\Delta}}} for some time {t \in {\bf R}}, which can be written in the form

\displaystyle  Tf(y) = \frac{1}{(2\pi \hbar)^{n/2}} \int_{{\bf R}^n} e^{i (\xi \cdot y + t |\xi|)/\hbar} a(\xi,y) \hat f(\xi)\ d\xi.

This corresponds to the phase space transformation {(x,\xi) \mapsto (x+t\xi/|\xi|, \xi)}, which can be viewed as the classical propagator associated to the “quantum” propagator {e^{it\sqrt{-\Delta}}}. More generally, propagators for linear Hamiltonian partial differential equations can often be expressed (at least approximately) by Fourier integral operators corresponding to the propagator of the associated classical Hamiltonian flow associated to the symbol of the Hamiltonian operator {H}; this leads to an important mathematical formalisation of the correspondence principle between quantum mechanics and classical mechanics, that is one of the foundations of microlocal analysis and which was extensively developed in Hörmander’s work. (More recently, numerically stable versions of this theory have been developed to allow for rapid and accurate numerical solutions to various linear PDE, for instance through Emmanuel Candés’ theory of curvelets, so the theory that Hörmander built now has some quite significant practical applications in areas such as geology.)

In some cases, the canonical relation {\Sigma} may have some singularities (such as fold singularities) which prevent it from being written as graphs in the previous senses, but the theory for defining FIOs even in these cases, and in developing their calculus, is now well established, in large part due to the foundational work of Hörmander.

Let {n} be a large natural number, and let {M_n} be a matrix drawn from the Gaussian Unitary Ensemble (GUE), by which we mean that {M_n} is a Hermitian matrix whose upper triangular entries are iid complex gaussians with mean zero and variance one, and whose diagonal entries are iid real gaussians with mean zero and variance one (and independent of the upper triangular entries). The eigenvalues {\lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)} are then real and almost surely distinct, and can be viewed as a random point process {\Sigma^{(n)} := \{\lambda_1(M_n),\ldots,\lambda_n(M_n)\}} on the real line. One can then form the {k}-point correlation functions {\rho_k^{(n)}: {\bf R}^k \rightarrow {\bf R}^+} for every {k \geq 0}, which can be defined by duality by requiring

\displaystyle  \mathop{\bf E} \sum_{i_1,\ldots,i_k \hbox{ distinct}} F( \lambda_{i_1}(M_n),\ldots,\lambda_{i_k}(M_n))

\displaystyle  = \int_{{\bf R}^k} \rho_k^{(n)}(x_1,\ldots,x_k) F(x_1,\ldots,x_k)\ dx_1 \ldots dx_k

for any test function {F: {\bf R}^k \rightarrow {\bf R}^+}. For GUE, which is a continuous matrix ensemble, one can also define {\rho_k^{(n)}(x_1,\ldots,x_k)} for distinct {x_1<\ldots<x_k} as the unique quantity such that the probability that there is an eigenvalue in each of the intervals {[x_1,x_1+\epsilon],\ldots,[x_k,x_k+\epsilon]} is {(\rho_k^{(n)}(x_1,\ldots,x_k)+o(1))\epsilon^k} in the limit {\epsilon\rightarrow 0}.

As is well known, the GUE process is a determinantal point process, which means that {k}-point correlation functions can be explicitly computed as

\displaystyle  \rho^{(n)}_k(x_1,\ldots,x_k) = \det( K^{(n)}(x_i,x_j) )_{1 \leq i,j \leq k}

for some kernel {K^{(n)}: {\bf R} \times {\bf R} \rightarrow {\bf C}}; explicitly, one has

\displaystyle  K^{(n)}(x,y) := \sum_{k=0}^{n-1} P_k(x) e^{-x^2/4}P_k(y) e^{-y^2/4}

where {P_0, P_1,\ldots} are the (normalised) Hermite polynomials; see this previous blog post for details.

Using the asymptotics of Hermite polynomials (which then give asymptotics for the kernel {K^{(n)}}), one can take a limit of a (suitably rescaled) sequence of GUE processes to obtain the Dyson sine process, which is a determinantal point process {\Sigma} on the real line with correlation functions

\displaystyle  \rho_k(x_1,\ldots,x_k) = \det( K(x_i,x_j) )_{1 \leq i,j \leq k} \ \ \ \ \ (1)

where {K} is the Dyson sine kernel

\displaystyle  K(x,y) := \frac{\sin(\pi(x-y))}{\pi(x-y)}. \ \ \ \ \ (2)

A bit more precisely, for any fixed bulk energy {-2 < u < 2}, the renormalised point processes {\rho_{sc}(u) \sqrt{n} ( \Sigma^{(n)} - \sqrt{n} u )} converge in distribution in the vague topology to {\Sigma} as {n \rightarrow \infty}, where {\rho_{sc}(u) := \frac{1}{2\pi} (4-u^2)^{1/2}_+} is the semi-circular law density.

On the other hand, an important feature of the GUE process {\Sigma^{(n)} = \{\lambda_1,\ldots,\lambda_n\}} is its stationarity (modulo rescaling) under Dyson Brownian motion

\displaystyle  d\lambda_i = dB_i + \sum_{j \neq i} \frac{dt}{\lambda_i-\lambda_j}

which describes the stochastic evolution of eigenvalues of a Hermitian matrix under independent Brownian motion of its entries, and is discussed in this previous blog post. To cut a long story short, this stationarity tells us that the self-similar {n}-point correlation function

\displaystyle  \rho^{(n)}_n(t,x) := t^{-n/2} \rho^{(n)}_n(x/\sqrt{t})

obeys the Dyson heat equation

\displaystyle  \partial_t \rho^{(n)}_n = \frac{1}{2} \sum_{i=1}^n \partial_{x_i}^2 \rho^{(n)}_n - \sum_{1 \leq i,j \leq n;i\neq j} \partial_{x_i} \frac{\rho^{(n)}_n}{x_i-x_j}

(see Exercise 11 of the previously mentioned blog post). Note that {\rho^{(n)}_n} vanishes to second order whenever two of the {x_i} coincide, so there is no singularity on the right-hand side. Setting {t=1} and using self-similarity, we can rewrite this equation in time-independent form as

\displaystyle  -\frac{1}{2} \sum_{i=1}^n \partial_i (x_i \rho^{(n)}_n) = \frac{1}{2} \sum_{i=1}^n \partial_{x_i}^2 \rho^{(n)}_n - \sum_{1 \leq i,j \leq n;i\neq j} \partial_{x_i} \frac{\rho^{(n)}_n}{x_i-x_j}.

One can then integrate out all but {k} of these variables (after carefully justifying convergence) to obtain a system of equations for the {k}-point correlation functions {\rho^{(n)}_k}:

\displaystyle  -\frac{1}{2} \sum_{i=1}^k \partial_i (x_i \rho^{(n)}_k) = \frac{1}{2} \sum_{i=1}^k \partial_{x_i}^2 \rho^{(n)}_k - \sum_{1 \leq i,j \leq k;i\neq j} \partial_{x_i} \frac{\rho^{(n)}_k}{x_i-x_j}

\displaystyle  - \sum_{i=1}^k \partial_{x_i} \int_{\bf R} \frac{\rho^{(n)}_{k+1}(x_1,\ldots,x_{k+1})}{x_i-x_{k+1}}\ dx_{k+1},

where the integral is interpreted in the principal value case. This system is an example of a BBGKY hierarchy.

If one carefully rescales and takes limits (say at the energy level {u=0}, for simplicity), the left-hand side turns out to rescale to be a lower order term, and one ends up with a hierarchy for the Dyson sine process:

\displaystyle  0 = \frac{1}{2} \sum_{i=1}^k \partial_{x_i}^2 \rho_k - \sum_{1 \leq i,j \leq k;i\neq j} \partial_{x_i} \frac{\rho_k}{x_i-x_j} \ \ \ \ \ (3)

\displaystyle  - \sum_{i=1}^k \partial_{x_i} \int_{\bf R} \frac{\rho_{k+1}(x_1,\ldots,x_{k+1})}{x_i-x_{k+1}}\ dx_{k+1}.

Informally, these equations show that the Dyson sine process {\Sigma = \{ \lambda_i: i \in {\bf Z} \}} is stationary with respect to the infinite Dyson Brownian motion

\displaystyle  d\lambda_i = dB_i + \sum_{j \neq i} \frac{dt}{\lambda_i-\lambda_j}

where {dB_i} are independent Brownian increments, and the sum is interpreted in a suitable principal value sense.

I recently set myself the exercise of deriving the identity (3) directly from the definition (1) of the Dyson sine process, without reference to GUE. This turns out to not be too difficult when done the right way (namely, by modifying the proof of Gaudin’s lemma), although it did take me an entire day of work before I realised this, and I could not find it in the literature (though I suspect that many people in the field have privately performed this exercise in the past). In any case, I am recording the computation here, largely because I really don’t want to have to do it again, but perhaps it will also be of interest to some readers.

Read the rest of this entry »

Let {L: H \rightarrow H} be a self-adjoint operator on a finite-dimensional Hilbert space {H}. The behaviour of this operator can be completely described by the spectral theorem for finite-dimensional self-adjoint operators (i.e. Hermitian matrices, when viewed in coordinates), which provides a sequence {\lambda_1,\ldots,\lambda_n \in {\bf R}} of eigenvalues and an orthonormal basis {e_1,\ldots,e_n} of eigenfunctions such that {L e_i = \lambda_i e_i} for all {i=1,\ldots,n}. In particular, given any function {m: \sigma(L) \rightarrow {\bf C}} on the spectrum {\sigma(L) := \{ \lambda_1,\ldots,\lambda_n\}} of {L}, one can then define the linear operator {m(L): H \rightarrow H} by the formula

\displaystyle  m(L) e_i := m(\lambda_i) e_i,

which then gives a functional calculus, in the sense that the map {m \mapsto m(L)} is a {C^*}-algebra isometric homomorphism from the algebra {BC(\sigma(L) \rightarrow {\bf C})} of bounded continuous functions from {\sigma(L)} to {{\bf C}}, to the algebra {B(H \rightarrow H)} of bounded linear operators on {H}. Thus, for instance, one can define heat operators {e^{-tL}} for {t>0}, Schrödinger operators {e^{itL}} for {t \in {\bf R}}, resolvents {\frac{1}{L-z}} for {z \not \in \sigma(L)}, and (if {L} is positive) wave operators {e^{it\sqrt{L}}} for {t \in {\bf R}}. These will be bounded operators (and, in the case of the Schrödinger and wave operators, unitary operators, and in the case of the heat operators with {L} positive, they will be contractions). Among other things, this functional calculus can then be used to solve differential equations such as the heat equation

\displaystyle  u_t + Lu = 0; \quad u(0) = f \ \ \ \ \ (1)

the Schrödinger equation

\displaystyle  u_t + iLu = 0; \quad u(0) = f \ \ \ \ \ (2)

the wave equation

\displaystyle  u_{tt} + Lu = 0; \quad u(0) = f; \quad u_t(0) = g \ \ \ \ \ (3)

or the Helmholtz equation

\displaystyle  (L-z) u = f. \ \ \ \ \ (4)

The functional calculus can also be associated to a spectral measure. Indeed, for any vectors {f, g \in H}, there is a complex measure {\mu_{f,g}} on {\sigma(L)} with the property that

\displaystyle  \langle m(L) f, g \rangle_H = \int_{\sigma(L)} m(x) d\mu_{f,g}(x);

indeed, one can set {\mu_{f,g}} to be the discrete measure on {\sigma(L)} defined by the formula

\displaystyle  \mu_{f,g}(E) := \sum_{i: \lambda_i \in E} \langle f, e_i \rangle_H \langle e_i, g \rangle_H.

One can also view this complex measure as a coefficient

\displaystyle  \mu_{f,g} = \langle \mu f, g \rangle_H

of a projection-valued measure {\mu} on {\sigma(L)}, defined by setting

\displaystyle  \mu(E) f := \sum_{i: \lambda_i \in E} \langle f, e_i \rangle_H e_i.

Finally, one can view {L} as unitarily equivalent to a multiplication operator {M: f \mapsto g f} on {\ell^2(\{1,\ldots,n\})}, where {g} is the real-valued function {g(i) := \lambda_i}, and the intertwining map {U: \ell^2(\{1,\ldots,n\}) \rightarrow H} is given by

\displaystyle  U ( (c_i)_{i=1}^n ) := \sum_{i=1}^n c_i e_i,

so that {L = U M U^{-1}}.

It is an important fact in analysis that many of these above assertions extend to operators on an infinite-dimensional Hilbert space {H}, so long as one one is careful about what “self-adjoint operator” means; these facts are collectively referred to as the spectral theorem. For instance, it turns out that most of the above claims have analogues for bounded self-adjoint operators {L: H \rightarrow H}. However, in the theory of partial differential equations, one often needs to apply the spectral theorem to unbounded, densely defined linear operators {L: D \rightarrow H}, which (initially, at least), are only defined on a dense subspace {D} of the Hilbert space {H}. A very typical situation arises when {H = L^2(\Omega)} is the square-integrable functions on some domain or manifold {\Omega} (which may have a boundary or be otherwise “incomplete”), and {D = C^\infty_c(\Omega)} are the smooth compactly supported functions on {\Omega}, and {L} is some linear differential operator. It is then of interest to obtain the spectral theorem for such operators, so that one build operators such as {e^{-tL}, e^{itL}, \frac{1}{L-z}, e^{it\sqrt{L}}} or to solve equations such as (1), (2), (3), (4).

In order to do this, some necessary conditions on the densely defined operator {L: D \rightarrow H} must be imposed. The most obvious is that of symmetry, which asserts that

\displaystyle  \langle Lf, g \rangle_H = \langle f, Lg \rangle_H \ \ \ \ \ (5)

for all {f, g \in D}. In some applications, one also wants to impose positive definiteness, which asserts that

\displaystyle  \langle Lf, f \rangle_H \geq 0 \ \ \ \ \ (6)

for all {f \in D}. These hypotheses are sufficient in the case when {L} is bounded, and in particular when {H} is finite dimensional. However, as it turns out, for unbounded operators these conditions are not, by themselves, enough to obtain a good spectral theory. For instance, one consequence of the spectral theorem should be that the resolvents {(L-z)^{-1}} are well-defined for any strictly complex {z}, which by duality implies that the image of {L-z} should be dense in {H}. However, this can fail if one just assumes symmetry, or symmetry and positive definiteness. A well-known example occurs when {H} is the Hilbert space {H := L^2((0,1))}, {D := C^\infty_c((0,1))} is the space of test functions, and {L} is the one-dimensional Laplacian {L := -\frac{d^2}{dx^2}}. Then {L} is symmetric and positive, but the operator {L-k^2} does not have dense image for any complex {k}, since

\displaystyle  \langle (L-\overline{k}^2) f, e^{\overline{k}x} \rangle_H = 0

for all test functions {f \in C^\infty_c((0,1))}, as can be seen from a routine integration by parts. As such, the resolvent map is not everywhere uniquely defined. There is also a lack of uniqueness for the wave, heat, and Schrödinger equations for this operator (note that there are no spatial boundary conditions specified in these equations).

Another example occurs when {H := L^2((0,+\infty))}, {D := C^\infty_c((0,+\infty))}, {L} is the momentum operator {L := i \frac{d}{dx}}. Then the resolvent {(L-z)^{-1}} can be uniquely defined for {z} in the upper half-plane, but not in the lower half-plane, due to the obstruction

\displaystyle  \langle (L-z) f, e^{i \bar{z} x} \rangle_H = 0

for all test functions {f} (note that the function {e^{i\bar{z} x}} lies in {L^2((0,+\infty))} when {z} is in the lower half-plane). For related reasons, the translation operators {e^{itL}} have a problem with either uniqueness or existence (depending on whether {t} is positive or negative), due to the unspecified boundary behaviour at the origin.

The key property that lets one avoid this bad behaviour is that of essential self-adjointness. Once {L} is essentially self-adjoint, then spectral theorem becomes applicable again, leading to all the expected behaviour (e.g. existence and uniqueness for the various PDE given above).

Unfortunately, the concept of essential self-adjointness is defined rather abstractly, and is difficult to verify directly; unlike the symmetry condition (5) or the positive condition (6), it is not a “local” condition that can be easily verified just by testing {L} on various inputs, but is instead a more “global” condition. In practice, to verify this property, one needs to invoke one of a number of a partial converses to the spectral theorem, which roughly speaking asserts that if at least one of the expected consequences of the spectral theorem is true for some symmetric densely defined operator {L}, then {L} is self-adjoint. Examples of “expected consequences” include:

  • Existence of resolvents {(L-z)^{-1}} (or equivalently, dense image for {L-z});
  • Existence of a contractive heat propagator semigroup {e^{tL}} (in the positive case);
  • Existence of a unitary Schrödinger propagator group {e^{itL}};
  • Existence of a unitary wave propagator group {e^{it\sqrt{L}}} (in the positive case);
  • Existence of a “reasonable” functional calculus.
  • Unitary equivalence with a multiplication operator.

Thus, to actually verify essential self-adjointness of a differential operator, one typically has to first solve a PDE (such as the wave, Schrödinger, heat, or Helmholtz equation) by some non-spectral method (e.g. by a contraction mapping argument, or a perturbation argument based on an operator already known to be essentially self-adjoint). Once one can solve one of the PDEs, then one can apply one of the known converse spectral theorems to obtain essential self-adjointness, and then by the forward spectral theorem one can then solve all the other PDEs as well. But there is no getting out of that first step, which requires some input (typically of an ODE, PDE, or geometric nature) that is external to what abstract spectral theory can provide. For instance, if one wants to establish essential self-adjointness of the Laplace-Beltrami operator {L = -\Delta_g} on a smooth Riemannian manifold {(M,g)} (using {C^\infty_c(M)} as the domain space), it turns out (under reasonable regularity hypotheses) that essential self-adjointness is equivalent to geodesic completeness of the manifold, which is a global ODE condition rather than a local one: one needs geodesics to continue indefinitely in order to be able to (unitarily) solve PDEs such as the wave equation, which in turn leads to essential self-adjointness. (Note that the domains {(0,1)} and {(0,+\infty)} in the previous examples were not geodesically complete.) For this reason, essential self-adjointness of a differential operator is sometimes referred to as quantum completeness (with the completeness of the associated Hamilton-Jacobi flow then being the analogous classical completeness).

In these notes, I wanted to record (mostly for my own benefit) the forward and converse spectral theorems, and to verify essential self-adjointness of the Laplace-Beltrami operator on geodesically complete manifolds. This is extremely standard analysis (covered, for instance, in the texts of Reed and Simon), but I wanted to write it down myself to make sure that I really understood this foundational material properly.

Read the rest of this entry »

In the previous set of notes we saw how a representation-theoretic property of groups, namely Kazhdan’s property (T), could be used to demonstrate expansion in Cayley graphs. In this set of notes we discuss a different representation-theoretic property of groups, namely quasirandomness, which is also useful for demonstrating expansion in Cayley graphs, though in a somewhat different way to property (T). For instance, whereas property (T), being qualitative in nature, is only interesting for infinite groups such as {SL_d({\bf Z})} or {SL_d({\bf R})}, and only creates Cayley graphs after passing to a finite quotient, quasirandomness is a quantitative property which is directly applicable to finite groups, and is able to deduce expansion in a Cayley graph, provided that random walks in that graph are known to become sufficiently “flat” in a certain sense.

The definition of quasirandomness is easy enough to state:

Definition 1 (Quasirandom groups) Let {G} be a finite group, and let {D \geq 1}. We say that {G} is {D}-quasirandom if all non-trivial unitary representations {\rho: G \rightarrow U(H)} of {G} have dimension at least {D}. (Recall a representation is trivial if {\rho(g)} is the identity for all {g \in G}.)

Exercise 1 Let {G} be a finite group, and let {D \geq 1}. A unitary representation {\rho: G \rightarrow U(H)} is said to be irreducible if {H} has no {G}-invariant subspaces other than {\{0\}} and {H}. Show that {G} is {D}-quasirandom if and only if every non-trivial irreducible representation of {G} has dimension at least {D}.

Remark 1 The terminology “quasirandom group” was introduced explicitly (though with slightly different notational conventions) by Gowers in 2008 in his detailed study of the concept; the name arises because dense Cayley graphs in quasirandom groups are quasirandom graphs in the sense of Chung, Graham, and Wilson, as we shall see below. This property had already been used implicitly to construct expander graphs by Sarnak and Xue in 1991, and more recently by Gamburd in 2002 and by Bourgain and Gamburd in 2008. One can of course define quasirandomness for more general locally compact groups than the finite ones, but we will only need this concept in the finite case. (A paper of Kunze and Stein from 1960, for instance, exploits the quasirandomness properties of the locally compact group {SL_2({\bf R})} to obtain mixing estimates in that group.)

Quasirandomness behaves fairly well with respect to quotients and short exact sequences:

Exercise 2 Let {0 \rightarrow H \rightarrow G \rightarrow K \rightarrow 0} be a short exact sequence of finite groups {H,G,K}.

  • (i) If {G} is {D}-quasirandom, show that {K} is {D}-quasirandom also. (Equivalently: any quotient of a {D}-quasirandom finite group is again a {D}-quasirandom finite group.)
  • (ii) Conversely, if {H} and {K} are both {D}-quasirandom, show that {G} is {D}-quasirandom also. (In particular, the direct or semidirect product of two {D}-quasirandom finite groups is again a {D}-quasirandom finite group.)

Informally, we will call {G} quasirandom if it is {D}-quasirandom for some “large” {D}, though the precise meaning of “large” will depend on context. For applications to expansion in Cayley graphs, “large” will mean “{D \geq |G|^c} for some constant {c>0} independent of the size of {G}“, but other regimes of {D} are certainly of interest.

The way we have set things up, the trivial group {G = \{1\}} is infinitely quasirandom (i.e. it is {D}-quasirandom for every {D}). This is however a degenerate case and will not be discussed further here. In the non-trivial case, a finite group can only be quasirandom if it is large and has no large subgroups:

Exercise 3 Let {D \geq 1}, and let {G} be a finite {D}-quasirandom group.

  • (i) Show that if {G} is non-trivial, then {|G| \geq D+1}. (Hint: use the mean zero component {\tau\downharpoonright_{\ell^2(G)_0}} of the regular representation {\tau: G \rightarrow U(\ell^2(G))}.) In particular, non-trivial finite groups cannot be infinitely quasirandom.
  • (ii) Show that any proper subgroup {H} of {G} has index {[G:H] \geq D+1}. (Hint: use the mean zero component of the quasiregular representation.)

The following exercise shows that quasirandom groups have to be quite non-abelian, and in particular perfect:

Exercise 4 (Quasirandomness, abelianness, and perfection) Let {G} be a finite group.

  • (i) If {G} is abelian and non-trivial, show that {G} is not {2}-quasirandom. (Hint: use Fourier analysis or the classification of finite abelian groups.)
  • (ii) Show that {G} is {2}-quasirandom if and only if it is perfect, i.e. the commutator group {[G,G]} is equal to {G}. (Equivalently, {G} is {2}-quasirandom if and only if it has no non-trivial abelian quotients.)

Later on we shall see that there is a converse to the above two exercises; any non-trivial perfect finite group with no large subgroups will be quasirandom.

Exercise 5 Let {G} be a finite {D}-quasirandom group. Show that for any subgroup {G'} of {G}, {G'} is {D/[G:G']}-quasirandom, where {[G:G'] := |G|/|G'|} is the index of {G'} in {G}. (Hint: use induced representations.)

Now we give an example of a more quasirandom group.

Lemma 2 (Frobenius lemma) If {F_p} is a field of some prime order {p}, then {SL_2(F_p)} is {\frac{p-1}{2}}-quasirandom.

This should be compared with the cardinality {|SL_2(F_p)|} of the special linear group, which is easily computed to be {(p^2-1) \times p = p^3 - p}.

Proof: We may of course take {p} to be odd. Suppose for contradiction that we have a non-trivial representation {\rho: SL_2(F_p) \rightarrow U_d({\bf C})} on a unitary group of some dimension {d} with {d < \frac{p-1}{2}}. Set {a} to be the group element

\displaystyle a := \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix},

and suppose first that {\rho(a)} is non-trivial. Since {a^p=1}, we have {\rho(a)^p=1}; thus all the eigenvalues of {\rho(a)} are {p^{th}} roots of unity. On the other hand, by conjugating {a} by diagonal matrices in {SL_2(F_p)}, we see that {a} is conjugate to {a^m} (and hence {\rho(a)} conjugate to {\rho(a)^m}) whenever {m} is a quadratic residue mod {p}. As such, the eigenvalues of {\rho(a)} must be permuted by the operation {x \mapsto x^m} for any quadratic residue mod {p}. Since {\rho(a)} has at least one non-trivial eigenvalue, and there are {\frac{p-1}{2}} distinct quadratic residues, we conclude that {\rho(a)} has at least {\frac{p-1}{2}} distinct eigenvalues. But {\rho(a)} is a {d \times d} matrix with {d < \frac{p-1}{2}}, a contradiction. Thus {a} lies in the kernel of {\rho}. By conjugation, we then see that this kernel contains all unipotent matrices. But these matrices generate {SL_2(F_p)} (see exercise below), and so {\rho} is trivial, a contradiction. \Box

Exercise 6 Show that for any prime {p}, the unipotent matrices

\displaystyle \begin{pmatrix} 1 & t \\ 0 & 1 \end{pmatrix}, \begin{pmatrix} 1 & 0 \\ t & 1 \end{pmatrix}

for {t} ranging over {F_p} generate {SL_2(F_p)} as a group.

Exercise 7 Let {G} be a finite group, and let {D \geq 1}. If {G} is generated by a collection {G_1,\ldots,G_k} of {D}-quasirandom subgroups, show that {G} is itself {D}-quasirandom.

Exercise 8 Show that {SL_d(F_p)} is {\frac{p-1}{2}}-quasirandom for any {d \geq 2} and any prime {p}. (This is not sharp; the optimal bound here is {\gg_d p^{d-1}}, which follows from the results of Landazuri and Seitz.)

As a corollary of the above results and Exercise 2, we see that the projective special linear group {PSL_d(F_p)} is also {\frac{p-1}{2}}-quasirandom.

Remark 2 One can ask whether the bound {\frac{p-1}{2}} in Lemma 2 is sharp, assuming of course that {p} is odd. Noting that {SL_2(F_p)} acts linearly on the plane {F_p^2}, we see that it also acts projectively on the projective line {PF_p^1 := (F_p^2 \backslash \{0\}) / F_p^\times}, which has {p+1} elements. Thus {SL_2(F_p)} acts via the quasiregular representation on the {p+1}-dimensional space {\ell^2(PF_p^1)}, and also on the {p}-dimensional subspace {\ell^2(PF_p^1)_0}; this latter representation (known as the Steinberg representation) is irreducible. This shows that the {\frac{p-1}{2}} bound cannot be improved beyond {p}. More generally, given any character {\chi: F_p^\times \rightarrow S^1}, {SL_2(F_p)} acts on the {p+1}-dimensional space {V_\chi} of functions {f \in \ell^2( F_p^2 \backslash \{0\} )} that obey the twisted dilation invariance {f(tx) = \chi(t) f(x)} for all {t \in F_p^\times} and {x \in F_p^2 \backslash \{0\}}; these are known as the principal series representations. When {\chi} is the trivial character, this is the quasiregular representation discussed earlier. For most other characters, this is an irreducible representation, but it turns out that when {\chi} is the quadratic representation (thus taking values in {\{-1,+1\}} while being non-trivial), the principal series representation splits into the direct sum of two {\frac{p+1}{2}}-dimensional representations, which comes very close to matching the bound in Lemma 2. There is a parallel series of representations to the principal series (known as the discrete series) which is more complicated to describe (roughly speaking, one has to embed {F_p} in a quadratic extension {F_{p^2}} and then use a rotated version of the above construction, to change a split torus into a non-split torus), but can generate irreducible representations of dimension {\frac{p-1}{2}}, showing that the bound in Lemma 2 is in fact exactly sharp. These constructions can be generalised to arbitrary finite groups of Lie type using Deligne-Luzstig theory, but this is beyond the scope of this course (and of my own knowledge in the subject).

Exercise 9 Let {p} be an odd prime. Show that for any {n \geq p+2}, the alternating group {A_n} is {p-1}-quasirandom. (Hint: show that all cycles of order {p} in {A_n} are conjugate to each other in {A_n} (and not just in {S_n}); in particular, a cycle is conjugate to its {j^{th}} power for all {j=1,\ldots,p-1}. Also, as {n \geq 5}, {A_n} is simple, and so the cycles of order {p} generate the entire group.)

Remark 3 By using more precise information on the representations of the alternating group (using the theory of Specht modules and Young tableaux), one can show the slightly sharper statement that {A_n} is {n-1}-quasirandom for {n \geq 6} (but is only {3}-quasirandom for {n=5} due to icosahedral symmetry, and {1}-quasirandom for {n \leq 4} due to lack of perfectness). Using Exercise 3 with the index {n} subgroup {A_{n-1}}, we see that the bound {n-1} cannot be improved. Thus, {A_n} (for large {n}) is not as quasirandom as the special linear groups {SL_d(F_p)} (for {p} large and {d} bounded), because in the latter case the quasirandomness is as strong as a power of the size of the group, whereas in the former case it is only logarithmic in size.

If one replaces the alternating group {A_n} with the slightly larger symmetric group {S_n}, then quasirandomness is destroyed (since {S_n}, having the abelian quotient {S_n/A_n}, is not perfect); indeed, {S_n} is {1}-quasirandom and no better.

Remark 4 Thanks to the monumental achievement of the classification of finite simple groups, we know that apart from a finite number (26, to be precise) of sporadic exceptions, all finite simple groups (up to isomorphism) are either a cyclic group {{\bf Z}/p{\bf Z}}, an alternating group {A_n}, or is a finite simple group of Lie type such as {PSL_d(F_p)}. (We will define the concept of a finite simple group of Lie type more precisely in later notes, but suffice to say for now that such groups are constructed from reductive algebraic groups, for instance {PSL_d(F_p)} is constructed from {SL_d} in characteristic {p}.) In the case of finite simple groups {G} of Lie type with bounded rank {r=O(1)}, it is known from the work of Landazuri and Seitz that such groups are {\gg |G|^c}-quasirandom for some {c>0} depending only on the rank. On the other hand, by the previous remark, the large alternating groups do not have this property, and one can show that the finite simple groups of Lie type with large rank also do not have this property. Thus, we see using the classification that if a finite simple group {G} is {|G|^c}-quasirandom for some {c>0} and {|G|} is sufficiently large depending on {c}, then {G} is a finite simple group of Lie type with rank {O_c(1)}. It would be of interest to see if there was an alternate way to establish this fact that did not rely on the classification, as it may lead to an alternate approach to proving the classification (or perhaps a weakened version thereof).

A key reason why quasirandomness is desirable for the purposes of demonstrating expansion is that quasirandom groups happen to be rapidly mixing at large scales, as we shall see below the fold. As such, quasirandomness is an important tool for demonstrating expansion in Cayley graphs, though because expansion is a phenomenon that must hold at all scales, one needs to supplement quasirandomness with some additional input that creates mixing at small or medium scales also before one can deduce expansion. As an example of this technique of combining quasirandomness with mixing at small and medium scales, we present a proof (due to Sarnak-Xue, and simplified by Gamburd) of a weak version of the famous “3/16 theorem” of Selberg on the least non-trivial eigenvalue of the Laplacian on a modular curve, which among other things can be used to construct a family of expander Cayley graphs in {SL_2({\bf Z}/N{\bf Z})} (compare this with the property (T)-based methods in the previous notes, which could construct expander Cayley graphs in {SL_d({\bf Z}/N{\bf Z})} for any fixed {d \geq 3}).

Read the rest of this entry »

A few days ago, I released a preprint entitled “Localisation and compactness properties of the Navier-Stokes global regularity problem“, discussed in this previous blog post.  As it turns out, I was somewhat impatient to finalise the paper and move on to other things, and the original preprint was still somewhat rough in places (contradicting my own advice on this matter), with a number of typos of minor to moderate severity.  But a bit more seriously, I discovered on a further proofreading that there was a subtle error in a component of the argument that I had believed to be routine – namely the persistence of higher regularity for mild solutions.   As a consequence, some of the implications stated in the first version were not exactly correct as stated; but they can be repaired by replacing a “bad” notion of global regularity for a certain class of data with a “good” notion.   I have completed (and proofread) an updated version of the ms, which should appear at the arXiv link of the paper in a day or two (and which I have also placed at this link).  (In the meantime, it is probably best not to read the original ms too carefully, as this could lead to some confusion.)   I’ve also added a new section that shows that, due to this technicality, one can exhibit smooth H^1 initial data to the Navier-Stokes equation for which there are no smooth solutions, which superficially sounds very close to a negative solution to the global regularity problem, but is actually nothing of the sort.

Let me now describe the issue in more detail (and also to explain why I missed it previously).  A standard principle in the theory of evolutionary partial differentiation equations is that regularity in space can be used to imply regularity in time.  To illustrate this, consider a solution u to the supercritical nonlinear wave equation

-\partial_{tt} u + \Delta u = u^7  (1)

for some field u: {\bf R} \times {\bf R}^3 \to {\bf R}.   Suppose one already knew that u had some regularity in space, and in particular the C^0_t C^2_x \cap C^1_t C^1_x norm of u was bounded (thus u and up to two spatial derivatives of u were bounded).  Then, by (1), we see that two time derivatives of u were also bounded, and one then gets the additional regularity of C^2_t C^0_x.

In a similar vein, suppose one initially knew that u had the regularity C^0_t C^3_x \cap C^1_t C^2_x.  Then (1) soon tells us that u also has the regularity C^2_t C^1_x; then, if one differentiates (1) in time to obtain

-\partial_{ttt} u + \Delta \partial_t u = 7 u^6 \partial_t u

one can conclude that u also has the regularity of C^3_t C^0_x.  One can continue this process indefinitely; in particular, if one knew that u \in C^0_t C^\infty_x \cap C^1_t C^\infty_x, then these sorts of manipulations show that u is infinitely smooth in both space and time.

The issue that caught me by surprise is that for the Navier-Stokes equations

\partial_t u + (u \cdot \nabla) u =\Delta u -\nabla p  (2)

\nabla \cdot u = 0

(setting the forcing term f equal to zero for simplicity), infinite regularity in space does not automatically imply infinite regularity in time, even if one assumes the initial data lies in a standard function space such as the Sobolev space H^1_x({\bf R}^3).  The problem lies with the pressure term p, which is recovered from the velocity via the elliptic equation

\Delta p = -\nabla^2 \cdot (u \otimes u) (3)

that can be obtained by taking the divergence of (2).   This equation is solved by a non-local integral operator:

\displaystyle p(t,x) = \int_{{\bf R}^3} \frac{\nabla^2 \cdot (u \otimes u)(t,y)}{4\pi |x-y|}\ dy.

If, say, u lies in H^1_x({\bf R}^3), then there is no difficulty establishing a bound on p in terms of u (for instance, one can use singular integral theory and Sobolev embedding to place p in L^3_x({\bf R}^3).  However, one runs into difficulty when trying to compute time derivatives of p.  Differentiating (3) once, one gets

\Delta \partial_t p = -2\nabla^2 \cdot (u \otimes \partial_t u).

At the regularity of H^1, one can still (barely) control this quantity by using (2) to expand out \partial_t u and using some integration by parts.  But when one wishes to compute a second time derivative of the pressure, one obtains (after integration by parts) an expansion of the form

\Delta \partial_{tt} p = -4\nabla^2 \cdot (\Delta u \otimes \Delta u) + \ldots

and now there is not enough regularity on u available to get any control on \partial_{tt} p, even if one assumes that u is smooth.   Indeed, following this observation, I was able to show that given generic smooth H^1 data, the pressure p will instantaneously fail to be C^2 in time, and thence (by (2)) the velocity will instantaneously fail to be C^3 in time.  (Switching to the vorticity formulation buys one further degree of time differentiability, but does not fully eliminate the problem; the vorticity \omega will fail to be C^4 in time.  Switching to material coordinates seems to makes things very slightly better, but I believe there is still a breakdown of time regularity in these coordinates also.)

For later times t>0 (and assuming homogeneous data f=0 for simplicity), this issue no longer arises, because of the instantaneous smoothing effect of the Navier-Stokes flow, which for instance will upgrade H^1_x regularity to H^\infty_x regularity instantaneously.  It is only the initial time at which some time irregularity can occur.

This breakdown of regularity does not actually impact the original formulation of the Clay Millennium Prize problem, though, because in that problem the initial velocity is required to be Schwartz class (so all derivatives are rapidly decreasing).  In this class, the regularity theory works as expected; if one has a solution which already has some reasonable regularity (e.g. a mild H^1 solution) and the data is Schwartz, then the solution will be smooth in spacetime.   (Another class where things work as expected is when the vorticity is Schwartz; in such cases, the solution remains smooth in both space and time (for short times, at least), and the Schwartz nature of the vorticity is preserved (because the vorticity is subject to fewer non-local effects than the velocity, as it is not directly affected by the pressure).)

This issue means that one of the implications in the original paper (roughly speaking, that global regularity for Schwartz data implies global regularity for smooth H^1 data) is not correct as stated.  But this can be fixed by weakening the notion of global regularity in the latter setting, by limiting the amount of time differentiability available at the initial time.  More precisely, call a solution u: [0,T] \times {\bf R}^3 \to {\bf R}^3 and p: [0,T] \times {\bf R}^3 \to {\bf R} almost smooth if

  • u and p are smooth on the half-open slab (0,T] \times {\bf R}^3; and
  • For every k \geq 0, \nabla^k_x u, \nabla^k_x p, \nabla^x_u \partial_t u exist and are continuous on the full slab [0,T] \times {\bf R}^3.

Thus, an almost smooth solution is the same concept as a smooth solution, except that at time zero, the velocity field is only C^1_t C^\infty_x, and the pressure field is only C^0_t C^\infty_x.  This is still enough regularity to interpret the Navier-Stokes equation (2) in a classical manner, but falls slightly short of full smoothness.

(I had already introduced this notion of almost smoothness in the more general setting of smooth finite energy solutions in the first draft of this paper, but had failed to realise that it was also necessary in the smooth H^1 setting also.)

One can now “fix” the global regularity conjectures for Navier-Stokes in the smooth H^1 or smooth finite energy setting by requiring the solutions to merely be almost smooth instead of smooth.  Once one does so, the results in my paper then work as before: roughly speaking, if one knows that Schwartz data produces smooth solutions, one can conclude that smooth H^1 or smooth finite energy data produces almost smooth solutions (and the paper now contains counterexamples to show that one does not always have smooth solutions in this category).

The diagram of implications between conjectures has been adjusted to reflect this issue, and now reads as follows:

Archives

RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.
Follow

Get every new post delivered to your Inbox.

Join 6,029 other followers