You are currently browsing the monthly archive for March 2014.

Let ${f: {\bf R}^3 \rightarrow {\bf R}}$ be an irreducible polynomial in three variables. As ${{\bf R}}$ is not algebraically closed, the zero set ${Z_{\bf R}(f) = \{ x \in{\bf R}^3: f(x)=0\}}$ can split into various components of dimension between ${0}$ and ${2}$. For instance, if ${f(x_1,x_2,x_3) = x_1^2+x_2^2}$, the zero set ${Z_{\bf R}(f)}$ is a line; more interestingly, if ${f(x_1,x_2,x_3) = x_3^2 + x_2^2 - x_2^3}$, then ${Z_{\bf R}(f)}$ is the union of a line and a surface (or the product of an acnodal cubic curve with a line). We will assume that the ${2}$-dimensional component ${Z_{{\bf R},2}(f)}$ is non-empty, thus defining a real surface in ${{\bf R}^3}$. In particular, this hypothesis implies that ${f}$ is not just irreducible over ${{\bf R}}$, but is in fact absolutely irreducible (i.e. irreducible over ${{\bf C}}$), since otherwise one could use the complex factorisation of ${f}$ to contain ${Z_{\bf R}(f)}$ inside the intersection ${{\bf Z}_{\bf C}(g) \cap {\bf Z}_{\bf C}(\bar{g})}$ of the complex zero locus of complex polynomial ${g}$ and its complex conjugate, with ${g,\bar{g}}$ having no common factor, forcing ${Z_{\bf R}(f)}$ to be at most one-dimensional. (For instance, in the case ${f(x_1,x_2,x_3)=x_1^2+x_2^2}$, one can take ${g(z_1,z_2,z_3) = z_1 + i z_2}$.) Among other things, this makes ${{\bf Z}_{{\bf R},2}(f)}$ a Zariski-dense subset of ${{\bf Z}_{\bf C}(f)}$, thus any polynomial identity which holds true at every point of ${{\bf Z}_{{\bf R},2}(f)}$, also holds true on all of ${{\bf Z}_{\bf C}(f)}$. This allows us to easily use tools from algebraic geometry in this real setting, even though the reals are not quite algebraically closed.

The surface ${Z_{{\bf R},2}(f)}$ is said to be ruled if, for a Zariski open dense set of points ${x \in Z_{{\bf R},2}(f)}$, there exists a line ${l_x = \{ x+tv_x: t \in {\bf R} \}}$ through ${x}$ for some non-zero ${v_x \in {\bf R}^3}$ which is completely contained in ${Z_{{\bf R},2}(f)}$, thus

$\displaystyle f(x+tv_x)=0$

for all ${t \in {\bf R}}$. Also, a point ${x \in {\bf Z}_{{\bf R},2}(f)}$ is said to be a flecnode if there exists a line ${l_x = \{ x+tv_x: t \in {\bf R}\}}$ through ${x}$ for some non-zero ${v_x \in {\bf R}^3}$ which is tangent to ${Z_{{\bf R},2}(f)}$ to third order, in the sense that

$\displaystyle f(x+tv_x)=O(t^4)$

as ${t \rightarrow 0}$, or equivalently that

$\displaystyle \frac{d^j}{dt^j} f(x+tv_x)|_{t=0} = 0 \ \ \ \ \ (1)$

for ${j=0,1,2,3}$. Clearly, if ${Z_{{\bf R},2}(f)}$ is a ruled surface, then a Zariski open dense set of points on ${Z_{{\bf R},2}}$ are a flecnode. We then have the remarkable theorem (discovered first by Monge, and then later by Cayley and Salmon) asserting the converse:

Theorem 1 (Monge-Cayley-Salmon theorem) Let ${f: {\bf R}^3 \rightarrow {\bf R}}$ be an irreducible polynomial with ${{\bf Z}_{{\bf R},2}}$ non-empty. Suppose that a Zariski dense set of points in ${Z_{{\bf R},2}(f)}$ are flecnodes. Then ${Z_{{\bf R},2}(f)}$ is a ruled surface.

Among other things, this theorem was used in the celebrated result of Guth and Katz that almost solved the Erdos distance problem in two dimensions, as discussed in this previous blog post. Vanishing to third order is necessary: observe that in a surface of negative curvature, such as the saddle ${\{ (x_1,x_2,x_3): x_3 = x_1^2 - x_2^2 \}}$, every point on the surface is tangent to second order to a line (the line in the direction for which the second fundamental form vanishes). This surface happens to be ruled, but a generic perturbation of this surface (e.g. ${x_3 = x_1^2 - x_2^2 + x_2^4}$) will no longer be ruled, although it is still negative curvature near the origin.

The original proof of the Monge-Cayley-Salmon theorem is not easily accessible and not written in modern language. A modern proof of this theorem (together with substantial generalisations, for instance to higher dimensions) is given by Landsberg; the proof uses the machinery of modern algebraic geometry. The purpose of this post is to record an alternate proof of the Monge-Cayley-Salmon theorem based on classical differential geometry (in particular, the notion of torsion of a curve) and basic ODE methods (in particular, Gronwall’s inequality and the Picard existence theorem). The idea is to “integrate” the lines ${l_x}$ indicated by the flecnode to produce smooth curves ${\gamma}$ on the surface ${{\bf Z}_{{\bf R},2}}$; one then uses the vanishing (1) and some basic calculus to conclude that these curves have zero torsion and are thus planar curves. Some further manipulation using (1) (now just to second order instead of third) then shows that these curves are in fact straight lines, giving the ruling on the surface.

Update: Janos Kollar has informed me that the above theorem was essentially known to Monge in 1809; see his recent arXiv note for more details.

I thank Larry Guth and Micha Sharir for conversations leading to this post.

A core foundation of the subject now known as arithmetic combinatorics (and particularly the subfield of additive combinatorics) are the elementary sum set estimates (sometimes known as “Ruzsa calculus”) that relate the cardinality of various sum sets

$\displaystyle A+B := \{ a+b: a \in A, b \in B \}$

and difference sets

$\displaystyle A-B := \{ a-b: a \in A, b \in B \},$

as well as iterated sumsets such as ${3A=A+A+A}$, ${2A-2A=A+A-A-A}$, and so forth. Here, ${A, B}$ are finite non-empty subsets of some additive group ${G = (G,+)}$ (classically one took ${G={\bf Z}}$ or ${G={\bf R}}$, but nowadays one usually considers more general additive groups). Some basic estimates in this vein are the following:

Lemma 1 (Ruzsa covering lemma) Let ${A, B}$ be finite non-empty subsets of ${G}$. Then ${A}$ may be covered by at most ${\frac{|A+B|}{|B|}}$ translates of ${B-B}$.

Proof: Consider a maximal set of disjoint translates ${a+B}$ of ${B}$ by elements ${a \in A}$. These translates have cardinality ${|B|}$, are disjoint, and lie in ${A+B}$, so there are at most ${\frac{|A+B|}{|B|}}$ of them. By maximality, for any ${a' \in A}$, ${a'+B}$ must intersect at least one of the selected ${a+B}$, thus ${a' \in a+B-B}$, and the claim follows. $\Box$

Lemma 2 (Ruzsa triangle inequality) Let ${A,B,C}$ be finite non-empty subsets of ${G}$. Then ${|A-C| \leq \frac{|A-B| |B-C|}{|B|}}$.

Proof: Consider the addition map ${+: (x,y) \mapsto x+y}$ from ${(A-B) \times (B-C)}$ to ${G}$. Every element ${a-c}$ of ${A - C}$ has a preimage ${\{ (x,y) \in (A-B) \times (B-C)\}}$ of this map of cardinality at least ${|B|}$, thanks to the obvious identity ${a-c = (a-b) + (b-c)}$ for each ${b \in B}$. Since ${(A-B) \times (B-C)}$ has cardinality ${|A-B| |B-C|}$, the claim follows. $\Box$

Such estimates (which are covered, incidentally, in Section 2 of my book with Van Vu) are particularly useful for controlling finite sets ${A}$ of small doubling, in the sense that ${|A+A| \leq K|A|}$ for some bounded ${K}$. (There are deeper theorems, most notably Freiman’s theorem, which give more control than what elementary Ruzsa calculus does, however the known bounds in the latter theorem are worse than polynomial in ${K}$ (although it is conjectured otherwise), whereas the elementary estimates are almost all polynomial in ${K}$.)

However, there are some settings in which the standard sum set estimates are not quite applicable. One such setting is the continuous setting, where one is dealing with bounded open sets in an additive Lie group (e.g. ${{\bf R}^n}$ or a torus ${({\bf R}/{\bf Z})^n}$) rather than a finite setting. Here, one can largely replicate the discrete sum set estimates by working with a Haar measure in place of cardinality; this is the approach taken for instance in this paper of mine. However, there is another setting, which one might dub the “discretised” setting (as opposed to the “discrete” setting or “continuous” setting), in which the sets ${A}$ remain finite (or at least discretisable to be finite), but for which there is a certain amount of “roundoff error” coming from the discretisation. As a typical example (working now in a non-commutative multiplicative setting rather than an additive one), consider the orthogonal group ${O_n({\bf R})}$ of orthogonal ${n \times n}$ matrices, and let ${A}$ be the matrices obtained by starting with all of the orthogonal matrice in ${O_n({\bf R})}$ and rounding each coefficient of each matrix in this set to the nearest multiple of ${\epsilon}$, for some small ${\epsilon>0}$. This forms a finite set (whose cardinality grows as ${\epsilon\rightarrow 0}$ like a certain negative power of ${\epsilon}$). In the limit ${\epsilon \rightarrow 0}$, the set ${A}$ is not a set of small doubling in the discrete sense. However, ${A \cdot A}$ is still close to ${A}$ in a metric sense, being contained in the ${O_n(\epsilon)}$-neighbourhood of ${A}$. Another key example comes from graphs ${\Gamma := \{ (x, f(x)): x \in G \}}$ of maps ${f: A \rightarrow H}$ from a subset ${A}$ of one additive group ${G = (G,+)}$ to another ${H = (H,+)}$. If ${f}$ is “approximately additive” in the sense that for all ${x,y \in G}$, ${f(x+y)}$ is close to ${f(x)+f(y)}$ in some metric, then ${\Gamma}$ might not have small doubling in the discrete sense (because ${f(x+y)-f(x)-f(y)}$ could take a large number of values), but could be considered a set of small doubling in a discretised sense.

One would like to have a sum set (or product set) theory that can handle these cases, particularly in “high-dimensional” settings in which the standard methods of passing back and forth between continuous, discrete, or discretised settings behave poorly from a quantitative point of view due to the exponentially large doubling constant of balls. One way to do this is to impose a translation invariant metric ${d}$ on the underlying group ${G = (G,+)}$ (reverting back to additive notation), and replace the notion of cardinality by that of metric entropy. There are a number of almost equivalent ways to define this concept:

Definition 3 Let ${(X,d)}$ be a metric space, let ${E}$ be a subset of ${X}$, and let ${r>0}$ be a radius.

• The packing number ${N^{pack}_r(E)}$ is the largest number of points ${x_1,\dots,x_n}$ one can pack inside ${E}$ such that the balls ${B(x_1,r),\dots,B(x_n,r)}$ are disjoint.
• The internal covering number ${N^{int}_r(E)}$ is the fewest number of points ${x_1,\dots,x_n \in E}$ such that the balls ${B(x_1,r),\dots,B(x_n,r)}$ cover ${E}$.
• The external covering number ${N^{ext}_r(E)}$ is the fewest number of points ${x_1,\dots,x_n \in X}$ such that the balls ${B(x_1,r),\dots,B(x_n,r)}$ cover ${E}$.
• The metric entropy ${N^{ent}_r(E)}$ is the largest number of points ${x_1,\dots,x_n}$ one can find in ${E}$ that are ${r}$-separated, thus ${d(x_i,x_j) \geq r}$ for all ${i \neq j}$.

It is an easy exercise to verify the inequalities

$\displaystyle N^{ent}_{2r}(E) \leq N^{pack}_r(E) \leq N^{ext}_r(E) \leq N^{int}_r(E) \leq N^{ent}_r(E)$

for any ${r>0}$, and that ${N^*_r(E)}$ is non-increasing in ${r}$ and non-decreasing in ${E}$ for the three choices ${* = pack,ext,ent}$ (but monotonicity in ${E}$ can fail for ${*=int}$!). It turns out that the external covering number ${N^{ent}_r(E)}$ is slightly more convenient than the other notions of metric entropy, so we will abbreviate ${N_r(E) = N^{ent}_r(E)}$. The cardinality ${|E|}$ can be viewed as the limit of the entropies ${N^*_r(E)}$ as ${r \rightarrow 0}$.

If we have the bounded doubling property that ${B(0,2r)}$ is covered by ${O(1)}$ translates of ${B(0,r)}$ for each ${r>0}$, and one has a Haar measure ${m}$ on ${G}$ which assigns a positive finite mass to each ball, then any of the above entropies ${N^*_r(E)}$ is comparable to ${m( E + B(0,r) ) / m(B(0,r))}$, as can be seen by simple volume packing arguments. Thus in the bounded doubling setting one can usually use the measure-theoretic sum set theory to derive entropy-theoretic sumset bounds (see e.g. this paper of mine for an example of this). However, it turns out that even in the absence of bounded doubling, one still has an entropy analogue of most of the elementary sum set theory, except that one has to accept some degradation in the radius parameter ${r}$ by some absolute constant. Such losses can be acceptable in applications in which the underlying sets ${A}$ are largely “transverse” to the balls ${B(0,r)}$, so that the ${N_r}$-entropy of ${A}$ is largely independent of ${A}$; this is a situation which arises in particular in the case of graphs ${\Gamma = \{ (x,f(x)): x \in G \}}$ discussed above, if one works with “vertical” metrics whose balls extend primarily in the vertical direction. (I hope to present a specific application of this type here in the near future.)

Henceforth we work in an additive group ${G}$ equipped with a translation-invariant metric ${d}$. (One can also generalise things slightly by allowing the metric to attain the values ${0}$ or ${+\infty}$, without changing much of the analysis below.) By the Heine-Borel theorem, any precompact set ${E}$ will have finite entropy ${N_r(E)}$ for any ${r>0}$. We now have analogues of the two basic Ruzsa lemmas above:

Lemma 4 (Ruzsa covering lemma) Let ${A, B}$ be precompact non-empty subsets of ${G}$, and let ${r>0}$. Then ${A}$ may be covered by at most ${\frac{N_r(A+B)}{N_r(B)}}$ translates of ${B-B+B(0,2r)}$.

Proof: Let ${a_1,\dots,a_n \in A}$ be a maximal set of points such that the sets ${a_i + B + B(0,r)}$ are all disjoint. Then the sets ${a_i+B}$ are disjoint in ${A+B}$ and have entropy ${N_r(a_i+B)=N_r(B)}$, and furthermore any ball of radius ${r}$ can intersect at most one of the ${a_i+B}$. We conclude that ${N_r(A+B) \geq n N_r(B)}$, so ${n \leq \frac{N_r(A+B)}{N_r(B)}}$. If ${a \in A}$, then ${a+B+B(0,r)}$ must intersect one of the ${a_i + B + B(0,r)}$, so ${a \in a_i + B-B + B(0,2r)}$, and the claim follows. $\Box$

Lemma 5 (Ruzsa triangle inequality) Let ${A,B,C}$ be precompact non-empty subsets of ${G}$, and let ${r>0}$. Then ${N_{4r}(A-C) \leq \frac{N_r(A-B) N_r(B-C)}{N_r(B)}}$.

Proof: Consider the addition map ${+: (x,y) \mapsto x+y}$ from ${(A-B) \times (B-C)}$ to ${G}$. The domain ${(A-B) \times (B-C)}$ may be covered by ${N_r(A-B) N_r(B-C)}$ product balls ${B(x,r) \times B(y,r)}$. Every element ${a-c}$ of ${A - C}$ has a preimage ${\{ (x,y) \in (A-B) \times (B-C)\}}$ of this map which projects to a translate of ${B}$, and thus must meet at least ${N_r(B)}$ of these product balls. However, if two elements of ${A-C}$ are separated by a distance of at least ${4r}$, then no product ball can intersect both preimages. We thus see that ${N_{4r}^{ent}(A-C) \leq \frac{N_r(A-B) N_r(B-C)}{N_r(A-C)}}$, and the claim follows. $\Box$

Below the fold we will record some further metric entropy analogues of sum set estimates (basically redoing much of Chapter 2 of my book with Van Vu). Unfortunately there does not seem to be a direct way to abstractly deduce metric entropy results from their sum set analogues (basically due to the failure of a certain strong version of Freiman’s theorem, as discussed in this previous post); nevertheless, the proofs of the discrete arguments are elementary enough that they can be modified with a small amount of effort to handle the entropy case. (In fact, there should be a very general model-theoretic framework in which both the discrete and entropy arguments can be processed in a unified manner; see this paper of Hrushovski for one such framework.)

It is also likely that many of the arguments here extend to the non-commutative setting, but for simplicity we will not pursue such generalisations here.

As in the previous post, all computations here are at the formal level only.

In the previous blog post, the Euler equations for inviscid incompressible fluid flow were interpreted in a Lagrangian fashion, and then Noether’s theorem invoked to derive the known conservation laws for these equations. In a bit more detail: starting with Lagrangian space ${{\cal L} = ({\bf R}^n, \hbox{vol})}$ and Eulerian space ${{\cal E} = ({\bf R}^n, \eta, \hbox{vol})}$, we let ${M}$ be the space of volume-preserving, orientation-preserving maps ${\Phi: {\cal L} \rightarrow {\cal E}}$ from Lagrangian space to Eulerian space. Given a curve ${\Phi: {\bf R} \rightarrow M}$, we can define the Lagrangian velocity field ${\dot \Phi: {\bf R} \times {\cal L} \rightarrow T{\cal E}}$ as the time derivative of ${\Phi}$, and the Eulerian velocity field ${u := \dot \Phi \circ \Phi^{-1}: {\bf R} \times {\cal E} \rightarrow T{\cal E}}$. The volume-preserving nature of ${\Phi}$ ensures that ${u}$ is a divergence-free vector field:

$\displaystyle \nabla \cdot u = 0. \ \ \ \ \ (1)$

If we formally define the functional

$\displaystyle J[\Phi] := \frac{1}{2} \int_{\bf R} \int_{{\cal E}} |u(t,x)|^2\ dx dt = \frac{1}{2} \int_R \int_{{\cal L}} |\dot \Phi(t,x)|^2\ dx dt$

then one can show that the critical points of this functional (with appropriate boundary conditions) obey the Euler equations

$\displaystyle [\partial_t + u \cdot \nabla] u = - \nabla p$

$\displaystyle \nabla \cdot u = 0$

for some pressure field ${p: {\bf R} \times {\cal E} \rightarrow {\bf R}}$. As discussed in the previous post, the time translation symmetry of this functional yields conservation of the Hamiltonian

$\displaystyle \frac{1}{2} \int_{{\cal E}} |u(t,x)|^2\ dx = \frac{1}{2} \int_{{\cal L}} |\dot \Phi(t,x)|^2\ dx;$

the rigid motion symmetries of Eulerian space give conservation of the total momentum

$\displaystyle \int_{{\cal E}} u(t,x)\ dx$

and total angular momentum

$\displaystyle \int_{{\cal E}} x \wedge u(t,x)\ dx;$

and the diffeomorphism symmetries of Lagrangian space give conservation of circulation

$\displaystyle \int_{\Phi(\gamma)} u^*$

for any closed loop ${\gamma}$ in ${{\cal L}}$, or equivalently pointwise conservation of the Lagrangian vorticity ${\Phi^* \omega = \Phi^* du^*}$, where ${u^*}$ is the ${1}$-form associated with the vector field ${u}$ using the Euclidean metric ${\eta}$ on ${{\cal E}}$, with ${\Phi^*}$ denoting pullback by ${\Phi}$.

It turns out that one can generalise the above calculations. Given any self-adjoint operator ${A}$ on divergence-free vector fields ${u: {\cal E} \rightarrow {\bf R}}$, we can define the functional

$\displaystyle J_A[\Phi] := \frac{1}{2} \int_{\bf R} \int_{{\cal E}} u(t,x) \cdot A u(t,x)\ dx dt;$

as we shall see below the fold, critical points of this functional (with appropriate boundary conditions) obey the generalised Euler equations

$\displaystyle [\partial_t + u \cdot \nabla] Au + (\nabla u) \cdot Au= - \nabla \tilde p \ \ \ \ \ (2)$

$\displaystyle \nabla \cdot u = 0$

for some pressure field ${\tilde p: {\bf R} \times {\cal E} \rightarrow {\bf R}}$, where ${(\nabla u) \cdot Au}$ in coordinates is ${\partial_i u_j Au_j}$ with the usual summation conventions. (When ${A=1}$, ${(\nabla u) \cdot Au = \nabla(\frac{1}{2} |u|^2)}$, and this term can be absorbed into the pressure ${\tilde p}$, and we recover the usual Euler equations.) Time translation symmetry then gives conservation of the Hamiltonian

$\displaystyle \frac{1}{2} \int_{{\cal E}} u(t,x) \cdot A u(t,x)\ dx.$

If the operator ${A}$ commutes with rigid motions on ${{\cal E}}$, then we have conservation of total momentum

$\displaystyle \int_{{\cal E}} Au(t,x)\ dx$

and total angular momentum

$\displaystyle \int_{{\cal E}} x \wedge Au(t,x)\ dx,$

and the diffeomorphism symmetries of Lagrangian space give conservation of circulation

$\displaystyle \int_{\Phi(\gamma)} (Au)^*$

or pointwise conservation of the Lagrangian vorticity ${\Phi^* \theta := \Phi^* d(Au)^*}$. These applications of Noether’s theorem proceed exactly as the previous post; we leave the details to the interested reader.

One particular special case of interest arises in two dimensions ${n=2}$, when ${A}$ is the inverse derivative ${A = |\nabla|^{-1} = (-\Delta)^{-1/2}}$. The vorticity ${\theta = d(Au)^*}$ is a ${2}$-form, which in the two-dimensional setting may be identified with a scalar. In coordinates, if we write ${u = (u_1,u_2)}$, then

$\displaystyle \theta = \partial_{x_1} |\nabla|^{-1} u_2 - \partial_{x_2} |\nabla|^{-1} u_1.$

Since ${u}$ is also divergence-free, we may therefore write

$\displaystyle u = (- \partial_{x_2} \psi, \partial_{x_1} \psi )$

where the stream function ${\psi}$ is given by the formula

$\displaystyle \psi = |\nabla|^{-1} \theta.$

If we take the curl of the generalised Euler equation (2), we obtain (after some computation) the surface quasi-geostrophic equation

$\displaystyle [\partial_t + u \cdot \nabla] \theta = 0 \ \ \ \ \ (3)$

$\displaystyle u = (-\partial_{x_2} |\nabla|^{-1} \theta, \partial_{x_1} |\nabla|^{-1} \theta).$

This equation has strong analogies with the three-dimensional incompressible Euler equations, and can be viewed as a simplified model for that system; see this paper of Constantin, Majda, and Tabak for details.

Now we can specialise the general conservation laws derived previously to this setting. The conserved Hamiltonian is

$\displaystyle \frac{1}{2} \int_{{\bf R}^2} u\cdot |\nabla|^{-1} u\ dx = \frac{1}{2} \int_{{\bf R}^2} \theta \psi\ dx = \frac{1}{2} \int_{{\bf R}^2} \theta |\nabla|^{-1} \theta\ dx$

(a law previously observed for this equation in the abovementioned paper of Constantin, Majda, and Tabak). As ${A}$ commutes with rigid motions, we also have (formally, at least) conservation of momentum

$\displaystyle \int_{{\bf R}^2} Au\ dx$

(which up to trivial transformations is also expressible in impulse form as ${\int_{{\bf R}^2} \theta x\ dx}$, after integration by parts), and conservation of angular momentum

$\displaystyle \int_{{\bf R}^2} x \wedge Au\ dx$

(which up to trivial transformations is ${\int_{{\bf R}^2} \theta |x|^2\ dx}$). Finally, diffeomorphism invariance gives pointwise conservation of Lagrangian vorticity ${\Phi^* \theta}$, thus ${\theta}$ is transported by the flow (which is also evident from (3). In particular, all integrals of the form ${\int F(\theta)\ dx}$ for a fixed function ${F}$ are conserved by the flow.

Throughout this post, we will work only at the formal level of analysis, ignoring issues of convergence of integrals, justifying differentiation under the integral sign, and so forth. (Rigorous justification of the conservation laws and other identities arising from the formal manipulations below can usually be established in an a posteriori fashion once the identities are in hand, without the need to rigorously justify the manipulations used to come up with these identities).

It is a remarkable fact in the theory of differential equations that many of the ordinary and partial differential equations that are of interest (particularly in geometric PDE, or PDE arising from mathematical physics) admit a variational formulation; thus, a collection ${\Phi: \Omega \rightarrow M}$ of one or more fields on a domain ${\Omega}$ taking values in a space ${M}$ will solve the differential equation of interest if and only if ${\Phi}$ is a critical point to the functional

$\displaystyle J[\Phi] := \int_\Omega L( x, \Phi(x), D\Phi(x) )\ dx \ \ \ \ \ (1)$

involving the fields ${\Phi}$ and their first derivatives ${D\Phi}$, where the Lagrangian ${L: \Sigma \rightarrow {\bf R}}$ is a function on the vector bundle ${\Sigma}$ over ${\Omega \times M}$ consisting of triples ${(x, q, \dot q)}$ with ${x \in \Omega}$, ${q \in M}$, and ${\dot q: T_x \Omega \rightarrow T_q M}$ a linear transformation; we also usually keep the boundary data of ${\Phi}$ fixed in case ${\Omega}$ has a non-trivial boundary, although we will ignore these issues here. (We also ignore the possibility of having additional constraints imposed on ${\Phi}$ and ${D\Phi}$, which require the machinery of Lagrange multipliers to deal with, but which will only serve as a distraction for the current discussion.) It is common to use local coordinates to parameterise ${\Omega}$ as ${{\bf R}^d}$ and ${M}$ as ${{\bf R}^n}$, in which case ${\Sigma}$ can be viewed locally as a function on ${{\bf R}^d \times {\bf R}^n \times {\bf R}^{dn}}$.

Example 1 (Geodesic flow) Take ${\Omega = [0,1]}$ and ${M = (M,g)}$ to be a Riemannian manifold, which we will write locally in coordinates as ${{\bf R}^n}$ with metric ${g_{ij}(q)}$ for ${i,j=1,\dots,n}$. A geodesic ${\gamma: [0,1] \rightarrow M}$ is then a critical point (keeping ${\gamma(0),\gamma(1)}$ fixed) of the energy functional

$\displaystyle J[\gamma] := \frac{1}{2} \int_0^1 g_{\gamma(t)}( D\gamma(t), D\gamma(t) )\ dt$

or in coordinates (ignoring coordinate patch issues, and using the usual summation conventions)

$\displaystyle J[\gamma] = \frac{1}{2} \int_0^1 g_{ij}(\gamma(t)) \dot \gamma^i(t) \dot \gamma^j(t)\ dt.$

As discussed in this previous post, both the Euler equations for rigid body motion, and the Euler equations for incompressible inviscid flow, can be interpreted as geodesic flow (though in the latter case, one has to work really formally, as the manifold ${M}$ is now infinite dimensional).

More generally, if ${\Omega = (\Omega,h)}$ is itself a Riemannian manifold, which we write locally in coordinates as ${{\bf R}^d}$ with metric ${h_{ab}(x)}$ for ${a,b=1,\dots,d}$, then a harmonic map ${\Phi: \Omega \rightarrow M}$ is a critical point of the energy functional

$\displaystyle J[\Phi] := \frac{1}{2} \int_\Omega h(x) \otimes g_{\gamma(x)}( D\gamma(x), D\gamma(x) )\ dh(x)$

or in coordinates (again ignoring coordinate patch issues)

$\displaystyle J[\Phi] = \frac{1}{2} \int_{{\bf R}^d} h_{ab}(x) g_{ij}(\Phi(x)) (\partial_a \Phi^i(x)) (\partial_b \Phi^j(x))\ \sqrt{\det(h(x))}\ dx.$

If we replace the Riemannian manifold ${\Omega}$ by a Lorentzian manifold, such as Minkowski space ${{\bf R}^{1+3}}$, then the notion of a harmonic map is replaced by that of a wave map, which generalises the scalar wave equation (which corresponds to the case ${M={\bf R}}$).

Example 2 (${N}$-particle interactions) Take ${\Omega = {\bf R}}$ and ${M = {\bf R}^3 \otimes {\bf R}^N}$; then a function ${\Phi: \Omega \rightarrow M}$ can be interpreted as a collection of ${N}$ trajectories ${q_1,\dots,q_N: {\bf R} \rightarrow {\bf R}^3}$ in space, which we give a physical interpretation as the trajectories of ${N}$ particles. If we assign each particle a positive mass ${m_1,\dots,m_N > 0}$, and also introduce a potential energy function ${V: M \rightarrow {\bf R}}$, then it turns out that Newton’s laws of motion ${F=ma}$ in this context (with the force ${F_i}$ on the ${i^{th}}$ particle being given by the conservative force ${-\nabla_{q_i} V}$) are equivalent to the trajectories ${q_1,\dots,q_N}$ being a critical point of the action functional

$\displaystyle J[\Phi] := \int_{\bf R} \sum_{i=1}^N \frac{1}{2} m_i |\dot q_i(t)|^2 - V( q_1(t),\dots,q_N(t) )\ dt.$

Formally, if ${\Phi = \Phi_0}$ is a critical point of a functional ${J[\Phi]}$, this means that

$\displaystyle \frac{d}{ds} J[ \Phi[s] ]|_{s=0} = 0$

whenever ${s \mapsto \Phi[s]}$ is a (smooth) deformation with ${\Phi[0]=\Phi_0}$ (and with ${\Phi[s]}$ respecting whatever boundary conditions are appropriate). Interchanging the derivative and integral, we (formally, at least) arrive at

$\displaystyle \int_\Omega \frac{d}{ds} L( x, \Phi[s](x), D\Phi[s](x) )|_{s=0}\ dx = 0. \ \ \ \ \ (2)$

Write ${\delta \Phi := \frac{d}{ds} \Phi[s]|_{s=0}}$ for the infinitesimal deformation of ${\Phi_0}$. By the chain rule, ${\frac{d}{ds} L( x, \Phi[s](x), D\Phi[s](x) )|_{s=0}}$ can be expressed in terms of ${x, \Phi_0(x), \delta \Phi(x), D\Phi_0(x), D \delta \Phi(x)}$. In coordinates, we have

$\displaystyle \frac{d}{ds} L( x, \Phi[s](x), D\Phi[s](x) )|_{s=0} = \delta \Phi^i(x) L_{q^i}(x,\Phi_0(x), D\Phi_0(x)) \ \ \ \ \ (3)$

$\displaystyle + \partial_{x^a} \delta \Phi^i(x) L_{\partial_{x^a} q^i} (x,\Phi_0(x), D\Phi_0(x)),$

where we parameterise ${\Sigma}$ by ${x, (q^i)_{i=1,\dots,n}, (\partial_{x^a} q^i)_{a=1,\dots,d; i=1,\dots,n}}$, and we use subscripts on ${L}$ to denote partial derivatives in the various coefficients. (One can of course work in a coordinate-free manner here if one really wants to, but the notation becomes a little cumbersome due to the need to carefully split up the tangent space of ${\Sigma}$, and we will not do so here.) Thus we can view (2) as an integral identity that asserts the vanishing of a certain integral, whose integrand involves ${x, \Phi_0(x), \delta \Phi(x), D\Phi_0(x), D \delta \Phi(x)}$, where ${\delta \Phi}$ vanishes at the boundary but is otherwise unconstrained.

A general rule of thumb in PDE and calculus of variations is that whenever one has an integral identity of the form ${\int_\Omega F(x)\ dx = 0}$ for some class of functions ${F}$ that vanishes on the boundary, then there must be an associated differential identity ${F = \hbox{div} X}$ that justifies this integral identity through Stokes’ theorem. This rule of thumb helps explain why integration by parts is used so frequently in PDE to justify integral identities. The rule of thumb can fail when one is dealing with “global” or “cohomologically non-trivial” integral identities of a topological nature, such as the Gauss-Bonnet or Kazhdan-Warner identities, but is quite reliable for “local” or “cohomologically trivial” identities, such as those arising from calculus of variations.

In any case, if we apply this rule to (2), we expect that the integrand ${\frac{d}{ds} L( x, \Phi[s](x), D\Phi[s](x) )|_{s=0}}$ should be expressible as a spatial divergence. This is indeed the case:

Proposition 1 (Formal) Let ${\Phi = \Phi_0}$ be a critical point of the functional ${J[\Phi]}$ defined in (1). Then for any deformation ${s \mapsto \Phi[s]}$ with ${\Phi[0] = \Phi_0}$, we have

$\displaystyle \frac{d}{ds} L( x, \Phi[s](x), D\Phi[s](x) )|_{s=0} = \hbox{div} X \ \ \ \ \ (4)$

where ${X}$ is the vector field that is expressible in coordinates as

$\displaystyle X^a := \delta \Phi^i(x) L_{\partial_{x^a} q^i}(x,\Phi_0(x), D\Phi_0(x)). \ \ \ \ \ (5)$

Proof: Comparing (4) with (3), we see that the claim is equivalent to the Euler-Lagrange equation

$\displaystyle L_{q^i}(x,\Phi_0(x), D\Phi_0(x)) - \partial_{x^a} L_{\partial_{x^a} q^i}(x,\Phi_0(x), D\Phi_0(x)) = 0. \ \ \ \ \ (6)$

The same computation, together with an integration by parts, shows that (2) may be rewritten as

$\displaystyle \int_\Omega ( L_{q^i}(x,\Phi_0(x), D\Phi_0(x)) - \partial_{x^a} L_{\partial_{x^a} q^i}(x,\Phi_0(x), D\Phi_0(x)) ) \delta \Phi^i(x)\ dx = 0.$

Since ${\delta \Phi^i(x)}$ is unconstrained on the interior of ${\Omega}$, the claim (6) follows (at a formal level, at least). $\Box$

Many variational problems also enjoy one-parameter continuous symmetries: given any field ${\Phi_0}$ (not necessarily a critical point), one can place that field in a one-parameter family ${s \mapsto \Phi[s]}$ with ${\Phi[0] = \Phi_0}$, such that

$\displaystyle J[ \Phi[s] ] = J[ \Phi[0] ]$

for all ${s}$; in particular,

$\displaystyle \frac{d}{ds} J[ \Phi[s] ]|_{s=0} = 0,$

which can be written as (2) as before. Applying the previous rule of thumb, we thus expect another divergence identity

$\displaystyle \frac{d}{ds} L( x, \Phi[s](x), D\Phi[s](x) )|_{s=0} = \hbox{div} Y \ \ \ \ \ (7)$

whenever ${s \mapsto \Phi[s]}$ arises from a continuous one-parameter symmetry. This expectation is indeed the case in many examples. For instance, if the spatial domain ${\Omega}$ is the Euclidean space ${{\bf R}^d}$, and the Lagrangian (when expressed in coordinates) has no direct dependence on the spatial variable ${x}$, thus

$\displaystyle L( x, \Phi(x), D\Phi(x) ) = L( \Phi(x), D\Phi(x) ), \ \ \ \ \ (8)$

then we obtain ${d}$ translation symmetries

$\displaystyle \Phi[s](x) := \Phi(x - s e^a )$

for ${a=1,\dots,d}$, where ${e^1,\dots,e^d}$ is the standard basis for ${{\bf R}^d}$. For a fixed ${a}$, the left-hand side of (7) then becomes

$\displaystyle \frac{d}{ds} L( \Phi(x-se^a), D\Phi(x-se^a) )|_{s=0} = -\partial_{x^a} [ L( \Phi(x), D\Phi(x) ) ]$

$\displaystyle = \hbox{div} Y$

where ${Y(x) = - L(\Phi(x), D\Phi(x)) e^a}$. Another common type of symmetry is a pointwise symmetry, in which

$\displaystyle L( x, \Phi[s](x), D\Phi[s](x) ) = L( x, \Phi[0](x), D\Phi[0](x) ) \ \ \ \ \ (9)$

for all ${x}$, in which case (7) clearly holds with ${Y=0}$.

If we subtract (4) from (7), we obtain the celebrated theorem of Noether linking symmetries with conservation laws:

Theorem 2 (Noether’s theorem) Suppose that ${\Phi_0}$ is a critical point of the functional (1), and let ${\Phi[s]}$ be a one-parameter continuous symmetry with ${\Phi[0] = \Phi_0}$. Let ${X}$ be the vector field in (5), and let ${Y}$ be the vector field in (7). Then we have the pointwise conservation law

$\displaystyle \hbox{div}(X-Y) = 0.$

In particular, for one-dimensional variational problems, in which ${\Omega \subset {\bf R}}$, we have the conservation law ${(X-Y)(t) = (X-Y)(0)}$ for all ${t \in \Omega}$ (assuming of course that ${\Omega}$ is connected and contains ${0}$).

Noether’s theorem gives a systematic way to locate conservation laws for solutions to variational problems. For instance, if ${\Omega \subset {\bf R}}$ and the Lagrangian has no explicit time dependence, thus

$\displaystyle L(t, \Phi(t), \dot \Phi(t)) = L(\Phi(t), \dot \Phi(t)),$

then by using the time translation symmetry ${\Phi[s](t) := \Phi(t-s)}$, we have

$\displaystyle Y(t) = - L( \Phi(t), \dot\Phi(t) )$

as discussed previously, whereas we have ${\delta \Phi(t) = - \dot \Phi(t)}$, and hence by (5)

$\displaystyle X(t) := - \dot \Phi^i(x) L_{\dot q^i}(\Phi(t), \dot \Phi(t)),$

and so Noether’s theorem gives conservation of the Hamiltonian

$\displaystyle H(t) := \dot \Phi^i(x) L_{\dot q^i}(\Phi(t), \dot \Phi(t))- L(\Phi(t), \dot \Phi(t)). \ \ \ \ \ (10)$

For instance, for geodesic flow, the Hamiltonian works out to be

$\displaystyle H(t) = \frac{1}{2} g_{ij}(\gamma(t)) \dot \gamma^i(t) \dot \gamma^j(t),$

so we see that the speed of the geodesic is conserved over time.

For pointwise symmetries (9), ${Y}$ vanishes, and so Noether’s theorem simplifies to ${\hbox{div} X = 0}$; in the one-dimensional case ${\Omega \subset {\bf R}}$, we thus see from (5) that the quantity

$\displaystyle \delta \Phi^i(t) L_{\dot q^i}(t,\Phi_0(t), \dot \Phi_0(t)) \ \ \ \ \ (11)$

is conserved in time. For instance, for the ${N}$-particle system in Example 2, if we have the translation invariance

$\displaystyle V( q_1 + h, \dots, q_N + h ) = V( q_1, \dots, q_N )$

for all ${q_1,\dots,q_N,h \in {\bf R}^3}$, then we have the pointwise translation symmetry

$\displaystyle q_i[s](t) := q_i(t) + s e^j$

for all ${i=1,\dots,N}$, ${s \in{\bf R}}$ and some ${j=1,\dots,3}$, in which case ${\dot q_i(t) = e^j}$, and the conserved quantity (11) becomes

$\displaystyle \sum_{i=1}^n m_i \dot q_i^j(t);$

as ${j=1,\dots,3}$ was arbitrary, this establishes conservation of the total momentum

$\displaystyle \sum_{i=1}^n m_i \dot q_i(t).$

Similarly, if we have the rotation invariance

$\displaystyle V( R q_1, \dots, Rq_N ) = V( q_1, \dots, q_N )$

for any ${q_1,\dots,q_N \in {\bf R}^3}$ and ${R \in SO(3)}$, then we have the pointwise rotation symmetry

$\displaystyle q_i[s](t) := \exp( s A ) q_i(t)$

for any skew-symmetric real ${3 \times 3}$ matrix ${A}$, in which case ${\dot q_i(t) = A q_i(t)}$, and the conserved quantity (11) becomes

$\displaystyle \sum_{i=1}^n m_i \langle A q_i(t), \dot q_i(t) \rangle;$

since ${A}$ is an arbitrary skew-symmetric matrix, this establishes conservation of the total angular momentum

$\displaystyle \sum_{i=1}^n m_i q_i(t) \wedge \dot q_i(t).$

Below the fold, I will describe how Noether’s theorem can be used to locate all of the conserved quantities for the Euler equations of inviscid fluid flow, discussed in this previous post, by interpreting that flow as geodesic flow in an infinite dimensional manifold.