You are currently browsing the tag archive for the ‘calculus of variations’ tag.
Throughout this post, we will work only at the formal level of analysis, ignoring issues of convergence of integrals, justifying differentiation under the integral sign, and so forth. (Rigorous justification of the conservation laws and other identities arising from the formal manipulations below can usually be established in an a posteriori fashion once the identities are in hand, without the need to rigorously justify the manipulations used to come up with these identities).
It is a remarkable fact in the theory of differential equations that many of the ordinary and partial differential equations that are of interest (particularly in geometric PDE, or PDE arising from mathematical physics) admit a variational formulation; thus, a collection of one or more fields on a domain taking values in a space will solve the differential equation of interest if and only if is a critical point to the functional
involving the fields and their first derivatives , where the Lagrangian is a function on the vector bundle over consisting of triples with , , and a linear transformation; we also usually keep the boundary data of fixed in case has a non-trivial boundary, although we will ignore these issues here. (We also ignore the possibility of having additional constraints imposed on and , which require the machinery of Lagrange multipliers to deal with, but which will only serve as a distraction for the current discussion.) It is common to use local coordinates to parameterise as and as , in which case can be viewed locally as a function on .
Example 1 (Geodesic flow) Take and to be a Riemannian manifold, which we will write locally in coordinates as with metric for . A geodesic is then a critical point (keeping fixed) of the energy functional
or in coordinates (ignoring coordinate patch issues, and using the usual summation conventions)
As discussed in this previous post, both the Euler equations for rigid body motion, and the Euler equations for incompressible inviscid flow, can be interpreted as geodesic flow (though in the latter case, one has to work really formally, as the manifold is now infinite dimensional).
More generally, if is itself a Riemannian manifold, which we write locally in coordinates as with metric for , then a harmonic map is a critical point of the energy functional
or in coordinates (again ignoring coordinate patch issues)
If we replace the Riemannian manifold by a Lorentzian manifold, such as Minkowski space , then the notion of a harmonic map is replaced by that of a wave map, which generalises the scalar wave equation (which corresponds to the case ).
Example 2 (-particle interactions) Take and ; then a function can be interpreted as a collection of trajectories in space, which we give a physical interpretation as the trajectories of particles. If we assign each particle a positive mass , and also introduce a potential energy function , then it turns out that Newton’s laws of motion in this context (with the force on the particle being given by the conservative force ) are equivalent to the trajectories being a critical point of the action functional
Formally, if is a critical point of a functional , this means that
where we parameterise by , and we use subscripts on to denote partial derivatives in the various coefficients. (One can of course work in a coordinate-free manner here if one really wants to, but the notation becomes a little cumbersome due to the need to carefully split up the tangent space of , and we will not do so here.) Thus we can view (2) as an integral identity that asserts the vanishing of a certain integral, whose integrand involves , where vanishes at the boundary but is otherwise unconstrained.
A general rule of thumb in PDE and calculus of variations is that whenever one has an integral identity of the form for some class of functions that vanishes on the boundary, then there must be an associated differential identity that justifies this integral identity through Stokes’ theorem. This rule of thumb helps explain why integration by parts is used so frequently in PDE to justify integral identities. The rule of thumb can fail when one is dealing with “global” or “cohomologically non-trivial” integral identities of a topological nature, such as the Gauss-Bonnet or Kazhdan-Warner identities, but is quite reliable for “local” or “cohomologically trivial” identities, such as those arising from calculus of variations.
In any case, if we apply this rule to (2), we expect that the integrand should be expressible as a spatial divergence. This is indeed the case:
Proposition 1 (Formal) Let be a critical point of the functional defined in (1). Then for any deformation with , we have
The same computation, together with an integration by parts, shows that (2) may be rewritten as
Since is unconstrained on the interior of , the claim (6) follows (at a formal level, at least).
Many variational problems also enjoy one-parameter continuous symmetries: given any field (not necessarily a critical point), one can place that field in a one-parameter family with , such that
for all ; in particular,
which can be written as (2) as before. Applying the previous rule of thumb, we thus expect another divergence identity
whenever arises from a continuous one-parameter symmetry. This expectation is indeed the case in many examples. For instance, if the spatial domain is the Euclidean space , and the Lagrangian (when expressed in coordinates) has no direct dependence on the spatial variable , thus
then we obtain translation symmetries
for , where is the standard basis for . For a fixed , the left-hand side of (7) then becomes
for all , in which case (7) clearly holds with .
Theorem 2 (Noether’s theorem) Suppose that is a critical point of the functional (1), and let be a one-parameter continuous symmetry with . Let be the vector field in (5), and let be the vector field in (7). Then we have the pointwise conservation law
In particular, for one-dimensional variational problems, in which , we have the conservation law for all (assuming of course that is connected and contains ).
Noether’s theorem gives a systematic way to locate conservation laws for solutions to variational problems. For instance, if and the Lagrangian has no explicit time dependence, thus
then by using the time translation symmetry , we have
as discussed previously, whereas we have , and hence by (5)
For instance, for geodesic flow, the Hamiltonian works out to be
so we see that the speed of the geodesic is conserved over time.
is conserved in time. For instance, for the -particle system in Example 2, if we have the translation invariance
for all , then we have the pointwise translation symmetry
for all , and some , in which case , and the conserved quantity (11) becomes
as was arbitrary, this establishes conservation of the total momentum
Similarly, if we have the rotation invariance
for any and , then we have the pointwise rotation symmetry
for any skew-symmetric real matrix , in which case , and the conserved quantity (11) becomes
since is an arbitrary skew-symmetric matrix, this establishes conservation of the total angular momentum
Below the fold, I will describe how Noether’s theorem can be used to locate all of the conserved quantities for the Euler equations of inviscid fluid flow, discussed in this previous post, by interpreting that flow as geodesic flow in an infinite dimensional manifold.
One of the most important topological concepts in analysis is that of compactness (as discussed for instance in my Companion article on this topic). There are various flavours of this concept, but let us focus on sequential compactness: a subset E of a topological space X is sequentially compact if every sequence in E has a convergent subsequence whose limit is also in E. This property allows one to do many things with the set E. For instance, it allows one to maximise a functional on E:
Proposition 1. (Existence of extremisers) Let E be a non-empty sequentially compact subset of a topological space X, and let be a continuous function. Then the supremum is attained at at least one point , thus for all . (In particular, this supremum is finite.) Similarly for the infimum.
Proof. Let be the supremum . By the definition of supremum (and the axiom of (countable) choice), one can find a sequence in E such that . By compactness, we can refine this sequence to a subsequence (which, by abuse of notation, we shall continue to call ) such that converges to a limit x in E. Since we still have , and f is continuous at x, we conclude that f(x)=L, and the claim for the supremum follows. The claim for the infimum is similar.
Remark 1. An inspection of the argument shows that one can relax the continuity hypothesis on F somewhat: to attain the supremum, it suffices that F be upper semicontinuous, and to attain the infimum, it suffices that F be lower semicontinuous.
We thus see that sequential compactness is useful, among other things, for ensuring the existence of extremisers. In finite-dimensional spaces (such as vector spaces), compact sets are plentiful; indeed, the Heine-Borel theorem asserts that every closed and bounded set is compact. However, once one moves to infinite-dimensional spaces, such as function spaces, then the Heine-Borel theorem fails quite dramatically; most of the closed and bounded sets one encounters in a topological vector space are non-compact, if one insists on using a reasonably “strong” topology. This causes a difficulty in (among other things) calculus of variations, which is often concerned to finding extremisers to a functional on a subset E of an infinite-dimensional function space X.
In recent decades, mathematicians have found a number of ways to get around this difficulty. One of them is to weaken the topology to recover compactness, taking advantage of such results as the Banach-Alaoglu theorem (or its sequential counterpart). Of course, there is a tradeoff: weakening the topology makes compactness easier to attain, but makes the continuity of F harder to establish. Nevertheless, if F enjoys enough “smoothing” or “cancellation” properties, one can hope to obtain continuity in the weak topology, allowing one to do things such as locate extremisers. (The phenomenon that cancellation can lead to continuity in the weak topology is sometimes referred to as compensated compactness.)
Another option is to abandon trying to make all sequences have convergent subsequences, and settle just for extremising sequences to have convergent subsequences, as this would still be enough to retain Theorem 1. Pursuing this line of thought leads to the Palais-Smale condition, which is a substitute for compactness in some calculus of variations situations.
But in many situations, one cannot weaken the topology to the point where the domain E becomes compact, without destroying the continuity (or semi-continuity) of F, though one can often at least find an intermediate topology (or metric) in which F is continuous, but for which E is still not quite compact. Thus one can find sequences in E which do not have any subsequences that converge to a constant element , even in this intermediate metric. (As we shall see shortly, one major cause of this failure of compactness is the existence of a non-trivial action of a non-compact group G on E; such a group action can cause compensated compactness or the Palais-Smale condition to fail also.) Because of this, it is a priori conceivable that a continuous function F need not attain its supremum or infimum.
Nevertheless, even though a sequence does not have any subsequences that converge to a constant x, it may have a subsequence (which we also call ) which converges to some non-constant sequence (in the sense that the distance between the subsequence and the new sequence in a this intermediate metric), where the approximating sequence is of a very structured form (e.g. “concentrating” to a point, or “travelling” off to infinity, or a superposition of several concentrating or travelling profiles of this form). This weaker form of compactness, in which superpositions of a certain type of profile completely describe all the failures (or defects) of compactness, is known as concentration compactness, and the decomposition of the subsequence is known as the profile decomposition. In many applications, it is a sufficiently good substitute for compactness that one can still do things like locate extremisers for functionals F – though one often has to make some additional assumptions of F to compensate for the more complicated nature of the compactness. This phenomenon was systematically studied by P.L. Lions in the 80s, and found great application in calculus of variations and nonlinear elliptic PDE. More recently, concentration compactness has been a crucial and powerful tool in the non-perturbative analysis of nonlinear dispersive PDE, in particular being used to locate “minimal energy blowup solutions” or “minimal mass blowup solutions” for such a PDE (analogously to how one can use calculus of variations to find minimal energy solutions to a nonlinear elliptic equation); see for instance this recent survey by Killip and Visan.
In typical applications, the concentration compactness phenomenon is exploited in moderately sophisticated function spaces (such as Sobolev spaces or Strichartz spaces), with the failure of traditional compactness being connected to a moderately complicated group G of symmetries (e.g. the group generated by translations and dilations). Because of this, concentration compactness can appear to be a rather complicated and technical concept when it is first encountered. In this note, I would like to illustrate concentration compactness in a simple toy setting, namely in the space of absolutely summable sequences, with the uniform () metric playing the role of the intermediate metric, and the translation group playing the role of the symmetry group G. This toy setting is significantly simpler than any model that one would actually use in practice [for instance, in most applications X is a Hilbert space], but hopefully it serves to illuminate this useful concept in a less technical fashion.