You are currently browsing the tag archive for the ‘calculus of variations’ tag.

Dimitri Shlyakhtenko and I have uploaded to the arXiv our paper Fractional free convolution powers. For me, this project (which we started during the 2018 IPAM program on quantitative linear algebra) was motivated by a desire to understand the behavior of the minor process applied to a large random Hermitian ${N \times N}$ matrix ${A_N}$, in which one takes the successive upper left ${n \times n}$ minors ${A_n}$ of ${A_N}$ and computes their eigenvalues ${\lambda_1(A_n) \leq \dots \leq \lambda_n(A_n)}$ in non-decreasing order. These eigenvalues are related to each other by the Cauchy interlacing inequalities

$\displaystyle \lambda_i(A_{n+1}) \leq \lambda_i(A_n) \leq \lambda_{i+1}(A_{n+1})$

for ${1 \leq i \leq n < N}$, and are often arranged in a triangular array known as a Gelfand-Tsetlin pattern, as discussed in these previous blog posts.

When ${N}$ is large and the matrix ${A_N}$ is a random matrix with empirical spectral distribution converging to some compactly supported probability measure ${\mu}$ on the real line, then under suitable hypotheses (e.g., unitary conjugation invariance of the random matrix ensemble ${A_N}$), a “concentration of measure” effect occurs, with the spectral distribution of the minors ${A_n}$ for ${n = \lfloor N/k\rfloor}$ for any fixed ${k \geq 1}$ converging to a specific measure ${k^{-1}_* \mu^{\boxplus k}}$ that depends only on ${\mu}$ and ${k}$. The reason for this notation is that there is a surprising description of this measure ${k^{-1}_* \mu^{\boxplus k}}$ when ${k}$ is a natural number, namely it is the free convolution ${\mu^{\boxplus k}}$ of ${k}$ copies of ${\mu}$, pushed forward by the dilation map ${x \mapsto k^{-1} x}$. For instance, if ${\mu}$ is the Wigner semicircular measure ${d\mu_{sc} = \frac{1}{\pi} (4-x^2)^{1/2}_+\ dx}$, then ${k^{-1}_* \mu_{sc}^{\boxplus k} = k^{-1/2}_* \mu_{sc}}$. At the random matrix level, this reflects the fact that the minor of a GUE matrix is again a GUE matrix (up to a renormalizing constant).

As first observed by Bercovici and Voiculescu and developed further by Nica and Speicher, among other authors, the notion of a free convolution power ${\mu^{\boxplus k}}$ of ${\mu}$ can be extended to non-integer ${k \geq 1}$, thus giving the notion of a “fractional free convolution power”. This notion can be defined in several different ways. One of them proceeds via the Cauchy transform

$\displaystyle G_\mu(z) := \int_{\bf R} \frac{d\mu(x)}{z-x}$

of the measure ${\mu}$, and ${\mu^{\boxplus k}}$ can be defined by solving the Burgers-type equation

$\displaystyle (k \partial_k + z \partial_z) G_{\mu^{\boxplus k}}(z) = \frac{\partial_z G_{\mu^{\boxplus k}}(z)}{G_{\mu^{\boxplus k}}(z)} \ \ \ \ \ (1)$

with initial condition ${G_{\mu^{\boxplus 1}} = G_\mu}$ (see this previous blog post for a derivation). This equation can be solved explicitly using the ${R}$-transform ${R_\mu}$ of ${\mu}$, defined by solving the equation

$\displaystyle \frac{1}{G_\mu(z)} + R_\mu(G_\mu(z)) = z$

for sufficiently large ${z}$, in which case one can show that

$\displaystyle R_{\mu^{\boxplus k}}(z) = k R_\mu(z).$

(In the case of the semicircular measure ${\mu_{sc}}$, the ${R}$-transform is simply the identity: ${R_{\mu_{sc}}(z)=z}$.)

Nica and Speicher also gave a free probability interpretation of the fractional free convolution power: if ${A}$ is a noncommutative random variable in a noncommutative probability space ${({\mathcal A},\tau)}$ with distribution ${\mu}$, and ${p}$ is a real projection operator free of ${A}$ with trace ${1/k}$, then the “minor” ${[pAp]}$ of ${A}$ (viewed as an element of a new noncommutative probability space ${({\mathcal A}_p, \tau_p)}$ whose elements are minors ${[pXp]}$, ${X \in {\mathcal A}}$ with trace ${\tau_p([pXp]) := k \tau(pXp)}$) has the law of ${k^{-1}_* \mu^{\boxplus k}}$ (we give a self-contained proof of this in an appendix to our paper). This suggests that the minor process (or fractional free convolution) can be studied within the framework of free probability theory.

One of the known facts about integer free convolution powers ${\mu^{\boxplus k}}$ is monotonicity of the free entropy

$\displaystyle \chi(\mu) = \int_{\bf R} \int_{\bf R} \log|s-t|\ d\mu(s) d\mu(t) + \frac{3}{4} + \frac{1}{2} \log 2\pi$

and free Fisher information

$\displaystyle \Phi(\mu) = \frac{2\pi^2}{3} \int_{\bf R} \left(\frac{d\mu}{dx}\right)^3\ dx$

which were introduced by Voiculescu as free probability analogues of the classical probability concepts of differential entropy and classical Fisher information. (Here we correct a small typo in the normalization constant of Fisher entropy as presented in Voiculescu’s paper.) Namely, it was shown by Shylakhtenko that the quantity ${\chi(k^{-1/2}_* \mu^{\boxplus k})}$ is monotone non-decreasing for integer ${k}$, and the Fisher information ${\Phi(k^{-1/2}_* \mu^{\boxplus k})}$ is monotone non-increasing for integer ${k}$. This is the free probability analogue of the corresponding monotonicities for differential entropy and classical Fisher information that was established by Artstein, Ball, Barthe, and Naor, answering a question of Shannon.

Our first main result is to extend the monotonicity results of Shylakhtenko to fractional ${k \geq 1}$. We give two proofs of this fact, one using free probability machinery, and a more self contained (but less motivated) proof using integration by parts and contour integration. The free probability proof relies on the concept of the free score ${J(X)}$ of a noncommutative random variable, which is the analogue of the classical score. The free score, also introduced by Voiculescu, can be defined by duality as measuring the perturbation with respect to semicircular noise, or more precisely

$\displaystyle \frac{d}{d\varepsilon} \tau( Z P( X + \varepsilon Z) )|_{\varepsilon=0} = \tau( J(X) P(X) )$

whenever ${P}$ is a polynomial and ${Z}$ is a semicircular element free of ${X}$. If ${X}$ has an absolutely continuous law ${\mu = f\ dx}$ for a sufficiently regular ${f}$, one can calculate ${J(X)}$ explicitly as ${J(X) = 2\pi Hf(X)}$, where ${Hf}$ is the Hilbert transform of ${f}$, and the Fisher information is given by the formula

$\displaystyle \Phi(X) = \tau( J(X)^2 ).$

One can also define a notion of relative free score ${J(X:B)}$ relative to some subalgebra ${B}$ of noncommutative random variables.

The free score interacts very well with the free minor process ${X \mapsto [pXp]}$, in particular by standard calculations one can establish the identity

$\displaystyle J( [pXp] : [pBp] ) = k {\bf E}( [p J(X:B) p] | [pXp], [pBp] )$

whenever ${X}$ is a noncommutative random variable, ${B}$ is an algebra of noncommutative random variables, and ${p}$ is a real projection of trace ${1/k}$ that is free of both ${X}$ and ${B}$. The monotonicity of free Fisher information then follows from an application of Pythagoras’s theorem (which implies in particular that conditional expectation operators are contractions on ${L^2}$). The monotonicity of free entropy then follows from an integral representation of free entropy as an integral of free Fisher information along the free Ornstein-Uhlenbeck process (or equivalently, free Fisher information is essentially the rate of change of free entropy with respect to perturbation by semicircular noise). The argument also shows when equality holds in the monotonicity inequalities; this occurs precisely when ${\mu}$ is a semicircular measure up to affine rescaling.

After an extensive amount of calculation of all the quantities that were implicit in the above free probability argument (in particular computing the various terms involved in the application of Pythagoras’ theorem), we were able to extract a self-contained proof of monotonicity that relied on differentiating the quantities in ${k}$ and using the differential equation (1). It turns out that if ${d\mu = f\ dx}$ for sufficiently regular ${f}$, then there is an identity

$\displaystyle \partial_k \Phi( k^{-1/2}_* \mu^{\boxplus k} ) = -\frac{1}{2\pi^2} \lim_{\varepsilon \rightarrow 0} \sum_{\alpha,\beta = \pm} f(x) f(y) K(x+i\alpha \varepsilon, y+i\beta \varepsilon)\ dx dy \ \ \ \ \ (2)$

where ${K}$ is the kernel

$\displaystyle K(z,w) := \frac{1}{G(z) G(w)} (\frac{G(z)-G(w)}{z-w} + G(z) G(w))^2$

and ${G(z) := G_\mu(z)}$. It is not difficult to show that ${K(z,\overline{w})}$ is a positive semi-definite kernel, which gives the required monotonicity. It would be interesting to obtain some more insightful interpretation of the kernel ${K}$ and the identity (2).

These monotonicity properties hint at the minor process ${A \mapsto [pAp]}$ being associated to some sort of “gradient flow” in the ${k}$ parameter. We were not able to formalize this intuition; indeed, it is not clear what a gradient flow on a varying noncommutative probability space ${({\mathcal A}_p, \tau_p)}$ even means. However, after substantial further calculation we were able to formally describe the minor process as the Euler-Lagrange equation for an intriguing Lagrangian functional that we conjecture to have a random matrix interpretation. We first work in “Lagrangian coordinates”, defining the quantity ${\lambda(s,y)}$ on the “Gelfand-Tsetlin pyramid”

$\displaystyle \Delta = \{ (s,y): 0 < s < 1; 0 < y < s \}$

by the formula

$\displaystyle \mu^{\boxplus 1/s}((-\infty,\lambda(s,y)/s])=y/s,$

which is well defined if the density of ${\mu}$ is sufficiently well behaved. The random matrix interpretation of ${\lambda(s,y)}$ is that it is the asymptotic location of the ${\lfloor yN\rfloor^{th}}$ eigenvalue of the ${\lfloor sN \rfloor \times \lfloor sN \rfloor}$ upper left minor of a random ${N \times N}$ matrix ${A_N}$ with asymptotic empirical spectral distribution ${\mu}$ and with unitarily invariant distribution, thus ${\lambda}$ is in some sense a continuum limit of Gelfand-Tsetlin patterns. Thus for instance the Cauchy interlacing laws in this asymptotic limit regime become

$\displaystyle 0 \leq \partial_s \lambda \leq \partial_y \lambda.$

After a lengthy calculation (involving extensive use of the chain rule and product rule), the equation (1) is equivalent to the Euler-Lagrange equation

$\displaystyle \partial_s L_{\lambda_s}(\partial_s \lambda, \partial_y \lambda) + \partial_y L_{\lambda_y}(\partial_s \lambda, \partial_y \lambda) = 0$

where ${L}$ is the Lagrangian density

$\displaystyle L(\lambda_s, \lambda_y) := \log \lambda_y + \log \sin( \pi \frac{\lambda_s}{\lambda_y} ).$

Thus the minor process is formally a critical point of the integral ${\int_\Delta L(\partial_s \lambda, \partial_y \lambda)\ ds dy}$. The quantity ${\partial_y \lambda}$ measures the mean eigenvalue spacing at some location of the Gelfand-Tsetlin pyramid, and the ratio ${\frac{\partial_s \lambda}{\partial_y \lambda}}$ measures mean eigenvalue drift in the minor process. This suggests that this Lagrangian density is some sort of measure of entropy of the asymptotic microscale point process emerging from the minor process at this spacing and drift. There is work of Metcalfe demonstrating that this point process is given by the Boutillier bead model, so we conjecture that this Lagrangian density ${L}$ somehow measures the entropy density of this process.

These lecture notes are a continuation of the 254A lecture notes from the previous quarter.
We consider the Euler equations for incompressible fluid flow on a Euclidean space ${{\bf R}^d}$; we will label ${{\bf R}^d}$ as the “Eulerian space” ${{\bf R}^d_E}$ (or “Euclidean space”, or “physical space”) to distinguish it from the “Lagrangian space” ${{\bf R}^d_L}$ (or “labels space”) that we will introduce shortly (but the reader is free to also ignore the ${E}$ or ${L}$ subscripts if he or she wishes). Elements of Eulerian space ${{\bf R}^d_E}$ will be referred to by symbols such as ${x}$, we use ${dx}$ to denote Lebesgue measure on ${{\bf R}^d_E}$ and we will use ${x^1,\dots,x^d}$ for the ${d}$ coordinates of ${x}$, and use indices such as ${i,j,k}$ to index these coordinates (with the usual summation conventions), for instance ${\partial_i}$ denotes partial differentiation along the ${x^i}$ coordinate. (We use superscripts for coordinates ${x^i}$ instead of subscripts ${x_i}$ to be compatible with some differential geometry notation that we will use shortly; in particular, when using the summation notation, we will now be matching subscripts with superscripts for the pair of indices being summed.)
In Eulerian coordinates, the Euler equations read

$\displaystyle \partial_t u + u \cdot \nabla u = - \nabla p \ \ \ \ \ (1)$

$\displaystyle \nabla \cdot u = 0$

where ${u: [0,T) \times {\bf R}^d_E \rightarrow {\bf R}^d_E}$ is the velocity field and ${p: [0,T) \times {\bf R}^d_E \rightarrow {\bf R}}$ is the pressure field. These are functions of time ${t \in [0,T)}$ and on the spatial location variable ${x \in {\bf R}^d_E}$. We will refer to the coordinates ${(t,x) = (t,x^1,\dots,x^d)}$ as Eulerian coordinates. However, if one reviews the physical derivation of the Euler equations from 254A Notes 0, before one takes the continuum limit, the fundamental unknowns were not the velocity field ${u}$ or the pressure field ${p}$, but rather the trajectories ${(x^{(a)}(t))_{a \in A}}$, which can be thought of as a single function ${x: [0,T) \times A \rightarrow {\bf R}^d_E}$ from the coordinates ${(t,a)}$ (where ${t}$ is a time and ${a}$ is an element of the label set ${A}$) to ${{\bf R}^d}$. The relationship between the trajectories ${x^{(a)}(t) = x(t,a)}$ and the velocity field was given by the informal relationship

$\displaystyle \partial_t x(t,a) \approx u( t, x(t,a) ). \ \ \ \ \ (2)$

We will refer to the coordinates ${(t,a)}$ as (discrete) Lagrangian coordinates for describing the fluid.
In view of this, it is natural to ask whether there is an alternate way to formulate the continuum limit of incompressible inviscid fluids, by using a continuous version ${(t,a)}$ of the Lagrangian coordinates, rather than Eulerian coordinates. This is indeed the case. Suppose for instance one has a smooth solution ${u, p}$ to the Euler equations on a spacetime slab ${[0,T) \times {\bf R}^d_E}$ in Eulerian coordinates; assume furthermore that the velocity field ${u}$ is uniformly bounded. We introduce another copy ${{\bf R}^d_L}$ of ${{\bf R}^d}$, which we call Lagrangian space or labels space; we use symbols such as ${a}$ to refer to elements of this space, ${da}$ to denote Lebesgue measure on ${{\bf R}^d_L}$, and ${a^1,\dots,a^d}$ to refer to the ${d}$ coordinates of ${a}$. We use indices such as ${\alpha,\beta,\gamma}$ to index these coordinates, thus for instance ${\partial_\alpha}$ denotes partial differentiation along the ${a^\alpha}$ coordinate. We will use summation conventions for both the Eulerian coordinates ${i,j,k}$ and the Lagrangian coordinates ${\alpha,\beta,\gamma}$, with an index being summed if it appears as both a subscript and a superscript in the same term. While ${{\bf R}^d_L}$ and ${{\bf R}^d_E}$ are of course isomorphic, we will try to refrain from identifying them, except perhaps at the initial time ${t=0}$ in order to fix the initialisation of Lagrangian coordinates.
Given a smooth and bounded velocity field ${u: [0,T) \times {\bf R}^d_E \rightarrow {\bf R}^d_E}$, define a trajectory map for this velocity to be any smooth map ${X: [0,T) \times {\bf R}^d_L \rightarrow {\bf R}^d_E}$ that obeys the ODE

$\displaystyle \partial_t X(t,a) = u( t, X(t,a) ); \ \ \ \ \ (3)$

in view of (2), this describes the trajectory (in ${{\bf R}^d_E}$) of a particle labeled by an element ${a}$ of ${{\bf R}^d_L}$. From the Picard existence theorem and the hypothesis that ${u}$ is smooth and bounded, such a map exists and is unique as long as one specifies the initial location ${X(0,a)}$ assigned to each label ${a}$. Traditionally, one chooses the initial condition

$\displaystyle X(0,a) = a \ \ \ \ \ (4)$

for ${a \in {\bf R}^d_L}$, so that we label each particle by its initial location at time ${t=0}$; we are also free to specify other initial conditions for the trajectory map if we please. Indeed, we have the freedom to “permute” the labels ${a \in {\bf R}^d_L}$ by an arbitrary diffeomorphism: if ${X: [0,T) \times {\bf R}^d_L \rightarrow {\bf R}^d_E}$ is a trajectory map, and ${\pi: {\bf R}^d_L \rightarrow{\bf R}^d_L}$ is any diffeomorphism (a smooth map whose inverse exists and is also smooth), then the map ${X \circ \pi: [0,T) \times {\bf R}^d_L \rightarrow {\bf R}^d_E}$ is also a trajectory map, albeit one with different initial conditions ${X(0,a)}$.
Despite the popularity of the initial condition (4), we will try to keep conceptually separate the Eulerian space ${{\bf R}^d_E}$ from the Lagrangian space ${{\bf R}^d_L}$, as they play different physical roles in the interpretation of the fluid; for instance, while the Euclidean metric ${d\eta^2 = dx^1 dx^1 + \dots + dx^d dx^d}$ is an important feature of Eulerian space ${{\bf R}^d_E}$, it is not a geometrically natural structure to use in Lagrangian space ${{\bf R}^d_L}$. We have the following more general version of Exercise 8 from 254A Notes 2:

Exercise 1 Let ${u: [0,T) \times {\bf R}^d_E \rightarrow {\bf R}^d_E}$ be smooth and bounded.

• If ${X_0: {\bf R}^d_L \rightarrow {\bf R}^d_E}$ is a smooth map, show that there exists a unique smooth trajectory map ${X: [0,T) \times {\bf R}^d_L \rightarrow {\bf R}^d_E}$ with initial condition ${X(0,a) = X_0(a)}$ for all ${a \in {\bf R}^d_L}$.
• Show that if ${X_0}$ is a diffeomorphism and ${t \in [0,T)}$, then the map ${X(t): a \mapsto X(t,a)}$ is also a diffeomorphism.

Remark 2 The first of the Euler equations (1) can now be written in the form

$\displaystyle \frac{d^2}{dt^2} X(t,a) = - (\nabla p)( t, X(t,a) ) \ \ \ \ \ (5)$

which can be viewed as a continuous limit of Newton’s first law ${m^{(a)} \frac{d^2}{dt^2} x^{(a)}(t) = F^{(a)}(t)}$.

Call a diffeomorphism ${Y: {\bf R}^d_L \rightarrow {\bf R}^d_E}$ (oriented) volume preserving if one has the equation

$\displaystyle \mathrm{det}( \nabla Y )(a) = 1 \ \ \ \ \ (6)$

for all ${a \in {\bf R}^d_L}$, where the total differential ${\nabla Y}$ is the ${d \times d}$ matrix with entries ${\partial_\alpha Y^i}$ for ${\alpha = 1,\dots,d}$ and ${i=1,\dots,d}$, where ${Y^1,\dots,Y^d:{\bf R}^d_L \rightarrow {\bf R}}$ are the components of ${Y}$. (If one wishes, one can also view ${\nabla Y}$ as a linear transformation from the tangent space ${T_a {\bf R}^d_L}$ of Lagrangian space at ${a}$ to the tangent space ${T_{Y(a)} {\bf R}^d_E}$ of Eulerian space at ${Y(a)}$.) Equivalently, ${Y}$ is orientation preserving and one has a Jacobian-free change of variables formula

$\displaystyle \int_{{\bf R}^d_F} f( Y(a) )\ da = \int_{{\bf R}^d_E} f(x)\ dx$

for all ${f \in C_c({\bf R}^d_E \rightarrow {\bf R})}$, which is in turn equivalent to ${Y(E) \subset {\bf R}^d_E}$ having the same Lebesgue measure as ${E}$ for any measurable set ${E \subset {\bf R}^d_L}$.
The divergence-free condition ${\nabla \cdot u = 0}$ then can be nicely expressed in terms of volume-preserving properties of the trajectory maps ${X}$, in a manner which confirms the interpretation of this condition as an incompressibility condition on the fluid:

Lemma 3 Let ${u: [0,T) \times {\bf R}^d_E \rightarrow {\bf R}^d_E}$ be smooth and bounded, let ${X_0: {\bf R}^d_L \rightarrow {\bf R}^d_E}$ be a volume-preserving diffeomorphism, and let ${X: [0,T) \times {\bf R}^d_L \rightarrow {\bf R}^d_E}$ be the trajectory map. Then the following are equivalent:

• ${\nabla \cdot u = 0}$ on ${[0,T) \times {\bf R}^d_E}$.
• ${X(t): {\bf R}^d_L \rightarrow {\bf R}^d_E}$ is volume-preserving for all ${t \in [0,T)}$.

Proof: Since ${X_0}$ is orientation-preserving, we see from continuity that ${X(t)}$ is also orientation-preserving. Suppose that ${X(t)}$ is also volume-preserving, then for any ${f \in C^\infty_c({\bf R}^d_E \rightarrow {\bf R})}$ we have the conservation law

$\displaystyle \int_{{\bf R}^d_L} f( X(t,a) )\ da = \int_{{\bf R}^d_E} f(x)\ dx$

for all ${t \in [0,T)}$. Differentiating in time using the chain rule and (3) we conclude that

$\displaystyle \int_{{\bf R}^d_L} (u(t) \cdot \nabla f)( X(t,a)) \ da = 0$

for all ${t \in [0,T)}$, and hence by change of variables

$\displaystyle \int_{{\bf R}^d_E} (u(t) \cdot \nabla f)(x) \ dx = 0$

which by integration by parts gives

$\displaystyle \int_{{\bf R}^d_E} (\nabla \cdot u(t,x)) f(x)\ dx = 0$

for all ${f \in C^\infty_c({\bf R}^d_E \rightarrow {\bf R})}$ and ${t \in [0,T)}$, so ${u}$ is divergence-free.
To prove the converse implication, it is convenient to introduce the labels map ${A:[0,T) \times {\bf R}^d_E \rightarrow {\bf R}^d_L}$, defined by setting ${A(t): {\bf R}^d_E \rightarrow {\bf R}^d_L}$ to be the inverse of the diffeomorphism ${X(t): {\bf R}^d_L \rightarrow {\bf R}^d_E}$, thus

$\displaystyle A(t, X(t,a)) = a$

for all ${(t,a) \in [0,T) \times {\bf R}^d_L}$. By the implicit function theorem, ${A}$ is smooth, and by differentiating the above equation in time using (3) we see that

$\displaystyle D_t A(t,x) = 0$

where ${D_t}$ is the usual material derivative

$\displaystyle D_t := \partial_t + u \cdot \nabla \ \ \ \ \ (7)$

acting on functions on ${[0,T) \times {\bf R}^d_E}$. If ${u}$ is divergence-free, we have from integration by parts that

$\displaystyle \partial_t \int_{{\bf R}^d_E} \phi(t,x)\ dx = \int_{{\bf R}^d_E} D_t \phi(t,x)\ dx$

for any test function ${\phi: [0,T) \times {\bf R}^d_E \rightarrow {\bf R}}$. In particular, for any ${g \in C^\infty_c({\bf R}^d_L \rightarrow {\bf R})}$, we can calculate

$\displaystyle \partial_t \int_{{\bf R}^d_E} g( A(t,x) )\ dx = \int_{{\bf R}^d_E} D_t (g(A(t,x)))\ dx$

$\displaystyle = \int_{{\bf R}^d_E} 0\ dx$

and hence

$\displaystyle \int_{{\bf R}^d_E} g(A(t,x))\ dx = \int_{{\bf R}^d_E} g(A(0,x))\ dx$

for any ${t \in [0,T)}$. Since ${X_0}$ is volume-preserving, so is ${A(0)}$, thus

$\displaystyle \int_{{\bf R}^d_E} g \circ A(t)\ dx = \int_{{\bf R}^d_L} g\ da.$

Thus ${A(t)}$ is volume-preserving, and hence ${X(t)}$ is also. $\Box$

Exercise 4 Let ${M: [0,T) \rightarrow \mathrm{GL}_d({\bf R})}$ be a continuously differentiable map from the time interval ${[0,T)}$ to the general linear group ${\mathrm{GL}_d({\bf R})}$ of invertible ${d \times d}$ matrices. Establish Jacobi’s formula

$\displaystyle \partial_t \det(M(t)) = \det(M(t)) \mathrm{tr}( M(t)^{-1} \partial_t M(t) )$

and use this and (6) to give an alternate proof of Lemma 3 that does not involve any integration in space.

Remark 5 One can view the use of Lagrangian coordinates as an extension of the method of characteristics. Indeed, from the chain rule we see that for any smooth function ${f: [0,T) \times {\bf R}^d_E \rightarrow {\bf R}}$ of Eulerian spacetime, one has

$\displaystyle \frac{d}{dt} f(t,X(t,a)) = (D_t f)(t,X(t,a))$

and hence any transport equation that in Eulerian coordinates takes the form

$\displaystyle D_t f = g$

for smooth functions ${f,g: [0,T) \times {\bf R}^d_E \rightarrow {\bf R}}$ of Eulerian spacetime is equivalent to the ODE

$\displaystyle \frac{d}{dt} F = G$

where ${F,G: [0,T) \times {\bf R}^d_L \rightarrow {\bf R}}$ are the smooth functions of Lagrangian spacetime defined by

$\displaystyle F(t,a) := f(t,X(t,a)); \quad G(t,a) := g(t,X(t,a)).$

In this set of notes we recall some basic differential geometry notation, particularly with regards to pullbacks and Lie derivatives of differential forms and other tensor fields on manifolds such as ${{\bf R}^d_E}$ and ${{\bf R}^d_L}$, and explore how the Euler equations look in this notation. Our discussion will be entirely formal in nature; we will assume that all functions have enough smoothness and decay at infinity to justify the relevant calculations. (It is possible to work rigorously in Lagrangian coordinates – see for instance the work of Ebin and Marsden – but we will not do so here.) As a general rule, Lagrangian coordinates tend to be somewhat less convenient to use than Eulerian coordinates for establishing the basic analytic properties of the Euler equations, such as local existence, uniqueness, and continuous dependence on the data; however, they are quite good at clarifying the more algebraic properties of these equations, such as conservation laws and the variational nature of the equations. It may well be that in the future we will be able to use the Lagrangian formalism more effectively on the analytic side of the subject also.

Remark 6 One can also write the Navier-Stokes equations in Lagrangian coordinates, but the equations are not expressed in a favourable form in these coordinates, as the Laplacian ${\Delta}$ appearing in the viscosity term becomes replaced with a time-varying Laplace-Beltrami operator. As such, we will not discuss the Lagrangian coordinate formulation of Navier-Stokes here.

Throughout this post, we will work only at the formal level of analysis, ignoring issues of convergence of integrals, justifying differentiation under the integral sign, and so forth. (Rigorous justification of the conservation laws and other identities arising from the formal manipulations below can usually be established in an a posteriori fashion once the identities are in hand, without the need to rigorously justify the manipulations used to come up with these identities).

It is a remarkable fact in the theory of differential equations that many of the ordinary and partial differential equations that are of interest (particularly in geometric PDE, or PDE arising from mathematical physics) admit a variational formulation; thus, a collection ${\Phi: \Omega \rightarrow M}$ of one or more fields on a domain ${\Omega}$ taking values in a space ${M}$ will solve the differential equation of interest if and only if ${\Phi}$ is a critical point to the functional

$\displaystyle J[\Phi] := \int_\Omega L( x, \Phi(x), D\Phi(x) )\ dx \ \ \ \ \ (1)$

involving the fields ${\Phi}$ and their first derivatives ${D\Phi}$, where the Lagrangian ${L: \Sigma \rightarrow {\bf R}}$ is a function on the vector bundle ${\Sigma}$ over ${\Omega \times M}$ consisting of triples ${(x, q, \dot q)}$ with ${x \in \Omega}$, ${q \in M}$, and ${\dot q: T_x \Omega \rightarrow T_q M}$ a linear transformation; we also usually keep the boundary data of ${\Phi}$ fixed in case ${\Omega}$ has a non-trivial boundary, although we will ignore these issues here. (We also ignore the possibility of having additional constraints imposed on ${\Phi}$ and ${D\Phi}$, which require the machinery of Lagrange multipliers to deal with, but which will only serve as a distraction for the current discussion.) It is common to use local coordinates to parameterise ${\Omega}$ as ${{\bf R}^d}$ and ${M}$ as ${{\bf R}^n}$, in which case ${\Sigma}$ can be viewed locally as a function on ${{\bf R}^d \times {\bf R}^n \times {\bf R}^{dn}}$.

Example 1 (Geodesic flow) Take ${\Omega = [0,1]}$ and ${M = (M,g)}$ to be a Riemannian manifold, which we will write locally in coordinates as ${{\bf R}^n}$ with metric ${g_{ij}(q)}$ for ${i,j=1,\dots,n}$. A geodesic ${\gamma: [0,1] \rightarrow M}$ is then a critical point (keeping ${\gamma(0),\gamma(1)}$ fixed) of the energy functional

$\displaystyle J[\gamma] := \frac{1}{2} \int_0^1 g_{\gamma(t)}( D\gamma(t), D\gamma(t) )\ dt$

or in coordinates (ignoring coordinate patch issues, and using the usual summation conventions)

$\displaystyle J[\gamma] = \frac{1}{2} \int_0^1 g_{ij}(\gamma(t)) \dot \gamma^i(t) \dot \gamma^j(t)\ dt.$

As discussed in this previous post, both the Euler equations for rigid body motion, and the Euler equations for incompressible inviscid flow, can be interpreted as geodesic flow (though in the latter case, one has to work really formally, as the manifold ${M}$ is now infinite dimensional).

More generally, if ${\Omega = (\Omega,h)}$ is itself a Riemannian manifold, which we write locally in coordinates as ${{\bf R}^d}$ with metric ${h_{ab}(x)}$ for ${a,b=1,\dots,d}$, then a harmonic map ${\Phi: \Omega \rightarrow M}$ is a critical point of the energy functional

$\displaystyle J[\Phi] := \frac{1}{2} \int_\Omega h(x) \otimes g_{\gamma(x)}( D\gamma(x), D\gamma(x) )\ dh(x)$

or in coordinates (again ignoring coordinate patch issues)

$\displaystyle J[\Phi] = \frac{1}{2} \int_{{\bf R}^d} h_{ab}(x) g_{ij}(\Phi(x)) (\partial_a \Phi^i(x)) (\partial_b \Phi^j(x))\ \sqrt{\det(h(x))}\ dx.$

If we replace the Riemannian manifold ${\Omega}$ by a Lorentzian manifold, such as Minkowski space ${{\bf R}^{1+3}}$, then the notion of a harmonic map is replaced by that of a wave map, which generalises the scalar wave equation (which corresponds to the case ${M={\bf R}}$).

Example 2 (${N}$-particle interactions) Take ${\Omega = {\bf R}}$ and ${M = {\bf R}^3 \otimes {\bf R}^N}$; then a function ${\Phi: \Omega \rightarrow M}$ can be interpreted as a collection of ${N}$ trajectories ${q_1,\dots,q_N: {\bf R} \rightarrow {\bf R}^3}$ in space, which we give a physical interpretation as the trajectories of ${N}$ particles. If we assign each particle a positive mass ${m_1,\dots,m_N > 0}$, and also introduce a potential energy function ${V: M \rightarrow {\bf R}}$, then it turns out that Newton’s laws of motion ${F=ma}$ in this context (with the force ${F_i}$ on the ${i^{th}}$ particle being given by the conservative force ${-\nabla_{q_i} V}$) are equivalent to the trajectories ${q_1,\dots,q_N}$ being a critical point of the action functional

$\displaystyle J[\Phi] := \int_{\bf R} \sum_{i=1}^N \frac{1}{2} m_i |\dot q_i(t)|^2 - V( q_1(t),\dots,q_N(t) )\ dt.$

Formally, if ${\Phi = \Phi_0}$ is a critical point of a functional ${J[\Phi]}$, this means that

$\displaystyle \frac{d}{ds} J[ \Phi[s] ]|_{s=0} = 0$

whenever ${s \mapsto \Phi[s]}$ is a (smooth) deformation with ${\Phi[0]=\Phi_0}$ (and with ${\Phi[s]}$ respecting whatever boundary conditions are appropriate). Interchanging the derivative and integral, we (formally, at least) arrive at

$\displaystyle \int_\Omega \frac{d}{ds} L( x, \Phi[s](x), D\Phi[s](x) )|_{s=0}\ dx = 0. \ \ \ \ \ (2)$

Write ${\delta \Phi := \frac{d}{ds} \Phi[s]|_{s=0}}$ for the infinitesimal deformation of ${\Phi_0}$. By the chain rule, ${\frac{d}{ds} L( x, \Phi[s](x), D\Phi[s](x) )|_{s=0}}$ can be expressed in terms of ${x, \Phi_0(x), \delta \Phi(x), D\Phi_0(x), D \delta \Phi(x)}$. In coordinates, we have

$\displaystyle \frac{d}{ds} L( x, \Phi[s](x), D\Phi[s](x) )|_{s=0} = \delta \Phi^i(x) L_{q^i}(x,\Phi_0(x), D\Phi_0(x)) \ \ \ \ \ (3)$

$\displaystyle + \partial_{x^a} \delta \Phi^i(x) L_{\partial_{x^a} q^i} (x,\Phi_0(x), D\Phi_0(x)),$

where we parameterise ${\Sigma}$ by ${x, (q^i)_{i=1,\dots,n}, (\partial_{x^a} q^i)_{a=1,\dots,d; i=1,\dots,n}}$, and we use subscripts on ${L}$ to denote partial derivatives in the various coefficients. (One can of course work in a coordinate-free manner here if one really wants to, but the notation becomes a little cumbersome due to the need to carefully split up the tangent space of ${\Sigma}$, and we will not do so here.) Thus we can view (2) as an integral identity that asserts the vanishing of a certain integral, whose integrand involves ${x, \Phi_0(x), \delta \Phi(x), D\Phi_0(x), D \delta \Phi(x)}$, where ${\delta \Phi}$ vanishes at the boundary but is otherwise unconstrained.

A general rule of thumb in PDE and calculus of variations is that whenever one has an integral identity of the form ${\int_\Omega F(x)\ dx = 0}$ for some class of functions ${F}$ that vanishes on the boundary, then there must be an associated differential identity ${F = \hbox{div} X}$ that justifies this integral identity through Stokes’ theorem. This rule of thumb helps explain why integration by parts is used so frequently in PDE to justify integral identities. The rule of thumb can fail when one is dealing with “global” or “cohomologically non-trivial” integral identities of a topological nature, such as the Gauss-Bonnet or Kazhdan-Warner identities, but is quite reliable for “local” or “cohomologically trivial” identities, such as those arising from calculus of variations.

In any case, if we apply this rule to (2), we expect that the integrand ${\frac{d}{ds} L( x, \Phi[s](x), D\Phi[s](x) )|_{s=0}}$ should be expressible as a spatial divergence. This is indeed the case:

Proposition 1 (Formal) Let ${\Phi = \Phi_0}$ be a critical point of the functional ${J[\Phi]}$ defined in (1). Then for any deformation ${s \mapsto \Phi[s]}$ with ${\Phi[0] = \Phi_0}$, we have

$\displaystyle \frac{d}{ds} L( x, \Phi[s](x), D\Phi[s](x) )|_{s=0} = \hbox{div} X \ \ \ \ \ (4)$

where ${X}$ is the vector field that is expressible in coordinates as

$\displaystyle X^a := \delta \Phi^i(x) L_{\partial_{x^a} q^i}(x,\Phi_0(x), D\Phi_0(x)). \ \ \ \ \ (5)$

Proof: Comparing (4) with (3), we see that the claim is equivalent to the Euler-Lagrange equation

$\displaystyle L_{q^i}(x,\Phi_0(x), D\Phi_0(x)) - \partial_{x^a} L_{\partial_{x^a} q^i}(x,\Phi_0(x), D\Phi_0(x)) = 0. \ \ \ \ \ (6)$

The same computation, together with an integration by parts, shows that (2) may be rewritten as

$\displaystyle \int_\Omega ( L_{q^i}(x,\Phi_0(x), D\Phi_0(x)) - \partial_{x^a} L_{\partial_{x^a} q^i}(x,\Phi_0(x), D\Phi_0(x)) ) \delta \Phi^i(x)\ dx = 0.$

Since ${\delta \Phi^i(x)}$ is unconstrained on the interior of ${\Omega}$, the claim (6) follows (at a formal level, at least). $\Box$

Many variational problems also enjoy one-parameter continuous symmetries: given any field ${\Phi_0}$ (not necessarily a critical point), one can place that field in a one-parameter family ${s \mapsto \Phi[s]}$ with ${\Phi[0] = \Phi_0}$, such that

$\displaystyle J[ \Phi[s] ] = J[ \Phi[0] ]$

for all ${s}$; in particular,

$\displaystyle \frac{d}{ds} J[ \Phi[s] ]|_{s=0} = 0,$

which can be written as (2) as before. Applying the previous rule of thumb, we thus expect another divergence identity

$\displaystyle \frac{d}{ds} L( x, \Phi[s](x), D\Phi[s](x) )|_{s=0} = \hbox{div} Y \ \ \ \ \ (7)$

whenever ${s \mapsto \Phi[s]}$ arises from a continuous one-parameter symmetry. This expectation is indeed the case in many examples. For instance, if the spatial domain ${\Omega}$ is the Euclidean space ${{\bf R}^d}$, and the Lagrangian (when expressed in coordinates) has no direct dependence on the spatial variable ${x}$, thus

$\displaystyle L( x, \Phi(x), D\Phi(x) ) = L( \Phi(x), D\Phi(x) ), \ \ \ \ \ (8)$

then we obtain ${d}$ translation symmetries

$\displaystyle \Phi[s](x) := \Phi(x - s e^a )$

for ${a=1,\dots,d}$, where ${e^1,\dots,e^d}$ is the standard basis for ${{\bf R}^d}$. For a fixed ${a}$, the left-hand side of (7) then becomes

$\displaystyle \frac{d}{ds} L( \Phi(x-se^a), D\Phi(x-se^a) )|_{s=0} = -\partial_{x^a} [ L( \Phi(x), D\Phi(x) ) ]$

$\displaystyle = \hbox{div} Y$

where ${Y(x) = - L(\Phi(x), D\Phi(x)) e^a}$. Another common type of symmetry is a pointwise symmetry, in which

$\displaystyle L( x, \Phi[s](x), D\Phi[s](x) ) = L( x, \Phi[0](x), D\Phi[0](x) ) \ \ \ \ \ (9)$

for all ${x}$, in which case (7) clearly holds with ${Y=0}$.

If we subtract (4) from (7), we obtain the celebrated theorem of Noether linking symmetries with conservation laws:

Theorem 2 (Noether’s theorem) Suppose that ${\Phi_0}$ is a critical point of the functional (1), and let ${\Phi[s]}$ be a one-parameter continuous symmetry with ${\Phi[0] = \Phi_0}$. Let ${X}$ be the vector field in (5), and let ${Y}$ be the vector field in (7). Then we have the pointwise conservation law

$\displaystyle \hbox{div}(X-Y) = 0.$

In particular, for one-dimensional variational problems, in which ${\Omega \subset {\bf R}}$, we have the conservation law ${(X-Y)(t) = (X-Y)(0)}$ for all ${t \in \Omega}$ (assuming of course that ${\Omega}$ is connected and contains ${0}$).

Noether’s theorem gives a systematic way to locate conservation laws for solutions to variational problems. For instance, if ${\Omega \subset {\bf R}}$ and the Lagrangian has no explicit time dependence, thus

$\displaystyle L(t, \Phi(t), \dot \Phi(t)) = L(\Phi(t), \dot \Phi(t)),$

then by using the time translation symmetry ${\Phi[s](t) := \Phi(t-s)}$, we have

$\displaystyle Y(t) = - L( \Phi(t), \dot\Phi(t) )$

as discussed previously, whereas we have ${\delta \Phi(t) = - \dot \Phi(t)}$, and hence by (5)

$\displaystyle X(t) := - \dot \Phi^i(x) L_{\dot q^i}(\Phi(t), \dot \Phi(t)),$

and so Noether’s theorem gives conservation of the Hamiltonian

$\displaystyle H(t) := \dot \Phi^i(x) L_{\dot q^i}(\Phi(t), \dot \Phi(t))- L(\Phi(t), \dot \Phi(t)). \ \ \ \ \ (10)$

For instance, for geodesic flow, the Hamiltonian works out to be

$\displaystyle H(t) = \frac{1}{2} g_{ij}(\gamma(t)) \dot \gamma^i(t) \dot \gamma^j(t),$

so we see that the speed of the geodesic is conserved over time.

For pointwise symmetries (9), ${Y}$ vanishes, and so Noether’s theorem simplifies to ${\hbox{div} X = 0}$; in the one-dimensional case ${\Omega \subset {\bf R}}$, we thus see from (5) that the quantity

$\displaystyle \delta \Phi^i(t) L_{\dot q^i}(t,\Phi_0(t), \dot \Phi_0(t)) \ \ \ \ \ (11)$

is conserved in time. For instance, for the ${N}$-particle system in Example 2, if we have the translation invariance

$\displaystyle V( q_1 + h, \dots, q_N + h ) = V( q_1, \dots, q_N )$

for all ${q_1,\dots,q_N,h \in {\bf R}^3}$, then we have the pointwise translation symmetry

$\displaystyle q_i[s](t) := q_i(t) + s e^j$

for all ${i=1,\dots,N}$, ${s \in{\bf R}}$ and some ${j=1,\dots,3}$, in which case ${\dot q_i(t) = e^j}$, and the conserved quantity (11) becomes

$\displaystyle \sum_{i=1}^n m_i \dot q_i^j(t);$

as ${j=1,\dots,3}$ was arbitrary, this establishes conservation of the total momentum

$\displaystyle \sum_{i=1}^n m_i \dot q_i(t).$

Similarly, if we have the rotation invariance

$\displaystyle V( R q_1, \dots, Rq_N ) = V( q_1, \dots, q_N )$

for any ${q_1,\dots,q_N \in {\bf R}^3}$ and ${R \in SO(3)}$, then we have the pointwise rotation symmetry

$\displaystyle q_i[s](t) := \exp( s A ) q_i(t)$

for any skew-symmetric real ${3 \times 3}$ matrix ${A}$, in which case ${\dot q_i(t) = A q_i(t)}$, and the conserved quantity (11) becomes

$\displaystyle \sum_{i=1}^n m_i \langle A q_i(t), \dot q_i(t) \rangle;$

since ${A}$ is an arbitrary skew-symmetric matrix, this establishes conservation of the total angular momentum

$\displaystyle \sum_{i=1}^n m_i q_i(t) \wedge \dot q_i(t).$

Below the fold, I will describe how Noether’s theorem can be used to locate all of the conserved quantities for the Euler equations of inviscid fluid flow, discussed in this previous post, by interpreting that flow as geodesic flow in an infinite dimensional manifold.

One of the most important topological concepts in analysis is that of compactness (as discussed for instance in my Companion article on this topic).  There are various flavours of this concept, but let us focus on sequential compactness: a subset E of a topological space X is sequentially compact if every sequence in E has a convergent subsequence whose limit is also in E.  This property allows one to do many things with the set E.  For instance, it allows one to maximise a functional on E:

Proposition 1. (Existence of extremisers)  Let E be a non-empty sequentially compact subset of a topological space X, and let $F: E \to {\Bbb R}$ be a continuous function.  Then the supremum $\sup_{x \in E} f(x)$ is attained at at least one point $x_* \in E$, thus $F(x) \leq F(x_*)$ for all $x \in E$.  (In particular, this supremum is finite.)  Similarly for the infimum.

Proof. Let $-\infty < L \leq +\infty$ be the supremum $L := \sup_{x \in E} F(x)$.  By the definition of supremum (and the axiom of (countable) choice), one can find a sequence $x^{(n)}$ in E such that $F(x^{(n)}) \to L$.  By compactness, we can refine this sequence to a subsequence (which, by abuse of notation, we shall continue to call $x^{(n)}$) such that $x^{(n)}$ converges to a limit x in E.  Since we still have $f(x^{(n)}) \to L$, and f is continuous at x, we conclude that f(x)=L, and the claim for the supremum follows.  The claim for the infimum is similar.  $\Box$

Remark 1. An inspection of the argument shows that one can relax the continuity hypothesis on F somewhat: to attain the supremum, it suffices that F be upper semicontinuous, and to attain the infimum, it suffices that F be lower semicontinuous. $\diamond$

We thus see that sequential compactness is useful, among other things, for ensuring the existence of extremisers.  In finite-dimensional spaces (such as vector spaces), compact sets are plentiful; indeed, the Heine-Borel theorem asserts that every closed and bounded set is compact.  However, once one moves to infinite-dimensional spaces, such as function spaces, then the Heine-Borel theorem fails quite dramatically; most of the closed and bounded sets one encounters in a topological vector space are non-compact, if one insists on using a reasonably “strong” topology.  This causes a difficulty in (among other things) calculus of variations, which is often concerned to finding extremisers to a functional $F: E \to {\Bbb R}$ on a subset E of an infinite-dimensional function space X.

In recent decades, mathematicians have found a number of ways to get around this difficulty.  One of them is to weaken the topology to recover compactness, taking advantage of such results as the Banach-Alaoglu theorem (or its sequential counterpart).  Of course, there is a tradeoff: weakening the topology makes compactness easier to attain, but makes the continuity of F harder to establish.  Nevertheless, if F enjoys enough “smoothing” or “cancellation” properties, one can hope to obtain continuity in the weak topology, allowing one to do things such as locate extremisers.  (The phenomenon that cancellation can lead to continuity in the weak topology is sometimes referred to as compensated compactness.)

Another option is to abandon trying to make all sequences have convergent subsequences, and settle just for extremising sequences to have convergent subsequences, as this would still be enough to retain Theorem 1.  Pursuing this line of thought leads to the Palais-Smale condition, which is a substitute for compactness in some calculus of variations situations.

But in many situations, one cannot weaken the topology to the point where the domain E becomes compact, without destroying the continuity (or semi-continuity) of F, though one can often at least find an intermediate topology (or metric) in which F is continuous, but for which E is still not quite compact.  Thus one can find sequences $x^{(n)}$ in E which do not have any subsequences that converge to a constant element $x \in E$, even in this intermediate metric.  (As we shall see shortly, one major cause of this failure of compactness is the existence of a non-trivial action of a non-compact group G on E; such a group action can cause compensated compactness or the Palais-Smale condition to fail also.)  Because of this, it is a priori conceivable that a continuous function F need not attain its supremum or infimum.

Nevertheless, even though a sequence $x^{(n)}$ does not have any subsequences that converge to a constant x, it may have a subsequence (which we also call $x^{(n)}$) which converges to some non-constant sequence $y^{(n)}$ (in the sense that the distance $d(x^{(n)},y^{(n)})$ between the subsequence and the new sequence in a this intermediate metric), where the approximating sequence $y^{(n)}$ is of a very structured form (e.g. “concentrating” to a point, or “travelling” off to infinity, or a superposition $y^{(n)} = \sum_j y^{(n)}_j$ of several concentrating or travelling profiles of this form).  This weaker form of compactness, in which superpositions of a certain type of profile completely describe all the failures (or defects) of compactness, is known as concentration compactness, and the decomposition $x^{(n)} \approx \sum_j y^{(n)}_j$ of the subsequence is known as the profile decomposition.  In many applications, it is a sufficiently good substitute for compactness that one can still do things like locate extremisers for functionals F –  though one often has to make some additional assumptions of F to compensate for the more complicated nature of the compactness.  This phenomenon was systematically studied by P.L. Lions in the 80s, and found great application in calculus of variations and nonlinear elliptic PDE.  More recently, concentration compactness has been a crucial and powerful tool in the non-perturbative analysis of nonlinear dispersive PDE, in particular being used to locate “minimal energy blowup solutions” or “minimal mass blowup solutions” for such a PDE (analogously to how one can use calculus of variations to find minimal energy solutions to a nonlinear elliptic equation); see for instance this recent survey by Killip and Visan.

In typical applications, the concentration compactness phenomenon is exploited in moderately sophisticated function spaces (such as Sobolev spaces or Strichartz spaces), with the failure of traditional compactness being connected to a moderately complicated group G of symmetries (e.g. the group generated by translations and dilations).  Because of this, concentration compactness can appear to be a rather complicated and technical concept when it is first encountered.  In this note, I would like to illustrate concentration compactness in a simple toy setting, namely in the space $X = l^1({\Bbb Z})$ of absolutely summable sequences, with the uniform ($l^\infty$) metric playing the role of the intermediate metric, and the translation group ${\Bbb Z}$ playing the role of the symmetry group G.  This toy setting is significantly simpler than any model that one would actually use in practice [for instance, in most applications X is a Hilbert space], but hopefully it serves to illuminate this useful concept in a less technical fashion.