One of the most important topological concepts in analysis is that of compactness (as discussed for instance in my Companion article on this topic).  There are various flavours of this concept, but let us focus on sequential compactness: a subset E of a topological space X is sequentially compact if every sequence in E has a convergent subsequence whose limit is also in E.  This property allows one to do many things with the set E.  For instance, it allows one to maximise a functional on E:

Proposition 1. (Existence of extremisers)  Let E be a non-empty sequentially compact subset of a topological space X, and let F: E \to {\Bbb R} be a continuous function.  Then the supremum \sup_{x \in E} f(x) is attained at at least one point x_* \in E, thus F(x) \leq F(x_*) for all x \in E.  (In particular, this supremum is finite.)  Similarly for the infimum.

Proof. Let -\infty < L \leq +\infty be the supremum L := \sup_{x \in E} F(x).  By the definition of supremum (and the axiom of (countable) choice), one can find a sequence x^{(n)} in E such that F(x^{(n)}) \to L.  By compactness, we can refine this sequence to a subsequence (which, by abuse of notation, we shall continue to call x^{(n)}) such that x^{(n)} converges to a limit x in E.  Since we still have f(x^{(n)}) \to L, and f is continuous at x, we conclude that f(x)=L, and the claim for the supremum follows.  The claim for the infimum is similar.  \Box

Remark 1. An inspection of the argument shows that one can relax the continuity hypothesis on F somewhat: to attain the supremum, it suffices that F be upper semicontinuous, and to attain the infimum, it suffices that F be lower semicontinuous. \diamond

We thus see that sequential compactness is useful, among other things, for ensuring the existence of extremisers.  In finite-dimensional spaces (such as vector spaces), compact sets are plentiful; indeed, the Heine-Borel theorem asserts that every closed and bounded set is compact.  However, once one moves to infinite-dimensional spaces, such as function spaces, then the Heine-Borel theorem fails quite dramatically; most of the closed and bounded sets one encounters in a topological vector space are non-compact, if one insists on using a reasonably “strong” topology.  This causes a difficulty in (among other things) calculus of variations, which is often concerned to finding extremisers to a functional F: E \to {\Bbb R} on a subset E of an infinite-dimensional function space X.

In recent decades, mathematicians have found a number of ways to get around this difficulty.  One of them is to weaken the topology to recover compactness, taking advantage of such results as the Banach-Alaoglu theorem (or its sequential counterpart).  Of course, there is a tradeoff: weakening the topology makes compactness easier to attain, but makes the continuity of F harder to establish.  Nevertheless, if F enjoys enough “smoothing” or “cancellation” properties, one can hope to obtain continuity in the weak topology, allowing one to do things such as locate extremisers.  (The phenomenon that cancellation can lead to continuity in the weak topology is sometimes referred to as compensated compactness.)

Another option is to abandon trying to make all sequences have convergent subsequences, and settle just for extremising sequences to have convergent subsequences, as this would still be enough to retain Theorem 1.  Pursuing this line of thought leads to the Palais-Smale condition, which is a substitute for compactness in some calculus of variations situations.

But in many situations, one cannot weaken the topology to the point where the domain E becomes compact, without destroying the continuity (or semi-continuity) of F, though one can often at least find an intermediate topology (or metric) in which F is continuous, but for which E is still not quite compact.  Thus one can find sequences x^{(n)} in E which do not have any subsequences that converge to a constant element x \in E, even in this intermediate metric.  (As we shall see shortly, one major cause of this failure of compactness is the existence of a non-trivial action of a non-compact group G on E; such a group action can cause compensated compactness or the Palais-Smale condition to fail also.)  Because of this, it is a priori conceivable that a continuous function F need not attain its supremum or infimum.

Nevertheless, even though a sequence x^{(n)} does not have any subsequences that converge to a constant x, it may have a subsequence (which we also call x^{(n)}) which converges to some non-constant sequence y^{(n)} (in the sense that the distance d(x^{(n)},y^{(n)}) between the subsequence and the new sequence in a this intermediate metric), where the approximating sequence y^{(n)} is of a very structured form (e.g. “concentrating” to a point, or “travelling” off to infinity, or a superposition y^{(n)} = \sum_j y^{(n)}_j of several concentrating or travelling profiles of this form).  This weaker form of compactness, in which superpositions of a certain type of profile completely describe all the failures (or defects) of compactness, is known as concentration compactness, and the decomposition x^{(n)} \approx \sum_j y^{(n)}_j of the subsequence is known as the profile decomposition.  In many applications, it is a sufficiently good substitute for compactness that one can still do things like locate extremisers for functionals F –  though one often has to make some additional assumptions of F to compensate for the more complicated nature of the compactness.  This phenomenon was systematically studied by P.L. Lions in the 80s, and found great application in calculus of variations and nonlinear elliptic PDE.  More recently, concentration compactness has been a crucial and powerful tool in the non-perturbative analysis of nonlinear dispersive PDE, in particular being used to locate “minimal energy blowup solutions” or “minimal mass blowup solutions” for such a PDE (analogously to how one can use calculus of variations to find minimal energy solutions to a nonlinear elliptic equation); see for instance this recent survey by Killip and Visan.

In typical applications, the concentration compactness phenomenon is exploited in moderately sophisticated function spaces (such as Sobolev spaces or Strichartz spaces), with the failure of traditional compactness being connected to a moderately complicated group G of symmetries (e.g. the group generated by translations and dilations).  Because of this, concentration compactness can appear to be a rather complicated and technical concept when it is first encountered.  In this note, I would like to illustrate concentration compactness in a simple toy setting, namely in the space X = l^1({\Bbb Z}) of absolutely summable sequences, with the uniform (l^\infty) metric playing the role of the intermediate metric, and the translation group {\Bbb Z} playing the role of the symmetry group G.  This toy setting is significantly simpler than any model that one would actually use in practice [for instance, in most applications X is a Hilbert space], but hopefully it serves to illuminate this useful concept in a less technical fashion.

— Defects of compactness in l^1({\Bbb Z})

Consider the space

X := l^1({\Bbb Z}) := \{ (x_m)_{m \in {\Bbb Z}}: \sum_{m \in {\Bbb Z}} |x_m| < \infty \}

of absolutely summable doubly infinite sequences x = (x_m)_{m \in {\Bbb Z}}; this is a normed vector space generated by the basis vectors e_n := (\delta_{n, m})_{m \in {\Bbb Z}} for n \in {\Bbb Z} (here \delta is the Kronecker delta).  We can place several topologies on this space X:

Definition 1. Let x^{(n)} = (x^{(n)}_m)_{m \in {\Bbb Z}} be a sequence in X (i.e. a sequence of sequences!), and let x = (x_m)_{m \in {\Bbb Z}}be another element in X.

  1. (Strong topology) We say that x^{(n)} converges to x in the strong topology (or l^1 topology) if the l^1 distance \|x^{(n)}-x\|_{l^1({\Bbb Z})} := \sum_{m \in {\Bbb Z}} |x^{(n)}_m - x_m| converges to zero.
  2. (Intermediate topology)  We say that x^{(n)} converges in x in the intermediate topology (or uniform topology) if the l^\infty distance \|x^{(n)}-x\|_{l^\infty({\Bbb Z})} := \sup_{m \in {\Bbb Z}} |x^{(n)}_m - x_m| converges to zero.
  3. (Weak topology) We say that x^{(n)} converges in x in the weak topology (or pointwise topology) if x^{(n)}_m \to x_m as n \to \infty for each m. [Strictly speaking, this only describes the weak topology for bounded sequences, but these are the only sequences we will be considering here.]

Example 1. The sequence e_n for n=1,2,\ldots converges weakly to zero, but is not convergent in the strong or intermediate topologies.  The sequence \frac{1}{n} \sum_{n'=1}^n e_{n'} converges in the intermediate and weak topologies to zero, but is not convergent in the strong topology. \diamond

It is easy to see that strong convergence implies intermediate convergence, which in turn implies weak convergence, thus justifying the names “strong”, “intermediate”, and “weak”.  For bounded sequences, the intermediate topology can also be described by a number of other norms, e.g. the l^p({\Bbb Z}) norm for any p>1 (this is an easy application of Hölder’s inequality).

The space X also has the translation action of the group of integers G := {\Bbb Z}, defined using the shift operators T^h: X \to X for h \in G, defined by the formula

T^h (x_m)_{m \in {\Bbb Z}} := (x_{m-h})_{m \in {\Bbb Z}}

(in particular, T^h is linear with T^h e_n = e_{n+h}).  This action is continuous with respect to all three of the above topologies.   (We give G the discrete topology.)

Inside the infinite-dimensional space X, we let E be the “unit sphere” (though it looks more like an octahedron, actually)

E := \{ (x_m)_{m \in {\Bbb Z}} \in X: \sum_{m \in {\Bbb Z}} |x_m| = 1 \}.

E is clearly invariant under the translation action of G.  It is easy to see that E is closed and bounded in the strong topology (or metric).  However, it is not closed in the weak topology: the sequence e_n \in E of basis vectors for n=1,2,\ldots converges weakly to the origin 0, which lies outside of E.  It is also not closed in the intermediate topology; the sequence \frac{1}{n} \sum_{n'=1}^n e_{n'} lies in E but converges in the intermediate topology to 0, which lies outside of E.

The failure of closure in the weak topology causes failure of compactness in the strong or intermediate topologies.  Indeed, the sequence e_n \in E cannot have any convergent subsequence in those topologies, since the limit of such a subsequence would have to equal its weak limit, which is zero; but e_n clearly does not converge in either the strong or intermediate topologies to 0.   (To put it another way, the embedding of l^1({\Bbb Z}) into l^\infty({\Bbb Z}) is not compact.)

More generally, for any fixed profile x \in E, the “travelling wave” (or “travelling profile”) T^n x \in E for n = 1, 2, \ldots converges weakly to zero, and so by the above argument has no convergent subsequence in the strong or intermediate topologies.  A little more generally still, given any sequence h^{(n)} of integers going off to infinity, T^{h^{(n)}} x \in E is a sequence in E which has no convergent subsequence in the strong or intermediate topologies.  Thus we see that the action of the (non-compact) group G is causing a failure of compactness of E in the strong and intermediate topologies.

Because of the linear nature of the vector space X, one can also create examples of sequences in E with no convergent subsequences by taking superpositions of travelling profiles.  For instance, if x_1, x_2 \in X are two non-negative sequences with \|x_1\|_{l^1({\Bbb Z})} + \|x_2\|_{l^1({\Bbb Z})} = 1, and h^{(n)}_1, h^{(n)}_2 are two sequences of integers which both go off to infinity,

|h^{(n)}_1|, |h^{(n)}_2| \to \infty

then the superposition

x^{(n)} := T^{h^{(n)}_1} x_1 + T^{h^{(n)}_2} x_2

of the two travelling profiles T^{h^{(n)}_1} x_1 and T^{h^{(n)}_2} x_2 will be a sequence in E that continues to converge weakly to zero, and so again has no convergent subsequence in the strong or intermediate topologies.

If x_1 and x_2 are not non-negative, then there can be cancellations between T^{h^{(n)}_1} x_1 and T^{h^{(n)}_2} x_2, which could cause x^{(n)} to have norm significantly less than 1 (thus straying away from E).  However, if one also imposes the asymptotic orthogonality condition

|h^{(n)}_2 - h^{(n)}_1| \to \infty

we see that these cancellations vanish in the limit n \to \infty, and so in this case we can build a modified superposition

x^{(n)} := T^{h^{(n)}_1} x_1 + T^{h^{(n)}_2} x_2 + w^{(n)}

that lies in E, with w^{(n)} converging to zero in the strong and uniform topology, and will once again be a sequence with no convergent subsequence.  [If the asymptotic orthogonality condition fails, then one can collapse the superposition of two travelling profiles into a single travelling profile, after passing to a subsequence if necessary.  Indeed, if |h^{(n)}_2 - h^{(n)}_1| does not go to infinity, then we can find a subsequence for which h^{(n)}_2 - h^{(n)}_1 is equal to a constant c, in which case T^{h^{(n)}_1} f_1 + T^{h^{(n)}_2} f_2 is equal to a single travelling profile T^{h^{(n)}_1}( f_1 + T^c f_2 ).]  More generally, given any collection x_j of non-zero elements of X with

\sum_j \|x_j\|_{l^1({\Bbb Z})} \leq 1 (1)

and any sequences h^{(n)}_j of integers obeying the asymptotic orthogonality condition

|h^{(n)}_{j'} - h^{(n)}_j| \to \infty \hbox{ as } n \to \infty (2)

for all j' > j, we can find a sequence in x^{(n)} that takes the form

x^{(n)} = \sum_j T^{h^{(n)}_j} x_j + w^{(n)}  (3)

where w^{(n)} converges to zero in the intermediate topology.  [If one has equality in (1), one can make w^{(n)} converge in the strong topology also.]  If h^{(n)}_j goes off to infinity for at least one j with x_j non-zero, then this sequence will have no convergent subsequence.

We have thus demonstrated a large number of ways that compactness of E fails in the strong and intermediate topologies.  The concentration compactness phenomenon, in this setting, tells us that these are essentially the only ways in which compactness fails in the intermediate topology.  More precisely, one has

Theorem 2. (Profile decomposition)  Let x^{(n)} be a sequence in E.  Then, after passing to a subsequence (which we still call x^{(n)}), there exist x_j \in X obeying (1), and sequences h^{(n)}_j of integers obeying (2), such that we have the decomposition (3) where the error w^{(n)} converges to zero in the intermediate topology.  Furthermore, we can improve (1) to

\sum_j \|x_j\|_{l^1({\Bbb Z})} + \lim_{n \to \infty} \|w^{(n)}\|_{l^1({\Bbb Z})} \leq 1 (1′)

Remark 2. The situation is vastly different in the strong topology; in this case, virtually every sequence in E fails to have a convergent subsequence (consider for instance the sequence \frac{1}{n} \sum_{n'=1}^n e_{n'} from Example 1), and there are so many different ways a sequence can behave that there is no meaningful profile decomposition.  A more quantitative way to see this is via a computation of metric entropy constants (i.e. covering numbers).  Pick a small number \varepsilon > 0 (e.g. \varepsilon = 0.1) and a large number N, and consider how many balls of radius \varepsilon in the l^1(\{1,\ldots,N\}) norm are needed to cover the unit sphere E_N in l^1(\{1,\ldots,N\}).  A simple volume packing argument shows that this number must grow exponentially in N.  On the other hand, if one wants to cover E_N with the (much larger) balls of radius \varepsilon in the l^\infty(\{1,\ldots,N\}) topology instead, the number of balls needed grows only polynomially with N.  Indeed, after rounding down each coefficient of an element of l^1(\{1,\ldots,N\}) to a multiple of \varepsilon, there are only at most 1/\varepsilon non-zero coefficients, and so the total number of possibilities for this rounded down approximant is about (n/\varepsilon)^{1/\varepsilon}.  Thus, the metric entropy constants for both the strong and intermediate topologies go to infinity in the infinite dimensional limit N \to \infty (thus demonstrating the lack of compactness for both), but much more rapidly for the former than for the latter. \diamond

— Proof sketch of Theorem 2 —

We now sketch how one would prove Theorem 2.  The idea is to hunt down and “domesticate” the large values of x^{(n)}, as these are the only obstructions to convergence in the intermediate topology. (I believe the use of the term “domesticate” here is due to Kyril Tintarev.)  Each large piece of the x^{(n)} that we capture in this manner will decrease the total “mass” in play, which guarantees that eventually one runs out of such large pieces, at which point one obtains the decomposition (3).  [Curiously, the strategy here is very similar to that underlying the structural theorems that arise in additive combinatorics and ergodic theory; I touched upon these analogies before in my Simons lectures.] In this process we rely heavily on the freedom to pass to a subsequence at will, which is useful to eliminate any fluctuations so long as they range over a compact space of possibilities.

Let’s see how this procedure works.  We begin with our bounded sequence x^{(n)}, whose l^1 norms are all equal to 1.  If this sequence already converging to zero in the intermediate topology, we are done (we let j range over the empty set, and set w^{(n)} equal to all of x^{(n)}.  So suppose that x^{(n)} are not converging to zero in this topology.  Passing to a subsequence if necessary, this implies the existence of an \varepsilon_1 > 0 such that \|x^{(n)}\|_{l^\infty({\Bbb Z})} > \varepsilon_1 for all n.  Thus we can find integers h^{(n)}_1 such that |x^{(n)}_{h^{(n)}_1}| > \varepsilon_1 for all n, or equivalently that the shifts T^{-h^{(n)}_1} x^{(n)} have their zero coefficient uniformly bounded below in magnitude by \varepsilon_1.

We have used the symmetry group G to move a large component of each of the x^{(n)} the origin.  Now we take advantage of sequential compactness of the unit ball in the weak topology.  This allows one (after passing to another subsequence) to assume that the shifted elements T^{-h^{(n)}_1} x^{(n)} converge weakly to some limit x_1; since the T^{-h^{(n)}_1} x^{(n)} are uniformly non-trivial at the origin, the weak limit x_1 is also; in particular, we have \|x_1\|_{l^1({\Bbb Z})} \geq \varepsilon_1 > 0.  Undoing the shift, we have obtained a decomposition

x^{(n)} = T^{h^{(n)}_1} x_1 + w^{(n)}_1

where the residual w^{(n)}_1 is such that T^{-h^{(n)}_1} w^{(n)}_1 converges weakly to zero (thus, in some sense w^{(n)}_1 vanishes asymptotically near h^{(n)}_1).  It is then not difficult to show the “asymptotic orthogonality” relationship

\|x^{(n)}\|_{l^1({\Bbb Z})} = \|T^{h^{(n)}_1} x_1\|_{l^1({\Bbb Z})} + \|w^{(n)}_1\|_{l^1({\Bbb Z})} + o(1)

where o(1) is a quantity that goes to zero as n \to \infty; this implies, in particular, that the residual w^{(n)}_1 eventually has mass strictly less than that of the original sequence x^{(n)}_1:

\|w^{(n)}_1\|_{l^1({\Bbb Z})} \leq 1 - \varepsilon_1 + o(1);

in fact we have the more precise relationship

\|x_1\|_{l^1({\Bbb Z})} + \lim_{n \to \infty} \|w^{(n)}_1\|_{l^1({\Bbb Z})} = 1.

Now we take this residual w^{(n)}_1 and repeat the whole process.  Namely, if w^{(n)}_1 converges in the intermediate topology to zero, then we are done; otherwise, as before, we can find (after passing to a subsequence) \varepsilon_2 > 0, h^{(n)}_2 for which T^{-h^{(n)}_2} w^{(n)}_1 is bounded from below by \varepsilon_2 at the origin. Because T^{-h^{(n)}_1} w^{(n)}_1 already converged weakly to zero, one can conclude that h^{(n)}_2 and h^{(n)}_1 must be asymptotically orthogonal in the sense of (2).

Passing to a subsequence again, we can assume that T^{-h^{(n)}_2} w^{(n)}_2 converges weakly to a limit x_2 with mass at least \varepsilon_2, leading to a decomposition

x^{(n)} = T^{h^{(n)}_1} x_1 + T^{h^{(n)}_2} x_2 + w^{(n)}_2

where the residual w^{(n)}_2 is such that T^{-h^{(n)}_1} w^{(n)}_2 and T^{-h^{(n)}_2} w^{(n)}_2 both converge weakly to zero, and has norm

\|w^{(n)}_2\|_{l^1({\Bbb Z})} \leq 1 - \varepsilon_1 - \varepsilon_2 + o(1);

in fact we have the more precise relationship

\|x_1\|_{l^1({\Bbb Z})} +  \|x_2\|_{l^1({\Bbb Z})} + \lim_{n \to \infty} \|w^{(n)}_2\|_{l^1({\Bbb Z})} = 1.

One can continue in this vein, extracting more and more travelling profiles T^{h^{(n)}_j} x_j on finer and finer subsequences, with residuals w^{(n)}_j that are getting smaller and smaller.  The subsequences involved depend on j, but by the usual Cantor (or Arzelá-Ascoli) diagonalisation argument, one can work with a single sequence throughout.  Note that the amounts of mass \varepsilon_j that are extracted in this process cannot exceed 1 in total: \sum_j \varepsilon_j \leq 1 (in fact we have the slightly stronger statement (2)).  In particular, the \varepsilon_j must go to zero as j \to \infty (the infinite convergence principle!).  If the \varepsilon_j were selected in a “greedy” manner, this shows that the asymptotic l^\infty({\Bbb Z}) norm of the residuals w^{(n)}_j as n \to \infty must decay to zero as j \to \infty.  Carefully rearranging the epsilons, this gives the decomposition (3) with residual w^{(n)} converging to zero in the intermediate topology, and the verification of the rest of the theorem is routine.

Remark 3. It is tempting to view Theorem 2 as asserting that the space E with the l^\infty({\Bbb Z}) can be “compactified” by throwing in some idealised superposition of profiles that are “infinitely far apart” from each other. \diamond

— An application of concentration compactness —

As mentioned in the introduction, one can use the profile decomposition of Theorem 2 as a substitute for compactness in establishing results analogous to Theorem 1.  The catch is that one needs more hypotheses on the functional F in order to be able to handle the complicated profiles that come up.  It is difficult to formalise the “best” set of hypotheses that would cover all conceivable situations; it seems better to just adapt the general arguments to each individual situation separately.  Here is a typical (but certainly not optimal) result of this type:

Theorem 3.  Let X, E be as above.  Let F: X \to {\Bbb R}^+ be a non-negative function with the following properties:

  1. (Continuity) F is continuous in the intermediate topology on E.
  2. (Homogeneity) F is homogeneous of some degree 1 < p < \infty, thus F(\lambda x) = \lambda^p F(x) for all \lambda > 0 and x \in X.  (In particular, F(0)=0.)
  3. (Invariance) F is G-invariant: F(T^h x) = F(x) for all h \in {\Bbb Z} and x \in X.
  4. (Asymptotic additivity) If h^{(n)}_j are a collection of sequences obeying the asymptotic orthogonality condition (1), and x_j \in X are such that \sum_j \|x_j\|_{l^1({\Bbb Z})} < \infty, then \sum_j F(x_j) < \infty and F( \sum_j T^{h^{(n)}_j} x_j ) = \sum_j F(x_j) + o(1).  More generally, if w^{(n)} is bounded in l^1 and converges to zero in the intermediate topology, then F( \sum_j T^{h^{(n)}_j} x_j + w^{(n)} ) = \sum_j F(x_j) + o(1).  (Note that this generalises both 1. and 3.)

Then F is bounded on E, and attains its supremum.

A typical example of a functional F obeying the above properties is

F( (x_m)_{m \in {\Bbb Z}} ) := \sum_{m \in {\Bbb Z}} |x_m - x_{m+1}|^p

for some 1 < p < \infty.

Proof. We repeat the proof of Theorem 1.  Let L := \sup_{x \in E} F(x).  Clearly L \geq 0; we can assume that L>0, since the claim is trivial when L=0.  As before, we have an extremising sequence x^{(n)} \in E with F(x^{(n)}) \to L.  Applying Theorem 2, and passing to a subsequence, we obtain a decomposition (3) with the stated properties.  Applying the asymptotic additivity hypothesis 4., we have

F(x^{(n)}) = \sum_j F( x_j ) + o(1)

and in particular

L = \sum_j F(x_j). (4)

This implies in particular that L is finite.

Now, we use the homogeneity assumption.  Since F(x) \leq L when \|x\|_{l^1({\Bbb Z})} = 1, we obtain the bound F(x) \leq L \|x\|_{l^1({\Bbb Z})}^p.  We conclude that

L \leq L \sum_j \|x_j\|_{l^1({\Bbb Z})}^p

Combining this with (1) we obtain

L \leq \sum_j L \|x_j\|_{l^1({\Bbb Z})}^p \leq L \sum_j \|x_j\|_{l^1({\Bbb Z})} \leq L.

Thus all these inequalities must be equality.  Analysing this, we see that all but one of the x_j must vanish, with the remaining x_j (say x_0) having norm 1.  From (4) we thus have F(x_0)=L, and we have obtained the desired extremiser.

[Update, Nov 6: some corrections, in particular with regard to the closure of E in the intermediate topology.]