Concentration compactness via nonstandard analysis

29 November, 2010 in expository, math.AP, math.CA, math.LO | Tags: concentration compactness, nonstandard analysis, tightness | by Terence Tao

One of the key difficulties in performing analysis in infinite-dimensional function spaces, as opposed to finite-dimensional vector spaces, is that the Bolzano-Weierstrass theorem no longer holds: a bounded sequence in an infinite-dimensional function space need not have any convergent subsequences (when viewed using the strong topology). To put it another way, the closed unit ball in an infinite-dimensional function space usually fails to be (sequentially) compact.
As compactness is such a useful property to have in analysis, various tools have been developed over the years to try to salvage some sort of substitute for the compactness property in infinite-dimensional spaces. One of these tools is concentration compactness, which was discussed previously on this blog. This can be viewed as a compromise between weak compactness (which is true in very general circumstances, but is often too weak for applications) and strong compactness (which would be very useful in applications, but is usually false), in which one obtains convergence in an intermediate sense that involves a group of symmetries acting on the function space in question.
Concentration compactness is usually stated and proved in the language of standard analysis: epsilons and deltas, limits and supremas, and so forth. In this post, I wanted to note that one could also state and prove the basic foundations of concentration compactness in the framework of nonstandard analysis, in which one now deals with infinitesimals and ultralimits instead of epsilons and ordinary limits. This is a fairly mild change of viewpoint, but I found it to be informative to view this subject from a slightly different perspective. The nonstandard proofs require a fair amount of general machinery to set up, but conversely, once all the machinery is up and running, the proofs become slightly shorter, and can exploit tools from (standard) infinitary analysis, such as orthogonal projections in Hilbert spaces, or the continuous-pure point decomposition of measures. Because of the substantial amount of setup required, nonstandard proofs tend to have significantly more net complexity than their standard counterparts when it comes to basic results (such as those presented in this post), but the gap between the two narrows when the results become more difficult, and for particularly intricate and deep results it can happen that nonstandard proofs end up being simpler overall than their standard analogues, particularly if the nonstandard proof is able to tap the power of some existing mature body of infinitary mathematics (e.g. ergodic theory, measure theory, Hilbert space theory, or topological group theory) which is difficult to directly access in the standard formulation of the argument.

— 1. Weak sequential compactness in a Hilbert space —

Before turning to concentration compactness, we will warm up with the simpler situation of weak sequential compactness in a Hilbert space. For sake of notation we shall only consider complex Hilbert spaces, although all the discussion here works equally well for real Hilbert spaces.
Recall that a bounded sequence ${x_n}$ of vectors in a Hilbert space ${H}$ is said to converge weakly to a limit ${x}$ if one has ${\langle x_n,y\rangle \rightarrow \langle x,y \rangle}$ for all ${y \in H}$ . We have the following basic theorem:

Theorem 1 (Sequential Banach-Alaoglu theorem) Every bounded sequence ${x_n}$ of vectors in a Hilbert space ${H}$ has a weakly convergent subsequence.

The usual (standard analysis) proof of this theorem runs as follows:
Proof: (Sketch) By restricting to the closed span of the ${x_n}$ , we may assume without loss of generality that ${H}$ is separable. Letting ${y_1, y_2, \ldots}$ be a dense subet of ${H}$ , we may apply the Bolzano-Weierstrass theorem iteratively, followed by the Arzelá-Ascoli diagonalisation argument, to find a subsequence ${x_{n_j}}$ for which ${\langle x_{n_j}, y_m \rangle}$ converges to a limit for each ${m}$ . Using the boundedness of the ${x_{n_j}}$ and a density argument, we conclude that ${\langle x_{n_j}, y \rangle}$ converges to a limit for each ${y}$ ; applying the Riesz representation theorem for Hilbert spaces, the limit takes the form ${\langle x, y \rangle}$ for some ${x}$ , and the claim follows. $\Box$
However, this proof does not extend easily to the concentration compactness setting, when there is also a group action. For this, we need a more “algorithmic” proof based on the “energy increment method”. We give one such (standard analysis) proof as follows:
Proof: As ${x_n}$ is bounded, we have some bound of the form

$\displaystyle \limsup_{n \rightarrow \infty} \|x_n\|^2 \leq E$

for some finite ${E}$ . Of course, this bound would persist if we passed from ${x_n}$ to a subsequence.
Suppose for contradiction that no subsequence of ${x_n}$ was weakly convergent. In particular, ${x_n}$ itself was not weakly convergent, which means that there exists ${y_1 \in H}$ for which ${\langle x_n,y_1\rangle}$ did not converge. We can take ${y_1}$ to be a unit vector. Applying the Bolzano-Weierstrass theorem, we can pass to a subsequence (which, by abuse of notation, we continue to call ${x_n}$ ) in which ${\langle x_n,y_1\rangle}$ converged to some non-zero limit ${c_1}$ . We can choose ${c_1}$ to be nearly maximal in magnitude among all possible choices of subsequence and of ${y_1}$ ; in particular, we have

$\displaystyle \limsup_{n \rightarrow \infty} |\langle x_n, y \rangle| \leq 2 |c_1|$

(say) for all other choices of unit vector ${y}$ .
We may now decompose

$\displaystyle x_{n} = c_1 \phi_1 + x'_{1,n} + w_{1,n}$

where ${x'_{1,n}}$ is orthogonal to ${\phi_1}$ and ${w_{1,n}}$ converges strongly to zero. From Pythagoras theorem we see that ${x'_{1,n}}$ asymptotically has strictly less energy than ${E}$ :

$\displaystyle \limsup_{n \rightarrow \infty} \|x'_{1,n}\|^2 \leq E - |c_1|^2.$

If ${x'_{1,n}}$ was weakly convergent, then ${x_n}$ would be too, so we may assume that it is not weakly convergent. Arguing as before, we may find a unit vector ${\phi_2}$ (which we can take to be orthogonal to ${\phi_1}$ ) and a constant ${c_2}$ such that (after passing to a subsequence, and abusing notation once more) one had a decomposition

$\displaystyle x'_{1,n} = c_2 \phi_2 + x'_{2,n} + w_{2,n}$

in which ${x'_{2,n}}$ is orthogonal to both ${\phi_1,\phi_2}$ and ${w_{2,n}}$ converges strongly to zero, and such that

$\displaystyle \limsup_{n \rightarrow \infty} |\langle x'_{1,n}, y \rangle| \leq 2 |c_2|$

for all unit vectors ${y}$ . From Pythagoras, we have

$\displaystyle \limsup_{n \rightarrow \infty} \|x'_{2,n}\|^2 \leq E - |c_1|^2 - |c_2|^2.$

We iterate this process to obtain an orthonormal sequence ${\phi_1,\phi_2,\ldots}$ and constants ${c_1,c_2,\ldots}$ obeying the Bessel inequality

$\displaystyle \sum_{k=1}^\infty |c_k|^2 \leq E$

(which, in particular, implies that the ${c_k}$ go to zero as ${k \rightarrow \infty}$ ) such that, for each ${k}$ , one has a subsequence of the ${x_n}$ for which one has a decomposition of the form

$\displaystyle x_n = \sum_{i=1}^{k-1} c_i \phi_i + x'_{k,n} + w_{k,n}$

where ${w_{k,n}}$ converges strongly to zero, and for which

$\displaystyle \limsup_{n \rightarrow \infty} |\langle x'_{k,n}, y \rangle| \leq 2 |c_{k+1}|$

for all unit vectors ${y}$ . The series ${\sum_{i=1}^\infty c_i \phi_i}$ then converges (conditionally in the strong topology) to a limit ${x}$ , and by diagonalising all the subsequences we obtain a final subsequence ${x_{n_j}}$ which converges weakly to ${x}$ . $\Box$
Now we give a third proof, which is a nonstandard analysis proof that is analogous to the second standard analysis proof given above.
The basics of nonstandard analysis are reviewed in this previous blog post (and see also this later post on ultralimit analysis, as well as the most recent post on this topic). Very briefly, we will need to fix a non-principal ultrafilter ${p \in \beta {\bf N} \backslash {\bf N}}$ on the natural numbers. Once one fixes this ultrafilter, one can define the ultralimit ${\lim_{n \rightarrow p} x_n}$ of any sequence of standard objects ${x_n}$ , defined as the equivalence class of all sequences ${(y_n)_{n \in {\bf N}}}$ such that ${\{ n \in {\bf N}: x_n = y_n \} \in p}$ . We then define the ultrapower ${{}^* X}$ of a standard set ${X}$ to be the collection of all ultralimits ${\lim_{n \rightarrow p} x_n}$ of sequences ${x_n}$ in ${X}$ . We can interpret ${{}^* X}$ as the space of all nonstandard elements of ${X}$ , with the standard space ${X}$ being embedded in the nonstandard one ${{}^* X}$ by identifying ${x}$ with its nonstandard counterpart ${{}^* x := \lim_{n \rightarrow p} x}$ . One can extend all (first-order) structures on ${X}$ to ${{}^* X}$ in the obvious manner, and a famous theorem of Los asserts that all first-order sentences that are true about a standard space ${X}$ , will also be true about the nonstandard space ${{}^* X}$ . Thus, for instance, the ultrapower ${{}^* H}$ of a standard Hilbert space ${H}$ over the standard complex numbers ${{\bf C}}$ will be a nonstandard Hilbert space ${{}^* H}$ over the nonstandard reals ${{}^* {\bf R}}$ or the nonstandard complex numbers ${{}^* {\bf C}}$ . It has a nonstandard inner product ${\langle, \rangle: {}^* H \times {}^* H \rightarrow {}^* {\bf C}}$ instead of a standard one, which obeys the nonstandard analogue of the Hilbert space axioms. In particular, it is complete in the nonstandard sense: any nonstandard Cauchy sequence ${(x_n)_{n \in {}^* {\bf N}}}$ of nonstandard vectors ${x_n \in {}^* H}$ indexed by the nonstandard natural numbers ${{}^* {\bf N}}$ will converge (again, in the nonstandard sense) to a limit ${x \in {}^* H}$ .
The ultrapower ${{}^* H}$ – the space of ultralimits ${\lim_{n \rightarrow p} x_n}$ of arbitrary sequences ${x_n}$ in ${H}$ – turns out to be too large and unwieldy to be helpful for us. We will work instead with a more tractable subquotient, defined as follows. Let ${O(H)}$ be the space of ultralimits ${\lim_{n \rightarrow p} x_n}$ of bounded sequences ${x_n \in H}$ , and let ${o(H)}$ be the space of ultralimits ${\lim_{n \rightarrow p} x_n}$ of sequences ${x_n \in H}$ that converge to zero. It is clear that ${o(H)}$ , ${O(H)}$ are vector spaces over the standard complex numbers ${{\bf C}}$ , with ${o(H)}$ being a subspace of ${O(H)}$ . (The space ${o(H)}$ is also known as the monad of the origin of ${H}$ .) We define the quotient space ${\tilde H := O(H) / o(H)}$ , which is then also a vector space over ${{\bf C}}$ . One easily verifies that ${H}$ is a subspace of ${O(H)}$ that is disjoint from ${o(H)}$ , so we can embed ${H}$ as a subspace of ${\tilde H}$ .

Remark 2 When ${H}$ is finite dimensional, the Bolzano-Weierstrass theorem (or more precisely, the proof of this theorem) shows that ${H = \tilde H}$ . For infinite-dimensional spaces, though, ${\tilde H}$ is larger than ${H}$ , basically because there exist bounded sequences in ${H}$ with no convergent subsequences. Thus we can view the quotient ${\tilde H/H}$ as measuring the failure of the Bolzano-Weierstrass theorem (a sort of “Bolzano-Weierstrass cohomology”, if you will).

Now we place a Hilbert space structure on ${\tilde H}$ . Observe that if ${x = \lim_{n \rightarrow p} x_n}$ and ${y = \lim_{n \rightarrow p} y_n}$ are elements of ${O(H)}$ (so that ${x_n, y_n}$ are bounded), then the nonstandard inner product ${\langle x,y\rangle = \lim_{n \rightarrow p} \langle x_n,y_n\rangle}$ is a nonstandard complex number which is bounded (i.e. it it lies in ${O({\bf C})}$ ). Since ${{\bf C} = O({\bf C})/o({\bf C})}$ , we can thus extract a standard part ${\hbox{st}\langle x,y\rangle}$ , defined as the unique standard complex number such that

$\displaystyle \langle x,y\rangle = \hbox{st}\langle x,y\rangle + o(1)$

where ${o(1)}$ denotes an infinitesimal, i.e. a non-standard quantity whose magnitude is less than any standard positive real ${\varepsilon > 0}$ . From the Cauchy-Schwarz inequality we see that if we modify either ${x}$ or ${y}$ by an element of ${o(H)}$ , then the standard part ${\hbox{st}\langle x,y \rangle}$ does not change. Thus, we see that the map ${x,y \mapsto \hbox{st}\langle x,y\rangle}$ on ${O(H)}$ descends to a map ${x,y \mapsto \langle x,y \rangle}$ on ${\tilde H}$ . One easily checks that this map is a standard Hermitian inner product on ${\tilde H}$ that extends the one on the subspace ${H}$ . (If one prefers to think in terms of commutative diagrams, one can think of the inner product as a bilinear map from the short exact sequence ${0 \rightarrow o(H) \rightarrow O(H) \rightarrow \tilde H \rightarrow 0}$ to the short exact sequence ${0 \rightarrow o({\bf C}) \rightarrow O({\bf C}) \rightarrow {\bf C} \rightarrow 0}$ .) Furthermore, by using the countable saturation (or Bolzano-Weierstrass) property of nonstandard analysis (see previous post), we can also show that ${\tilde H}$ is complete with respect to this inner product; thus ${\tilde H}$ is a standard Hilbert space that contains ${H}$ as a subspace. (One can view ${\tilde H}$ as a sort of nonstandard completion of ${H}$ , in a manner somewhat analogous to how the Stone-Cech compactification ${\beta X}$ of a space can be viewed as a topological completion of ${X}$ . This is of course consistent with the philosophy of the previous post.)
After all this setup, we can now give the third proof of Theorem 1:
Proof: Let ${z := \lim_{n \rightarrow p} x_n}$ be the ultralimit of the ${x_n}$ , then ${z}$ is an element of ${O(H)}$ . Let ${\tilde z}$ be the image of ${z}$ in ${\tilde H}$ , and let ${x}$ be the orthogonal projection of ${\tilde z}$ to ${H}$ . We claim that a subsequence of ${x_n}$ converges weakly to ${x}$ .
For any ${y \in H}$ , ${\tilde z - x}$ is orthogonal to ${y}$ , and thus ${\langle z-x, y \rangle = o(1)}$ . In other words,

$\displaystyle \lim_{n \rightarrow p} \langle x_n, y \rangle = \langle x, y \rangle + o(1) \ \ \ \ \ (1)$

for all ${y \in H}$ . This is already the nonstandard analogue of weak convergence along a subsequence, but we can get to weak convergence itself with only a little more argument. Indeed, from (1) we can easily construct a subsequence ${x_{n_j}}$ such that

$\displaystyle |\langle x_{n_j}, x_i \rangle - \langle x, x_i \rangle| \leq \frac{1}{j}$

and

$\displaystyle |\langle x_{n_j}, x \rangle - \langle x, x \rangle| \leq \frac{1}{j}$

for all ${1 \leq i \leq j}$ , which implies that

$\displaystyle \lim_{j \rightarrow \infty} \langle x_{n_j}, y \rangle = \langle x,y \rangle$

whenever ${y}$ is a finite linear combination of the ${x_i}$ and ${x}$ . Applying a density argument using the boundedness of the ${x_n}$ , this is then true for all ${y}$ in the closed span of the ${x_i}$ and ${x}$ ; it is also clearly true for ${y}$ in the orthogonal complement, and the claim follows. $\Box$
Observe that in contrast with the first two proofs, the third proof gave a “canonical” choice for the subsequence limit ${x}$ . This is ultimately because the ultrafilter ${p}$ already “made all the choices beforehand”, in some sense.
Observe also that we used the existence of orthogonal projections in Hilbert spaces in the above proof. If one unpacks the usual proof that these projections exist, one will find an energy increment argument that is not dissimilar to that used in the second proof of Theorem 1. Thus we see that the somewhat intricate energy increment argument from that second proof has in some sense been encapsulated into a general-purpose package in the nonstandard setting, namely the existence of orthogonal projections.

— 2. Concentration compactness for unitary group actions —

Now we generalise the sequential Banach-Alaoglu theorem to allow for a group of symmetries. The setup is now that of a (standard) complex vector space ${H}$ , together with a locally compact group ${G}$ acting unitarily on ${H}$ in a jointly continuous manner, thus the map ${(g,x) \mapsto gx}$ is jointly continuous from ${G \times H}$ to ${H}$ (or equivalently, the representation map from ${G}$ to ${U(H)}$ is continuous if we give ${U(H)}$ the strong operator topology). We also assume that ${G}$ is a group of dislocations, which means that ${g_n x}$ converges weakly to zero in ${H}$ whenever ${x \in H}$ and ${g_n}$ goes to infinity in ${G}$ (which means that ${g_n}$ eventually escapes any given compact subset of ${G}$ ). A typical example of such a group is the translation action ${h: f(\cdot) \mapsto f(\cdot-h)}$ of ${{\bf R}^d}$ on ${L^2({\bf R}^d)}$ , another example is the scaling action ${\lambda: f(\cdot) \mapsto \frac{1}{\lambda^{d/2}} f(\frac{\cdot}{\lambda})}$ of ${{\bf R}^+}$ on ${L^2({\bf R}^d)}$ . (One can also combine these two actions to give an action of the semidirect product ${{\bf R}^+ \ltimes {\bf R}^d}$ on ${L^2({\bf R}^d)}$ .)
The basic theorem here is

Theorem 3 (Profile decomposition) Let ${G, H}$ be as above. Let ${x_n}$ be a bounded sequence in ${H}$ obeying the energy bound

$\displaystyle \limsup_{n \rightarrow \infty} \|x_n\|^2 \leq E.$

Then, after passing to a subsequence, one can find a sequence ${\phi_1, \phi_2, \ldots \in H}$ with the Bessel inequality

$\displaystyle \sum_{k=1}^\infty \|\phi_k\|^2 \leq E$

and group elements ${g_{k,n} \in G}$ for ${k,n \in {\bf N}}$ such that

$\displaystyle g_{k',n}^{-1} g_{k,n} \rightarrow \infty \hbox{ as } n \rightarrow \infty$

whenever ${k \neq k'}$ and ${\phi_k, \phi_{k'}}$ are non-zero, such that for each ${K \in {\bf N}}$ one has the decomposition

$\displaystyle x_n = \sum_{k=1}^K g_{k,n} \phi_k + w_{K,n}$

such that

$\displaystyle \limsup_{n \rightarrow \infty} \|w_{K,n}\|^2 \leq E - \sum_{k=1}^K \|\phi_k\|^2$

and

$\displaystyle \limsup_{n \rightarrow \infty} \sup_{g \in G} |\langle g^{-1} w_{K,n}, y\rangle|^2 \leq \sum_{k=K+1}^\infty \|\phi_k\|^2$

for all unit vectors ${y}$ , and such that ${g_{k,n}^{-1} w_{K,n}}$ converges weakly to zero for every ${1 \leq k \leq K}$ .

Note that Theorem 1 is the case when ${G}$ is trivial.
There is a version of the conclusion available in which ${K}$ can be taken to be infinite, and also one can generalise ${G}$ to be a more general object than a group by modifying the hypotheses somewhat; see this paper of Schindler and Tintarev. The version with finite ${K}$ is slightly more convenient though for applications to nonlinear dispersive and wave equations; see these lecture notes of Killip and Visan for some applications of this type of decomposition. In order for this theorem to be useful for applications, one needs to exploit some sort of inverse theorem that controls other norms of a vector ${w}$ in terms of expressions such as ${\sup_{g \in G} |\langle gw, y \rangle|}$ ; these theorems tend to require “hard” harmonic analysis and cannot be established purely by such “soft” analysis tools as nonstandard analysis.
One can adapt the second proof of Theorem 1 to give a standard analysis proof of Theorem 3:
Proof: (Sketch) Applying Theorem 1 we can (after passing to a subsequence) find group elements ${g_{1,n}}$ such that ${g_{1,n}^{-1} x_n}$ converges weakly to a limit ${\phi_1 \in H}$ , which we can choose to be nearly maximal in the sense that

$\displaystyle \| \phi'_1 \| \leq 2 \| \phi_1 \|$

(say) whenever ${\phi_1}$ is the weak limit of ${g_{n_j}^{-1} x_{n_j}}$ for some subsequence ${x_{n_j}}$ and some collection of group elements ${g_{n_j}}$ . In particular, this implies (from further application of Theorem 1, and an argument by contradiction) that

$\displaystyle \limsup_{n \rightarrow \infty} \sup_{g \in G} |\langle g^{-1} x_n, y \rangle| \leq 2 \| \phi_1 \|$

for any unit vector ${y}$ .
We may now decompose

$\displaystyle x_{n} = g_{1,n} \phi_1 + w_{1,n}$

where ${g_{1,n}^{-1} w_{1,n}}$ converges weakly to zero. From Pythagoras theorem we see that ${w_{1,n}}$ asymptotically has strictly less energy than ${E}$ :

$\displaystyle \limsup_{n \rightarrow \infty} \|w_{1,n}\|^2 \leq E - \|\phi_1\|^2.$

We then repeat the argument, passing to a further subsequence and finding group elements ${g_{2,n}}$ such that ${g_{2,n}^{-1} w_{1,n}}$ converges weakly to ${\phi_2 \in H}$ , with

$\displaystyle \limsup_{n \rightarrow \infty} \sup_{g \in G} |\langle g^{-1} x_{1,n}, y \rangle| \leq 2 \| \phi_2 \|$

for any unit vector ${y}$ .
Note that ${g_{1,n}^{-1} w_{1,n}}$ converges weakly to zero, while ${g_{2,n}^{-1} w_{1,n}}$ converges weakly to ${\phi_2}$ . If ${\phi_2}$ is non-zero, this implies that ${g_{1,n}^{-1} g_{2,n}}$ must go to infinity (otherwise it has a convergent subsequence, and this soon leads to a contradiction).
If one iterates the above construction and passes to a diagonal subsequence one obtains the claim. $\Box$
Now we give the nonstandard analysis proof. As before, we introduce the short exact sequence of Hilbert spaces:

$\displaystyle 0 \rightarrow o(H) \rightarrow O(H) \rightarrow \tilde H \rightarrow 0.$

We will also need an analogous short exact sequence of groups

$\displaystyle 0 \rightarrow o(G) \rightarrow O(G) \rightarrow G \rightarrow 0$

where ${O(G) \leq {}^* G}$ is the space of ultralimits ${\lim_{n \rightarrow p} g_n}$ of sequences ${g_n}$ in ${G}$ that lie in a compact subset of ${G}$ , and ${o(G) \leq O(G)}$ is the space of ultralimits of ${\lim_{n \rightarrow p} g_n}$ of sequences ${g_n}$ that converge to the identity element (i.e. ${o(G)}$ is the monad of the group identity). One easily verifies that ${o(G)}$ is a normal subgroup of ${O(G)}$ , and that the quotient is isomorphic to ${G}$ . (Indeed, ${O(G)}$ can be expressed as a semi-direct product ${G \ltimes o(G)}$ , though we will not need this fact here.)
The group ${{}^* G}$ acts unitarily on ${{}^* H}$ , and so preserves both ${o(H)}$ and ${O(H)}$ . As such, it also acts unitarily on ${\tilde H}$ . The induced action of the subgroup ${o(G)}$ is trivial; and the induced action of the subgroup ${O(G)}$ preserves ${H}$ .
Let ${\langle ({}^* G) H \rangle}$ be the closed span of the set ${\{ gx: g \in {}^* G; h \in H \}}$ in ${\tilde H}$ ; this is a Hilbert space. Inside this space we have the subspaces ${gH}$ for ${g \in {}^* G}$ . As ${O(G)}$ preserves ${H}$ , we see that ${gH = g'H}$ whenever ${g, g'}$ lie in the same coset of ${O(G)}$ , so we can define ${\gamma H}$ for any ${\gamma \in {}^* G / O(G)}$ in a well-defined manner. On the other hand, if ${g, g'}$ do not lie in the same coset of ${O(G)}$ , then we have ${g' = g \lim_{n \rightarrow p} h_n}$ for some sequence ${h_n}$ in ${G}$ that goes to infinity. As ${G}$ is a group of dislocations, we conclude that ${g' H}$ and ${gH}$ are now orthogonal. In other words, ${\gamma' H}$ and ${\gamma H}$ are orthogonal whenever ${\gamma, \gamma' \in {}^* G / O(G)}$ are distinct. We conclude that we have the decomposition

$\displaystyle \langle ({}^* G) H \rangle = \bigoplus_{\gamma \in {}^* G/O(G)} \gamma H \ \ \ \ \ (2)$

where ${\bigoplus}$ is the Hilbert space direct sum.
Now we can prove Theorem 3. As in the previous section, starting with a bounded sequence ${x_n}$ in ${H}$ , we form the ultralimit ${z := \lim_{n \rightarrow p} x_n \in O(H)}$ and the image ${\tilde z \in \tilde H}$ . We let ${x}$ be the orthogonal projection of ${\tilde z}$ to ${\langle ({}^* G) H \rangle}$ . By (2), we can write

$\displaystyle x = \sum_k g_k \phi_k$

for some at most countable sequence of vectors ${\phi_k \in H}$ and ${g_k \in {}^* G}$ , with the ${g_n}$ lying in distinct cosets of ${O(G)}$ . In particular, for any ${k \neq k'}$ , ${g_{k'}^{-1} g_k}$ is the ultralimit of a sequence of vectors going to infinity. By adding dummy values of ${g_k,\phi_k}$ if necessary we may assume that ${k}$ ranges from ${1}$ to infinity. Also, one has the Bessel inequality

$\displaystyle \sum_k \| \phi_k\|^2 = \|x\|^2 \leq \|z\|^2 \leq E$

and from Cauchy-Schwarz and Bessel one has

$\displaystyle |\langle z-\sum_{k=1}^K g_k \phi_k, g y \rangle| \leq \sum_{k=K+1}^\infty \|\phi_k\|^2.$

for any unit vector ${y \in H}$ and ${g \in G}$ . From this we can obtain the required conclusions by arguing as in the previous section.

— 3. Concentration compactness for measures —

We now give a variant of the profile decomposition, for Borel probability measures ${\mu_n}$ on ${{\bf R}^d}$ . Recall that such a sequence is said to be tight if, for every ${\varepsilon > 0}$ , there is a ball ${B(0,R)}$ such that ${\limsup_{n \rightarrow \infty} \mu_n({\bf R}^d \backslash B(0,R)) \leq \varepsilon}$ . Given any Borel probability measure ${\mu}$ on ${{\bf R}^d}$ and any ${x \in {\bf R}^d}$ , define the translate ${\tau_x \mu}$ to be the Borel probability measure given by the formula ${\tau_x \mu(E) := \mu(E-x)}$ .

Theorem 4 (Profile decomposition for probability measures on ${{\bf R}^d}$ ) Let ${\mu_n}$ be a sequence of Borel probability measures on ${{\bf R}^d}$ . Then, after passing to a subsequence, one can find a sequence ${c_k}$ of non-negative real numbers with ${\sum_k c_k \leq 1}$ , a tight sequence ${\nu_{k,n}}$ of positive measures whose mass converges to ${1}$ as ${n \rightarrow \infty}$ for fixed ${k}$ , and shifts ${x_{k,n} \in {\bf R}^d}$ such that

$\displaystyle x_{k,n} - x_{k',n} \rightarrow \infty \hbox{ as } n \rightarrow \infty$

for all ${k \neq k'}$ , and such that for each ${K}$ , one has the decomposition

$\displaystyle \mu_n = \sum_{k=1}^K c_k \tau_{k,n} \nu_{k,n} + \rho_{K,n}$

where the error ${\rho_{K,n}}$ obeys the bounds

$\displaystyle \limsup_{n \rightarrow \infty} \sup_{x \in {\bf R}^d} \rho_{K,n}( B(x,R) ) \leq \sup_{k \geq K} c_k$

and

$\displaystyle \lim_{n \rightarrow \infty} \rho_{K,n}( B(x_{k,n},R) ) = 0$

for all radii ${R}$ and ${1 \leq k \leq K}$ .
Furthermore, one can ensure that for each ${k}$ , ${\nu_{k,n}}$ converges in the vague topology to a probability measure ${\nu_k}$ .

We first give the standard proof of this theorem:
Proof: (Sketch) Suppose first that

$\displaystyle \limsup_{n \rightarrow \infty} \sup_{x \in {\bf R}^d} \mu_n(B(x,R)) = 0$

for all ${R}$ . Then we are done by setting all the ${c_k}$ equal to zero, and ${\rho_{K,n} = \mu_n}$ . So we may assume that we can find ${R}$ such that

$\displaystyle \limsup_{n \rightarrow \infty} \sup_{x \in {\bf R}^d} \mu_n(B(x,R)) = \alpha$

for some ${\alpha > 0}$ ; we may also assume that ${\alpha}$ is approximately maximal in the sense that

$\displaystyle \limsup_{n \rightarrow \infty} \sup_{x \in {\bf R}^d} \mu_n(B(x,R')) \leq 2\alpha$

(say) for all other radii ${R'}$ . By passing to a subsequence, we may thus find ${x_{1,n} \in {\bf R}^d}$ such that

$\displaystyle \lim_{n \rightarrow \infty} \mu_n(B(x_{1,n},R)) = \alpha;$

By passing to a further subsequence using the Helly selection principle (or the sequential Banach-Alaoglu theorem), we may assume that the translates ${\tau_{-x_{1,n}} \mu_n}$ converge in the vague topology to a limit of total mass at most ${1}$ and at least ${\alpha}$ , and which can be expressed as ${c_1 \nu_1}$ for some ${c_1 \geq \alpha}$ and a probability measure ${\nu_1}$ .
As ${\tau_{-x_{1,n}} \mu_n}$ converges vaguely to ${c_1 \nu_1}$ , we have

$\displaystyle \limsup_{n \rightarrow \infty} \tau_{-x_{1,n}} \mu_n( B( 0,R') \backslash B(0,R) ) \leq c_1 \nu_1( {\bf R}^d \backslash B(0,R/2) )$

for any ${0 < R < R'}$ . By making ${R'_n}$ grow sufficiently slowly to infinity with respect to ${n}$ , we may thus ensure that

$\displaystyle \limsup_{n \rightarrow \infty} \tau_{-x_{1,n}} \mu_n( B( 0,R'_n) \backslash B(0,R) ) \leq c_1 \nu_1( {\bf R}^d \backslash B(0,R/2) )$

for all integers ${R>0}$ . If we then set ${c_1 \tilde \nu_{1,n}}$ to be the restriction of ${\tau_{-x_{1,n}} \mu_n}$ to ${B(0,R'_n)}$ , we see that ${\tilde \nu_{1,n}}$ is tight, converges vaguely to ${\nu_{1,n}}$ , and has total mass converging to ${1}$ . We can thus split

$\displaystyle \mu_n = c_1 \tau_{x_{1,n}} \tilde \nu_{1,n} + \rho_{1,n}$

for some residual positive measure ${\rho_{1,n}}$ of total mass converging to ${1-c_1}$ , and such that ${\rho_{1,n}(B(x_{1,n},R)) \rightarrow 0}$ as ${n \rightarrow \infty}$ for any fixed ${R}$ . We can then iterate this procedure to obtain the claims of the theorem (after one last diagonalisation to combine together all the subsequences). $\Box$
Now we give the nonstandard proof. We take the ultralimit ${\mu := \lim_{n \rightarrow p} \mu_n}$ of the standard Borel probability measures ${\mu_n}$ on ${{\bf R}^d}$ , resulting in a nonstandard Borel probability measure. What, exactly, is a nonstandard Borel probability measure? A standard Borel probability measure, such as ${\mu_n}$ , is a map ${\mu_n: {\mathcal B} \rightarrow [0,1]}$ from the standard Borel ${\sigma}$ -algebra ${{\mathcal B}}$ to the unit interval ${[0,1]}$ which is countably additive and maps ${{\bf R}^n}$ to ${1}$ . Thus, the nonstandard Borel probability measure is a nonstandard map ${\mu: {}^* {\mathcal B} \rightarrow {}^* [0,1]}$ from the nonstandard Borel ${\sigma}$ -algebra (the collection of all ultralimits of standard Borel sets) to the nonstandard interval ${{}^* [0,1]}$ which is nonstandardly countably additive and maps ${{}^* {\bf R}^n}$ to ${1}$ . In particular, it is finitely additive.
There is an important subtlety here. The nonstandard Borel ${\sigma}$ -algebra is closed under nonstandard countable unions: if ${(E_n)_{n \in {}^* {\bf N}}}$ is a nonstandard countable sequence of nonstandard Borel sets (i.e. an ultralimit of standard countable sequences ${(E_{n,m})_{n \in {\bf N}}}$ of standard Borel sets), then ${\bigcup_n E_n}$ is also nonstandard Borel, but this is not necessarily the case for external countable unions, thus if ${(E_n)_{n \in {\bf N}}}$ is an external countable sequence of nonstandard Borel sets, then ${\bigcup_n E_n}$ need not be nonstandard Borel. On the other hand, ${{\mathcal B}}$ is certainly still closed under finite unions and other finite Boolean operations, so it can be viewed (externally) as a Boolean algebra, at least.
Now we perform the Loeb measure construction (which was also introduced in the previous post). Consider the standard part ${\hbox{st}(\mu)}$ of ${\mu}$ ; this is a finitely additive map from ${{}^* {\mathcal B}}$ to ${[0,1]}$ . From the countable saturation property, one can verify that this map is a premeasure, and so (by the Hahn-Kolmogorov theorem) extends to a countably additive probability measure ${\tilde \mu}$ on the measure-theoretic completion ${\tilde {\mathcal B} := \overline{\langle {}^* {\mathcal B} \rangle}}$ of ${{}^* {\mathcal B}}$ .
The measure ${\tilde \mu}$ is a measure on ${{}^* {\bf R}^d}$ . We push it forward to the quotient space ${{}^* {\bf R}^d/O({\bf R}^d)}$ by the obvious quotient map ${\pi: {}^* {\bf R}^d \rightarrow {}^* {\bf R}^d/O({\bf R}^d)}$ to obtain a pushforward measure ${\pi_* \tilde \mu}$ on the pushforward ${\sigma}$ -algebra ${\pi_* \tilde {\mathcal B}}$ , which consists of all (external) subsets ${E}$ of ${{}^* {\bf R}^d/O({\bf R}^d)}$ whose preimage ${\pi^{-1}(E)}$ is measurable in ${\tilde {\mathcal B}}$ .
We claim that every point in ${{}^* {\bf R}^d/O({\bf R}^d)}$ is measurable in ${\pi_* \tilde {\mathcal B}}$ , or equivalently that every coset ${x+O({\bf R}^d)}$ in ${{}^* {\bf R}^d}$ is measurable in ${{\mathcal B}}$ . Indeed, this coset is the union of the countable family of (nonstandard) balls ${\{ y \in {}^* {\bf R}^d: |x-y| < n \}}$ for ${n \in {\bf N}}$ , each one of which is a nonstandard Borel set and thus measurable in ${\tilde {\mathcal B}}$ .
Because of this, we can decompose the measure ${\pi_* \tilde \mu}$ into pure point and singular components, thus

$\displaystyle \pi_* \tilde \mu = \sum_{k} c_k \delta_{x_k+O({\bf R}^d)} + \rho$

where ${c_k}$ are standard non-positive reals, ${k}$ ranges over an at most countable set, ${x_k + O({\bf R}^d)}$ are disjoint cosets in ${{}^* {\bf R}^d/O({\bf R}^d)}$ , and ${\rho}$ is a finite measure on ${\pi_* \tilde {\mathcal B}}$ such that

$\displaystyle \sum_k c_k + \|\rho \| = 1$

and

$\displaystyle \rho(\{ x + O({\bf R}^d) \}) = 0$

for every coset ${x+O({\bf R}^d)}$ .
Now we analyse the restriction of ${\tilde \mu}$ to a single coset ${x_k+O({\bf R}^d)}$ , which has total mass ${c_k}$ . For any standard continuous, compactly supported function ${f: {\bf R}^d \rightarrow {\bf R}}$ , one can form the integral

$\displaystyle \int_{x_k+O({\bf R}^d)} {}^* f(x -x_k)\ d\tilde \mu(x).$

This is a non-negative continuous linear functional, so by the Riesz representation theorem there exists a non-negative Radon measure ${\nu_k}$ on ${{\bf R}^d}$ such that

$\displaystyle \int_{x_k+O({\bf R}^d)} {}^* f(x -x_k)\ d\tilde \mu(x) = c_k \int_{{\bf R}^d} f(y)\ d\nu_k(y)$

for all such ${f}$ . As ${x_k+O({\bf R}^d)}$ has total mass ${c_k}$ , ${\nu_k}$ is a probability measure. From definition of ${\tilde \mu}$ , we thus have

$\displaystyle \int_{{}^* {\bf R}^d} {}^* f(x -x_k)\ d\mu(x) = c_k \int_{{\bf R}^d} f(y)\ d\nu_k(y) + o(1)$

for all ${f}$ .
We have

$\displaystyle \mu(B(x_k,R)) \leq c_k+o(1)$

for every standard ${R}$ , and thus by the overspill principle there exists an unbounded ${R_k}$ for which

$\displaystyle \mu(B(x_k,R_k)) \leq c_k+o(1);$

since ${\mu(x_k + O({\bf R}^d)) = c_k}$ , we thus have

$\displaystyle \mu(B(x_k,R_k)) = c_k+o(1);$

If we set ${c_k \tilde \nu_k}$ to be the restriction of ${\tau_{-x_k} \mu}$ to ${B(0,R_k)}$ , we thus see that

$\displaystyle \int_{{}^* {\bf R}^d} {}^* f(x)\ d\tilde \nu_k(y) = \int_{{\bf R}^d} f(y)\ d\nu_k(y) + o(1)$

for all test functions ${f}$ . Writing ${\tilde \nu_k}$ as the ultralimit of probability measures ${\tilde \nu_{k,n}}$ , we thus see (upon passing to a subsequence) that ${\tilde \nu_{k,k}}$ converges vaguely to the probability measure ${\nu_k}$ , and is in particular tight.
For any standard ${K \geq 1}$ , we can write

$\displaystyle \mu = \sum_{k=1}^K c_k \tau_{x_k} \nu_k + \rho_K$

where ${\rho_K}$ is a finite measure. Letting ${\tilde \rho_K}$ be the Loeb extension of the standard part of ${\rho_K}$ , we see that ${\tilde \rho_K}$ assigns zero mass to ${x_k+O({\bf R}^d)}$ for ${k \leq K}$ and assigns a mass of at most ${\sup_{k > K} c_k}$ to any other coset of ${O({\bf R}^d)}$ . This implies that

$\displaystyle \tilde \rho_K( B(x,R) ) \leq \sup_{k > K} c_k + o(1)$

for any standard ${R}$ . Expressing ${\rho_K}$ as an ultralimit of ${\rho_{K,n}}$ , we then obtain the claim.

21 comments

Comments feed for this article

29 November, 2010 at 1:43 pm

Anonymous

Prof. Tao, do you intend to teach a course in nonstandard analysis?

29 November, 2010 at 4:24 pm

Terence Tao

The short answer is no (any more than I would teach a course on, say, the construction of number systems; it is too foundational a subject for the type of course topics I have in mind). But there may be occasion to develop some nonstandard arguments within a broader course topic. For instance, in my 254B course on higher order Fourier analysis last year, I mentioned the nonstandard approach to equidistribution, for instance in Notes 1 of that course. Somewhat relatedly, I also mentioned the ultrafilter approach to Ramsey theory in my 254A course on ergodic theory from the previous year (see e.g. Notes 3 from that course).

29 November, 2010 at 6:10 pm

Anonymous

The course on ergodic theory should be “254A” :) [Corrected, thanks – T.]

30 November, 2010 at 4:48 am

Anonymous

Prof. Tao,

Which courses are you planning to teach in near future? I hope you teach a grad level complex analysis and probability theory…

Thanks

29 November, 2010 at 5:12 pm

notedscholar

I wonder if you have read my work on Infinity? I do not want to link which would be inappropriate, but just asking.

Cheers,
NS

30 November, 2010 at 11:56 am

Ulrich Kohlenbach

Dear Terry,

many thanks for your interesting posting.

In connection with your treatment of the sequential weak compactntess in Hilbert space I like to mention three recent papers of mine:

1) In my paper “Goedel functional interpretation and weak compactness”
(available at http://www.mathematik.tu-darmstadt.de/~kohlenbach/weakcompactness-els.pdf )
I carry out (a variant version) of the Goedel “Dialectica” interpretation of the standard proof for the weak compactness as sketched in your posting. This results in a certain effective functional Omega* that comprises the computational content of that principle in the sense that it is precisely Omega* that is needed to extract bounds from proofs of combinatorial statements that use weak compactness. Omega* is primitive recursive in Spector’s bar recursion of lowest type. This is optimal in the sense that already the usual Bolzano-Weierstrass principle [0,1] requires this. In particular, Omega* locally stays within Goedel’s primitive recursive functionals of finite type. The construction also uses energy increment ideas. It would be interesting to see whether your 2nd proof results in a simpler bound that e.g. may need only one use of bar recursion (corresponding to the use of the Bolzano-Weierstrass principle) whereas I have a 2nd use corresponding to the Riesz representation theorem.

2) In “A uniform quantitative version of sequential weak compactness
and Baillon’s nonlinear ergodic theorem” (also available from the above
site) I use Omega* to extract an explicit uniform bound on a metastable
(in your sense) version of Baillon’s nonlinear ergodic theorem.

3) Interestingly, when weak compactness is used to prove a strong convergence result, the quantitative analysis in the spirit of Goedel’s
interpretation often seems to be able to eliminate weak compactness
altogether: see my recent paper:
“On quantitative versions of theorems due to F.E. Browder and R. Wittmann. Advances in Mathematics 226, pp. 2764-2795 (2011).

With best regards,
Ulrich

30 November, 2010 at 9:18 pm

Bright, YU

Hi, Terry. I wanna ask a question not directly related to this post, but somehow I find it interesting and maybe you have an answer to it.

Suppose u is a harmonic function on the Euclidean Plane, given (x,y),then by Mean Value Property, u(x,y) should equals the mean value of u along a circle with radius r centred at (x,y).
With the mean value theorem for integral, u should attains the value at some point(s), say (m,n) on the circle.
SO, my question is that what is the set of such points if we pick one (m,n) on each circle and let r range from 0 to infinity?

Looking forward to your reply.

Bright, YU

2 December, 2010 at 9:24 am

J.P. McCarthy

Bright,

Your question is a bit on the general side.

For example, for a constant function, the set of points ( $(m,n)_r$ ?) is the entire circle for all $r$ .

For a linear function, such as $f(x,y)=x+y$ , the set of points $\{(m,n)_r:r\geq 0\}$ , if we choose for example $(x,y)=(1,1)$ , is going to correspond to half of the circle (a picture will show you which half).

If however you pick $(x,y)=(0,0)$ the points are always going to be those corresponding to $\theta =3\pi/4,-\pi/4$ .

J.P.

4 December, 2010 at 4:51 pm

iori1986iori

大哥，你好强悍！big brother, u really smart

6 December, 2010 at 10:14 pm

katz

I have a naive question. Is this related to concentration of measure (around a median) a la Paul Levy and vitaly Milman?

7 December, 2010 at 10:44 am

Terence Tao

As far as I know there is no connection. Concentration of measure is ultimately coming from the law of large numbers, whereas the type of concentration on subsequences seen here is coming from things like the Bolzano-Weierstrass theorem. In both cases there is convergence to a limit, but other than that there appears to be no further relationship.

13 December, 2010 at 1:08 am

Igor Carron

Terry,

I have checked with both my blogging software (in my Blog list) and Google reader and it looks like your RSS feed has stopped as of December 6th. The last entry I get has the title: “Strongly dense fre”. I just re-subscribed to your feed and get the same result. If I am the only one to get this then sorry, if not…

Cheers,

Igor.

16 December, 2010 at 12:19 am

Qiaochu Yuan

This happened to me as well. I can’t imagine what would cause this. Perhaps the feed has a maximum size which has been exceeded?

17 December, 2010 at 2:06 am

Igor Carron

It’s fixed now. Thanks.

15 December, 2010 at 6:01 am

Avery Carr

Prof. Tao,

Can Concentration Compactness be applied to the Invariant Subspace Problem for Hilbert Spaces? Thank you.

Avery Carr

9 April, 2011 at 7:04 pm

peter p.

Apparently, P.L. Lions makes the comment in his 1984 paper that “This crucial lemma is proved with the help of the notion of the concentration function of a measure -introduced by P. Levy [14]”. He was referring to the concentration compactness lemma, I think, which occupies a big portion of his paper. Maybe worth a look.

4 August, 2011 at 6:53 pm

Localisation and compactness properties of the Navier-Stokes global regularity problem « What’s new

[…] symmetry that is available for the space . (Concentration compactness is discussed in these previous blog posts.) One then has to deal with sequences of data that are not strongly convergent, but are […]

15 October, 2011 at 10:58 am

254A, Notes 6: Ultraproducts as a bridge between hard analysis and soft analysis « What’s new

[…] two constructions are at least partially interchangeable in this setting. (See also these previous posts for the use of ultralimits as a substitute for topological limits.) In the theory of approximate […]

25 October, 2012 at 10:10 am

Walsh’s ergodic theorem, metastability, and external Cauchy convergence « What’s new

[…] us to reduce the need to invoke the nonstandard measure theory of Loeb, discussed for instance in this blog post); we will use the notion of a (real) commutative probability space , which for us will be a […]

7 December, 2013 at 4:06 pm

Ultraproducts as a Bridge Between Discrete and Continuous Analysis | What's new

[…] analysis can be used as a framework to describe the theory of concentration compactness; see this previous blog post for further discussion. Finally, if one starts with a finitely generated group with a word metric […]

4 January, 2021 at 6:46 am

Go 2 It

Some typos:

in the middle of section 2, “do not lie in the same coset of ${}^* H$ “: should be $O(G)$

in the nonstandard proof of theorem 3, “nonstandard interval $[0,1]$ “: should be ${}^* [0,1]$

following “obvious quotient map”, missing “ $O(\mathbb{R}^d) \rightarrow$ ”

[Corrected, thanks – T.]

	Anonymous on 254A, Supplement 4: Probabilis…
	Terence Tao on Analysis II
	Anonymous on Analysis II
	El problema de Erdős… on Two announcements: AI for Math…
	Anonymous on An airport-inspired puzzle
	oliverknill on Two announcements: AI for Math…
	Anonymous on An airport-inspired puzzle
	Prashant Patil on Two announcements: AI for Math…
	Anonymous on Two announcements: AI for Math…
	Anonymous on Two announcements: AI for Math…
	Anonymous on 275A, Notes 3: The weak and st…
	Anonymous on 275A, Notes 3: The weak and st…
	Anonymous on Two announcements: AI for Math…
	Anonymous on Two announcements: AI for Math…
	Lior Silberman on Two announcements: AI for Math…

Concentration compactness via nonstandard analysis

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

21 comments

Leave a reply to Anonymous Cancel reply

For commenters

Concentration compactness via nonstandard analysis

Share this:

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

21 comments

Leave a reply to Anonymous Cancel reply

For commenters