One of the key difficulties in performing analysis in infinite-dimensional function spaces, as opposed to finite-dimensional vector spaces, is that the Bolzano-Weierstrass theorem no longer holds: a bounded sequence in an infinite-dimensional function space need not have any convergent subsequences (when viewed using the strong topology). To put it another way, the closed unit ball in an infinite-dimensional function space usually fails to be (sequentially) compact.
As compactness is such a useful property to have in analysis, various tools have been developed over the years to try to salvage some sort of substitute for the compactness property in infinite-dimensional spaces. One of these tools is concentration compactness, which was discussed previously on this blog. This can be viewed as a compromise between weak compactness (which is true in very general circumstances, but is often too weak for applications) and strong compactness (which would be very useful in applications, but is usually false), in which one obtains convergence in an intermediate sense that involves a group of symmetries acting on the function space in question.
Concentration compactness is usually stated and proved in the language of standard analysis: epsilons and deltas, limits and supremas, and so forth. In this post, I wanted to note that one could also state and prove the basic foundations of concentration compactness in the framework of nonstandard analysis, in which one now deals with infinitesimals and ultralimits instead of epsilons and ordinary limits. This is a fairly mild change of viewpoint, but I found it to be informative to view this subject from a slightly different perspective. The nonstandard proofs require a fair amount of general machinery to set up, but conversely, once all the machinery is up and running, the proofs become slightly shorter, and can exploit tools from (standard) infinitary analysis, such as orthogonal projections in Hilbert spaces, or the continuous-pure point decomposition of measures. Because of the substantial amount of setup required, nonstandard proofs tend to have significantly more net complexity than their standard counterparts when it comes to basic results (such as those presented in this post), but the gap between the two narrows when the results become more difficult, and for particularly intricate and deep results it can happen that nonstandard proofs end up being simpler overall than their standard analogues, particularly if the nonstandard proof is able to tap the power of some existing mature body of infinitary mathematics (e.g. ergodic theory, measure theory, Hilbert space theory, or topological group theory) which is difficult to directly access in the standard formulation of the argument.

— 1. Weak sequential compactness in a Hilbert space —

Before turning to concentration compactness, we will warm up with the simpler situation of weak sequential compactness in a Hilbert space. For sake of notation we shall only consider complex Hilbert spaces, although all the discussion here works equally well for real Hilbert spaces.
Recall that a bounded sequence {x_n} of vectors in a Hilbert space {H} is said to converge weakly to a limit {x} if one has {\langle x_n,y\rangle \rightarrow \langle x,y \rangle} for all {y \in H}. We have the following basic theorem:

Theorem 1 (Sequential Banach-Alaoglu theorem) Every bounded sequence {x_n} of vectors in a Hilbert space {H} has a weakly convergent subsequence.

The usual (standard analysis) proof of this theorem runs as follows:
Proof: (Sketch) By restricting to the closed span of the {x_n}, we may assume without loss of generality that {H} is separable. Letting {y_1, y_2, \ldots} be a dense subet of {H}, we may apply the Bolzano-Weierstrass theorem iteratively, followed by the Arzelá-Ascoli diagonalisation argument, to find a subsequence {x_{n_j}} for which {\langle x_{n_j}, y_m \rangle} converges to a limit for each {m}. Using the boundedness of the {x_{n_j}} and a density argument, we conclude that {\langle x_{n_j}, y \rangle} converges to a limit for each {y}; applying the Riesz representation theorem for Hilbert spaces, the limit takes the form {\langle x, y \rangle} for some {x}, and the claim follows. \Box
However, this proof does not extend easily to the concentration compactness setting, when there is also a group action. For this, we need a more “algorithmic” proof based on the “energy increment method”. We give one such (standard analysis) proof as follows:
Proof: As {x_n} is bounded, we have some bound of the form

\displaystyle  \limsup_{n \rightarrow \infty} \|x_n\|^2 \leq E

for some finite {E}. Of course, this bound would persist if we passed from {x_n} to a subsequence.
Suppose for contradiction that no subsequence of {x_n} was weakly convergent. In particular, {x_n} itself was not weakly convergent, which means that there exists {y_1 \in H} for which {\langle x_n,y_1\rangle} did not converge. We can take {y_1} to be a unit vector. Applying the Bolzano-Weierstrass theorem, we can pass to a subsequence (which, by abuse of notation, we continue to call {x_n}) in which {\langle x_n,y_1\rangle} converged to some non-zero limit {c_1}. We can choose {c_1} to be nearly maximal in magnitude among all possible choices of subsequence and of {y_1}; in particular, we have

\displaystyle \limsup_{n \rightarrow \infty} |\langle x_n, y \rangle| \leq 2 |c_1|

(say) for all other choices of unit vector {y}.
We may now decompose

\displaystyle  x_{n} = c_1 \phi_1 + x'_{1,n} + w_{1,n}

where {x'_{1,n}} is orthogonal to {\phi_1} and {w_{1,n}} converges strongly to zero. From Pythagoras theorem we see that {x'_{1,n}} asymptotically has strictly less energy than {E}:

\displaystyle  \limsup_{n \rightarrow \infty} \|x'_{1,n}\|^2 \leq E - |c_1|^2.

If {x'_{1,n}} was weakly convergent, then {x_n} would be too, so we may assume that it is not weakly convergent. Arguing as before, we may find a unit vector {\phi_2} (which we can take to be orthogonal to {\phi_1}) and a constant {c_2} such that (after passing to a subsequence, and abusing notation once more) one had a decomposition

\displaystyle  x'_{1,n} = c_2 \phi_2 + x'_{2,n} + w_{2,n}

in which {x'_{2,n}} is orthogonal to both {\phi_1,\phi_2} and {w_{2,n}} converges strongly to zero, and such that

\displaystyle \limsup_{n \rightarrow \infty} |\langle x'_{1,n}, y \rangle| \leq 2 |c_2|

for all unit vectors {y}. From Pythagoras, we have

\displaystyle  \limsup_{n \rightarrow \infty} \|x'_{2,n}\|^2 \leq E - |c_1|^2 - |c_2|^2.

We iterate this process to obtain an orthonormal sequence {\phi_1,\phi_2,\ldots} and constants {c_1,c_2,\ldots} obeying the Bessel inequality

\displaystyle  \sum_{k=1}^\infty |c_k|^2 \leq E

(which, in particular, implies that the {c_k} go to zero as {k \rightarrow \infty}) such that, for each {k}, one has a subsequence of the {x_n} for which one has a decomposition of the form

\displaystyle  x_n = \sum_{i=1}^{k-1} c_i \phi_i + x'_{k,n} + w_{k,n}

where {w_{k,n}} converges strongly to zero, and for which

\displaystyle \limsup_{n \rightarrow \infty} |\langle x'_{k,n}, y \rangle| \leq 2 |c_{k+1}|

for all unit vectors {y}. The series {\sum_{i=1}^\infty c_i \phi_i} then converges (conditionally in the strong topology) to a limit {x}, and by diagonalising all the subsequences we obtain a final subsequence {x_{n_j}} which converges weakly to {x}. \Box
Now we give a third proof, which is a nonstandard analysis proof that is analogous to the second standard analysis proof given above.
The basics of nonstandard analysis are reviewed in this previous blog post (and see also this later post on ultralimit analysis, as well as the most recent post on this topic). Very briefly, we will need to fix a non-principal ultrafilter {p \in \beta {\bf N} \backslash {\bf N}} on the natural numbers. Once one fixes this ultrafilter, one can define the ultralimit {\lim_{n \rightarrow p} x_n} of any sequence of standard objects {x_n}, defined as the equivalence class of all sequences {(y_n)_{n \in {\bf N}}} such that {\{ n \in {\bf N}: x_n = y_n \} \in p}. We then define the ultrapower {{}^* X} of a standard set {X} to be the collection of all ultralimits {\lim_{n \rightarrow p} x_n} of sequences {x_n} in {X}. We can interpret {{}^* X} as the space of all nonstandard elements of {X}, with the standard space {X} being embedded in the nonstandard one {{}^* X} by identifying {x} with its nonstandard counterpart {{}^* x := \lim_{n \rightarrow p} x}. One can extend all (first-order) structures on {X} to {{}^* X} in the obvious manner, and a famous theorem of Los asserts that all first-order sentences that are true about a standard space {X}, will also be true about the nonstandard space {{}^* X}. Thus, for instance, the ultrapower {{}^* H} of a standard Hilbert space {H} over the standard complex numbers {{\bf C}} will be a nonstandard Hilbert space {{}^* H} over the nonstandard reals {{}^* {\bf R}} or the nonstandard complex numbers {{}^* {\bf C}}. It has a nonstandard inner product {\langle, \rangle: {}^* H \times {}^* H \rightarrow {}^* {\bf C}} instead of a standard one, which obeys the nonstandard analogue of the Hilbert space axioms. In particular, it is complete in the nonstandard sense: any nonstandard Cauchy sequence {(x_n)_{n \in {}^* {\bf N}}} of nonstandard vectors {x_n \in {}^* H} indexed by the nonstandard natural numbers {{}^* {\bf N}} will converge (again, in the nonstandard sense) to a limit {x \in {}^* H}.
The ultrapower {{}^* H} – the space of ultralimits {\lim_{n \rightarrow p} x_n} of arbitrary sequences {x_n} in {H} – turns out to be too large and unwieldy to be helpful for us. We will work instead with a more tractable subquotient, defined as follows. Let {O(H)} be the space of ultralimits {\lim_{n \rightarrow p} x_n} of bounded sequences {x_n \in H}, and let {o(H)} be the space of ultralimits {\lim_{n \rightarrow p} x_n} of sequences {x_n \in H} that converge to zero. It is clear that {o(H)}, {O(H)} are vector spaces over the standard complex numbers {{\bf C}}, with {o(H)} being a subspace of {O(H)}. (The space {o(H)} is also known as the monad of the origin of {H}.) We define the quotient space {\tilde H := O(H) / o(H)}, which is then also a vector space over {{\bf C}}. One easily verifies that {H} is a subspace of {O(H)} that is disjoint from {o(H)}, so we can embed {H} as a subspace of {\tilde H}.

Remark 2 When {H} is finite dimensional, the Bolzano-Weierstrass theorem (or more precisely, the proof of this theorem) shows that {H = \tilde H}. For infinite-dimensional spaces, though, {\tilde H} is larger than {H}, basically because there exist bounded sequences in {H} with no convergent subsequences. Thus we can view the quotient {\tilde H/H} as measuring the failure of the Bolzano-Weierstrass theorem (a sort of “Bolzano-Weierstrass cohomology”, if you will).

Now we place a Hilbert space structure on {\tilde H}. Observe that if {x = \lim_{n \rightarrow p} x_n} and {y = \lim_{n \rightarrow p} y_n} are elements of {O(H)} (so that {x_n, y_n} are bounded), then the nonstandard inner product {\langle x,y\rangle = \lim_{n \rightarrow p} \langle x_n,y_n\rangle} is a nonstandard complex number which is bounded (i.e. it it lies in {O({\bf C})}). Since {{\bf C} = O({\bf C})/o({\bf C})}, we can thus extract a standard part {\hbox{st}\langle x,y\rangle}, defined as the unique standard complex number such that

\displaystyle \langle x,y\rangle = \hbox{st}\langle x,y\rangle + o(1)

where {o(1)} denotes an infinitesimal, i.e. a non-standard quantity whose magnitude is less than any standard positive real {\varepsilon > 0}. From the Cauchy-Schwarz inequality we see that if we modify either {x} or {y} by an element of {o(H)}, then the standard part {\hbox{st}\langle x,y \rangle} does not change. Thus, we see that the map {x,y \mapsto \hbox{st}\langle x,y\rangle} on {O(H)} descends to a map {x,y \mapsto \langle x,y \rangle} on {\tilde H}. One easily checks that this map is a standard Hermitian inner product on {\tilde H} that extends the one on the subspace {H}. (If one prefers to think in terms of commutative diagrams, one can think of the inner product as a bilinear map from the short exact sequence {0 \rightarrow o(H) \rightarrow O(H) \rightarrow \tilde H \rightarrow 0} to the short exact sequence {0 \rightarrow o({\bf C}) \rightarrow O({\bf C}) \rightarrow {\bf C} \rightarrow 0}.) Furthermore, by using the countable saturation (or Bolzano-Weierstrass) property of nonstandard analysis (see previous post), we can also show that {\tilde H} is complete with respect to this inner product; thus {\tilde H} is a standard Hilbert space that contains {H} as a subspace. (One can view {\tilde H} as a sort of nonstandard completion of {H}, in a manner somewhat analogous to how the Stone-Cech compactification {\beta X} of a space can be viewed as a topological completion of {X}. This is of course consistent with the philosophy of the previous post.)
After all this setup, we can now give the third proof of Theorem 1:
Proof: Let {z := \lim_{n \rightarrow p} x_n} be the ultralimit of the {x_n}, then {z} is an element of {O(H)}. Let {\tilde z} be the image of {z} in {\tilde H}, and let {x} be the orthogonal projection of {\tilde z} to {H}. We claim that a subsequence of {x_n} converges weakly to {x}.
For any {y \in H}, {\tilde z - x} is orthogonal to {y}, and thus {\langle z-x, y \rangle = o(1)}. In other words,

\displaystyle  \lim_{n \rightarrow p} \langle x_n, y \rangle = \langle x, y \rangle + o(1) \ \ \ \ \ (1)

for all {y \in H}. This is already the nonstandard analogue of weak convergence along a subsequence, but we can get to weak convergence itself with only a little more argument. Indeed, from (1) we can easily construct a subsequence {x_{n_j}} such that

\displaystyle  |\langle x_{n_j}, x_i \rangle - \langle x, x_i \rangle| \leq \frac{1}{j}

and

\displaystyle  |\langle x_{n_j}, x \rangle - \langle x, x \rangle| \leq \frac{1}{j}

for all {1 \leq i \leq j}, which implies that

\displaystyle  \lim_{j \rightarrow \infty} \langle x_{n_j}, y \rangle = \langle x,y \rangle

whenever {y} is a finite linear combination of the {x_i} and {x}. Applying a density argument using the boundedness of the {x_n}, this is then true for all {y} in the closed span of the {x_i} and {x}; it is also clearly true for {y} in the orthogonal complement, and the claim follows. \Box
Observe that in contrast with the first two proofs, the third proof gave a “canonical” choice for the subsequence limit {x}. This is ultimately because the ultrafilter {p} already “made all the choices beforehand”, in some sense.
Observe also that we used the existence of orthogonal projections in Hilbert spaces in the above proof. If one unpacks the usual proof that these projections exist, one will find an energy increment argument that is not dissimilar to that used in the second proof of Theorem 1. Thus we see that the somewhat intricate energy increment argument from that second proof has in some sense been encapsulated into a general-purpose package in the nonstandard setting, namely the existence of orthogonal projections.

— 2. Concentration compactness for unitary group actions —

Now we generalise the sequential Banach-Alaoglu theorem to allow for a group of symmetries. The setup is now that of a (standard) complex vector space {H}, together with a locally compact group {G} acting unitarily on {H} in a jointly continuous manner, thus the map {(g,x) \mapsto gx} is jointly continuous from {G \times H} to {H} (or equivalently, the representation map from {G} to {U(H)} is continuous if we give {U(H)} the strong operator topology). We also assume that {G} is a group of dislocations, which means that {g_n x} converges weakly to zero in {H} whenever {x \in H} and {g_n} goes to infinity in {G} (which means that {g_n} eventually escapes any given compact subset of {G}). A typical example of such a group is the translation action {h: f(\cdot) \mapsto f(\cdot-h)} of {{\bf R}^d} on {L^2({\bf R}^d)}, another example is the scaling action {\lambda: f(\cdot) \mapsto \frac{1}{\lambda^{d/2}} f(\frac{\cdot}{\lambda})} of {{\bf R}^+} on {L^2({\bf R}^d)}. (One can also combine these two actions to give an action of the semidirect product {{\bf R}^+ \ltimes {\bf R}^d} on {L^2({\bf R}^d)}.)
The basic theorem here is

Theorem 3 (Profile decomposition) Let {G, H} be as above. Let {x_n} be a bounded sequence in {H} obeying the energy bound

\displaystyle  \limsup_{n \rightarrow \infty} \|x_n\|^2 \leq E.

Then, after passing to a subsequence, one can find a sequence {\phi_1, \phi_2, \ldots \in H} with the Bessel inequality

\displaystyle  \sum_{k=1}^\infty \|\phi_k\|^2 \leq E

and group elements {g_{k,n} \in G} for {k,n \in {\bf N}} such that

\displaystyle  g_{k',n}^{-1} g_{k,n} \rightarrow \infty \hbox{ as } n \rightarrow \infty

whenever {k \neq k'} and {\phi_k, \phi_{k'}} are non-zero, such that for each {K \in {\bf N}} one has the decomposition

\displaystyle  x_n = \sum_{k=1}^K g_{k,n} \phi_k + w_{K,n}

such that

\displaystyle  \limsup_{n \rightarrow \infty} \|w_{K,n}\|^2 \leq E - \sum_{k=1}^K \|\phi_k\|^2

and

\displaystyle  \limsup_{n \rightarrow \infty} \sup_{g \in G} |\langle g^{-1} w_{K,n}, y\rangle|^2 \leq \sum_{k=K+1}^\infty \|\phi_k\|^2

for all unit vectors {y}, and such that {g_{k,n}^{-1} w_{K,n}} converges weakly to zero for every {1 \leq k \leq K}.

Note that Theorem 1 is the case when {G} is trivial.
There is a version of the conclusion available in which {K} can be taken to be infinite, and also one can generalise {G} to be a more general object than a group by modifying the hypotheses somewhat; see this paper of Schindler and Tintarev. The version with finite {K} is slightly more convenient though for applications to nonlinear dispersive and wave equations; see these lecture notes of Killip and Visan for some applications of this type of decomposition. In order for this theorem to be useful for applications, one needs to exploit some sort of inverse theorem that controls other norms of a vector {w} in terms of expressions such as {\sup_{g \in G} |\langle gw, y \rangle|}; these theorems tend to require “hard” harmonic analysis and cannot be established purely by such “soft” analysis tools as nonstandard analysis.
One can adapt the second proof of Theorem 1 to give a standard analysis proof of Theorem 3:
Proof: (Sketch) Applying Theorem 1 we can (after passing to a subsequence) find group elements {g_{1,n}} such that {g_{1,n}^{-1} x_n} converges weakly to a limit {\phi_1 \in H}, which we can choose to be nearly maximal in the sense that

\displaystyle  \| \phi'_1 \| \leq 2 \| \phi_1 \|

(say) whenever {\phi_1} is the weak limit of {g_{n_j}^{-1} x_{n_j}} for some subsequence {x_{n_j}} and some collection of group elements {g_{n_j}}. In particular, this implies (from further application of Theorem 1, and an argument by contradiction) that

\displaystyle  \limsup_{n \rightarrow \infty} \sup_{g \in G} |\langle g^{-1} x_n, y \rangle| \leq 2 \| \phi_1 \|

for any unit vector {y}.
We may now decompose

\displaystyle  x_{n} = g_{1,n} \phi_1 + w_{1,n}

where {g_{1,n}^{-1} w_{1,n}} converges weakly to zero. From Pythagoras theorem we see that {w_{1,n}} asymptotically has strictly less energy than {E}:

\displaystyle  \limsup_{n \rightarrow \infty} \|w_{1,n}\|^2 \leq E - \|\phi_1\|^2.

We then repeat the argument, passing to a further subsequence and finding group elements {g_{2,n}} such that {g_{2,n}^{-1} w_{1,n}} converges weakly to {\phi_2 \in H}, with

\displaystyle  \limsup_{n \rightarrow \infty} \sup_{g \in G} |\langle g^{-1} x_{1,n}, y \rangle| \leq 2 \| \phi_2 \|

for any unit vector {y}.
Note that {g_{1,n}^{-1} w_{1,n}} converges weakly to zero, while {g_{2,n}^{-1} w_{1,n}} converges weakly to {\phi_2}. If {\phi_2} is non-zero, this implies that {g_{1,n}^{-1} g_{2,n}} must go to infinity (otherwise it has a convergent subsequence, and this soon leads to a contradiction).
If one iterates the above construction and passes to a diagonal subsequence one obtains the claim. \Box
Now we give the nonstandard analysis proof. As before, we introduce the short exact sequence of Hilbert spaces:

\displaystyle  0 \rightarrow o(H) \rightarrow O(H) \rightarrow \tilde H \rightarrow 0.

We will also need an analogous short exact sequence of groups

\displaystyle  0 \rightarrow o(G) \rightarrow O(G) \rightarrow G \rightarrow 0

where {O(G) \leq {}^* G} is the space of ultralimits {\lim_{n \rightarrow p} g_n} of sequences {g_n} in {G} that lie in a compact subset of {G}, and {o(G) \leq O(G)} is the space of ultralimits of {\lim_{n \rightarrow p} g_n} of sequences {g_n} that converge to the identity element (i.e. {o(G)} is the monad of the group identity). One easily verifies that {o(G)} is a normal subgroup of {O(G)}, and that the quotient is isomorphic to {G}. (Indeed, {O(G)} can be expressed as a semi-direct product {G \ltimes o(G)}, though we will not need this fact here.)
The group {{}^* G} acts unitarily on {{}^* H}, and so preserves both {o(H)} and {O(H)}. As such, it also acts unitarily on {\tilde H}. The induced action of the subgroup {o(G)} is trivial; and the induced action of the subgroup {O(G)} preserves {H}.
Let {\langle ({}^* G) H \rangle} be the closed span of the set {\{ gx: g \in {}^* G; h \in H \}} in {\tilde H}; this is a Hilbert space. Inside this space we have the subspaces {gH} for {g \in {}^* G}. As {O(G)} preserves {H}, we see that {gH = g'H} whenever {g, g'} lie in the same coset of {O(G)}, so we can define {\gamma H} for any {\gamma \in {}^* G / O(G)} in a well-defined manner. On the other hand, if {g, g'} do not lie in the same coset of {O(G)}, then we have {g' = g \lim_{n \rightarrow p} h_n} for some sequence {h_n} in {G} that goes to infinity. As {G} is a group of dislocations, we conclude that {g' H} and {gH} are now orthogonal. In other words, {\gamma' H} and {\gamma H} are orthogonal whenever {\gamma, \gamma' \in {}^* G / O(G)} are distinct. We conclude that we have the decomposition

\displaystyle  \langle ({}^* G) H \rangle = \bigoplus_{\gamma \in {}^* G/O(G)} \gamma H \ \ \ \ \ (2)

where {\bigoplus} is the Hilbert space direct sum.
Now we can prove Theorem 3. As in the previous section, starting with a bounded sequence {x_n} in {H}, we form the ultralimit {z := \lim_{n \rightarrow p} x_n \in O(H)} and the image {\tilde z \in \tilde H}. We let {x} be the orthogonal projection of {\tilde z} to {\langle ({}^* G) H \rangle}. By (2), we can write

\displaystyle  x = \sum_k g_k \phi_k

for some at most countable sequence of vectors {\phi_k \in H} and {g_k \in {}^* G}, with the {g_n} lying in distinct cosets of {O(G)}. In particular, for any {k \neq k'}, {g_{k'}^{-1} g_k} is the ultralimit of a sequence of vectors going to infinity. By adding dummy values of {g_k,\phi_k} if necessary we may assume that {k} ranges from {1} to infinity. Also, one has the Bessel inequality

\displaystyle  \sum_k \| \phi_k\|^2 = \|x\|^2 \leq \|z\|^2 \leq E

and from Cauchy-Schwarz and Bessel one has

\displaystyle  |\langle z-\sum_{k=1}^K g_k \phi_k, g y \rangle| \leq \sum_{k=K+1}^\infty \|\phi_k\|^2.

for any unit vector {y \in H} and {g \in G}. From this we can obtain the required conclusions by arguing as in the previous section.

— 3. Concentration compactness for measures —

We now give a variant of the profile decomposition, for Borel probability measures {\mu_n} on {{\bf R}^d}. Recall that such a sequence is said to be tight if, for every {\varepsilon > 0}, there is a ball {B(0,R)} such that {\limsup_{n \rightarrow \infty} \mu_n({\bf R}^d \backslash B(0,R)) \leq \varepsilon}. Given any Borel probability measure {\mu} on {{\bf R}^d} and any {x \in {\bf R}^d}, define the translate {\tau_x \mu} to be the Borel probability measure given by the formula {\tau_x \mu(E) := \mu(E-x)}.

Theorem 4 (Profile decomposition for probability measures on {{\bf R}^d}) Let {\mu_n} be a sequence of Borel probability measures on {{\bf R}^d}. Then, after passing to a subsequence, one can find a sequence {c_k} of non-negative real numbers with {\sum_k c_k \leq 1}, a tight sequence {\nu_{k,n}} of positive measures whose mass converges to {1} as {n \rightarrow \infty} for fixed {k}, and shifts {x_{k,n} \in {\bf R}^d} such that

\displaystyle  x_{k,n} - x_{k',n} \rightarrow \infty \hbox{ as } n \rightarrow \infty

for all {k \neq k'}, and such that for each {K}, one has the decomposition

\displaystyle  \mu_n = \sum_{k=1}^K c_k \tau_{k,n} \nu_{k,n} + \rho_{K,n}

where the error {\rho_{K,n}} obeys the bounds

\displaystyle  \limsup_{n \rightarrow \infty} \sup_{x \in {\bf R}^d} \rho_{K,n}( B(x,R) ) \leq \sup_{k \geq K} c_k

and

\displaystyle  \lim_{n \rightarrow \infty} \rho_{K,n}( B(x_{k,n},R) ) = 0

for all radii {R} and {1 \leq k \leq K}.
Furthermore, one can ensure that for each {k}, {\nu_{k,n}} converges in the vague topology to a probability measure {\nu_k}.

We first give the standard proof of this theorem:
Proof: (Sketch) Suppose first that

\displaystyle  \limsup_{n \rightarrow \infty} \sup_{x \in {\bf R}^d} \mu_n(B(x,R)) = 0

for all {R}. Then we are done by setting all the {c_k} equal to zero, and {\rho_{K,n} = \mu_n}. So we may assume that we can find {R} such that

\displaystyle  \limsup_{n \rightarrow \infty} \sup_{x \in {\bf R}^d} \mu_n(B(x,R)) = \alpha

for some {\alpha > 0}; we may also assume that {\alpha} is approximately maximal in the sense that

\displaystyle  \limsup_{n \rightarrow \infty} \sup_{x \in {\bf R}^d} \mu_n(B(x,R')) \leq 2\alpha

(say) for all other radii {R'}. By passing to a subsequence, we may thus find {x_{1,n} \in {\bf R}^d} such that

\displaystyle  \lim_{n \rightarrow \infty} \mu_n(B(x_{1,n},R)) = \alpha;

By passing to a further subsequence using the Helly selection principle (or the sequential Banach-Alaoglu theorem), we may assume that the translates {\tau_{-x_{1,n}} \mu_n} converge in the vague topology to a limit of total mass at most {1} and at least {\alpha}, and which can be expressed as {c_1 \nu_1} for some {c_1 \geq \alpha} and a probability measure {\nu_1}.
As {\tau_{-x_{1,n}} \mu_n} converges vaguely to {c_1 \nu_1}, we have

\displaystyle  \limsup_{n \rightarrow \infty} \tau_{-x_{1,n}} \mu_n( B( 0,R') \backslash B(0,R) ) \leq c_1 \nu_1( {\bf R}^d \backslash B(0,R/2) )

for any {0 < R < R'}. By making {R'_n} grow sufficiently slowly to infinity with respect to {n}, we may thus ensure that

\displaystyle  \limsup_{n \rightarrow \infty} \tau_{-x_{1,n}} \mu_n( B( 0,R'_n) \backslash B(0,R) ) \leq c_1 \nu_1( {\bf R}^d \backslash B(0,R/2) )

for all integers {R>0}. If we then set {c_1 \tilde \nu_{1,n}} to be the restriction of {\tau_{-x_{1,n}} \mu_n} to {B(0,R'_n)}, we see that {\tilde \nu_{1,n}} is tight, converges vaguely to {\nu_{1,n}}, and has total mass converging to {1}. We can thus split

\displaystyle  \mu_n = c_1 \tau_{x_{1,n}} \tilde \nu_{1,n} + \rho_{1,n}

for some residual positive measure {\rho_{1,n}} of total mass converging to {1-c_1}, and such that {\rho_{1,n}(B(x_{1,n},R)) \rightarrow 0} as {n \rightarrow \infty} for any fixed {R}. We can then iterate this procedure to obtain the claims of the theorem (after one last diagonalisation to combine together all the subsequences). \Box
Now we give the nonstandard proof. We take the ultralimit {\mu := \lim_{n \rightarrow p} \mu_n} of the standard Borel probability measures {\mu_n} on {{\bf R}^d}, resulting in a nonstandard Borel probability measure. What, exactly, is a nonstandard Borel probability measure? A standard Borel probability measure, such as {\mu_n}, is a map {\mu_n: {\mathcal B} \rightarrow [0,1]} from the standard Borel {\sigma}-algebra {{\mathcal B}} to the unit interval {[0,1]} which is countably additive and maps {{\bf R}^n} to {1}. Thus, the nonstandard Borel probability measure is a nonstandard map {\mu: {}^* {\mathcal B} \rightarrow {}^* [0,1]} from the nonstandard Borel {\sigma}-algebra (the collection of all ultralimits of standard Borel sets) to the nonstandard interval {{}^* [0,1]} which is nonstandardly countably additive and maps {{}^* {\bf R}^n} to {1}. In particular, it is finitely additive.
There is an important subtlety here. The nonstandard Borel {\sigma}-algebra is closed under nonstandard countable unions: if {(E_n)_{n \in {}^* {\bf N}}} is a nonstandard countable sequence of nonstandard Borel sets (i.e. an ultralimit of standard countable sequences {(E_{n,m})_{n \in {\bf N}}} of standard Borel sets), then {\bigcup_n E_n} is also nonstandard Borel, but this is not necessarily the case for external countable unions, thus if {(E_n)_{n \in {\bf N}}} is an external countable sequence of nonstandard Borel sets, then {\bigcup_n E_n} need not be nonstandard Borel. On the other hand, {{\mathcal B}} is certainly still closed under finite unions and other finite Boolean operations, so it can be viewed (externally) as a Boolean algebra, at least.
Now we perform the Loeb measure construction (which was also introduced in the previous post). Consider the standard part {\hbox{st}(\mu)} of {\mu}; this is a finitely additive map from {{}^* {\mathcal B}} to {[0,1]}. From the countable saturation property, one can verify that this map is a premeasure, and so (by the Hahn-Kolmogorov theorem) extends to a countably additive probability measure {\tilde \mu} on the measure-theoretic completion {\tilde {\mathcal B} := \overline{\langle {}^* {\mathcal B} \rangle}} of {{}^* {\mathcal B}}.
The measure {\tilde \mu} is a measure on {{}^* {\bf R}^d}. We push it forward to the quotient space {{}^* {\bf R}^d/O({\bf R}^d)} by the obvious quotient map {\pi: {}^* {\bf R}^d \rightarrow {}^* {\bf R}^d/O({\bf R}^d)} to obtain a pushforward measure {\pi_* \tilde \mu} on the pushforward {\sigma}-algebra {\pi_* \tilde {\mathcal B}}, which consists of all (external) subsets {E} of {{}^* {\bf R}^d/O({\bf R}^d)} whose preimage {\pi^{-1}(E)} is measurable in {\tilde {\mathcal B}}.
We claim that every point in {{}^* {\bf R}^d/O({\bf R}^d)} is measurable in {\pi_* \tilde {\mathcal B}}, or equivalently that every coset {x+O({\bf R}^d)} in {{}^* {\bf R}^d} is measurable in {{\mathcal B}}. Indeed, this coset is the union of the countable family of (nonstandard) balls {\{ y \in {}^* {\bf R}^d: |x-y| < n \}} for {n \in {\bf N}}, each one of which is a nonstandard Borel set and thus measurable in {\tilde {\mathcal B}}.
Because of this, we can decompose the measure {\pi_* \tilde \mu} into pure point and singular components, thus

\displaystyle  \pi_* \tilde \mu = \sum_{k} c_k \delta_{x_k+O({\bf R}^d)} + \rho

where {c_k} are standard non-positive reals, {k} ranges over an at most countable set, {x_k + O({\bf R}^d)} are disjoint cosets in {{}^* {\bf R}^d/O({\bf R}^d)}, and {\rho} is a finite measure on {\pi_* \tilde {\mathcal B}} such that

\displaystyle  \sum_k c_k + \|\rho \| = 1

and

\displaystyle  \rho(\{ x + O({\bf R}^d) \}) = 0

for every coset {x+O({\bf R}^d)}.
Now we analyse the restriction of {\tilde \mu} to a single coset {x_k+O({\bf R}^d)}, which has total mass {c_k}. For any standard continuous, compactly supported function {f: {\bf R}^d \rightarrow {\bf R}}, one can form the integral

\displaystyle  \int_{x_k+O({\bf R}^d)} {}^* f(x -x_k)\ d\tilde \mu(x).

This is a non-negative continuous linear functional, so by the Riesz representation theorem there exists a non-negative Radon measure {\nu_k} on {{\bf R}^d} such that

\displaystyle  \int_{x_k+O({\bf R}^d)} {}^* f(x -x_k)\ d\tilde \mu(x) = c_k \int_{{\bf R}^d} f(y)\ d\nu_k(y)

for all such {f}. As {x_k+O({\bf R}^d)} has total mass {c_k}, {\nu_k} is a probability measure. From definition of {\tilde \mu}, we thus have

\displaystyle  \int_{{}^* {\bf R}^d} {}^* f(x -x_k)\ d\mu(x) = c_k \int_{{\bf R}^d} f(y)\ d\nu_k(y) + o(1)

for all {f}.
We have

\displaystyle  \mu(B(x_k,R)) \leq c_k+o(1)

for every standard {R}, and thus by the overspill principle there exists an unbounded {R_k} for which

\displaystyle  \mu(B(x_k,R_k)) \leq c_k+o(1);

since {\mu(x_k + O({\bf R}^d)) = c_k}, we thus have

\displaystyle  \mu(B(x_k,R_k)) = c_k+o(1);

If we set {c_k \tilde \nu_k} to be the restriction of {\tau_{-x_k} \mu} to {B(0,R_k)}, we thus see that

\displaystyle  \int_{{}^* {\bf R}^d} {}^* f(x)\ d\tilde \nu_k(y) = \int_{{\bf R}^d} f(y)\ d\nu_k(y) + o(1)

for all test functions {f}. Writing {\tilde \nu_k} as the ultralimit of probability measures {\tilde \nu_{k,n}}, we thus see (upon passing to a subsequence) that {\tilde \nu_{k,k}} converges vaguely to the probability measure {\nu_k}, and is in particular tight.
For any standard {K \geq 1}, we can write

\displaystyle  \mu = \sum_{k=1}^K c_k \tau_{x_k} \nu_k + \rho_K

where {\rho_K} is a finite measure. Letting {\tilde \rho_K} be the Loeb extension of the standard part of {\rho_K}, we see that {\tilde \rho_K} assigns zero mass to {x_k+O({\bf R}^d)} for {k \leq K} and assigns a mass of at most {\sup_{k > K} c_k} to any other coset of {O({\bf R}^d)}. This implies that

\displaystyle  \tilde \rho_K( B(x,R) ) \leq \sup_{k > K} c_k + o(1)

for any standard {R}. Expressing {\rho_K} as an ultralimit of {\rho_{K,n}}, we then obtain the claim.