You are currently browsing the tag archive for the ‘correspondence principle’ tag.

One of the basic objects of study in combinatorics are finite strings ${(a_n)_{n=0}^N}$ or infinite strings ${(a_n)_{n=0}^\infty}$ of symbols ${a_n}$ from some given alphabet ${{\mathcal A}}$, which could be either finite or infinite (but which we shall usually take to be compact). For instance, a set ${A}$ of natural numbers can be identified with the infinite string ${(1_A(n))_{n=0}^\infty}$ of ${0}$s and ${1}$s formed by the indicator of ${A}$, e.g. the even numbers can be identified with the string ${1010101\ldots}$ from the alphabet ${\{0,1\}}$, the multiples of three can be identified with the string ${100100100\ldots}$, and so forth. One can also consider doubly infinite strings ${(a_n)_{n \in {\bf Z}}}$, which among other things can be used to describe arbitrary subsets of integers.

On the other hand, the basic object of study in dynamics (and in related fields, such as ergodic theory) is that of a dynamical system ${(X,T)}$, that is to say a space ${X}$ together with a shift map ${T: X \rightarrow X}$ (which is often assumed to be invertible, although one can certainly study non-invertible dynamical systems as well). One often adds additional structure to this dynamical system, such as topological structure (giving rise topological dynamics), measure-theoretic structure (giving rise to ergodic theory), complex structure (giving rise to complex dynamics), and so forth. A dynamical system gives rise to an action of the natural numbers ${{\bf N}}$ on the space ${X}$ by using the iterates ${T^n: X \rightarrow X}$ of ${T}$ for ${n=0,1,2,\ldots}$; if ${T}$ is invertible, we can extend this action to an action of the integers ${{\bf Z}}$ on the same space. One can certainly also consider dynamical systems whose underlying group (or semi-group) is something other than ${{\bf N}}$ or ${{\bf Z}}$ (e.g. one can consider continuous dynamical systems in which the evolution group is ${{\bf R}}$), but we will restrict attention to the classical situation of ${{\bf N}}$ or ${{\bf Z}}$ actions here.

There is a fundamental correspondence principle connecting the study of strings (or subsets of natural numbers or integers) with the study of dynamical systems. In one direction, given a dynamical system ${(X,T)}$, an observable ${c: X \rightarrow {\mathcal A}}$ taking values in some alphabet ${{\mathcal A}}$, and some initial datum ${x_0 \in X}$, we can first form the forward orbit ${(T^n x_0)_{n=0}^\infty}$ of ${x_0}$, and then observe this orbit using ${c}$ to obtain an infinite string ${(c(T^n x_0))_{n=0}^\infty}$. If the shift ${T}$ in this system is invertible, one can extend this infinite string into a doubly infinite string ${(c(T^n x_0))_{n \in {\bf Z}}}$. Thus we see that every quadruplet ${(X,T,c,x_0)}$ consisting of a dynamical system ${(X,T)}$, an observable ${c}$, and an initial datum ${x_0}$ creates an infinite string.

Example 1 If ${X}$ is the three-element set ${X = {\bf Z}/3{\bf Z}}$ with the shift map ${Tx := x+1}$, ${c: {\bf Z}/3{\bf Z} \rightarrow \{0,1\}}$ is the observable that takes the value ${1}$ at the residue class ${0 \hbox{ mod } 3}$ and zero at the other two classes, and one starts with the initial datum ${x_0 = 0 \hbox{ mod } 3}$, then the observed string ${(c(T^n x_0))_{n=0}^\infty}$ becomes the indicator ${100100100\ldots}$ of the multiples of three.

In the converse direction, every infinite string ${(a_n){n=0}^\infty}$ in some alphabet ${{\mathcal A}}$ arises (in a decidedly non-unique fashion) from a quadruple ${(X,T,c,x_0)}$ in the above fashion. This can be easily seen by the following “universal” construction: take ${X}$ to be the set ${X:= {\mathcal A}^{\bf N}}$ of infinite strings ${(b_i)_{n=0}^\infty}$ in the alphabet ${{\mathcal A}}$, let ${T: X \rightarrow X}$ be the shift map

$\displaystyle T(b_i)_{n=0}^\infty := (b_{i+1})_{n=0}^\infty,$

let ${c: X \rightarrow {\mathcal A}}$ be the observable

$\displaystyle c((b_i)_{n=0}^\infty) := b_0,$

and let ${x_0 \in X}$ be the initial point

$\displaystyle x_0 := (a_i)_{n=0}^\infty.$

Then one easily sees that the observed string ${(c(T^n x_0))_{n=0}^\infty}$ is nothing more than the original string ${(b_i)_{n=0}^\infty}$. Note also that this construction can easily be adapted to doubly infinite strings by using ${{\mathcal A}^{\bf Z}}$ instead of ${{\mathcal A}^{\bf N}}$, at which point the shift map ${T}$ now becomes invertible. An important variant of this construction also attaches an invariant probability measure to ${X}$ that is associated to the limiting density of various sets associated to the string ${(a_i)_{n=0}^\infty}$, and leads to the Furstenberg correspondence principle, discussed for instance in these previous blog posts. Such principles allow one to rigorously pass back and forth between the combinatorics of strings and the dynamics of systems; for instance, Furstenberg famously used his correspondence principle to demonstrate the equivalence of Szemerédi’s theorem on arithmetic progressions with what is now known as the Furstenberg multiple recurrence theorem in ergodic theory.

In the case when the alphabet ${{\mathcal A}}$ is the binary alphabet ${\{0,1\}}$, and (for technical reasons related to the infamous non-injectivity ${0.999\ldots = 1.00\ldots}$ of the decimal representation system) the string ${(a_n)_{n=0}^\infty}$ does not end with an infinite string of ${1}$s, then one can reformulate the above universal construction by taking ${X}$ to be the interval ${[0,1)}$, ${T}$ to be the doubling map ${Tx := 2x \hbox{ mod } 1}$, ${c: X \rightarrow \{0,1\}}$ to be the observable that takes the value ${1}$ on ${[1/2,1)}$ and ${0}$ on ${[0,1/2)}$ (that is, ${c(x)}$ is the first binary digit of ${x}$), and ${x_0}$ is the real number ${x_0 := \sum_{n=0}^\infty a_n 2^{-n-1}}$ (that is, ${x_0 = 0.a_0a_1\ldots}$ in binary).

The above universal construction is very easy to describe, and is well suited for “generic” strings ${(a_n)_{n=0}^\infty}$ that have no further obvious structure to them, but it often leads to dynamical systems that are much larger and more complicated than is actually needed to produce the desired string ${(a_n)_{n=0}^\infty}$, and also often obscures some of the key dynamical features associated to that sequence. For instance, to generate the indicator ${100100100\ldots}$ of the multiples of three that were mentioned previously, the above universal construction requires an uncountable space ${X}$ and a dynamics which does not obviously reflect the key features of the sequence such as its periodicity. (Using the unit interval model, the dynamics arise from the orbit of ${2/7}$ under the doubling map, which is a rather artificial way to describe the indicator function of the multiples of three.)

A related aesthetic objection to the universal construction is that of the four components ${X,T,c,x_0}$ of the quadruplet ${(X,T,c,x_0)}$ used to generate the sequence ${(a_n)_{n=0}^\infty}$, three of the components ${X,T,c}$ are completely universal (in that they do not depend at all on the sequence ${(a_n)_{n=0}^\infty}$), leaving only the initial datum ${x_0}$ to carry all the distinctive features of the original sequence. While there is nothing wrong with this mathematically, from a conceptual point of view it would make sense to make all four components of the quadruplet to be adapted to the sequence, in order to take advantage of the accumulated intuition about various special dynamical systems (and special observables), not just special initial data.

One step in this direction can be made by restricting ${X}$ to the orbit ${\{ T^n x_0: n \in {\bf N} \}}$ of the initial datum ${x_0}$ (actually for technical reasons it is better to restrict to the topological closure ${\overline{\{ T^n x_0: n \in {\bf N} \}}}$ of this orbit, in order to keep ${X}$ compact). For instance, starting with the sequence ${100100100\ldots}$, the orbit now consists of just three points ${100100100\ldots}$, ${010010010\ldots}$, ${001001001\ldots}$, bringing the system more in line with the example in Example 1. Technically, this is the “optimal” representation of the sequence by a quadruplet ${(X,T,c,x_0)}$, because any other such representation ${(X',T',c',x'_0)}$ is a factor of this representation (in the sense that there is a unique map ${\pi: X \rightarrow X'}$ with ${T' \circ \pi = \pi \circ T}$, ${c' \circ \pi = c}$, and ${x'_0 = \pi(x_0)}$). However, from a conceptual point of view this representation is still somewhat unsatisfactory, given that the elements of the system ${X}$ are interpreted as infinite strings rather than elements of a more geometrically or algebraically rich object (e.g. points in a circle, torus, or other homogeneous space).

For general sequences ${(a_n)_{n=0}^\infty}$, locating relevant geometric or algebraic structure in a dynamical system generating that sequence is an important but very difficult task (see e.g. this paper of Host and Kra, which is more or less devoted to precisely this task in the context of working out what component of a dynamical system controls the multiple recurrence behaviour of that system). However, for specific examples of sequences ${(a_n)_{n=0}^\infty}$, one can use an informal procedure of educated guesswork in order to produce a more natural-looking quadruple ${(X,T,c,x_0)}$ that generates that sequence. This is not a particularly difficult or deep operation, but I found it very helpful in internalising the intuition behind the correspondence principle. Being non-rigorous, this procedure does not seem to be emphasised in most presentations of the correspondence principle, so I thought I would describe it here.

Lars Hörmander, who made fundamental contributions to all areas of partial differential equations, but particularly in developing the analysis of variable-coefficient linear PDE, died last Sunday, aged 81.

I unfortunately never met Hörmander personally, but of course I encountered his work all the time while working in PDE. One of his major contributions to the subject was to systematically develop the calculus of Fourier integral operators (FIOs), which are a substantial generalisation of pseudodifferential operators and which can be used to (approximately) solve linear partial differential equations, or to transform such equations into a more convenient form. Roughly speaking, Fourier integral operators are to linear PDE as canonical transformations are to Hamiltonian mechanics (and one can in fact view FIOs as a quantisation of a canonical transformation). They are a large class of transformations, for instance the Fourier transform, pseudodifferential operators, and smooth changes of the spatial variable are all examples of FIOs, and (as long as certain singular situations are avoided) the composition of two FIOs is again an FIO.

The full theory of FIOs is quite extensive, occupying the entire final volume of Hormander’s famous four-volume series “The Analysis of Linear Partial Differential Operators”. I am certainly not going to try to attempt to summarise it here, but I thought I would try to motivate how these operators arise when trying to transform functions. For simplicity we will work with functions ${f \in L^2({\bf R}^n)}$ on a Euclidean domain ${{\bf R}^n}$ (although FIOs can certainly be defined on more general smooth manifolds, and there is an extension of the theory that also works on manifolds with boundary). As this will be a heuristic discussion, we will ignore all the (technical, but important) issues of smoothness or convergence with regards to the functions, integrals and limits that appear below, and be rather vague with terms such as “decaying” or “concentrated”.

A function ${f \in L^2({\bf R}^n)}$ can be viewed from many different perspectives (reflecting the variety of bases, or approximate bases, that the Hilbert space ${L^2({\bf R}^n)}$ offers). Most directly, we have the physical space perspective, viewing ${f}$ as a function ${x \mapsto f(x)}$ of the physical variable ${x \in {\bf R}^n}$. In many cases, this function will be concentrated in some subregion ${\Omega}$ of physical space. For instance, a gaussian wave packet

$\displaystyle f(x) = A e^{-(x-x_0)^2/\hbar} e^{i \xi_0 \cdot x/\hbar}, \ \ \ \ \ (1)$

where ${\hbar > 0}$, ${A \in {\bf C}}$ and ${x_0, \xi_0 \in {\bf R}^n}$ are parameters, would be physically concentrated in the ball ${B(x_0,\sqrt{\hbar})}$. Then we have the frequency space (or momentum space) perspective, viewing ${f}$ now as a function ${\xi \mapsto \hat f(\xi)}$ of the frequency variable ${\xi \in {\bf R}^n}$. For this discussion, it will be convenient to normalise the Fourier transform using a small constant ${\hbar > 0}$ (which has the physical interpretation of Planck’s constant if one is doing quantum mechanics), thus

$\displaystyle \hat f(\xi) := \frac{1}{(2\pi \hbar)^{n/2}} \int_{\bf R} e^{-i\xi \cdot x/\hbar} f(x)\ dx.$

For instance, for the gaussian wave packet (1), one has

$\displaystyle \hat f(\xi) = A e^{i\xi_0 \cdot x_0/\hbar} e^{-(\xi-\xi_0)^2/\hbar} e^{-i \xi \cdot x_0/\hbar},$

and so we see that ${f}$ is concentrated in frequency space in the ball ${B(\xi_0,\sqrt{\hbar})}$.

However, there is a third (but less rigorous) way to view a function ${f}$ in ${L^2({\bf R}^n)}$, which is the phase space perspective in which one tries to view ${f}$ as distributed simultaneously in physical space and in frequency space, thus being something like a measure on the phase space ${T^* {\bf R}^n := \{ (x,\xi): x, \xi \in {\bf R}^n\}}$. Thus, for instance, the function (1) should heuristically be concentrated on the region ${B(x_0,\sqrt{\hbar}) \times B(\xi_0,\sqrt{\hbar})}$ in phase space. Unfortunately, due to the uncertainty principle, there is no completely satisfactory way to canonically and rigorously define what the “phase space portrait” of a function ${f}$ should be. (For instance, the Wigner transform of ${f}$ can be viewed as an attempt to describe the distribution of the ${L^2}$ energy of ${f}$ in phase space, except that this transform can take negative or even complex values; see Folland’s book for further discussion.) Still, it is a very useful heuristic to think of functions has having a phase space portrait, which is something like a non-negative measure on phase space that captures the distribution of functions in both space and frequency, albeit with some “quantum fuzziness” that shows up whenever one tries to inspect this measure at scales of physical space and frequency space that together violate the uncertainty principle. (The score of a piece of music is a good everyday example of a phase space portrait of a function, in this case a sound wave; here, the physical space is the time axis (the horizontal dimension of the score) and the frequency space is the vertical dimension. Here, the time and frequency scales involved are well above the uncertainty principle limit (a typical note lasts many hundreds of cycles, whereas the uncertainty principle kicks in at ${O(1)}$ cycles) and so there is no obstruction here to musical notation being unambiguous.) Furthermore, if one takes certain asymptotic limits, one can recover a precise notion of a phase space portrait; for instance if one takes the semiclassical limit ${\hbar \rightarrow 0}$ then, under certain circumstances, the phase space portrait converges to a well-defined classical probability measure on phase space; closely related to this is the high frequency limit of a fixed function, which among other things defines the wave front set of that function, which can be viewed as another asymptotic realisation of the phase space portrait concept.

If functions in ${L^2({\bf R}^n)}$ can be viewed as a sort of distribution in phase space, then linear operators ${T: L^2({\bf R}^n) \rightarrow L^2({\bf R}^n)}$ should be viewed as various transformations on such distributions on phase space. For instance, a pseudodifferential operator ${a(X,D)}$ should correspond (as a zeroth approximation) to multiplying a phase space distribution by the symbol ${a(x,\xi)}$ of that operator, as discussed in this previous blog post. Note that such operators only change the amplitude of the phase space distribution, but not the support of that distribution.

Now we turn to operators that alter the support of a phase space distribution, rather than the amplitude; we will focus on unitary operators to emphasise the amplitude preservation aspect. These will eventually be key examples of Fourier integral operators. A physical translation ${Tf(x) := f(x-x_0)}$ should correspond to pushing forward the distribution by the transformation ${(x,\xi) \mapsto (x+x_0,\xi)}$, as can be seen by comparing the physical and frequency space supports of ${Tf}$ with that of ${f}$. Similarly, a frequency modulation ${Tf(x) := e^{i \xi_0 \cdot x/\hbar} f(x)}$ should correspond to the transformation ${(x,\xi) \mapsto (x,\xi+\xi_0)}$; a linear change of variables ${Tf(x) := |\hbox{det} L|^{-1/2} f(L^{-1} x)}$, where ${L: {\bf R}^n \rightarrow {\bf R}^n}$ is an invertible linear transformation, should correspond to ${(x,\xi) \mapsto (Lx, (L^*)^{-1} \xi)}$; and finally, the Fourier transform ${Tf(x) := \hat f(x)}$ should correspond to the transformation ${(x,\xi) \mapsto (\xi,-x)}$.

Based on these examples, one may hope that given any diffeomorphism ${\Phi: T^* {\bf R}^n \rightarrow T^* {\bf R}^n}$ of phase space, one could associate some sort of unitary (or approximately unitary) operator ${T_\Phi: L^2({\bf R}^n) \rightarrow L^2({\bf R}^n)}$, which (heuristically, at least) pushes the phase space portrait of a function forward by ${\Phi}$. However, there is an obstruction to doing so, which can be explained as follows. If ${T_\Phi}$ pushes phase space portraits by ${\Phi}$, and pseudodifferential operators ${a(X,D)}$ multiply phase space portraits by ${a}$, then this suggests the intertwining relationship

$\displaystyle a(X,D) T_\Phi \approx T_\Phi (a \circ \Phi)(X,D),$

and thus ${(a \circ \Phi)(X,D)}$ is approximately conjugate to ${a(X,D)}$:

$\displaystyle (a \circ \Phi)(X,D) \approx T_\Phi^{-1} a(X,D) T_\Phi. \ \ \ \ \ (2)$

The formalisation of this fact in the theory of Fourier integral operators is known as Egorov’s theorem, due to Yu Egorov (and not to be confused with the more widely known theorem of Dmitri Egorov in measure theory).

Applying commutators, we conclude the approximate conjugacy relationship

$\displaystyle \frac{1}{i\hbar} [(a \circ \Phi)(X,D), (b \circ \Phi)(X,D)] \approx T_\Phi^{-1} \frac{1}{i\hbar} [a(X,D), b(X,D)] T_\Phi.$

Now, the pseudodifferential calculus (as discussed in this previous post) tells us (heuristically, at least) that

$\displaystyle \frac{1}{i\hbar} [a(X,D), b(X,D)] \approx \{ a, b \}(X,D)$

and

$\displaystyle \frac{1}{i\hbar} [(a \circ \Phi)(X,D), (b \circ \Phi)(X,D)] \approx \{ a \circ \Phi, b \circ \Phi \}(X,D)$

where ${\{,\}}$ is the Poisson bracket. Comparing this with (2), we are then led to the compatibility condition

$\displaystyle \{ a \circ \Phi, b \circ \Phi \} \approx \{ a, b \} \circ \Phi,$

thus ${\Phi}$ needs to preserve (approximately, at least) the Poisson bracket, or equivalently ${\Phi}$ needs to be a symplectomorphism (again, approximately at least).

Now suppose that ${\Phi: T^* {\bf R}^n \rightarrow T^* {\bf R}^n}$ is a symplectomorphism. This is morally equivalent to the graph ${\Sigma := \{ (z, \Phi(z)): z \in T^* {\bf R}^n \}}$ being a Lagrangian submanifold of ${T^* {\bf R}^n \times T^* {\bf R}^n}$ (where we give the second copy of phase space the negative ${-\omega}$ of the usual symplectic form ${\omega}$, thus yielding ${\omega \oplus -\omega}$ as the full symplectic form on ${T^* {\bf R}^n \times T^* {\bf R}^n}$; this is another instantiation of the closed graph theorem, as mentioned in this previous post. This graph is known as the canonical relation for the (putative) FIO that is associated to ${\Phi}$. To understand what it means for this graph to be Lagrangian, we coordinatise ${T^* {\bf R}^n \times T^* {\bf R}^n}$ as ${(x,\xi,y,\eta)}$ suppose temporarily that this graph was (locally, at least) a smooth graph in the ${x}$ and ${y}$ variables, thus

$\displaystyle \Sigma = \{ (x, F(x,y), y, G(x,y)): x, y \in {\bf R}^n \}$

for some smooth functions ${F, G: {\bf R}^n \rightarrow {\bf R}^n}$. A brief computation shows that the Lagrangian property of ${\Sigma}$ is then equivalent to the compatibility conditions

$\displaystyle \frac{\partial F_i}{\partial x_j} = \frac{\partial F_j}{\partial x_i}$

$\displaystyle \frac{\partial G_i}{\partial y_j} = \frac{\partial G_j}{\partial y_i}$

$\displaystyle \frac{\partial F_i}{\partial y_j} = - \frac{\partial G_j}{\partial x_i}$

for ${i,j=1,\ldots,n}$, where ${F_1,\ldots,F_n, G_1,\ldots,G_n}$ denote the components of ${F,G}$. Some Fourier analysis (or Hodge theory) lets us solve these equations as

$\displaystyle F_i = -\frac{\partial \phi}{\partial x_i}; \quad G_j = \frac{\partial \phi}{\partial y_j}$

for some smooth potential function ${\phi: {\bf R}^n \times {\bf R}^n \rightarrow {\bf R}}$. Thus, we have parameterised our graph ${\Sigma}$ as

$\displaystyle \Sigma = \{ (x, -\nabla_x \phi(x,y), y, \nabla_y \phi(x,y)): x,y \in {\bf R}^n \} \ \ \ \ \ (3)$

so that ${\Phi}$ maps ${(x, -\nabla_x \phi(x,y))}$ to ${(y, \nabla_y \phi(x,y))}$.

A reasonable candidate for an operator associated to ${\Phi}$ and ${\Sigma}$ in this fashion is the oscillatory integral operator

$\displaystyle Tf(y) := \frac{1}{(2\pi \hbar)^{n/2}} \int_{{\bf R}^n} e^{i \phi(x,y)/\hbar} a(x,y) f(x)\ dx \ \ \ \ \ (4)$

for some smooth amplitude function ${a}$ (note that the Fourier transform is the special case when ${a=1}$ and ${\phi(x,y)=xy}$, which helps explain the genesis of the term “Fourier integral operator”). Indeed, if one computes an inner product ${\int_{{\bf R}^n} Tf(y) \overline{g(y)}\ dy}$ for gaussian wave packets ${f, g}$ of the form (1) and localised in phase space near ${(x_0,\xi_0), (y_0,\eta_0)}$ respectively, then a Taylor expansion of ${\phi}$ around ${(x_0,y_0)}$, followed by a stationary phase computation, shows (again heuristically, and assuming ${\phi}$ is suitably non-degenerate) that ${T}$ has (3) as its canonical relation. (Furthermore, a refinement of this stationary phase calculation suggests that if ${a}$ is normalised to be the half-density ${|\det \nabla_x \nabla_y \phi|^{1/2}}$, then ${T}$ should be approximately unitary.) As such, we view (4) as an example of a Fourier integral operator (assuming various smoothness and non-degeneracy hypotheses on the phase ${\phi}$ and amplitude ${a}$ which we do not detail here).

Of course, it may be the case that ${\Sigma}$ is not a graph in the ${x,y}$ coordinates (for instance, the key examples of translation, modulation, and dilation are not of this form), but then it is often a graph in some other pair of coordinates, such as ${\xi,y}$. In that case one can compose the oscillatory integral construction given above with a Fourier transform, giving another class of FIOs of the form

$\displaystyle Tf(y) := \frac{1}{(2\pi \hbar)^{n/2}} \int_{{\bf R}^n} e^{i \phi(\xi,y)/\hbar} a(\xi,y) \hat f(\xi)\ d\xi. \ \ \ \ \ (5)$

This class of FIOs covers many important cases; for instance, the translation, modulation, and dilation operators considered earlier can be written in this form after some Fourier analysis. Another typical example is the half-wave propagator ${T := e^{it \sqrt{-\Delta}}}$ for some time ${t \in {\bf R}}$, which can be written in the form

$\displaystyle Tf(y) = \frac{1}{(2\pi \hbar)^{n/2}} \int_{{\bf R}^n} e^{i (\xi \cdot y + t |\xi|)/\hbar} a(\xi,y) \hat f(\xi)\ d\xi.$

This corresponds to the phase space transformation ${(x,\xi) \mapsto (x+t|\xi|, \xi)}$, which can be viewed as the classical propagator associated to the “quantum” propagator ${e^{it\sqrt{-\Delta}}}$. More generally, propagators for linear Hamiltonian partial differential equations can often be expressed (at least approximately) by Fourier integral operators corresponding to the propagator of the associated classical Hamiltonian flow associated to the symbol of the Hamiltonian operator ${H}$; this leads to an important mathematical formalisation of the correspondence principle between quantum mechanics and classical mechanics, that is one of the foundations of microlocal analysis and which was extensively developed in Hörmander’s work. (More recently, numerically stable versions of this theory have been developed to allow for rapid and accurate numerical solutions to various linear PDE, for instance through Emmanuel Candés’ theory of curvelets, so the theory that Hörmander built now has some quite significant practical applications in areas such as geology.)

In some cases, the canonical relation ${\Sigma}$ may have some singularities (such as fold singularities) which prevent it from being written as graphs in the previous senses, but the theory for defining FIOs even in these cases, and in developing their calculus, is now well established, in large part due to the foundational work of Hörmander.

Many structures in mathematics are incomplete in one or more ways. For instance, the field of rationals ${{\bf Q}}$ or the reals ${{\bf R}}$ are algebraically incomplete, because there are some non-trivial algebraic equations (such as ${x^2=2}$ in the case of the rationals, or ${x^2=-1}$ in the case of the reals) which could potentially have solutions (because they do not imply a necessarily false statement, such as ${1=0}$, just using the laws of algebra), but do not actually have solutions in the specified field.

Similarly, the rationals ${{\bf Q}}$, when viewed now as a metric space rather than as a field, are also metrically incomplete, beause there exist sequences in the rationals (e.g. the decimal approximations ${3, 3.1, 3.14, 3.141, \ldots}$ of the irrational number ${\pi}$) which could potentially converge to a limit (because they form a Cauchy sequence), but do not actually converge in the specified metric space.

A third type of incompleteness is that of logical incompleteness, which applies now to formal theories rather than to fields or metric spaces. For instance, Zermelo-Frankel-Choice (ZFC) set theory is logically incomplete, because there exist statements (such as the consistency of ZFC) which could potentially be provable by the theory (because it does not lead to a contradiction, or at least so we believe, just from the axioms and deductive rules of the theory), but is not actually provable in this theory.

A fourth type of incompleteness, which is slightly less well known than the above three, is what I will call elementary incompleteness (and which model theorists call the failure of the countable saturation property). It applies to any structure that is describable by a first-order language, such as a field, a metric space, or a universe of sets. For instance, in the language of ordered real fields, the real line ${{\bf R}}$ is elementarily incomplete, because there exists a sequence of statements (such as the statements ${0 < x < 1/n}$ for natural numbers ${n=1,2,\ldots}$) in this language which are potentially simultaneously satisfiable (in the sense that any finite number of these statements can be satisfied by some real number ${x}$) but are not actually simultaneously satisfiable in this theory.

In each of these cases, though, it is possible to start with an incomplete structure and complete it to a much larger structure to eliminate the incompleteness. For instance, starting with an arbitrary field ${k}$, one can take its algebraic completion (or algebraic closure) ${\overline{k}}$; for instance, ${{\bf C} = \overline{{\bf R}}}$ can be viewed as the algebraic completion of ${{\bf R}}$. This field is usually significantly larger than the original field ${k}$, but contains ${k}$ as a subfield, and every element of ${\overline{k}}$ can be described as the solution to some polynomial equation with coefficients in ${k}$. Furthermore, ${\overline{k}}$ is now algebraically complete (or algebraically closed): every polynomial equation in ${\overline{k}}$ which is potentially satisfiable (in the sense that it does not lead to a contradiction such as ${1=0}$ from the laws of algebra), is actually satisfiable in ${\overline{k}}$.

Similarly, starting with an arbitrary metric space ${X}$, one can take its metric completion ${\overline{X}}$; for instance, ${{\bf R} = \overline{{\bf Q}}}$ can be viewed as the metric completion of ${{\bf Q}}$. Again, the completion ${\overline{X}}$ is usually much larger than the original metric space ${X}$, but contains ${X}$ as a subspace, and every element of ${\overline{X}}$ can be described as the limit of some Cauchy sequence in ${X}$. Furthermore, ${\overline{X}}$ is now a complete metric space: every sequence in ${\overline{X}}$ which is potentially convergent (in the sense of being a Cauchy sequence), is now actually convegent in ${\overline{X}}$.

In a similar vein, we have the Gödel completeness theorem, which implies (among other things) that for any consistent first-order theory ${T}$ for a first-order language ${L}$, there exists at least one completion ${\overline{T}}$ of that theory ${T}$, which is a consistent theory in which every sentence in ${L}$ which is potentially true in ${\overline{T}}$ (because it does not lead to a contradiction in ${\overline{T}}$) is actually true in ${\overline{T}}$. Indeed, the completeness theorem provides at least one model (or structure) ${{\mathfrak U}}$ of the consistent theory ${T}$, and then the completion ${\overline{T} = \hbox{Th}({\mathfrak U})}$ can be formed by interpreting every sentence in ${L}$ using ${{\mathfrak U}}$ to determine its truth value. Note, in contrast to the previous two examples, that the completion is usually not unique in any way; a theory ${T}$ can have multiple inequivalent models ${{\mathfrak U}}$, giving rise to distinct completions of the same theory.

Finally, if one starts with an arbitrary structure ${{\mathfrak U}}$, one can form an elementary completion ${{}^* {\mathfrak U}}$ of it, which is a significantly larger structure which contains ${{\mathfrak U}}$ as a substructure, and such that every element of ${{}^* {\mathfrak U}}$ is an elementary limit of a sequence of elements in ${{\mathfrak U}}$ (I will define this term shortly). Furthermore, ${{}^* {\mathfrak U}}$ is elementarily complete; any sequence of statements that are potentially simultaneously satisfiable in ${{}^* {\mathfrak U}}$ (in the sense that any finite number of statements in this collection are simultaneously satisfiable), will actually be simultaneously satisfiable. As we shall see, one can form such an elementary completion by taking an ultrapower of the original structure ${{\mathfrak U}}$. If ${{\mathfrak U}}$ is the standard universe of all the standard objects one considers in mathematics, then its elementary completion ${{}^* {\mathfrak U}}$ is known as the nonstandard universe, and is the setting for nonstandard analysis.

As mentioned earlier, completion tends to make a space much larger and more complicated. If one algebraically completes a finite field, for instance, one necessarily obtains an infinite field as a consequence. If one metrically completes a countable metric space with no isolated points, such as ${{\bf Q}}$, then one necessarily obtains an uncountable metric space (thanks to the Baire category theorem). If one takes a logical completion of a consistent first-order theory that can model true arithmetic, then this completion is no longer describable by a recursively enumerable schema of axioms, thanks to Gödel’s incompleteness theorem. And if one takes the elementary completion of a countable structure, such as the integers ${{\bf Z}}$, then the resulting completion ${{}^* {\bf Z}}$ will necessarily be uncountable.

However, there are substantial benefits to working in the completed structure which can make it well worth the massive increase in size. For instance, by working in the algebraic completion of a field, one gains access to the full power of algebraic geometry. By working in the metric completion of a metric space, one gains access to powerful tools of real analysis, such as the Baire category theorem, the Heine-Borel theorem, and (in the case of Euclidean completions) the Bolzano-Weierstrass theorem. By working in a logically and elementarily completed theory (aka a saturated model) of a first-order theory, one gains access to the branch of model theory known as definability theory, which allows one to analyse the structure of definable sets in much the same way that algebraic geometry allows one to analyse the structure of algebraic sets. Finally, when working in an elementary completion of a structure, one gains a sequential compactness property, analogous to the Bolzano-Weierstrass theorem, which can be interpreted as the foundation for much of nonstandard analysis, as well as providing a unifying framework to describe various correspondence principles between finitary and infinitary mathematics.

In this post, I wish to expand upon these above points with regard to elementary completion, and to present nonstandard analysis as a completion of standard analysis in much the same way as, say, complex algebra is a completion of real algebra, or real metric geometry is a completion of rational metric geometry.

In a previous post, we discussed the Szemerédi regularity lemma, and how a given graph could be regularised by partitioning the vertex set into random neighbourhoods. More precisely, we gave a proof of

Lemma 1 (Regularity lemma via random neighbourhoods) Let ${\varepsilon > 0}$. Then there exists integers ${M_1,\ldots,M_m}$ with the following property: whenever ${G = (V,E)}$ be a graph on finitely many vertices, if one selects one of the integers ${M_r}$ at random from ${M_1,\ldots,M_m}$, then selects ${M_r}$ vertices ${v_1,\ldots,v_{M_r} \in V}$ uniformly from ${V}$ at random, then the ${2^{M_r}}$ vertex cells ${V^{M_r}_1,\ldots,V^{M_r}_{2^{M_r}}}$ (some of which can be empty) generated by the vertex neighbourhoods ${A_t := \{ v \in V: (v,v_t) \in E \}}$ for ${1 \leq t \leq M_r}$, will obey the regularity property

$\displaystyle \sum_{(V_i,V_j) \hbox{ not } \varepsilon-\hbox{regular}} |V_i| |V_j| \leq \varepsilon |V|^2 \ \ \ \ \ (1)$

with probability at least ${1-O(\varepsilon)}$, where the sum is over all pairs ${1 \leq i \leq j \leq k}$ for which ${G}$ is not ${\varepsilon}$-regular between ${V_i}$ and ${V_j}$. [Recall that a pair ${(V_i,V_j)}$ is ${\varepsilon}$-regular for ${G}$ if one has

$\displaystyle |d( A, B ) - d( V_i, V_j )| \leq \varepsilon$

for any ${A \subset V_i}$ and ${B \subset V_j}$ with ${|A| \geq \varepsilon |V_i|, |B| \geq \varepsilon |V_j|}$, where ${d(A,B) := |E \cap (A \times B)|/|A| |B|}$ is the density of edges between ${A}$ and ${B}$.]

The proof was a combinatorial one, based on the standard energy increment argument.

In this post I would like to discuss an alternate approach to the regularity lemma, which is an infinitary approach passing through a graph-theoretic version of the Furstenberg correspondence principle (mentioned briefly in this earlier post of mine). While this approach superficially looks quite different from the combinatorial approach, it in fact uses many of the same ingredients, most notably a reliance on random neighbourhoods to regularise the graph. This approach was introduced by myself back in 2006, and used by Austin and by Austin and myself to establish some property testing results for hypergraphs; more recently, a closely related infinitary hypergraph removal lemma developed in the 2006 paper was also used by Austin to give new proofs of the multidimensional Szemeredi theorem and of the density Hales-Jewett theorem (the latter being a spinoff of the polymath1 project).

For various technical reasons we will not be able to use the correspondence principle to recover Lemma 1 in its full strength; instead, we will establish the following slightly weaker variant.

Lemma 2 (Regularity lemma via random neighbourhoods, weak version) Let ${\varepsilon > 0}$. Then there exist an integer ${M_*}$ with the following property: whenever ${G = (V,E)}$ be a graph on finitely many vertices, there exists ${1 \leq M \leq M_*}$ such that if one selects ${M}$ vertices ${v_1,\ldots,v_{M} \in V}$ uniformly from ${V}$ at random, then the ${2^{M}}$ vertex cells ${V^{M}_1,\ldots,V^{M}_{2^{M}}}$ generated by the vertex neighbourhoods ${A_t := \{ v \in V: (v,v_t) \in E \}}$ for ${1 \leq t \leq M}$, will obey the regularity property (1) with probability at least ${1-\varepsilon}$.

Roughly speaking, Lemma 1 asserts that one can regularise a large graph ${G}$ with high probability by using ${M_r}$ random neighbourhoods, where ${M_r}$ is chosen at random from one of a number of choices ${M_1,\ldots,M_m}$; in contrast, the weaker Lemma 2 asserts that one can regularise a large graph ${G}$ with high probability by using some integer ${M}$ from ${1,\ldots,M_*}$, but the exact choice of ${M}$ depends on ${G}$, and it is not guaranteed that a randomly chosen ${M}$ will be likely to work. While Lemma 2 is strictly weaker than Lemma 1, it still implies the (weighted) Szemerédi regularity lemma (Lemma 2 from the previous post).

This month I am at MSRI, for the programs of Ergodic Theory and Additive Combinatorics, and Analysis on Singular Spaces, that are currently ongoing here.  This week I am giving three lectures on the correspondence principle, and on finitary versions of ergodic theory, for the introductory workshop in the former program.  The article here is broadly describing the content of these talks (which are slightly different in theme from that announced in the abstract, due to some recent developments).  [These lectures were also recorded on video and should be available from the MSRI web site within a few months.]

In this lecture, we describe the simple but fundamental Furstenberg correspondence principle which connects the “soft analysis” subject of ergodic theory (in particular, recurrence theorems) with the “hard analysis” subject of combinatorial number theory (or more generally with results of “density Ramsey theory” type). Rather than try to set up the most general and abstract version of this principle, we shall instead study the canonical example of this principle in action, namely the equating of the Furstenberg multiple recurrence theorem with Szemerédi’s theorem on arithmetic progressions.
Read the rest of this entry »

I’ve just uploaded to the arXiv my joint paper with Tim Austin, “On the testability and repair of hereditary hypergraph properties“, which has been submitted to Random Structures and Algorithms. In this paper we prove some positive and negative results for the testability (and the local repairability) of various properties of directed or undirected graphs and hypergraphs, which can be either monochromatic or multicoloured.

The negative results have already been discussed in a previous posting of mine, so today I will focus on the positive results. The property testing results here are finitary results, but it turns out to be rather convenient to use a certain correspondence principle (the hypergraph version of the Furstenberg correspondence principle) to convert the question into one about exchangeable probability measures on spaces of hypergraphs (i.e. on random hypergraphs whose probability distribution is invariant under exchange of vertices). Such objects are also closely related to the”graphons” and “hypergraphons” that emerge as graph limits, as studied by Lovasz-Szegedy, Elek-Szegedy, and others. Somewhat amusingly, once one does so, it then becomes convenient to keep track of objects indexed by vertex sets and how they are exchanged via the language of category theory, and in particular using the concept of a natural transformation to describe such objects as exchangeable measures, graph colourings, and local modification rules. I will try to sketch out some of these connections, after describing the main positive results.

I’ve just uploaded to the ArXiV my paper “Norm convergence of multiple ergodic averages for commuting transformations“, submitted to
Ergodic Theory and Dynamical Systems. This paper settles in full generality the norm convergence problem for several commuting transformations. Specifically, if $(X, {\mathcal X},\mu)$ is a probability space and $T_1, \ldots, T_l: X \to X$ are commuting measure-preserving transformations, then for any bounded measurable functions $f_1, \ldots, f_l: X \to {\Bbb R}$, the multiple average

$\frac{1}{N} \sum_{n=0}^{N-1} \int_X T_1^n f_1 \ldots T_l^n f_l$ (1)

is convergent in the $L^2(X)$ norm topology (and thus also converges in probability). The corresponding question of pointwise almost everywhere convergence remains open (and quite difficult, in my opinion). My argument also does not readily establish a formula as to what this limit actually is (it really establishes that the sequence is Cauchy in $L^2$ rather than convergent).

The l=1 case of this theorem is the classical mean ergodic theorem (also known as the von Neumann ergodic theorem). The l=2 case was established by Conze and Lesigne. The higher l case was partially resolved by Frantzikinakis and Kra, under the additional hypotheses that all of the transformations $T_i$, as well as the quotients $T_i T_j^{-1}$, are ergodic. The special case $T_j=T^j$ was established by Host-Kra (with another proof given subsequently by Ziegler). Another relevant result is the Furstenberg-Katznelson theorem, which asserts among other things when $f_1=\ldots=f_l$ is non-negative and not identically zero, then the inner product of the expression (1) with f has a strictly positive limit inferior as $N \to \infty$. This latter result also implies Szemerédi’s theorem.

It is also known that the Furstenberg-Katznelson theorem can be proven by hypergraph methods, and in fact my paper also proceeds by a hypergraph-inspired approach, although the language of hypergraphs is not explicitly used in the body of the argument. (In contrast to the work of Host-Kra and Ziegler, no nilsystems appear in the proof.)

In this second lecture, I wish to talk about the dichotomy between structure and randomness as it manifests itself in four closely related areas of mathematics:

• Combinatorial number theory, which seeks to find patterns in unstructured dense sets (or colourings) of integers;
• Ergodic theory (or more specifically, multiple recurrence theory), which seeks to find patterns in positive-measure sets under the action of a discrete dynamical system on probability spaces (or more specifically, measure-preserving actions of the integers ${\Bbb Z}$);
• Graph theory, or more specifically the portion of this theory concerned with finding patterns in large unstructured dense graphs; and
• Ergodic graph theory, which is a very new and undeveloped subject, which roughly speaking seems to be concerned with the patterns within a measure-preserving action of the infinite permutation group $S_\infty$, which is one of several models we have available to study infinite “limits” of graphs.

The two “discrete” (or “finitary”, or “quantitative”) fields of combinatorial number theory and graph theory happen to be related to each other, basically by using the Cayley graph construction; I will give an example of this shortly. The two “continuous” (or “infinitary”, or “qualitative”) fields of ergodic theory and ergodic graph theory are at present only related on the level of analogy and informal intuition, but hopefully some more systematic connections between them will appear soon.

On the other hand, we have some very rigorous connections between combinatorial number theory and ergodic theory, and also (more recently) between graph theory and ergodic graph theory, basically by the procedure of viewing the infinitary continuous setting as a limit of the finitary discrete setting. These two connections go by the names of the Furstenberg correspondence principle and the graph correspondence principle respectively. These principles allow one to tap the power of the infinitary world (for instance, the ability to take limits and perform completions or closures of objects) in order to establish results in the finitary world, or at least to take the intuition gained in the infinitary world and transfer it to a finitary setting. Conversely, the finitary world provides an excellent model setting to refine one’s understanding of infinitary objects, for instance by establishing quantitative analogues of “soft” results obtained in an infinitary manner. I will remark here that this best-of-both-worlds approach, borrowing from both the finitary and infinitary traditions of mathematics, was absolutely necessary for Ben Green and I in order to establish our result on long arithmetic progressions in the primes. In particular, the infinitary setting is excellent for being able to rigorously define and study concepts (such as structure or randomness) which are much “fuzzier” and harder to pin down exactly in the finitary world.