A recurring theme in mathematics is that of duality: a mathematical object {X} can either be described internally (or in physical space, or locally), by describing what {X} physically consists of (or what kind of maps exist into {X}), or externally (or in frequency space, or globally), by describing what {X} globally interacts or resonates with (or what kind of maps exist out of {X}). These two fundamentally opposed perspectives on the object {X} are often dual to each other in various ways: performing an operation on {X} may transform it one way in physical space, but in a dual way in frequency space, with the frequency space description often being a “inversion” of the physical space description. In several important cases, one is fortunate enough to have some sort of fundamental theorem connecting the internal and external perspectives. Here are some (closely inter-related) examples of this perspective:

  1. Vector space duality A vector space {V} over a field {F} can be described either by the set of vectors inside {V}, or dually by the set of linear functionals {\lambda: V \rightarrow F} from {V} to the field {F} (or equivalently, the set of vectors inside the dual space {V^*}). (If one is working in the category of topological vector spaces, one would work instead with continuous linear functionals; and so forth.) A fundamental connection between the two is given by the Hahn-Banach theorem (and its relatives).
  2. Vector subspace duality In a similar spirit, a subspace {W} of {V} can be described either by listing a basis or a spanning set, or dually by a list of linear functionals that cut out that subspace (i.e. a spanning set for the orthogonal complement {W^\perp := \{ \lambda \in V^*: \lambda(w)=0 \hbox{ for all } w \in W \})}. Again, the Hahn-Banach theorem provides a fundamental connection between the two perspectives.
  3. Convex duality More generally, a (closed, bounded) convex body {K} in a vector space {V} can be described either by listing a set of (extreme) points whose convex hull is {K}, or else by listing a set of (irreducible) linear inequalities that cut out {K}. The fundamental connection between the two is given by the Farkas lemma.
  4. Ideal-variety duality In a slightly different direction, an algebraic variety {V} in an affine space {A^n} can be viewed either “in physical space” or “internally” as a collection of points in {V}, or else “in frequency space” or “externally” as a collection of polynomials on {A^n} whose simultaneous zero locus cuts out {V}. The fundamental connection between the two perspectives is given by the nullstellensatz, which then leads to many of the basic fundamental theorems in classical algebraic geometry.
  5. Hilbert space duality An element {v} in a Hilbert space {H} can either be thought of in physical space as a vector in that space, or in momentum space as a covector {w \mapsto \langle v, w \rangle} on that space. The fundamental connection between the two is given by the Riesz representation theorem for Hilbert spaces.
  6. Semantic-syntactic duality Much more generally still, a mathematical theory can either be described internally or syntactically via its axioms and theorems, or externally or semantically via its models. The fundamental connection between the two perspectives is given by the Gödel completeness theorem.
  7. Intrinsic-extrinsic duality A (Riemannian) manifold {M} can either be viewed intrinsically (using only concepts that do not require an ambient space, such as the Levi-Civita connection), or extrinsically, for instance as the level set of some defining function in an ambient space. Some important connections between the two perspectives includes the Nash embedding theorem and the theorema egregium.
  8. Group duality A group {G} can be described either via presentations (lists of generators, together with relations between them) or representations (realisations of that group in some more concrete group of transformations). A fundamental connection between the two is Cayley’s theorem. Unfortunately, in general it is difficult to build upon this connection (except in special cases, such as the abelian case), and one cannot always pass effortlessly from one perspective to the other.
  9. Pontryagin group duality A (locally compact Hausdorff) abelian group {G} can be described either by listing its elements {g \in G}, or by listing the characters {\chi: G \rightarrow {\bf R}/{\bf Z}} (i.e. continuous homomorphisms from {G} to the unit circle, or equivalently elements of {\hat G}). The connection between the two is the focus of abstract harmonic analysis.
  10. Pontryagin subgroup duality A subgroup {H} of a locally compact abelian group {G} can be described either by generators in {H}, or generators in the orthogonal complement {H^\perp := \{ \xi \in \hat G: \xi \cdot h = 0 \hbox{ for all } h \in H \}}. One of the fundamental connections between the two is the Poisson summation formula.
  11. Fourier duality A (sufficiently nice) function {f: G \rightarrow {\bf C}} on a locally compact abelian group {G} (equipped with a Haar measure {\mu}) can either be described in physical space (by its values {f(x)} at each element {x} of {G}) or in frequency space (by the values {\hat f(\xi) = \int_G f(x) e( - \xi \cdot x )\ d\mu(x)} at elements {\xi} of the Pontryagin dual {\hat G}). The fundamental connection between the two is the Fourier inversion formula.
  12. The uncertainty principle The behaviour of a function {f} at physical scales above (resp. below) a certain scale {R} is almost completely controlled by the behaviour of its Fourier transform {\hat f} at frequency scales below (resp. above) the dual scale {1/R} and vice versa, thanks to various mathematical manifestations of the uncertainty principle. (The Poisson summation formula can also be viewed as a variant of this principle, using subgroups instead of scales.)
  13. Stone/Gelfand duality A (locally compact Hausdorff) topological space {X} can be viewed in physical space (as a collection of points), or dually, via the {C^*} algebra {C(X)} of continuous complex-valued functions on that space, or (in the case when {X} is compact and totally disconnected) via the boolean algebra of clopen sets (or equivalently, the idempotents of {C(X)}). The fundamental connection between the two is given by the Stone representation theorem or the (commutative) Gelfand-Naimark theorem.

I have discussed a fair number of these examples in previous blog posts (indeed, most of the links above are to my own blog). In this post, I would like to discuss the uncertainty principle, that describes the dual relationship between physical space and frequency space. There are various concrete formalisations of this principle, most famously the Heisenberg uncertainty principle and the Hardy uncertainty principle – but in many situations, it is the heuristic formulation of the principle that is more useful and insightful than any particular rigorous theorem that attempts to capture that principle. Unfortunately, it is a bit tricky to formulate this heuristic in a succinct way that covers all the various applications of that principle; the Heisenberg inequality {\Delta x \cdot \Delta \xi \gtrsim 1} is a good start, but it only captures a portion of what the principle tells us. Consider for instance the following (deliberately vague) statements, each of which can be viewed (heuristically, at least) as a manifestation of the uncertainty principle:

  1. A function which is band-limited (restricted to low frequencies) is featureless and smooth at fine scales, but can be oscillatory (i.e. containing plenty of cancellation) at coarse scales. Conversely, a function which is smooth at fine scales will be almost entirely restricted to low frequencies.
  2. A function which is restricted to high frequencies is oscillatory at fine scales, but is negligible at coarse scales. Conversely, a function which is oscillatory at fine scales will be almost entirely restricted to high frequencies.
  3. Projecting a function to low frequencies corresponds to averaging out (or spreading out) that function at fine scales, leaving only the coarse scale behaviour.
  4. Projecting a frequency to high frequencies corresponds to removing the averaged coarse scale behaviour, leaving only the fine scale oscillation.
  5. The number of degrees of freedom of a function is bounded by the product of its spatial uncertainty and its frequency uncertainty (or more generally, by the volume of the phase space uncertainty). In particular, there are not enough degrees of freedom for a non-trivial function to be simulatenously localised to both very fine scales and very low frequencies.
  6. To control the coarse scale (or global) averaged behaviour of a function, one essentially only needs to know the low frequency components of the function (and vice versa).
  7. To control the fine scale (or local) oscillation of a function, one only needs to know the high frequency components of the function (and vice versa).
  8. Localising a function to a region of physical space will cause its Fourier transform (or inverse Fourier transform) to resemble a plane wave on every dual region of frequency space.
  9. Averaging a function along certain spatial directions or at certain scales will cause the Fourier transform to become localised to the dual directions and scales. The smoother the averaging, the sharper the localisation.
  10. The smoother a function is, the more rapidly decreasing its Fourier transform (or inverse Fourier transform) is (and vice versa).
  11. If a function is smooth or almost constant in certain directions or at certain scales, then its Fourier transform (or inverse Fourier transform) will decay away from the dual directions or beyond the dual scales.
  12. If a function has a singularity spanning certain directions or certain scales, then its Fourier transform (or inverse Fourier transform) will decay slowly along the dual directions or within the dual scales.
  13. Localisation operations in position approximately commute with localisation operations in frequency so long as the product of the spatial uncertainty and the frequency uncertainty is significantly larger than one.
  14. In the high frequency (or large scale) limit, position and frequency asymptotically behave like a pair of classical observables, and partial differential equations asymptotically behave like classical ordinary differential equations. At lower frequencies (or finer scales), the former becomes a “quantum mechanical perturbation” of the latter, with the strength of the quantum effects increasing as one moves to increasingly lower frequencies and finer spatial scales.
  15. Etc., etc.
  16. Almost all of the above statements generalise to other locally compact abelian groups than {{\bf R}} or {{\bf R}^n}, in which the concept of a direction or scale is replaced by that of a subgroup or an approximate subgroup. (In particular, as we will see below, the Poisson summation formula can be viewed as another manifestation of the uncertainty principle.)

I think of all of the above (closely related) assertions as being instances of “the uncertainty principle”, but it seems difficult to combine them all into a single unified assertion, even at the heuristic level; they seem to be better arranged as a cloud of tightly interconnected assertions, each of which is reinforced by several of the others. The famous inequality {\Delta x \cdot \Delta \xi \gtrsim 1} is at the centre of this cloud, but is by no means the only aspect of it.

The uncertainty principle (as interpreted in the above broad sense) is one of the most fundamental principles in harmonic analysis (and more specifically, to the subfield of time-frequency analysis), second only to the Fourier inversion formula (and more generally, Plancherel’s theorem) in importance; understanding this principle is a key piece of intuition in the subject that one has to internalise before one can really get to grips with this subject (and also with closely related subjects, such as semi-classical analysis and microlocal analysis). Like many fundamental results in mathematics, the principle is not actually that difficult to understand, once one sees how it works; and when one needs to use it rigorously, it is usually not too difficult to improvise a suitable formalisation of the principle for the occasion. But, given how vague this principle is, it is difficult to present this principle in a traditional “theorem-proof-remark” manner. Even in the more informal format of a blog post, I was surprised by how challenging it was to describe my own understanding of this piece of mathematics in a linear fashion, despite (or perhaps because of) it being one of the most central and basic conceptual tools in my own personal mathematical toolbox. In the end, I chose to give below a cloud of interrelated discussions about this principle rather than a linear development of the theory, as this seemed to more closely align with the nature of this principle.

— 1. An informal foundation for the uncertainty principle —

Many of the manifestations of the uncertainty principle can be heuristically derived from the following informal heuristic:

Heuristic 1 (Phase heuristic) If the phase {\phi(x)} of a complex exponential {e^{2\pi i \phi(x)}} fluctuates by less than {1} for {x} in some nice domain {\Omega} (e.g. a convex set, or more generally an approximate subgroup), then the phase {e^{2\pi i \phi(x)}} behaves as if it were constant on {\Omega}. If instead the phase fluctuates by much more than {1}, then {e^{2\pi i \phi(x)}} should oscillate and exhibit significant cancellation. The more the phase fluctuates, the more oscillation and cancellation becomes present.

For instance, according to this heuristic, on an interval {[-R,R]} in the real line, the linear phase {x \mapsto e^{2\pi i \xi x}} at a given frequency {\xi \in {\bf R}} behaves like a constant when {|\xi| \ll 1/R}, but oscillates significantly when {|\xi| \gg 1/R}. This is visually plausible if one graphs the real and imaginary parts {\cos(2\pi i \xi x)}, {\sin(2 \pi i \xi x)}. For now, we will take this principle as axiomatic, without further justification, and without further elaboration as to what vague terms such as “behaves as if” or {\ll} mean.

We remark in passing that, the above heuristic can also be viewed as the informal foundation for the principle of stationary phase. This is not coincidental, but will not be the focus of the discussion here.

Let’s give a few examples to illustrate how this heuristic informally implies some versions of the uncertainty principle. Suppose for instance that a function {f: {\bf R} \rightarrow {\bf C}} is supported in an interval {[-R,R]}. Now consider the Fourier transform

\displaystyle  \hat f(\xi) := \int_{\bf R} e^{-2\pi i x\xi} f(x)\ dx = \int_{-R}^R e^{-2\pi i x \xi} f(x)\ dx.

(Other normalisations of the Fourier transform are possible, but this does not significantly affect the discussion here.) We assume that the function is nice enough (e.g. absolutely integrable will certainly suffice) that one can define the Fourier transform without difficulty.

If {|\xi| \ll 1/R}, then the phase {x\xi} fluctuates by less than {1} on the domain {x \in [-R,R]}, and so the phase here is essentially constant by the above heuristic; in particular, we expect the Fourier transform {\hat f(\xi)} to not vary much in this interval. More generally, if we consider frequencies {\xi} in an interval {|\xi-\xi_0| \ll 1/R} for a fixed {\xi_0}, then on separating {e^{-2\pi i x \xi}} as {e^{-2\pi i x \xi_0} \times e^{-2\pi i x (\xi-\xi_0)}}, the latter phase {x (\xi-\xi_0)} is essentially constant by the above heuristic, and so we expect {\hat f(\xi)} to not vary much in this interval either. Thus {\hat f(\xi)} is close to constant at scales much finer than {1/R}, just as the uncertainty principle predicts.

A similar heuristic calculation using the Fourier inversion formula

\displaystyle  f(x) = \int_{\bf R} e^{2\pi i x\xi} \hat f(\xi)\ d\xi

shows that if the Fourier transform {\hat f(\xi)} is restricted to an interval {[-N, N]}, then the function {f} should behave roughly like a constant at scales {\ll 1/N}. A bit more generally, if the Fourier transform is restricted to an interval {[\xi_0-N, \xi_0+N]}, then by separating {e^{2\pi i x \xi}} as {e^{2\pi i x_0 \xi_0} e^{2\pi i (x-x_0) \xi} e^{2\pi i x_0 (\xi-\xi_0)} e^{2\pi i (x-x_0)(\xi-\xi_0)}} and discarding the last phase when {|x-x_0| \ll 1/N}, we see that the function {f} behaves like a constant multiple of the plane wave {x \mapsto e^{2\pi i x \xi_0}} on each interval {\{ x: |x-x_0| \ll 1/N \}} (but it could be a different constant multiple on each such interval).

The same type of heuristic computation can be carried through in higher dimensions. For instance, if a function {f: {\bf R}^n \rightarrow {\bf C}} has Fourier transform supported in some symmetric convex body {\Omega}, then one expects {f} itself to behave like a constant on any translate {x_0+c\Omega^*} of a small multiple {0 < c \ll 1} of the polar body

\displaystyle \Omega^* := \{ x \in {\bf R}^n: |x \cdot \xi| \leq 1 \hbox{ for all } \xi \in \Omega \}

of {\Omega}.

An important special case where the above heuristics are in fact exactly rigorous is when one does not work with approximate subgroups such as intervals {[-R,R]} or convex bodies {\Omega}, but rather with subgroups {H} of the ambient (locally compact abelian) group {G} that is serving as physical space. Here, of course, we need the general Fourier transform

\displaystyle  \hat f(\xi) := \int_G e^{-2\pi i \xi \cdot x} f(x)\ d\mu_G(x),

where {\mu_G} is a Haar measure on the locally compact abelian group {G}, where {\xi: x \mapsto \xi \cdot x} is a continuous homomorphism from {G} to {{\bf R}/{\bf Z}} (and is thus an element of the Pontryagin dual group {\hat G}), with Fourier transform given by the inversion formula

\displaystyle  f(x) = \int_{\hat G} e^{2\pi i \xi \cdot x} \hat f(\xi) d\mu_{\hat G}(\xi)

wheere {\mu_{\hat G}} is the dual Haar measure on {\hat G} (see e.g. my lecture notes for further discussion of this general theory). If {f} is supported on a subgroup {H} of {G} (this may require {f} to be a measure rather than a function, if {H} is a measure zero subgroup of {G}), we conclude (rigorously!) that {\hat f} is constant along cosets of the orthogonal complement

\displaystyle  \hat H := \{ \xi \in \hat G: \xi \cdot x = 0 \hbox{ for all } x \in H\}.

For instance, a measure {f} on {{\bf R}} that is supported on {{\bf Z}} will have a Fourier transform {\hat f} that is constant along the {{\bf Z}} direction, as {{\bf Z}} is its own orthogonal complement. This is a basic component of the Poisson summation formula. (The situation becomes more complicated if {f} is merely a distribution rather than a measure, but we will not discuss this technical issue here.)

Remark 1 Of course, in Euclidean domains such as {{\bf R}} or {{\bf R}^n}, basic sets such as the intervals {[-R,R]} are not actual subgroups, but are only approximate subgroups (roughly speaking, this means that they are closed under addition a “reasonable fraction of the time”; for a precise definition, see my book with Van Vu). However, there are dyadic models of Euclidean domains, such as the field {F((\frac{1}{t}))} of formal Laurent series in a variable {\frac{1}{t}} over a finite field {F}, in which the analogues of such intervals are in fact actual subgroups, which allows for a very precise and rigorous formalisation of many of the heuristics given here in that setting. See for instance these lecture notes of mine for more discussion.

One can view an interval such as {[-1/R,1/R]} as being an approximate orthogonal complement to the interval {[-R,R]}, and more generally the polar body {\Omega^*} as an approximate orthogonal complement to {\Omega}. Conversely, the uncertainty principle {\Delta x \cdot \Delta \xi \gg 1} when specialised to subgroups {H} of a finite abelian group {G} becomes the equality

\displaystyle  |H| \cdot |H^\perp| = |G|

and when specialised to subspaces {V} of a Euclidean space {{\bf R}^n} becomes

\displaystyle  \hbox{dim}(V) + \hbox{dim}(V^\perp) = \hbox{dim}({\bf R}^n).

We saw above that a function {f} that was restricted to a region {\Omega} would necessarily have a Fourier transform {\hat f} that was essentially constant on translates of (small multiples of) the dual region {\Omega^*}. This implication can be partially reversed. For instance, suppose that {\hat f} behaved like a constant at all scales {\ll N}. Then if one inspects the Fourier inversion formula

\displaystyle  f(x) = \int_{\bf R} \hat f(\xi) e^{2\pi i x \xi}\ d\xi

we note that if {|x| \gg 1/N}, then {e^{2\pi i x \xi}} oscillates at scales {\ll N} by the above heuristic, and so {f(x)} should be negligible when {|x| \gg 1/N}.

The above heuristic computations can be made rigorous in a number of ways. One basic method is to exploit the fundamental fact that the Fourier transform intertwines multiplication and convolution, thus

\displaystyle  \widehat{f*g} = \hat f \hat g


\displaystyle  \widehat{fg} = \hat f * \hat g

and similarly for the inverse Fourier transform. (Here, the convolution {*} is with respect to either the Haar measure {\mu_G} on the physical space {G}, or the Haar measure {\mu_{\hat G}} on the frequency space {\hat G}, as indicated by context.) For instance, if a function {f} has Fourier transform supported on {[-N,N]}, then we have

\displaystyle  \hat f = \hat f \psi_N

where {\psi_N(x) := \psi(x/N)} and {\psi} is a smooth and compactly supported (or rapidly decreasing) cutoff function that equals {1} on the interval {[-1,1]}. (There is a lot of freedom here in what cutoff function to pick, but in practice, “all bump functions are usually equivalent”; unless one is optimising constants, needs a very specific and delicate cancellation, or if one really, really needs a explicit formula, one usually does not have to think too hard regarding what specific cutoff to use, though smooth and well localised cutoffs often tend to be superior to rough or slowly decaying cutoffs.)

Inverting the Fourier transform, we obtain the reproducing formula

\displaystyle  f = f * \check \psi_N

where {\check \psi_N} is the inverse Fourier transform of {\psi_N}. One can compute that

\displaystyle  \check \psi_N(x) = N \check \psi(Nx)

and thus

\displaystyle  f(x) = \int_{\bf R} f(x+\frac{y}{N}) \check \psi(y)\ dy. \ \ \ \ \ (1)

If one chose {\psi} to be smooth and compactly supported (or at the very least, a Schwartz function), {\check \psi} will be in the Schwartz class. As such, (1) can be viewed as an assertion that the value of the band-limited function {f} at any given point {x} is essentially an average of its values at nearby points {x+\frac{y}{N}} for {y=O(1)}. This formula can already be used to give many rigorous instantiations of the uncertainty principle; see for instance these lecture notes of mine for further discussion.

Another basic method to formalise the above heuristics, particularly with regard to “oscillation causes cancellation”, is to use integration by parts; this is discussed at this Tricki article.

— 2. Projections —

The restriction {1_{[-N,N]}(X) f := f 1_{[-N,N]}} of a function {f: {\bf R} \rightarrow {\bf C}} to an interval {[-N,N]} is just the orthogonal projection (in the Hilbert space {L^2({\bf R})}) of {f} to the space of functions that are spatially supported in {[-N,N]}. Taking Fourier transforms (which, by Plancherel’s theorem, preserves the Hilbert space {L^2({\bf R})}), we see that the Fourier restriction {1_{[-N,N]}(D) f} of {f}, defined as

\displaystyle  \widehat{1_{[-N,N]}(D) f} := \hat f 1_{[-N,N]}

is the orthogonal projection of {f} to those functions with Fourier support in {[-N,N]}. As discussed above, such functions are (heuristically) those functions which are essentially constant at scales {\ll 1/N}. As such, these projection operators should behave like averaging operators at this scale. This turns out not to be that accurate of a heuristic if one uses the sharp cutoffs {1_{[-N,N]}} (though this does work perfectly in the dyadic model setting), but if one replaces the sharp cutoffs by smoother ones, then this heuristic can be justified by using convolutions as in the previous section; this leads to Littlewood-Paley theory, a cornerstone of the harmonic analysis of function spaces such as Sobolev spaces, and which are particularly important in partial differential equations; see for instance the first appendix of my PDE book for further discussion.

One can view the restriction operator {1_{[-N,N]}(X)} as the spectral projection of the position operator {X f(x) := x f(x)} to the interval {[-N,N]}; in a similar vein, one can view {1_{[-N,N]}(D)} as a spectral projection of the differentiation operator {D f(x) := \frac{1}{2\pi i} \frac{d}{dx} f(x)}.

As before, one can work with other sets than intervals here. For instance, restricting a function {f: G \rightarrow {\bf C}} to a subgroup {H} causes the Fourier transform {\hat f} to be averaged along the dual group {\hat H}. In particular, restricting a function {f: {\bf R} \rightarrow {\bf C}} to the integers (and renormalising it to become the measure {\sum_{n \in {\bf Z}} f(n) \delta_n}) causes the Fourier transform {\hat f: {\bf R} \rightarrow {\bf C}} to become summed over the dual group {{\bf Z}^\perp = {\bf Z}} to become the function {\sum_{m \in {\bf Z}} \hat f(\cdot+m)}. In particular, the zero Fourier coefficient of {\sum_{n \in {\bf Z}} f(n) \delta_n} is {\sum_{m \in {\bf Z}} \hat f(m)}, leading to the Poisson summation formula

\displaystyle  \sum_{n \in {\bf Z}} f(n) = \sum_{m \in {\bf Z}} \hat f(m).

More generally, one has

\displaystyle  \sum_{n \in R {\bf Z}} f(n) = \frac{1}{R} \sum_{m \in \frac{1}{R} {\bf Z}} \hat f(m)

for any {R > 0}, which can be viewed as a one-parameter family of identities interpolating between the inversion formula

\displaystyle  f(0) = \int_{\bf R} \hat f(\xi) \ d\xi

on one hand, and the forward Fourier transform formula

\displaystyle  \int_{\bf R} f(x)\ dx = \hat f(0)

on the other.

The duality {\Delta x \cdot \Delta \xi \gg 1} between the position variable {x} and the frequency variable {\xi} (or equivalently, between the position operator {X} and the differentiation operator {D}) can be generalised to contexts in which the two dual variables haved a different “physical” interpretation than position and frequency. One basic example of this is the duality {\Delta t \cdot \Delta E \gg 1} between a time variable {t} and an energy variable {E} in quantum mechanics. Consider a time-dependent Schrödinger equation

\displaystyle  i \partial_t \psi = H \psi; \quad \psi(0) = \psi_0 \ \ \ \ \ (2)

for some Hermitian (and time-independent) spatial operator {H} on some arbitrary domain (which does not need to be a Euclidean space {{\bf R}^n}, or even a group), where we have normalised away for now the role of Planck’s constant {\hbar}. If the underlying spatial space {L^2({\bf R})} has an orthonormal basis of eigenvector solutions to the time-independent Schrödinger equation

\displaystyle  H u_k = E_k u_k

then the solution to (2) is formally given by the formula

\displaystyle  \psi = e^{-itH} \psi_0 = \sum_k e^{-i E_k t} \langle \psi_0, u_k \rangle u_k.

We thus see that the coefficients {\psi_0, u_k \rangle} (or more precisely, the eigenvectors {\langle \psi_0, u_k \rangle u_k}) can be viewed as the Fourier coefficients of {\psi} in time, with the energies {E_k} playing the role of the frequency vector. Taking traces, one (formally) sees a similar Fourier relationship between the trace function {\hbox{tr}(e^{-itH})} and the spectrum {E_1 < E_2 < E_3 < \ldots}:

\displaystyle  \hbox{tr}(e^{-itH}) = \sum_k e^{-i E_k t}. \ \ \ \ \ (3)

As a consequence, the heuristics of the uncertainty principle carry through here. Just as the behaviour of a function {f} at scales {\ll T} largely controls the spectral behaviour of {\hat f} at scales {\gg 1/T}, one can use the evolution operator {e^{-itH}} of the Schrödinger equation up to times {|t| \leq T} to understand the spectrum {E_1 < E_2 < E_3 < \ldots} of {H} at scales {\gg 1/T}. For instance, from (3) we (formally) see that

\displaystyle  \hbox{tr}( \int_{\bf R} \eta(t/T) e^{it E_0} e^{-itH}\ dt ) = T \sum_k \hat \eta( \frac{E_k - E_0}{2\pi/T} )

for any test function {\eta} and any energy level {E_0}. Roughly speaking, this formula tells us that the number of eigenvalues in an interval of size {O(1/T)} can be more or less controlled by the Schrödinger operators up to time {T}.

A similar analysis also holds for the solution operator

\displaystyle  u(t) = \cos(t \sqrt{-\Delta}) u_0 + \frac{\sin(t\sqrt{-\Delta})}{\sqrt{-\Delta}} u_1

for the wave equation

\displaystyle  \partial_t^2 u - \Delta u = 0

on an arbitrary spatial Riemannian manifold {M} (which we will take to be compact in order to have discrete spectrum). If we write {\lambda_k} for the eigenvalues of {\sqrt{-\Delta}} (so the Laplace-Beltrami operator {\Delta} has eigenvalues {-\lambda_k^2}), then a similar analysis to the above shows that knowledge of the solution to the wave equation up to time {T} gives (at least in principle) knowledge of the spectrum averaged to at the scale {1/T} or above.

From the finite speed of propagation property of the wave equation (which has been normalised so that the speed of light {c} is equal to {1}), one only needs to know the geometry of the manifold {M} up to distance scales {T} in order to understand the wave operator up to times {T}. In particular, if {T} is less than the injectivity radius of {M}, then the topology and global geometry of {M} is largely irrelevant, and the manifold more or less behaves like (a suitably normalised version of) Euclidean space. As a consequence, one can borrow Euclidean space techniques (such as the spatial Fourier transform) to control the spectrum at coarse scales {\gg 1}, leading in particular to the Weyl law for the distribution of eigenvalues on this manifold; see for instance this book of Sogge for a rigorous discussion. It is a significant challenge to go significantly below this scale and understand the finer structure of the spectrum; by the uncertainty principle, this task is largely equivalent to that of understanding the wave equation on long time scales {T \gg 1}, and the global geometry of the manifold {M} (and in particular, the dynamical properties of the geodesic flow) must then inevitably play a more dominant role.

Another important uncertainty principle duality relationship is that between the (imaginary parts of the) zeroes {\rho} of the Riemann zeta function {\zeta(s)} and the logarithms {\log p} of the primes. Starting from the fundamental Euler product formula

\displaystyle  \zeta(s) = \prod_p (1-p^{-s})^{-1}

and using rigorous versions of the heuristic factorisation

\displaystyle  \zeta(s) \approx \prod_\rho (s-\rho)

one can soon derive explicit formulae connecting zeroes and primes, such as

\displaystyle  \sum_\rho \frac{1}{s-\rho} \approx - \sum_p \log p e^{-s \log p}

(see e.g. this blog post of mine for more discussion). Using such formulae, one can relate the zeroes of the zeta function in the strip {\{ \hbox{Im}(\rho) \leq T \}} with the distribution of the log-primes at scales {\gg 1/T}. For instance, knowing that there are no zeroes on the line segment {\{ 1+it: |t| \leq T \}} is basically equivalent to a partial prime number theorem {\pi(x) = (1+O(\frac{1}{T})) \frac{x}{\log x}}; letting {T \rightarrow \infty}, we see that the full prime number theorem is equivalent to the absence of zeroes on the entire line {\{1+it: t \in {\bf R} \}}. More generally, there is a fairly well-understood dictionary between the distribution of zeroes and the distribution of primes, which is explored in just about any advanced text in analytic number theory.

— 3. Phase space and the semi-classical limit —

The above heuristic description of Fourier projections such as {1_{[-N,N]}(x)} suggest that a Fourier projection {1_J(D)} will approximately commute with a spatial projection {1_I(X)} whenever {I}, {J} are intervals that obey the Heisenberg inequality {|I| |J| \gg 1}. Again, this heuristic is not quite accurate if one uses sharp cutoffs (except in the dyadic model), but becomes quite valid if one uses smooth cutoffs. As such, one can morally talk about phase space projections {1_{I \times J}(X,D) \approx 1_I(X) 1_J(D) \approx 1_J(D) 1_I(X)} to rectangles {I \times J} in phase space, so long as these rectangles are large enough not to be in violation of the uncertainty principle.

Heuristically, {1_{I \times J}(X,D)} is an orthogonal projection to the space of functions that are localised to {I} in physical space and to {J} in frequency space. (This is morally a vector space, but unfortunately this is not rigorous due to the inability to perfectly localise in both physical space and frequency space simultaneously, thanks to the Hardy uncertainty principle.) One can approximately compute the dimension of this not-quite-vector-space by computing the trace of the projection. Recalling that the trace of an integral operator {Tf(x) := \int_{\bf R} K(x,y) f(y)\ dy} is given by {\hbox{tr} T = \int_{\bf R} K(x,x)}, a short computation reveals that the trace of {1_I(X) 1_J(D)} is

\displaystyle  \int_I \check 1_J(0)\ dx = |I| |J|.

Thus we conclude that the phase space region {I \times J} contains approximately {|I| |J|} degrees of freedom in it, which can be viewed as a “macroscopic” version of the uncertainty principle.

More generally, the number of degrees of freedom contained in a large region {\Omega \subset {\bf R} \times {\bf R}} of phase space is proportional to its area. Among other things, this can be used to justify the Weyl law for the distribution of eigenvalues of various operators. For instance, if {H} is the Schrödinger operator

\displaystyle  H = - \hbar^2 \frac{d^2}{dx^2} + V(x) = \hbar^2 D^2 + V(X),

where {\hbar > 0} is a small constant (which physically can be interpreted as Planck’s constant), and {V} is a confining potential (to ensure discreteness of the spectrum), then the spectral projection {1_{[-\infty,E]}(H)}, when spectrally projected to energy levels below a given threshold {E}, is morally like a phase space projection to the region {\Omega := \{ (\xi,x): \hbar^2 \xi^2 + V(x) \leq E \}}. As such, the number of eigenvalues of {H} less than {E} should roughly equal the area of {\Omega}, particularly when {\hbar} is small (so that {\Omega} becomes large, and the uncertainty principle no longer dominates); note that if {V} is a confining potential (such as the harmonic potential {V(x) = |x|^2}) then {\Omega} will have finite area. Such heuristics can be justified by the machinery of semi-classical analysis and the pseudo-differential calculus, which we will not detail here.

The correspondence principle in quantum mechanics asserts that in the limit {\hbar \rightarrow 0}, quantum mechanics asymptotically converges (in some suitable sense) to classical mechanics. There are several ways to make this precise. One can work in a dual formulation, using algebras of observables rather than dealing with physical states directly, in which case the point is that the non-commutative operator algebras of quantum observables converge in various operator topologies to the commutative operator algebras of classical observables in the limit {\hbar \rightarrow 0}. This is the most common way that the correspondence principle is formulated; but one can also work directly using states. We illustrate this with the time-dependent Schrödinger equation

\displaystyle  i \hbar \partial_t \psi = - \frac{\hbar^2}{2m} \partial_{xx} \psi + V(x) \psi \ \ \ \ \ (4)

with a potential {V}, where {m > 0} is a fixed constant (representing mass) and {\hbar > 0} is a small constant, or equivalently

\displaystyle  i \hbar \partial_t \psi = (\frac{P^2}{2m} + V(X)) \psi

where {X} is the position operator {X f(x) := x f(x)} and {P} is the momentum operator {P f(x) := -i \hbar \frac{d}{dx} f(x)} (thus {P = i \hbar D}). The classical counterpart to this equation is Newton’s second law

\displaystyle 	F = ma;

where {a = \frac{d^2 x}{dt^2}} and {F = - \partial_x V(x)}; introducing the momentum {p := mv = m \frac{dx}{dt}}, one can rewrite Newton’s second law as Hamilton’s equations of motion

\displaystyle  \partial_t p = -\partial_x V(x); \quad \partial_t x = \frac{1}{m} p. \ \ \ \ \ (5)

We now indicate (heuristically, at least) how (4) converges to (5) as {\hbar \rightarrow 0}. According to de Broglie’s law {p = 2\pi \hbar \xi}, the momentum {p} should be proportional to the frequency {\xi}. Accordingly, consider a wave function {\psi} that at time {t} is concentrated near position {x_0(t)} and momentum {p_0(t)}, and thus near frequency {p_0(t)/(2\pi\hbar)}; heuristically one can view {\psi} as having the shape

\displaystyle  \psi(t,x) = A(t,\frac{x-x_0(t)}{r}) e^{i p_0(t) x / \hbar} e^{i \theta(t) / \hbar}

where {\theta(t)} is some phase, {r} is some spatial scale (between {1} and {\hbar}) and {A} is some amplitude function. Informally, we have {X \approx x_0(t)} and {P \approx p_0(t)} for {\psi}.

Before we analyse the equation (4), we first look at some simpler equations. First, we look at

\displaystyle  i \hbar \partial_t \psi = E \psi

where {E} is a real scalar constant. Then the evolution of this equation is given by a simple phase rotation:

\displaystyle  \psi(t,x) = e^{-i Et/\hbar} \psi(0,x).

This phase rotation does not change the location {x_0(t)} or momentum {p_0(t)} of the wave:

\displaystyle  \partial_t x_0(t) = 0; \quad \partial_t p_0(t) = 0.

Next, we look at the transport equation

\displaystyle  i \hbar \partial_t \psi = - i \hbar v \partial_x \psi

where {v \in {\bf R}} is another constant This evolution is given by translation:

\displaystyle  \psi(t,x) = \psi(0,x-vt);

the position {x_0(t)} of this evolution moves at the constant speed of {v}, but the momentum is unchanged:

\displaystyle  \partial_t x_0(t) = v; \quad \partial_t p_0(t) = 0.

Combining the two, we see that an equation of the form

\displaystyle  i \hbar \partial_t \psi = E \psi - i \hbar v (\partial_x - i p_0(t) / \hbar) \psi

would also transport the position {x_0} at a constant speed of {v}, without changing the momentum. Next, we consider the modulation equation

\displaystyle  i \hbar \partial_t \psi = F x \psi

where {F \in {\bf R}} is yet another constant. This equation is solved by the formula

\displaystyle  \psi(t,x) = e^{i t F x / \hbar} \psi(0,x);

this phase modulation does not change the position {x_0(t)}, but steadily increases the momentum {p_0(t)} at a rate of {F}:

\displaystyle  \partial_t x_0(t) = 0; \quad \partial_t p_0(t) = F.

Finally, we combine all these equations together, looking at the combined equation

\displaystyle  i \hbar \partial_t \psi = E \psi - i \hbar v (\partial_x - i p_0(t) / \hbar) \psi + F (x-x_0(t)) \psi.

Heuristically at least, the position {x_0(t)} and momentum {p_0(t)} of solutions to this equation should evolve according to the law

\displaystyle  \partial_t x_0(t) = v; \quad \partial_t p_0(t) = F. \ \ \ \ \ (6)

We remark that one can make the above quite rigorous by using the metaplectic representation.

This analysis was for {v, F} constant, but as all statements here are instantaneous and first-order in time, it also applies for time-dependent {v, F}.

Now we return to the Schrödinger equation (4). If {\psi} is localised in space close to {x_0(t)}, then by Taylor expansion we may linearise the {V(x)} component as

\displaystyle  V(x) = V(x_0(t)) + (x-x_0(t)) \partial_x V(x).

Similarly, if {\psi} is localised in momentum close to {p_0(t)}, then in frequency it is localised close to {p_0(t)/(2\pi \hbar)}, so that {\partial_x \approx i p_0(t)/\hbar}, and so we have a Taylor expansion

\displaystyle  \partial_{xx} \approx (i p_0(t)/\hbar)^2 + 2 (i p_0(t)/\hbar) (\partial_x - (i p_0(t)/\hbar)).

These Taylor expansions become increasingly accurate in the limit {\hbar \rightarrow 0}, assuming suitable localisation in both space and momentum. Inserting these approximations and simplifying, one arrives at

\displaystyle  \partial_t \psi = \frac{E(t)}{i\hbar} \psi - \frac{p_0(t)}{m} (\partial_x - (i p_0(t)/\hbar)) \psi - \frac{i}{\hbar} (x-x_0(t)) \partial_x V(x_0(t)) \psi

where {E(t) := \frac{p_0(t)^2}{2m} + V(x_0(t))} is the classical energy of the state. Using the heuristics (6) we are led to (5) as desired.

More generally, a Schrödinger equation

\displaystyle  i \hbar \partial_t \psi = H( X, P ) \psi

where {P := -i \hbar \frac{d}{dx}} is the momentum operator, and being vague about exactly what a function {H(X,P)} of two non-commuting operators {X, P} means, can be (heuristically) approximately Taylor expanded as

\displaystyle  i \hbar \partial_t \psi = H(x_0(t),p_0(t)) + \frac{\partial H}{\partial p} H( x_0(t), p_0(t) ) ( P - p_0(t) ) \psi + \frac{\partial H}{\partial x} H( x_0(t), p_0(t) ) ( X - x_0(t) ) \psi

and (6) leads us to the Hamilton equations of motion

\displaystyle  \partial_t x(t) = \frac{\partial H}{\partial p}; \quad \partial_t p(t) = - \frac{\partial H}{\partial x}.

It turns out that these heuristic computations can be made completely rigorous in the semi-classical limit {\hbar \rightarrow 0}, by using the machinery of pseudodifferential calculus, but we will not detail this here.