You are currently browsing the tag archive for the ‘uncertainty principle’ tag.

One of the most fundamental principles in Fourier analysis is the uncertainty principle. It does not have a single canonical formulation, but one typical informal description of the principle is that if a function ${f}$ is restricted to a narrow region of physical space, then its Fourier transform ${\hat f}$ must be necessarily “smeared out” over a broad region of frequency space. Some versions of the uncertainty principle are discussed in this previous blog post.

In this post I would like to highlight a useful instance of the uncertainty principle, due to Hugh Montgomery, which is useful in analytic number theory contexts. Specifically, suppose we are given a complex-valued function ${f: {\bf Z} \rightarrow {\bf C}}$ on the integers. To avoid irrelevant issues at spatial infinity, we will assume that the support ${\hbox{supp}(f) := \{ n \in {\bf Z}: f(n) \neq 0 \}}$ of this function is finite (in practice, we will only work with functions that are supported in an interval ${[M+1,M+N]}$ for some natural numbers ${M,N}$). Then we can define the Fourier transform ${\hat f: {\bf R}/{\bf Z} \rightarrow {\bf C}}$ by the formula

$\displaystyle \hat f(\xi) := \sum_{n \in {\bf Z}} f(n) e(-n\xi)$

where ${e(x) := e^{2\pi i x}}$. (In some literature, the sign in the exponential phase is reversed, but this will make no substantial difference to the arguments below.)

The classical uncertainty principle, in this context, asserts that if ${f}$ is localised in an interval of length ${N}$, then ${\hat f}$ must be “smeared out” at a scale of at least ${1/N}$ (and essentially constant at scales less than ${1/N}$). For instance, if ${f}$ is supported in ${[M+1,M+N]}$, then we have the Plancherel identity

$\displaystyle \int_{{\bf R}/{\bf Z}} |\hat f(\xi)|^2\ d\xi = \| f \|_{\ell^2({\bf Z})}^2 \ \ \ \ \ (1)$

while from the Cauchy-Schwarz inequality we have

$\displaystyle |\hat f(\xi)| \leq N^{1/2} \| f \|_{\ell^2({\bf Z})}$

for each frequency ${\xi}$, and in particular that

$\displaystyle \int_I |\hat f(\xi)|^2\ d\xi \leq N |I| \| f \|_{\ell^2({\bf Z})}^2$

for any arc ${I}$ in the unit circle (with ${|I|}$ denoting the length of ${I}$). In particular, an interval of length significantly less than ${1/N}$ can only capture a fraction of the ${L^2}$ energy of the Fourier transform of ${\hat f}$, which is consistent with the above informal statement of the uncertainty principle.

Another manifestation of the classical uncertainty principle is the large sieve inequality. A particularly nice formulation of this inequality is due independently to Montgomery and Vaughan and Selberg: if ${f}$ is supported in ${[M+1,M+N]}$, and ${\xi_1,\ldots,\xi_J}$ are frequencies in ${{\bf R}/{\bf Z}}$ that are ${\delta}$-separated for some ${\delta>0}$, thus ${\| \xi_i-\xi_j\|_{{\bf R}/{\bf Z}} \geq \delta}$ for all ${1 \leq i (where ${\|\xi\|_{{\bf R}/{\bf Z}}}$ denotes the distance of ${\xi}$ to the origin in ${{\bf R}/{\bf Z}}$), then

$\displaystyle \sum_{j=1}^J |\hat f(\xi_j)|^2 \leq (N + \frac{1}{\delta}) \| f \|_{\ell^2({\bf Z})}^2. \ \ \ \ \ (2)$

The reader is encouraged to see how this inequality is consistent with the Plancherel identity (1) and the intuition that ${\hat f}$ is essentially constant at scales less than ${1/N}$. The factor ${N + \frac{1}{\delta}}$ can in fact be amplified a little bit to ${N + \frac{1}{\delta} - 1}$, which is essentially optimal, by using a neat dilation trick of Paul Cohen, in which one dilates ${[M+1,M+N]}$ to ${[MK+K, MK+NK]}$ (and replaces each frequency ${\xi_j}$ by their ${K^{th}}$ roots), and then sending ${K \rightarrow \infty}$ (cf. the tensor product trick); see this survey of Montgomery for details. But we will not need this refinement here.

In the above instances of the uncertainty principle, the concept of narrow support in physical space was formalised in the Archimedean sense, using the standard Archimedean metric ${d_\infty(n,m) := |n-m|}$ on the integers ${{\bf Z}}$ (in particular, the parameter ${N}$ is essentially the Archimedean diameter of the support of ${f}$). However, in number theory, the Archimedean metric is not the only metric of importance on the integers; the ${p}$-adic metrics play an equally important role; indeed, it is common to unify the Archimedean and ${p}$-adic perspectives together into a unified adelic perspective. In the ${p}$-adic world, the metric balls are no longer intervals, but are instead residue classes modulo some power of ${p}$. Intersecting these balls from different ${p}$-adic metrics together, we obtain residue classes with respect to various moduli ${q}$ (which may be either prime or composite). As such, another natural manifestation of the concept of “narrow support in physical space” is “vanishes on many residue classes modulo ${q}$“. This notion of narrowness is particularly common in sieve theory, when one deals with functions supported on thin sets such as the primes, which naturally tend to avoid many residue classes (particularly if one throws away the first few primes).

In this context, the uncertainty principle is this: the more residue classes modulo ${q}$ that ${f}$ avoids, the more the Fourier transform ${\hat f}$ must spread out along multiples of ${1/q}$. To illustrate a very simple example of this principle, let us take ${q=2}$, and suppose that ${f}$ is supported only on odd numbers (thus it completely avoids the residue class ${0 \mod 2}$). We write out the formulae for ${\hat f(\xi)}$ and ${\hat f(\xi+1/2)}$:

$\displaystyle \hat f(\xi) = \sum_n f(n) e(-n\xi)$

$\displaystyle \hat f(\xi+1/2) = \sum_n f(n) e(-n\xi) e(-n/2).$

If ${f}$ is supported on the odd numbers, then ${e(-n/2)}$ is always equal to ${-1}$ on the support of ${f}$, and so we have ${\hat f(\xi+1/2)=-\hat f(\xi)}$. Thus, whenever ${\hat f}$ has a significant presence at a frequency ${\xi}$, it also must have an equally significant presence at the frequency ${\xi+1/2}$; there is a spreading out across multiples of ${1/2}$. Note that one has a similar effect if ${f}$ was supported instead on the even integers instead of the odd integers.

A little more generally, suppose now that ${f}$ avoids a single residue class modulo a prime ${p}$; for sake of argument let us say that it avoids the zero residue class ${0 \mod p}$, although the situation for the other residue classes is similar. For any frequency ${\xi}$ and any ${j=0,\ldots,p-1}$, one has

$\displaystyle \hat f(\xi+j/p) = \sum_n f(n) e(-n\xi) e(-jn/p).$

From basic Fourier analysis, we know that the phases ${e(-jn/p)}$ sum to zero as ${j}$ ranges from ${0}$ to ${p-1}$ whenever ${n}$ is not a multiple of ${p}$. We thus have

$\displaystyle \sum_{j=0}^{p-1} \hat f(\xi+j/p) = 0.$

In particular, if ${\hat f(\xi)}$ is large, then one of the other ${\hat f(\xi+j/p)}$ has to be somewhat large as well; using the Cauchy-Schwarz inequality, we can quantify this assertion in an ${\ell^2}$ sense via the inequality

$\displaystyle \sum_{j=1}^{p-1} |\hat f(\xi+j/p)|^2 \geq \frac{1}{p-1} |\hat f(\xi)|^2. \ \ \ \ \ (3)$

Let us continue this analysis a bit further. Now suppose that ${f}$ avoids ${\omega(p)}$ residue classes modulo a prime ${p}$, for some ${0 \leq \omega(p) < p}$. (We exclude the case ${\omega(p)=p}$ as it is clearly degenerates by forcing ${f}$ to be identically zero.) Let ${\nu(n)}$ be the function that equals ${1}$ on these residue classes and zero away from these residue classes, then

$\displaystyle \sum_n f(n) e(-n\xi) \nu(n) = 0.$

Using the periodic Fourier transform, we can write

$\displaystyle \nu(n) = \sum_{j=0}^{p-1} c(j) e(-jn/p)$

for some coefficients ${c(j)}$, thus

$\displaystyle \sum_{j=0}^{p-1} \hat f(\xi+j/p) c(j) = 0.$

Some Fourier-analytic computations reveal that

$\displaystyle c(0) = \frac{\omega(p)}{p}$

and

$\displaystyle \sum_{j=0}^{p-1} |c(j)|^2 = \frac{\omega(p)}{p}$

and so after some routine algebra and the Cauchy-Schwarz inequality, we obtain a generalisation of (3):

$\displaystyle \sum_{j=1}^{p-1} |\hat f(\xi+j/p)|^2 \geq \frac{\omega(p)}{p-\omega(p)} |\hat f(\xi)|^2.$

Thus we see that the more residue classes mod ${p}$ we exclude, the more Fourier energy has to disperse along multiples of ${1/p}$. It is also instructive to consider the extreme case ${\omega(p)=p-1}$, in which ${f}$ is supported on just a single residue class ${a \mod p}$; in this case, one clearly has ${\hat f(\xi+j/p) = e(-aj/p) \hat f(\xi)}$, and so ${\hat f}$ spreads its energy completely evenly along multiples of ${1/p}$.

In 1968, Montgomery observed the following useful generalisation of the above calculation to arbitrary modulus:

Proposition 1 (Montgomery’s uncertainty principle) Let ${f: {\bf Z} \rightarrow{\bf C}}$ be a finitely supported function which, for each prime ${p}$, avoids ${\omega(p)}$ residue classes modulo ${p}$ for some ${0 \leq \omega(p) < p}$. Then for each natural number ${q}$, one has

$\displaystyle \sum_{1 \leq a \leq q: (a,q)=1} |\hat f(\xi+\frac{a}{q})|^2 \geq h(q) |\hat f(\xi)|^2$

where ${h(q)}$ is the quantity

$\displaystyle h(q) := \mu(q)^2 \prod_{p|q} \frac{\omega(p)}{p-\omega(p)} \ \ \ \ \ (4)$

where ${\mu}$ is the Möbius function.

We give a proof of this proposition below the fold.

Following the “adelic” philosophy, it is natural to combine this uncertainty principle with the large sieve inequality to take simultaneous advantage of localisation both in the Archimedean sense and in the ${p}$-adic senses. This leads to the following corollary:

Corollary 2 (Arithmetic large sieve inequality) Let ${f: {\bf Z} \rightarrow {\bf C}}$ be a function supported on an interval ${[M+1,M+N]}$ which, for each prime ${p}$, avoids ${\omega(p)}$ residue classes modulo ${p}$ for some ${0 \leq \omega(p) < p}$. Let ${\xi_1,\ldots,\xi_J \in {\bf R}/{\bf Z}}$, and let ${{\mathcal Q}}$ be a finite set of natural numbers. Suppose that the frequencies ${\xi_j + \frac{a}{q}}$ with ${1 \leq j \leq J}$, ${q \in {\mathcal Q}}$, and ${(a,q)=1}$ are ${\delta}$-separated. Then one has

$\displaystyle \sum_{j=1}^J |\hat f(\xi_j)|^2 \leq \frac{N + \frac{1}{\delta}}{\sum_{q \in {\mathcal Q}} h(q)} \| f \|_{\ell^2({\bf Z})}^2$

where ${h(q)}$ was defined in (4).

Indeed, from the large sieve inequality one has

$\displaystyle \sum_{j=1}^J \sum_{q \in {\mathcal Q}} \sum_{1 \leq a \leq q: (a,q)=1} |\hat f(\xi_j+\frac{a}{q})|^2 \leq (N + \frac{1}{\delta}) \| f \|_{\ell^2({\bf Z})}^2$

while from Proposition 1 one has

$\displaystyle \sum_{q \in {\mathcal Q}} \sum_{1 \leq a \leq q: (a,q)=1} |\hat f(\xi_j+\frac{a}{q})|^2 \geq (\sum_{q \in {\mathcal Q}} h(q)) |\hat f(\xi_j)|^2$

whence the claim.

There is a great deal of flexibility in the above inequality, due to the ability to select the set ${{\mathcal Q}}$, the frequencies ${\xi_1,\ldots,\xi_J}$, the omitted classes ${\omega(p)}$, and the separation parameter ${\delta}$. Here is a typical application concerning the original motivation for the large sieve inequality, namely in bounding the size of sets which avoid many residue classes:

Corollary 3 (Large sieve) Let ${A}$ be a set of integers contained in ${[M+1,M+N]}$ which avoids ${\omega(p)}$ residue classes modulo ${p}$ for each prime ${p}$, and let ${R>0}$. Then

$\displaystyle |A| \leq \frac{N+R^2}{G(R)}$

where

$\displaystyle G(R) := \sum_{q \leq R} h(q).$

Proof: We apply Corollary 2 with ${f = 1_A}$, ${J=1}$, ${\delta=1/R^2}$, ${\xi_1=0}$, and ${{\mathcal Q} := \{ q: q \leq R\}}$. The key point is that the Farey sequence of fractions ${a/q}$ with ${q \leq R}$ and ${(a,q)=1}$ is ${1/R^2}$-separated, since

$\displaystyle \| \frac{a}{q}-\frac{a'}{q'} \|_{{\bf R}/{\bf Z}} \geq \frac{1}{qq'} \geq \frac{1}{R^2}$

whenever ${a/q, a'/q'}$ are distinct fractions in this sequence. $\Box$

If, for instance, ${A}$ is the set of all primes in ${[M+1,M+N]}$ larger than ${R}$, then one can set ${\omega(p)=1}$ for all ${p \leq R}$, which makes ${h(q) = \frac{\mu^2(q)}{\phi(q)}}$, where ${\phi}$ is the Euler totient function. It is a classical estimate that

$\displaystyle G(R) = \log R + O(1).$

Using this fact and optimising in ${R}$, we obtain (a special case of) the Brun-Titchmarsh inequality

$\displaystyle \pi(M+N)-\pi(M) \leq (2+o_{N \rightarrow \infty}(1)) \frac{N}{\log N}$

where ${\pi(x)}$ is the prime counting function; a variant of the same argument gives the more general Brun-Titchmarsh inequality

$\displaystyle \pi(M+N;a,q)-\pi(M;a,q) \leq (2+o_{N \rightarrow \infty;q}(1)) \frac{q}{\phi(q)} \frac{N}{\log N}$

for any primitive residue class ${a \mod q}$, where ${\pi(M;a,q)}$ is the number of primes less than or equal to ${M}$ that are congruent to ${a \mod q}$. By performing a more careful optimisation using a slightly sharper version of the large sieve inequality (2) that exploits the irregular spacing of the Farey sequence, Montgomery and Vaughan were able to delete the error term in the Brun-Titchmarsh inequality, thus establishing the very nice inequality

$\displaystyle \pi(M+N;a,q)-\pi(M;a,q) \leq 2 \frac{q}{\phi(q)} \frac{N}{\log N}$

for any natural numbers ${M,N,a,q}$ with ${N>1}$. This is a particularly useful inequality in non-asymptotic analytic number theory (when one wishes to study number theory at explicit orders of magnitude, rather than the number theory of sufficiently large numbers), due to the absence of asymptotic notation.

I recently realised that Corollary 2 also establishes a stronger version of the “restriction theorem for the Selberg sieve” that Ben Green and I proved some years ago (indeed, one can view Corollary 2 as a “restriction theorem for the large sieve”). I’m placing the details below the fold.

A recurring theme in mathematics is that of duality: a mathematical object ${X}$ can either be described internally (or in physical space, or locally), by describing what ${X}$ physically consists of (or what kind of maps exist into ${X}$), or externally (or in frequency space, or globally), by describing what ${X}$ globally interacts or resonates with (or what kind of maps exist out of ${X}$). These two fundamentally opposed perspectives on the object ${X}$ are often dual to each other in various ways: performing an operation on ${X}$ may transform it one way in physical space, but in a dual way in frequency space, with the frequency space description often being a “inversion” of the physical space description. In several important cases, one is fortunate enough to have some sort of fundamental theorem connecting the internal and external perspectives. Here are some (closely inter-related) examples of this perspective:

1. Vector space duality A vector space ${V}$ over a field ${F}$ can be described either by the set of vectors inside ${V}$, or dually by the set of linear functionals ${\lambda: V \rightarrow F}$ from ${V}$ to the field ${F}$ (or equivalently, the set of vectors inside the dual space ${V^*}$). (If one is working in the category of topological vector spaces, one would work instead with continuous linear functionals; and so forth.) A fundamental connection between the two is given by the Hahn-Banach theorem (and its relatives).
2. Vector subspace duality In a similar spirit, a subspace ${W}$ of ${V}$ can be described either by listing a basis or a spanning set, or dually by a list of linear functionals that cut out that subspace (i.e. a spanning set for the orthogonal complement ${W^\perp := \{ \lambda \in V^*: \lambda(w)=0 \hbox{ for all } w \in W \})}$. Again, the Hahn-Banach theorem provides a fundamental connection between the two perspectives.
3. Convex duality More generally, a (closed, bounded) convex body ${K}$ in a vector space ${V}$ can be described either by listing a set of (extreme) points whose convex hull is ${K}$, or else by listing a set of (irreducible) linear inequalities that cut out ${K}$. The fundamental connection between the two is given by the Farkas lemma.
4. Ideal-variety duality In a slightly different direction, an algebraic variety ${V}$ in an affine space ${A^n}$ can be viewed either “in physical space” or “internally” as a collection of points in ${V}$, or else “in frequency space” or “externally” as a collection of polynomials on ${A^n}$ whose simultaneous zero locus cuts out ${V}$. The fundamental connection between the two perspectives is given by the nullstellensatz, which then leads to many of the basic fundamental theorems in classical algebraic geometry.
5. Hilbert space duality An element ${v}$ in a Hilbert space ${H}$ can either be thought of in physical space as a vector in that space, or in momentum space as a covector ${w \mapsto \langle v, w \rangle}$ on that space. The fundamental connection between the two is given by the Riesz representation theorem for Hilbert spaces.
6. Semantic-syntactic duality Much more generally still, a mathematical theory can either be described internally or syntactically via its axioms and theorems, or externally or semantically via its models. The fundamental connection between the two perspectives is given by the Gödel completeness theorem.
7. Intrinsic-extrinsic duality A (Riemannian) manifold ${M}$ can either be viewed intrinsically (using only concepts that do not require an ambient space, such as the Levi-Civita connection), or extrinsically, for instance as the level set of some defining function in an ambient space. Some important connections between the two perspectives includes the Nash embedding theorem and the theorema egregium.
8. Group duality A group ${G}$ can be described either via presentations (lists of generators, together with relations between them) or representations (realisations of that group in some more concrete group of transformations). A fundamental connection between the two is Cayley’s theorem. Unfortunately, in general it is difficult to build upon this connection (except in special cases, such as the abelian case), and one cannot always pass effortlessly from one perspective to the other.
9. Pontryagin group duality A (locally compact Hausdorff) abelian group ${G}$ can be described either by listing its elements ${g \in G}$, or by listing the characters ${\chi: G \rightarrow {\bf R}/{\bf Z}}$ (i.e. continuous homomorphisms from ${G}$ to the unit circle, or equivalently elements of ${\hat G}$). The connection between the two is the focus of abstract harmonic analysis.
10. Pontryagin subgroup duality A subgroup ${H}$ of a locally compact abelian group ${G}$ can be described either by generators in ${H}$, or generators in the orthogonal complement ${H^\perp := \{ \xi \in \hat G: \xi \cdot h = 0 \hbox{ for all } h \in H \}}$. One of the fundamental connections between the two is the Poisson summation formula.
11. Fourier duality A (sufficiently nice) function ${f: G \rightarrow {\bf C}}$ on a locally compact abelian group ${G}$ (equipped with a Haar measure ${\mu}$) can either be described in physical space (by its values ${f(x)}$ at each element ${x}$ of ${G}$) or in frequency space (by the values ${\hat f(\xi) = \int_G f(x) e( - \xi \cdot x )\ d\mu(x)}$ at elements ${\xi}$ of the Pontryagin dual ${\hat G}$). The fundamental connection between the two is the Fourier inversion formula.
12. The uncertainty principle The behaviour of a function ${f}$ at physical scales above (resp. below) a certain scale ${R}$ is almost completely controlled by the behaviour of its Fourier transform ${\hat f}$ at frequency scales below (resp. above) the dual scale ${1/R}$ and vice versa, thanks to various mathematical manifestations of the uncertainty principle. (The Poisson summation formula can also be viewed as a variant of this principle, using subgroups instead of scales.)
13. Stone/Gelfand duality A (locally compact Hausdorff) topological space ${X}$ can be viewed in physical space (as a collection of points), or dually, via the ${C^*}$ algebra ${C(X)}$ of continuous complex-valued functions on that space, or (in the case when ${X}$ is compact and totally disconnected) via the boolean algebra of clopen sets (or equivalently, the idempotents of ${C(X)}$). The fundamental connection between the two is given by the Stone representation theorem or the (commutative) Gelfand-Naimark theorem.

I have discussed a fair number of these examples in previous blog posts (indeed, most of the links above are to my own blog). In this post, I would like to discuss the uncertainty principle, that describes the dual relationship between physical space and frequency space. There are various concrete formalisations of this principle, most famously the Heisenberg uncertainty principle and the Hardy uncertainty principle – but in many situations, it is the heuristic formulation of the principle that is more useful and insightful than any particular rigorous theorem that attempts to capture that principle. Unfortunately, it is a bit tricky to formulate this heuristic in a succinct way that covers all the various applications of that principle; the Heisenberg inequality ${\Delta x \cdot \Delta \xi \gtrsim 1}$ is a good start, but it only captures a portion of what the principle tells us. Consider for instance the following (deliberately vague) statements, each of which can be viewed (heuristically, at least) as a manifestation of the uncertainty principle:

1. A function which is band-limited (restricted to low frequencies) is featureless and smooth at fine scales, but can be oscillatory (i.e. containing plenty of cancellation) at coarse scales. Conversely, a function which is smooth at fine scales will be almost entirely restricted to low frequencies.
2. A function which is restricted to high frequencies is oscillatory at fine scales, but is negligible at coarse scales. Conversely, a function which is oscillatory at fine scales will be almost entirely restricted to high frequencies.
3. Projecting a function to low frequencies corresponds to averaging out (or spreading out) that function at fine scales, leaving only the coarse scale behaviour.
4. Projecting a frequency to high frequencies corresponds to removing the averaged coarse scale behaviour, leaving only the fine scale oscillation.
5. The number of degrees of freedom of a function is bounded by the product of its spatial uncertainty and its frequency uncertainty (or more generally, by the volume of the phase space uncertainty). In particular, there are not enough degrees of freedom for a non-trivial function to be simulatenously localised to both very fine scales and very low frequencies.
6. To control the coarse scale (or global) averaged behaviour of a function, one essentially only needs to know the low frequency components of the function (and vice versa).
7. To control the fine scale (or local) oscillation of a function, one only needs to know the high frequency components of the function (and vice versa).
8. Localising a function to a region of physical space will cause its Fourier transform (or inverse Fourier transform) to resemble a plane wave on every dual region of frequency space.
9. Averaging a function along certain spatial directions or at certain scales will cause the Fourier transform to become localised to the dual directions and scales. The smoother the averaging, the sharper the localisation.
10. The smoother a function is, the more rapidly decreasing its Fourier transform (or inverse Fourier transform) is (and vice versa).
11. If a function is smooth or almost constant in certain directions or at certain scales, then its Fourier transform (or inverse Fourier transform) will decay away from the dual directions or beyond the dual scales.
12. If a function has a singularity spanning certain directions or certain scales, then its Fourier transform (or inverse Fourier transform) will decay slowly along the dual directions or within the dual scales.
13. Localisation operations in position approximately commute with localisation operations in frequency so long as the product of the spatial uncertainty and the frequency uncertainty is significantly larger than one.
14. In the high frequency (or large scale) limit, position and frequency asymptotically behave like a pair of classical observables, and partial differential equations asymptotically behave like classical ordinary differential equations. At lower frequencies (or finer scales), the former becomes a “quantum mechanical perturbation” of the latter, with the strength of the quantum effects increasing as one moves to increasingly lower frequencies and finer spatial scales.
15. Etc., etc.
16. Almost all of the above statements generalise to other locally compact abelian groups than ${{\bf R}}$ or ${{\bf R}^n}$, in which the concept of a direction or scale is replaced by that of a subgroup or an approximate subgroup. (In particular, as we will see below, the Poisson summation formula can be viewed as another manifestation of the uncertainty principle.)

I think of all of the above (closely related) assertions as being instances of “the uncertainty principle”, but it seems difficult to combine them all into a single unified assertion, even at the heuristic level; they seem to be better arranged as a cloud of tightly interconnected assertions, each of which is reinforced by several of the others. The famous inequality ${\Delta x \cdot \Delta \xi \gtrsim 1}$ is at the centre of this cloud, but is by no means the only aspect of it.

The uncertainty principle (as interpreted in the above broad sense) is one of the most fundamental principles in harmonic analysis (and more specifically, to the subfield of time-frequency analysis), second only to the Fourier inversion formula (and more generally, Plancherel’s theorem) in importance; understanding this principle is a key piece of intuition in the subject that one has to internalise before one can really get to grips with this subject (and also with closely related subjects, such as semi-classical analysis and microlocal analysis). Like many fundamental results in mathematics, the principle is not actually that difficult to understand, once one sees how it works; and when one needs to use it rigorously, it is usually not too difficult to improvise a suitable formalisation of the principle for the occasion. But, given how vague this principle is, it is difficult to present this principle in a traditional “theorem-proof-remark” manner. Even in the more informal format of a blog post, I was surprised by how challenging it was to describe my own understanding of this piece of mathematics in a linear fashion, despite (or perhaps because of) it being one of the most central and basic conceptual tools in my own personal mathematical toolbox. In the end, I chose to give below a cloud of interrelated discussions about this principle rather than a linear development of the theory, as this seemed to more closely align with the nature of this principle.

[This post was typeset using a LaTeX to WordPress-HTML converter kindly provided to me by Luca Trevisan.]

Many properties of a (sufficiently nice) function ${f: {\mathbb R} \rightarrow {\mathbb C}}$ are reflected in its Fourier transform ${\hat f: {\mathbb R} \rightarrow {\mathbb C}}$, defined by the formula

$\displaystyle \hat f(\xi) := \int_{-\infty}^\infty f(x) e^{-2\pi i x \xi}\ dx. \ \ \ \ \ (1)$

For instance, decay properties of ${f}$ are reflected in smoothness properties of ${\hat f}$, as the following table shows:

 If ${f}$ is… then ${\hat f}$ is… and this relates to… Square-integrable square-integrable Plancherel’s theorem Absolutely integrable continuous Riemann-Lebesgue lemma Rapidly decreasing smooth theory of Schwartz functions Exponentially decreasing analytic in a strip Compactly supported entire and at most exponential growth Paley-Wiener theorem

Another important relationship between a function ${f}$ and its Fourier transform ${\hat f}$ is the uncertainty principle, which roughly asserts that if a function ${f}$ is highly localised in space, then its Fourier transform ${\hat f}$ must be widely dispersed in space, or to put it another way, ${f}$ and ${\hat f}$ cannot both decay too strongly at infinity (except of course in the degenerate case ${f=0}$). There are many ways to make this intuition precise. One of them is the Heisenberg uncertainty principle, which asserts that if we normalise

$\displaystyle \int_{{\mathbb R}} |f(x)|^2\ dx = \int_{\mathbb R} |\hat f(\xi)|^2\ d\xi = 1$

then we must have

$\displaystyle (\int_{\mathbb R} |x|^2 |f(x)|^2\ dx) \cdot (\int_{\mathbb R} |\xi|^2 |\hat f(\xi)|^2\ dx)\geq \frac{1}{(4\pi)^2}$

thus forcing at least one of ${f}$ or ${\hat f}$ to not be too concentrated near the origin. This principle can be proven (for sufficiently nice ${f}$, initially) by observing the integration by parts identity

$\displaystyle \langle xf, f' \rangle = \int_{\mathbb R} x f(x) \overline{f'(x)}\ dx = - \frac{1}{2} \int_{\mathbb R} |f(x)|^2\ dx$

and then using Cauchy-Schwarz and the Plancherel identity.

Another well known manifestation of the uncertainty principle is the fact that it is not possible for ${f}$ and ${\hat f}$ to both be compactly supported (unless of course they vanish entirely). This can be in fact be seen from the above table: if ${f}$ is compactly supported, then ${\hat f}$ is an entire function; but the zeroes of a non-zero entire function are isolated, yielding a contradiction unless ${f}$ vanishes. (Indeed, the table also shows that if one of ${f}$ and ${\hat f}$ is compactly supported, then the other cannot have exponential decay.)

On the other hand, we have the example of the Gaussian functions ${f(x) = e^{-\pi a x^2}}$, ${\hat f(\xi) = \frac{1}{\sqrt{a}} e^{-\pi \xi^2/a }}$, which both decay faster than exponentially. The classical Hardy uncertainty principle asserts, roughly speaking, that this is the fastest that ${f}$ and ${\hat f}$ can simultaneously decay:

Theorem 1 (Hardy uncertainty principle) Suppose that ${f}$ is a (measurable) function such that ${|f(x)| \leq C e^{-\pi a x^2 }}$ and ${|\hat f(\xi)| \leq C' e^{-\pi \xi^2/a}}$ for all ${x, \xi}$ and some ${C, C', a > 0}$. Then ${f(x)}$ is a scalar multiple of the gaussian ${e^{-\pi ax^2}}$.

This theorem is proven by complex-analytic methods, in particular the Phragmén-Lindelöf principle; for sake of completeness we give that proof below. But I was curious to see if there was a real-variable proof of the same theorem, avoiding the use of complex analysis. I was able to find the proof of a slightly weaker theorem:

Theorem 2 (Weak Hardy uncertainty principle) Suppose that ${f}$ is a non-zero (measurable) function such that ${|f(x)| \leq C e^{-\pi a x^2 }}$ and ${|\hat f(\xi)| \leq C' e^{-\pi b \xi^2}}$ for all ${x, \xi}$ and some ${C, C', a, b > 0}$. Then ${ab \leq C_0}$ for some absolute constant ${C_0}$.

Note that the correct value of ${C_0}$ should be ${1}$, as is implied by the true Hardy uncertainty principle. Despite the weaker statement, I thought the proof might still might be of interest as it is a little less “magical” than the complex-variable one, and so I am giving it below.

This week I am in Seville, Spain, for a conference in harmonic analysis and related topics.  My talk is titled “the uniform uncertainty principle and compressed sensing“.  The content of this talk overlaps substantially with my Ostrowski lecture on the same topic; the slides I prepared for the Seville lecture can be found here.

[Update, Dec 6: Some people have asked about my other lecture given in Seville, on structure and randomness in the prime numbers.  This lecture is largely equivalent to the one posted here.]