You are currently browsing the tag archive for the ‘uncertainty principle’ tag.
One of the most fundamental principles in Fourier analysis is the uncertainty principle. It does not have a single canonical formulation, but one typical informal description of the principle is that if a function is restricted to a narrow region of physical space, then its Fourier transform
must be necessarily “smeared out” over a broad region of frequency space. Some versions of the uncertainty principle are discussed in this previous blog post.
In this post I would like to highlight a useful instance of the uncertainty principle, due to Hugh Montgomery, which is useful in analytic number theory contexts. Specifically, suppose we are given a complex-valued function on the integers. To avoid irrelevant issues at spatial infinity, we will assume that the support
of this function is finite (in practice, we will only work with functions that are supported in an interval
for some natural numbers
). Then we can define the Fourier transform
by the formula
where . (In some literature, the sign in the exponential phase is reversed, but this will make no substantial difference to the arguments below.)
The classical uncertainty principle, in this context, asserts that if is localised in an interval of length
, then
must be “smeared out” at a scale of at least
(and essentially constant at scales less than
). For instance, if
is supported in
, then we have the Plancherel identity
for each frequency , and in particular that
for any arc in the unit circle (with
denoting the length of
). In particular, an interval of length significantly less than
can only capture a fraction of the
energy of the Fourier transform of
, which is consistent with the above informal statement of the uncertainty principle.
Another manifestation of the classical uncertainty principle is the large sieve inequality. A particularly nice formulation of this inequality is due independently to Montgomery and Vaughan and Selberg: if is supported in
, and
are frequencies in
that are
-separated for some
, thus
for all
(where
denotes the distance of
to the origin in
), then
is essentially constant at scales less than
. The factor
can in fact be amplified a little bit to
, which is essentially optimal, by using a neat dilation trick of Paul Cohen, in which one dilates
to
(and replaces each frequency
by their
roots), and then sending
(cf. the tensor product trick); see this survey of Montgomery for details. But we will not need this refinement here.
In the above instances of the uncertainty principle, the concept of narrow support in physical space was formalised in the Archimedean sense, using the standard Archimedean metric on the integers
(in particular, the parameter
is essentially the Archimedean diameter of the support of
). However, in number theory, the Archimedean metric is not the only metric of importance on the integers; the
-adic metrics play an equally important role; indeed, it is common to unify the Archimedean and
-adic perspectives together into a unified adelic perspective. In the
-adic world, the metric balls are no longer intervals, but are instead residue classes modulo some power of
. Intersecting these balls from different
-adic metrics together, we obtain residue classes with respect to various moduli
(which may be either prime or composite). As such, another natural manifestation of the concept of “narrow support in physical space” is “vanishes on many residue classes modulo
“. This notion of narrowness is particularly common in sieve theory, when one deals with functions supported on thin sets such as the primes, which naturally tend to avoid many residue classes (particularly if one throws away the first few primes).
In this context, the uncertainty principle is this: the more residue classes modulo that
avoids, the more the Fourier transform
must spread out along multiples of
. To illustrate a very simple example of this principle, let us take
, and suppose that
is supported only on odd numbers (thus it completely avoids the residue class
). We write out the formulae for
and
:
If is supported on the odd numbers, then
is always equal to
on the support of
, and so we have
. Thus, whenever
has a significant presence at a frequency
, it also must have an equally significant presence at the frequency
; there is a spreading out across multiples of
. Note that one has a similar effect if
was supported instead on the even integers instead of the odd integers.
A little more generally, suppose now that avoids a single residue class modulo a prime
; for sake of argument let us say that it avoids the zero residue class
, although the situation for the other residue classes is similar. For any frequency
and any
, one has
From basic Fourier analysis, we know that the phases sum to zero as
ranges from
to
whenever
is not a multiple of
. We thus have
In particular, if is large, then one of the other
has to be somewhat large as well; using the Cauchy-Schwarz inequality, we can quantify this assertion in an
sense via the inequality
Let us continue this analysis a bit further. Now suppose that avoids
residue classes modulo a prime
, for some
. (We exclude the case
as it is clearly degenerates by forcing
to be identically zero.) Let
be the function that equals
on these residue classes and zero away from these residue classes, then
Using the periodic Fourier transform, we can write
for some coefficients , thus
Some Fourier-analytic computations reveal that
and
and so after some routine algebra and the Cauchy-Schwarz inequality, we obtain a generalisation of (3):
Thus we see that the more residue classes mod we exclude, the more Fourier energy has to disperse along multiples of
. It is also instructive to consider the extreme case
, in which
is supported on just a single residue class
; in this case, one clearly has
, and so
spreads its energy completely evenly along multiples of
.
In 1968, Montgomery observed the following useful generalisation of the above calculation to arbitrary modulus:
Proposition 1 (Montgomery’s uncertainty principle) Let
be a finitely supported function which, for each prime
, avoids
residue classes modulo
for some
. Then for each natural number
, one has
where
is the quantity
where
is the Möbius function.
We give a proof of this proposition below the fold.
Following the “adelic” philosophy, it is natural to combine this uncertainty principle with the large sieve inequality to take simultaneous advantage of localisation both in the Archimedean sense and in the -adic senses. This leads to the following corollary:
Corollary 2 (Arithmetic large sieve inequality) Let
be a function supported on an interval
which, for each prime
, avoids
residue classes modulo
for some
. Let
, and let
be a finite set of natural numbers. Suppose that the frequencies
with
,
, and
are
-separated. Then one has
where
was defined in (4).
Indeed, from the large sieve inequality one has
while from Proposition 1 one has
whence the claim.
There is a great deal of flexibility in the above inequality, due to the ability to select the set , the frequencies
, the omitted classes
, and the separation parameter
. Here is a typical application concerning the original motivation for the large sieve inequality, namely in bounding the size of sets which avoid many residue classes:
Corollary 3 (Large sieve) Let
be a set of integers contained in
which avoids
residue classes modulo
for each prime
, and let
. Then
where
Proof: We apply Corollary 2 with ,
,
,
, and
. The key point is that the Farey sequence of fractions
with
and
is
-separated, since
whenever are distinct fractions in this sequence.
If, for instance, is the set of all primes in
larger than
, then one can set
for all
, which makes
, where
is the Euler totient function. It is a classical estimate that
Using this fact and optimising in , we obtain (a special case of) the Brun-Titchmarsh inequality
where is the prime counting function; a variant of the same argument gives the more general Brun-Titchmarsh inequality
for any primitive residue class , where
is the number of primes less than or equal to
that are congruent to
. By performing a more careful optimisation using a slightly sharper version of the large sieve inequality (2) that exploits the irregular spacing of the Farey sequence, Montgomery and Vaughan were able to delete the error term in the Brun-Titchmarsh inequality, thus establishing the very nice inequality
for any natural numbers with
. This is a particularly useful inequality in non-asymptotic analytic number theory (when one wishes to study number theory at explicit orders of magnitude, rather than the number theory of sufficiently large numbers), due to the absence of asymptotic notation.
I recently realised that Corollary 2 also establishes a stronger version of the “restriction theorem for the Selberg sieve” that Ben Green and I proved some years ago (indeed, one can view Corollary 2 as a “restriction theorem for the large sieve”). I’m placing the details below the fold.
A recurring theme in mathematics is that of duality: a mathematical object can either be described internally (or in physical space, or locally), by describing what
physically consists of (or what kind of maps exist into
), or externally (or in frequency space, or globally), by describing what
globally interacts or resonates with (or what kind of maps exist out of
). These two fundamentally opposed perspectives on the object
are often dual to each other in various ways: performing an operation on
may transform it one way in physical space, but in a dual way in frequency space, with the frequency space description often being a “inversion” of the physical space description. In several important cases, one is fortunate enough to have some sort of fundamental theorem connecting the internal and external perspectives. Here are some (closely inter-related) examples of this perspective:
- Vector space duality A vector space
over a field
can be described either by the set of vectors inside
, or dually by the set of linear functionals
from
to the field
(or equivalently, the set of vectors inside the dual space
). (If one is working in the category of topological vector spaces, one would work instead with continuous linear functionals; and so forth.) A fundamental connection between the two is given by the Hahn-Banach theorem (and its relatives).
- Vector subspace duality In a similar spirit, a subspace
of
can be described either by listing a basis or a spanning set, or dually by a list of linear functionals that cut out that subspace (i.e. a spanning set for the orthogonal complement
. Again, the Hahn-Banach theorem provides a fundamental connection between the two perspectives.
- Convex duality More generally, a (closed, bounded) convex body
in a vector space
can be described either by listing a set of (extreme) points whose convex hull is
, or else by listing a set of (irreducible) linear inequalities that cut out
. The fundamental connection between the two is given by the Farkas lemma.
- Ideal-variety duality In a slightly different direction, an algebraic variety
in an affine space
can be viewed either “in physical space” or “internally” as a collection of points in
, or else “in frequency space” or “externally” as a collection of polynomials on
whose simultaneous zero locus cuts out
. The fundamental connection between the two perspectives is given by the nullstellensatz, which then leads to many of the basic fundamental theorems in classical algebraic geometry.
- Hilbert space duality An element
in a Hilbert space
can either be thought of in physical space as a vector in that space, or in momentum space as a covector
on that space. The fundamental connection between the two is given by the Riesz representation theorem for Hilbert spaces.
- Semantic-syntactic duality Much more generally still, a mathematical theory can either be described internally or syntactically via its axioms and theorems, or externally or semantically via its models. The fundamental connection between the two perspectives is given by the Gödel completeness theorem.
- Intrinsic-extrinsic duality A (Riemannian) manifold
can either be viewed intrinsically (using only concepts that do not require an ambient space, such as the Levi-Civita connection), or extrinsically, for instance as the level set of some defining function in an ambient space. Some important connections between the two perspectives includes the Nash embedding theorem and the theorema egregium.
- Group duality A group
can be described either via presentations (lists of generators, together with relations between them) or representations (realisations of that group in some more concrete group of transformations). A fundamental connection between the two is Cayley’s theorem. Unfortunately, in general it is difficult to build upon this connection (except in special cases, such as the abelian case), and one cannot always pass effortlessly from one perspective to the other.
- Pontryagin group duality A (locally compact Hausdorff) abelian group
can be described either by listing its elements
, or by listing the characters
(i.e. continuous homomorphisms from
to the unit circle, or equivalently elements of
). The connection between the two is the focus of abstract harmonic analysis.
- Pontryagin subgroup duality A subgroup
of a locally compact abelian group
can be described either by generators in
, or generators in the orthogonal complement
. One of the fundamental connections between the two is the Poisson summation formula.
- Fourier duality A (sufficiently nice) function
on a locally compact abelian group
(equipped with a Haar measure
) can either be described in physical space (by its values
at each element
of
) or in frequency space (by the values
at elements
of the Pontryagin dual
). The fundamental connection between the two is the Fourier inversion formula.
- The uncertainty principle The behaviour of a function
at physical scales above (resp. below) a certain scale
is almost completely controlled by the behaviour of its Fourier transform
at frequency scales below (resp. above) the dual scale
and vice versa, thanks to various mathematical manifestations of the uncertainty principle. (The Poisson summation formula can also be viewed as a variant of this principle, using subgroups instead of scales.)
- Stone/Gelfand duality A (locally compact Hausdorff) topological space
can be viewed in physical space (as a collection of points), or dually, via the
algebra
of continuous complex-valued functions on that space, or (in the case when
is compact and totally disconnected) via the boolean algebra of clopen sets (or equivalently, the idempotents of
). The fundamental connection between the two is given by the Stone representation theorem or the (commutative) Gelfand-Naimark theorem.
I have discussed a fair number of these examples in previous blog posts (indeed, most of the links above are to my own blog). In this post, I would like to discuss the uncertainty principle, that describes the dual relationship between physical space and frequency space. There are various concrete formalisations of this principle, most famously the Heisenberg uncertainty principle and the Hardy uncertainty principle – but in many situations, it is the heuristic formulation of the principle that is more useful and insightful than any particular rigorous theorem that attempts to capture that principle. Unfortunately, it is a bit tricky to formulate this heuristic in a succinct way that covers all the various applications of that principle; the Heisenberg inequality is a good start, but it only captures a portion of what the principle tells us. Consider for instance the following (deliberately vague) statements, each of which can be viewed (heuristically, at least) as a manifestation of the uncertainty principle:
- A function which is band-limited (restricted to low frequencies) is featureless and smooth at fine scales, but can be oscillatory (i.e. containing plenty of cancellation) at coarse scales. Conversely, a function which is smooth at fine scales will be almost entirely restricted to low frequencies.
- A function which is restricted to high frequencies is oscillatory at fine scales, but is negligible at coarse scales. Conversely, a function which is oscillatory at fine scales will be almost entirely restricted to high frequencies.
- Projecting a function to low frequencies corresponds to averaging out (or spreading out) that function at fine scales, leaving only the coarse scale behaviour.
- Projecting a frequency to high frequencies corresponds to removing the averaged coarse scale behaviour, leaving only the fine scale oscillation.
- The number of degrees of freedom of a function is bounded by the product of its spatial uncertainty and its frequency uncertainty (or more generally, by the volume of the phase space uncertainty). In particular, there are not enough degrees of freedom for a non-trivial function to be simulatenously localised to both very fine scales and very low frequencies.
- To control the coarse scale (or global) averaged behaviour of a function, one essentially only needs to know the low frequency components of the function (and vice versa).
- To control the fine scale (or local) oscillation of a function, one only needs to know the high frequency components of the function (and vice versa).
- Localising a function to a region of physical space will cause its Fourier transform (or inverse Fourier transform) to resemble a plane wave on every dual region of frequency space.
- Averaging a function along certain spatial directions or at certain scales will cause the Fourier transform to become localised to the dual directions and scales. The smoother the averaging, the sharper the localisation.
- The smoother a function is, the more rapidly decreasing its Fourier transform (or inverse Fourier transform) is (and vice versa).
- If a function is smooth or almost constant in certain directions or at certain scales, then its Fourier transform (or inverse Fourier transform) will decay away from the dual directions or beyond the dual scales.
- If a function has a singularity spanning certain directions or certain scales, then its Fourier transform (or inverse Fourier transform) will decay slowly along the dual directions or within the dual scales.
- Localisation operations in position approximately commute with localisation operations in frequency so long as the product of the spatial uncertainty and the frequency uncertainty is significantly larger than one.
- In the high frequency (or large scale) limit, position and frequency asymptotically behave like a pair of classical observables, and partial differential equations asymptotically behave like classical ordinary differential equations. At lower frequencies (or finer scales), the former becomes a “quantum mechanical perturbation” of the latter, with the strength of the quantum effects increasing as one moves to increasingly lower frequencies and finer spatial scales.
- Etc., etc.
- Almost all of the above statements generalise to other locally compact abelian groups than
or
, in which the concept of a direction or scale is replaced by that of a subgroup or an approximate subgroup. (In particular, as we will see below, the Poisson summation formula can be viewed as another manifestation of the uncertainty principle.)
I think of all of the above (closely related) assertions as being instances of “the uncertainty principle”, but it seems difficult to combine them all into a single unified assertion, even at the heuristic level; they seem to be better arranged as a cloud of tightly interconnected assertions, each of which is reinforced by several of the others. The famous inequality is at the centre of this cloud, but is by no means the only aspect of it.
The uncertainty principle (as interpreted in the above broad sense) is one of the most fundamental principles in harmonic analysis (and more specifically, to the subfield of time-frequency analysis), second only to the Fourier inversion formula (and more generally, Plancherel’s theorem) in importance; understanding this principle is a key piece of intuition in the subject that one has to internalise before one can really get to grips with this subject (and also with closely related subjects, such as semi-classical analysis and microlocal analysis). Like many fundamental results in mathematics, the principle is not actually that difficult to understand, once one sees how it works; and when one needs to use it rigorously, it is usually not too difficult to improvise a suitable formalisation of the principle for the occasion. But, given how vague this principle is, it is difficult to present this principle in a traditional “theorem-proof-remark” manner. Even in the more informal format of a blog post, I was surprised by how challenging it was to describe my own understanding of this piece of mathematics in a linear fashion, despite (or perhaps because of) it being one of the most central and basic conceptual tools in my own personal mathematical toolbox. In the end, I chose to give below a cloud of interrelated discussions about this principle rather than a linear development of the theory, as this seemed to more closely align with the nature of this principle.
[This post was typeset using a LaTeX to WordPress-HTML converter kindly provided to me by Luca Trevisan.]
Many properties of a (sufficiently nice) function are reflected in its Fourier transform
, defined by the formula
For instance, decay properties of are reflected in smoothness properties of
, as the following table shows:
| If |
then |
and this relates to… |
| Square-integrable | square-integrable | Plancherel’s theorem |
| Absolutely integrable | continuous | Riemann-Lebesgue lemma |
| Rapidly decreasing | smooth | theory of Schwartz functions |
| Exponentially decreasing | analytic in a strip | |
| Compactly supported | entire and at most exponential growth | Paley-Wiener theorem |
Another important relationship between a function and its Fourier transform
is the uncertainty principle, which roughly asserts that if a function
is highly localised in space, then its Fourier transform
must be widely dispersed in space, or to put it another way,
and
cannot both decay too strongly at infinity (except of course in the degenerate case
). There are many ways to make this intuition precise. One of them is the Heisenberg uncertainty principle, which asserts that if we normalise
then we must have
thus forcing at least one of or
to not be too concentrated near the origin. This principle can be proven (for sufficiently nice
, initially) by observing the integration by parts identity
and then using Cauchy-Schwarz and the Plancherel identity.
Another well known manifestation of the uncertainty principle is the fact that it is not possible for and
to both be compactly supported (unless of course they vanish entirely). This can be in fact be seen from the above table: if
is compactly supported, then
is an entire function; but the zeroes of a non-zero entire function are isolated, yielding a contradiction unless
vanishes. (Indeed, the table also shows that if one of
and
is compactly supported, then the other cannot have exponential decay.)
On the other hand, we have the example of the Gaussian functions ,
, which both decay faster than exponentially. The classical Hardy uncertainty principle asserts, roughly speaking, that this is the fastest that
and
can simultaneously decay:
Theorem 1 (Hardy uncertainty principle) Suppose that
is a (measurable) function such that
and
for all
and some
. Then
is a scalar multiple of the gaussian
.
This theorem is proven by complex-analytic methods, in particular the Phragmén-Lindelöf principle; for sake of completeness we give that proof below. But I was curious to see if there was a real-variable proof of the same theorem, avoiding the use of complex analysis. I was able to find the proof of a slightly weaker theorem:
Theorem 2 (Weak Hardy uncertainty principle) Suppose that
is a non-zero (measurable) function such that
and
for all
and some
. Then
for some absolute constant
.
Note that the correct value of should be
, as is implied by the true Hardy uncertainty principle. Despite the weaker statement, I thought the proof might still might be of interest as it is a little less “magical” than the complex-variable one, and so I am giving it below.
This week I am in Seville, Spain, for a conference in harmonic analysis and related topics. My talk is titled “the uniform uncertainty principle and compressed sensing“. The content of this talk overlaps substantially with my Ostrowski lecture on the same topic; the slides I prepared for the Seville lecture can be found here.
[Update, Dec 6: Some people have asked about my other lecture given in Seville, on structure and randomness in the prime numbers. This lecture is largely equivalent to the one posted here.]

Recent Comments