You are currently browsing the tag archive for the ‘Fourier transform’ tag.
We will shortly turn to the complex-analytic approach to multiplicative number theory, which relies on the basic properties of complex analytic functions. In this supplement to the main notes, we quickly review the portions of complex analysis that we will be using in this course. We will not attempt a comprehensive review of this subject; for instance, we will completely neglect the conformal geometry or Riemann surface aspect of complex analysis, and we will also avoid using the various boundary convergence theorems for Taylor series or Dirichlet series (the latter type of result is traditionally utilised in multiplicative number theory, but I personally find them a little unintuitive to use, and will instead rely on a slightly different set of complex-analytic tools). We will also focus on the “local” structure of complex analytic functions, in particular adopting the philosophy that such functions behave locally like complex polynomials; the classical “global” theory of entire functions, while traditionally used in the theory of the Riemann zeta function, will be downplayed in these notes. On the other hand, we will play up the relationship between complex analysis and Fourier analysis, as we will incline to using the latter tool over the former in some of the subsequent material. (In the traditional approach to the subject, the Mellin transform is used in place of the Fourier transform, but we will not emphasise the role of the Mellin transform here.)
We begin by recalling the notion of a holomorphic function, which will later be shown to be essentially synonymous with that of a complex analytic function.
Definition 1 (Holomorphic function) Let be an open subset of , and let be a function. If , we say that is complex differentiable at if the limit
exists, in which case we refer to as the (complex) derivative of at . If is differentiable at every point of , and the derivative is continuous, we say that is holomorphic on .
Exercise 2 Show that a function is holomorphic if and only if the two-variable function is continuously differentiable on and obeys the Cauchy-Riemann equation
Basic examples of holomorphic functions include complex polynomials
as well as the complex exponential function
which are holomorphic on the entire complex plane (i.e., they are entire functions). The sum or product of two holomorphic functions is again holomorphic; the quotient of two holomorphic functions is holomorphic so long as the denominator is non-zero. Finally, the composition of two holomorphic functions is holomorphic wherever the composition is defined.
- (i) Establish Euler’s formula
for all . (Hint: it is a bit tricky to do this starting from the trigonometric definitions of sine and cosine; I recommend either using the Taylor series formulations of these functions instead, or alternatively relying on the ordinary differential equations obeyed by sine and cosine.)
- (ii) Show that every non-zero complex number has a complex logarithm such that , and that this logarithm is unique up to integer multiples of .
- (iii) Show that there exists a unique principal branch of the complex logarithm in the region , defined by requiring to be a logarithm of with imaginary part between and . Show that this principal branch is holomorphic with derivative .
In real analysis, we have the fundamental theorem of calculus, which asserts that
whenever is a real interval and is a continuously differentiable function. The complex analogue of this fact is that
whenever is a holomorphic function, and is a contour in , by which we mean a piecewise continuously differentiable function, and the contour integral for a continuous function is defined via change of variables as
The complex fundamental theorem of calculus (2) follows easily from the real fundamental theorem and the chain rule.
In real analysis, we have the rather trivial fact that the integral of a continuous function on a closed contour is always zero:
In complex analysis, the analogous fact is significantly more powerful, and is known as Cauchy’s theorem:
Theorem 4 (Cauchy’s theorem) Let be a holomorphic function in a simply connected open set , and let be a closed contour in (thus ). Then .
Exercise 5 Use Stokes’ theorem to give a proof of Cauchy’s theorem.
A useful reformulation of Cauchy’s theorem is that of contour shifting: if is a holomorphic function on a open set , and are two contours in an open set with and , such that can be continuously deformed into , then . A basic application of contour shifting is the Cauchy integral formula:
Theorem 6 (Cauchy integral formula) Let be a holomorphic function in a simply connected open set , and let be a closed contour which is simple (thus does not traverse any point more than once, with the exception of the endpoint that is traversed twice), and which encloses a bounded region in the anticlockwise direction. Then for any , one has
Proof: Let be a sufficiently small quantity. By contour shifting, one can replace the contour by the sum (concatenation) of three contours: a contour from to , a contour traversing the circle once anticlockwise, and the reversal of the contour that goes from to . The contributions of the contours cancel each other, thus
By a change of variables, the right-hand side can be expanded as
Sending , we obtain the claim.
The Cauchy integral formula has many consequences. Specialising to the case when traverses a circle around , we conclude the mean value property
whenever is holomorphic in a neighbourhood of the disk . In a similar spirit, we have the maximum principle for holomorphic functions:
Lemma 7 (Maximum principle) Let be a simply connected open set, and let be a simple closed contour in enclosing a bounded region anti-clockwise. Let be a holomorphic function. If we have the bound for all on the contour , then we also have the bound for all .
Proof: We use an argument of Landau. Fix . From the Cauchy integral formula and the triangle inequality we have the bound
for some constant depending on and . This ostensibly looks like a weaker bound than what we want, but we can miraculously make the constant disappear by the “tensor power trick“. Namely, observe that if is a holomorphic function bounded in magnitude by on , and is a natural number, then is a holomorphic function bounded in magnitude by on . Applying the preceding argument with replaced by we conclude that
and hence
Sending , we obtain the claim.
Another basic application of the integral formula is
Corollary 8 Every holomorphic function is complex analytic, thus it has a convergent Taylor series around every point in the domain. In particular, holomorphic functions are smooth, and the derivative of a holomorphic function is again holomorphic.
Conversely, it is easy to see that complex analytic functions are holomorphic. Thus, the terms “complex analytic” and “holomorphic” are synonymous, at least when working on open domains. (On a non-open set , saying that is analytic on is equivalent to asserting that extends to a holomorphic function of an open neighbourhood of .) This is in marked contrast to real analysis, in which a function can be continuously differentiable, or even smooth, without being real analytic.
Proof: By translation, we may suppose that . Let be a a contour traversing the circle that is contained in the domain , then by the Cauchy integral formula one has
for all in the disk . As is continuously differentiable (and hence continuous) on , it is bounded. From the geometric series formula
and dominated convergence, we conclude that
with the right-hand side an absolutely convergent series for , and the claim follows.
Exercise 9 Establish the generalised Cauchy integral formulae
for any non-negative integer , where is the -fold complex derivative of .
This in turn leads to a converse to Cauchy’s theorem, known as Morera’s theorem:
Corollary 10 (Morera’s theorem) Let be a continuous function on an open set with the property that for all closed contours . Then is holomorphic.
Proof: We can of course assume to be non-empty and connected (hence path-connected). Fix a point , and define a “primitive” of by defining , with being any contour from to (this is well defined by hypothesis). By mimicking the proof of the real fundamental theorem of calculus, we see that is holomorphic with , and the claim now follows from Corollary 8.
An important consequence of Morera’s theorem for us is
Corollary 11 (Locally uniform limit of holomorphic functions is holomorphic) Let be holomorphic functions on an open set which converge locally uniformly to a function . Then is also holomorphic on .
Proof: By working locally we may assume that is a ball, and in particular simply connected. By Cauchy’s theorem, for all closed contours in . By local uniform convergence, this implies that for all such contours, and the claim then follows from Morera’s theorem.
Now we study the zeroes of complex analytic functions. If a complex analytic function vanishes at a point , but is not identically zero in a neighbourhood of that point, then by Taylor expansion we see that factors in a sufficiently small neighbourhood of as
for some natural number (which we call the order or multiplicity of the zero at ) and some function that is complex analytic and non-zero near ; this generalises the factor theorem for polynomials. In particular, the zero is isolated if does not vanish identically near . We conclude that if is connected and vanishes on a neighbourhood of some point in , then it must vanish on all of (since the maximal connected neighbourhood of in on which vanishes cannot have any boundary point in ). This implies unique continuation of analytic functions: if two complex analytic functions on agree on a non-empty open set, then they agree everywhere. In particular, if a complex analytic function does not vanish everywhere, then all of its zeroes are isolated, so in particular it has only finitely many zeroes on any given compact set.
Recall that a rational function is a function which is a quotient of two polynomials (at least outside of the set where vanishes). Analogously, let us define a meromorphic function on an open set to be a function defined outside of a discrete subset of (the singularities of ), which is locally the quotient of holomorphic functions, in the sense that for every , one has in a neighbourhood of excluding , with holomorphic near and with non-vanishing outside of . If and has a zero of equal or higher order than at , then the singularity is removable and one can extend the meromorphic function holomorphically across (by the holomorphic factor theorem (4)); otherwise, the singularity is non-removable and is known as a pole, whose order is equal to the difference between the order of and the order of at . (If one wished, one could extend meromorphic functions to the poles by embedding in the Riemann sphere and mapping each pole to , but we will not do so here. One could also consider non-meromorphic functions with essential singularities at various points, but we will have no need to analyse such singularities in this course.) If the order of a pole or zero is one, we say that it is simple; if it is two, we say it is double; and so forth.
Exercise 12 Show that the space of meromorphic functions on a non-empty open set , quotiented by almost everywhere equivalence, forms a field.
By quotienting two Taylor series, we see that if a meromorphic function has a pole of order at some point , then it has a Laurent expansion
absolutely convergent in a neighbourhood of excluding itself, and with non-zero. The Laurent coefficient has a special significance, and is called the residue of the meromorphic function at , which we will denote as . The importance of this coefficient comes from the following significant generalisation of the Cauchy integral formula, known as the residue theorem:
Exercise 13 (Residue theorem) Let be a meromorphic function on a simply connected domain , and let be a closed contour in enclosing a bounded region anticlockwise, and avoiding all the singularities of . Show that
where is summed over all the poles of that lie in .
The residue theorem is particularly useful when applied to logarithmic derivatives of meromorphic functions , because the residue is of a specific form:
Exercise 14 Let be a meromorphic function on an open set that does not vanish identically. Show that the only poles of are simple poles (poles of order ), occurring at the poles and zeroes of (after all removable singularities have been removed). Furthermore, the residue of at a pole is an integer, equal to the order of zero of if has a zero at , or equal to negative the order of pole at if has a pole at .
Remark 15 The fact that residues of logarithmic derivatives of meromorphic functions are automatically integers is a remarkable feature of the complex analytic approach to multiplicative number theory, which is difficult (though not entirely impossible) to duplicate in other approaches to the subject. Here is a sample application of this integrality, which is challenging to reproduce by non-complex-analytic means: if is meromorphic near , and one has the bound as , then must in fact stay bounded near , because the only integer of magnitude less than is zero.
Roth’s theorem on arithmetic progressions asserts that every subset of the integers of positive upper density contains infinitely many arithmetic progressions of length three. There are many versions and variants of this theorem. Here is one of them:
Theorem 1 (Roth’s theorem) Let be a compact abelian group, with Haar probability measure , which is -divisible (i.e. the map is surjective) and let be a measurable subset of with for some . Then we have
where denotes the bound for some depending only on .
This theorem is usually formulated in the case that is a finite abelian group of odd order (in which case the result is essentially due to Meshulam) or more specifically a cyclic group of odd order (in which case it is essentially due to Varnavides), but is also valid for the more general setting of -divisible compact abelian groups, as we shall shortly see. One can be more precise about the dependence of the implied constant on , but to keep the exposition simple we will work at the qualitative level here, without trying at all to get good quantitative bounds. The theorem is also true without the -divisibility hypothesis, but the proof we will discuss runs into some technical issues due to the degeneracy of the shift in that case.
We can deduce Theorem 1 from the following more general Khintchine-type statement. Let denote the Pontryagin dual of a compact abelian group , that is to say the set of all continuous homomorphisms from to the (additive) unit circle . Thus is a discrete abelian group, and functions have a Fourier transform defined by
If is -divisible, then is -torsion-free in the sense that the map is injective. For any finite set and any radius , define the Bohr set
where denotes the distance of to the nearest integer. We refer to the cardinality of as the rank of the Bohr set. We record a simple volume bound on Bohr sets:
Lemma 2 (Volume packing bound) Let be a compact abelian group with Haar probability measure . For any Bohr set , we have
Proof: We can cover the torus by translates of the cube . Then the sets form an cover of . But all of these sets lie in a translate of , and the claim then follows from the translation invariance of .
Given any Bohr set , we define a normalised “Lipschitz” cutoff function by the formula
where is the constant such that
thus
The function should be viewed as an -normalised “tent function” cutoff to . Note from Lemma 2 that
We then have the following sharper version of Theorem 1:
Theorem 3 (Roth-Khintchine theorem) Let be a -divisible compact abelian group, with Haar probability measure , and let . Then for any measurable function , there exists a Bohr set with and such that
where denotes the convolution operation
A variant of this result (expressed in the language of ergodic theory) appears in this paper of Bergelson, Host, and Kra; a combinatorial version of the Bergelson-Host-Kra result that is closer to Theorem 3 subsequently appeared in this paper of Ben Green and myself, but this theorem arguably appears implicitly in a much older paper of Bourgain. To see why Theorem 3 implies Theorem 1, we apply the theorem with and equal to a small multiple of to conclude that there is a Bohr set with and such that
But from (2) we have the pointwise bound , and Theorem 1 follows.
Below the fold, we give a short proof of Theorem 3, using an “energy pigeonholing” argument that essentially dates back to the 1986 paper of Bourgain mentioned previously (not to be confused with a later 1999 paper of Bourgain on Roth’s theorem that was highly influential, for instance in emphasising the importance of Bohr sets). The idea is to use the pigeonhole principle to choose the Bohr set to capture all the “large Fourier coefficients” of , but such that a certain “dilate” of does not capture much more Fourier energy of than itself. The bound (3) may then be obtained through elementary Fourier analysis, without much need to explicitly compute things like the Fourier transform of an indicator function of a Bohr set. (However, the bound obtained by this argument is going to be quite poor – of tower-exponential type.) To do this we perform a structural decomposition of into “structured”, “small”, and “highly pseudorandom” components, as is common in the subject (e.g. in this previous blog post), but even though we crucially need to retain non-negativity of one of the components in this decomposition, we can avoid recourse to conditional expectation with respect to a partition (or “factor”) of the space, using instead convolution with one of the considered above to achieve a similar effect.
The classification of finite simple groups (CFSG), first announced in 1983 but only fully completed in 2004, is one of the monumental achievements of twentieth century mathematics. Spanning hundreds of papers and tens of thousands of pages, it has been called the “enormous theorem”. A “second generation” proof of the theorem is nearly completed which is a little shorter (estimated at about five thousand pages in length), but currently there is no reasonably sized proof of the classification.
An important precursor of the CFSG is the Feit-Thompson theorem from 1962-1963, which asserts that every finite group of odd order is solvable, or equivalently that every non-abelian finite simple group has even order. This is an immediate consequence of CFSG, and conversely the Feit-Thompson theorem is an essential starting point in the proof of the classification, since it allows one to reduce matters to groups of even order for which key additional tools (such as the Brauer-Fowler theorem) become available. The original proof of the Feit-Thompson theorem is 255 pages long, which is significantly shorter than the proof of the CFSG, but still far from short. While parts of the proof of the Feit-Thompson theorem have been simplified (and it has recently been converted, after six years of effort, into an argument that has been verified by the proof assistant Coq), the available proofs of this theorem are still extremely lengthy by any reasonable standard.
However, there is a significantly simpler special case of the Feit-Thompson theorem that was established previously by Suzuki in 1957, which was influential in the proof of the more general Feit-Thompson theorem (and thus indirectly to the proof of CFSG). Define a CA-group to be a group with the property that the centraliser of any non-identity element is abelian; equivalently, the commuting relation (defined as the relation that holds when commutes with , thus ) is an equivalence relation on the non-identity elements of . Trivially, every abelian group is CA. A non-abelian example of a CA-group is the group of invertible affine transformations on a field . A little less obviously, the special linear group over a finite field is a CA-group when is a power of two. The finite simple groups of Lie type are not, in general, CA-groups, but when the rank is bounded they tend to behave as if they were “almost CA”; the centraliser of a generic element in , for instance, when is bounded and is large), is typically a maximal torus (because most elements in are regular semisimple) which is certainly abelian. In view of the CFSG, we thus see that CA or nearly CA groups form an important subclass of the simple groups, and it is thus of interest to study them separately. To this end, we have
Theorem 1 (Suzuki’s theorem on CA-groups) Every finite CA-group of odd order is solvable.
Of course, this theorem is superceded by the more general Feit-Thompson theorem, but Suzuki’s proof is substantially shorter (the original proof is nine pages) and will be given in this post. (See this survey of Solomon for some discussion of the link between Suzuki’s argument and the Feit-Thompson argument.) Suzuki’s analysis can be pushed further to give an essentially complete classification of all the finite CA-groups (of either odd or even order), but we will not pursue these matters here.
Moving even further down the ladder of simple precursors of CSFG is the following theorem of Frobenius from 1901. Define a Frobenius group to be a finite group which has a subgroup (called the Frobenius complement) with the property that all the non-trivial conjugates of for , intersect only at the origin. For instance the group is also a Frobenius group (take to be the affine transformations that fix a specified point , e.g. the origin). This example suggests that there is some overlap between the notions of a Frobenius group and a CA group. Indeed, note that if is a CA-group and is a maximal abelian subgroup of , then any conjugate of that is not identical to will intersect only at the origin (because and each of its conjugates consist of equivalence classes under the commuting relation , together with the identity). So if a maximal abelian subgroup of a CA-group is its own normaliser (thus is equal to ), then the group is a Frobenius group.
Frobenius’ theorem places an unexpectedly strong amount of structure on a Frobenius group:
Theorem 2 (Frobenius’ theorem) Let be a Frobenius group with Frobenius complement . Then there exists a normal subgroup of (called the Frobenius kernel of ) such that is the semi-direct product of and .
Roughly speaking, this theorem indicates that all Frobenius groups “behave” like the example (which is a quintessential example of a semi-direct product).
Note that if every CA-group of odd order was either Frobenius or abelian, then Theorem 2 would imply Theorem 1 by an induction on the order of , since any subgroup of a CA-group is clearly again a CA-group. Indeed, the proof of Suzuki’s theorem does basically proceed by this route (Suzuki’s arguments do indeed imply that CA-groups of odd order are Frobenius or abelian, although we will not quite establish that fact here).
Frobenius’ theorem can be reformulated in the following concrete combinatorial form:
Theorem 3 (Frobenius’ theorem, equivalent version) Let be a group of permutations acting transitively on a finite set , with the property that any non-identity permutation in fixes at most one point in . Then the set of permutations in that fix no points in , together with the identity, is closed under composition.
Again, a good example to keep in mind for this theorem is when is the group of affine permutations on a field (i.e. the group for that field), and is the set of points on that field. In that case, the set of permutations in that do not fix any points are the non-trivial translations.
To deduce Theorem 3 from Theorem 2, one applies Theorem 2 to the stabiliser of a single point in . Conversely, to deduce Theorem 2 from Theorem 3, set to be the space of left-cosets of , with the obvious left -action; one easily verifies that this action is faithful, transitive, and each non-identity element of fixes at most one left-coset of (basically because it lies in at most one conjugate of ). If we let be the elements of that do not fix any point in , plus the identity, then by Theorem 3 is closed under composition; it is also clearly closed under inverse and conjugation, and is hence a normal subgroup of . From construction is the identity plus the complement of all the conjugates of , which are all disjoint except at the identity, so by counting elements we see that
As normalises and is disjoint from , we thus see that is all of , giving Theorem 2.
Despite the appealingly concrete and elementary form of Theorem 3, the only known proofs of that theorem (or equivalently, Theorem 2) in its full generality proceed via the machinery of group characters (which one can think of as a version of Fourier analysis for nonabelian groups). On the other hand, once one establishes the basic theory of these characters (reviewed below the fold), the proof of Frobenius’ theorem is very short, which gives quite a striking example of the power of character theory. The proof of Suzuki’s theorem also proceeds via character theory, and is basically a more involved version of the Frobenius argument; again, no character-free proof of Suzuki’s theorem is currently known. (The proofs of Feit-Thompson and CFSG also involve characters, but those proofs also contain many other arguments of much greater complexity than the character-based portions of the proof.)
It seems to me that the above four theorems (Frobenius, Suzuki, Feit-Thompson, and CFSG) provide a ladder of sorts (with exponentially increasing complexity at each step) to the full classification, and that any new approach to the classification might first begin by revisiting the earlier theorems on this ladder and finding new proofs of these results first (in particular, if one had a “robust” proof of Suzuki’s theorem that also gave non-trivial control on “almost CA-groups” – whatever that means – then this might lead to a new route to classifying the finite simple groups of Lie type and bounded rank). But even for the simplest two results on this ladder – Frobenius and Suzuki – it seems remarkably difficult to find any proof that is not essentially the character-based proof. (Even trying to replace character theory by its close cousin, representation theory, doesn’t seem to work unless one gives in to the temptation to take traces everywhere and put the characters back in; it seems that rather than abandon characters altogether, one needs to find some sort of “robust” generalisation of existing character-based methods.) In any case, I am recording here the standard character-based proofs of the theorems of Frobenius and Suzuki below the fold. There is nothing particularly novel here, but I wanted to collect all the relevant material in one place, largely for my own benefit.
Let be Hermitian matrices, with eigenvalues and . The Harish-Chandra–Itzykson-Zuber integral formula exactly computes the integral
where is integrated over the Haar probability measure of the unitary group and is a non-zero complex parameter, as the expression
when the eigenvalues of are simple, where denotes the Vandermonde determinant
and is the constant
There are at least two standard ways to prove this formula in the literature. One way is by applying the Duistermaat-Heckman theorem to the pushforward of Liouville measure on the coadjoint orbit (or more precisely, a rotation of such an orbit by ) under the moment map , and then using a stationary phase expansion. Another way, which I only learned about recently, is to use the formulae for evolution of eigenvalues under Dyson Brownian motion (as well as the closely related formulae for the GUE ensemble), which were derived in this previous blog post. Both of these approaches can be found in several places in the literature (the former being observed in the original paper of Duistermaat and Heckman, and the latter observed in the paper of Itzykson and Zuber as well as in this later paper of Johansson), but I thought I would record both of these here for my own benefit.
The Harish-Chandra-Itzykson-Zuber formula can be extended to other compact Lie groups than . At first glance, this might suggest that these formulae could be of use in the study of the GOE ensemble, but unfortunately the Lie algebra associated to corresponds to real anti-symmetric matrices rather than real symmetric matrices. This also occurs in the case, but there one can simply multiply by to rotate a complex skew-Hermitian matrix into a complex Hermitian matrix. This is consistent, though, with the fact that the (somewhat rarely studied) anti-symmetric GOE ensemble has cleaner formulae (in particular, having a determinantal structure similar to GUE) than the (much more commonly studied) symmetric GOE ensemble.
Let be a compact group. (Throughout this post, all topological groups are assumed to be Hausdorff.) Then has a number of unitary representations, i.e. continuous homomorphisms to the group of unitary operators on a Hilbert space , equipped with the strong operator topology. In particular, one has the left-regular representation , where we equip with its normalised Haar measure (and the Borel -algebra) to form the Hilbert space , and is the translation operation
We call two unitary representations and isomorphic if one has for some unitary transformation , in which case we write .
Given two unitary representations and , one can form their direct sum in the obvious manner: . Conversely, if a unitary representation has a closed invariant subspace of (thus for all ), then the orthogonal complement is also invariant, leading to a decomposition of into the subrepresentations , . Accordingly, we will call a unitary representation irreducible if is nontrivial (i.e. ) and there are no nontrivial invariant subspaces (i.e. no invariant subspaces other than and ); the irreducible representations play a role in the subject analogous to those of prime numbers in multiplicative number theory. By the principle of infinite descent, every finite-dimensional unitary representation is then expressible (perhaps non-uniquely) as the direct sum of irreducible representations.
The Peter-Weyl theorem asserts, among other things, that the same claim is true for the regular representation:
Theorem 1 (Peter-Weyl theorem) Let be a compact group. Then the regular representation is isomorphic to the direct sum of irreducible representations. In fact, one has , where is an enumeration of the irreducible finite-dimensional unitary representations of (up to isomorphism). (It is not difficult to see that such an enumeration exists.)
In the case when is abelian, the Peter-Weyl theorem is a consequence of the Plancherel theorem; in that case, the irreducible representations are all one dimensional, and are thus indexed by the space of characters (i.e. continuous homomorphisms into the unit circle ), known as the Pontryagin dual of . (See for instance my lecture notes on the Fourier transform.) Conversely, the Peter-Weyl theorem can be used to deduce the Plancherel theorem for compact groups, as well as other basic results in Fourier analysis on these groups, such as the Fourier inversion formula.
Because the regular representation is faithful (i.e. injective), a corollary of the Peter-Weyl theorem (and a classical theorem of Cartan) is that every compact group can be expressed as the inverse limit of Lie groups, leading to a solution to Hilbert’s fifth problem in the compact case. Furthermore, the compact case is then an important building block in the more general theory surrounding Hilbert’s fifth problem, and in particular a result of Yamabe that any locally compact group contains an open subgroup that is the inverse limit of Lie groups.
I’ve recently become interested in the theory around Hilbert’s fifth problem, due to the existence of a correspondence principle between locally compact groups and approximate groups, which play a fundamental role in arithmetic combinatorics. I hope to elaborate upon this correspondence in a subsequent post, but I will mention that versions of this principle play a crucial role in Gromov’s proof of his theorem on groups of polynomial growth (discussed previously on this blog), and in a more recent paper of Hrushovski on approximate groups (also discussed previously). It is also analogous in many ways to the more well-known Furstenberg correspondence principle between ergodic theory and combinatorics (also discussed previously).
Because of the above motivation, I have decided to write some notes on how the Peter-Weyl theorem is proven. This is utterly standard stuff in abstract harmonic analysis; these notes are primarily for my own benefit, but perhaps they may be of interest to some readers also.
A recurring theme in mathematics is that of duality: a mathematical object can either be described internally (or in physical space, or locally), by describing what physically consists of (or what kind of maps exist into ), or externally (or in frequency space, or globally), by describing what globally interacts or resonates with (or what kind of maps exist out of ). These two fundamentally opposed perspectives on the object are often dual to each other in various ways: performing an operation on may transform it one way in physical space, but in a dual way in frequency space, with the frequency space description often being a “inversion” of the physical space description. In several important cases, one is fortunate enough to have some sort of fundamental theorem connecting the internal and external perspectives. Here are some (closely inter-related) examples of this perspective:
- Vector space duality A vector space over a field can be described either by the set of vectors inside , or dually by the set of linear functionals from to the field (or equivalently, the set of vectors inside the dual space ). (If one is working in the category of topological vector spaces, one would work instead with continuous linear functionals; and so forth.) A fundamental connection between the two is given by the Hahn-Banach theorem (and its relatives).
- Vector subspace duality In a similar spirit, a subspace of can be described either by listing a basis or a spanning set, or dually by a list of linear functionals that cut out that subspace (i.e. a spanning set for the orthogonal complement . Again, the Hahn-Banach theorem provides a fundamental connection between the two perspectives.
- Convex duality More generally, a (closed, bounded) convex body in a vector space can be described either by listing a set of (extreme) points whose convex hull is , or else by listing a set of (irreducible) linear inequalities that cut out . The fundamental connection between the two is given by the Farkas lemma.
- Ideal-variety duality In a slightly different direction, an algebraic variety in an affine space can be viewed either “in physical space” or “internally” as a collection of points in , or else “in frequency space” or “externally” as a collection of polynomials on whose simultaneous zero locus cuts out . The fundamental connection between the two perspectives is given by the nullstellensatz, which then leads to many of the basic fundamental theorems in classical algebraic geometry.
- Hilbert space duality An element in a Hilbert space can either be thought of in physical space as a vector in that space, or in momentum space as a covector on that space. The fundamental connection between the two is given by the Riesz representation theorem for Hilbert spaces.
- Semantic-syntactic duality Much more generally still, a mathematical theory can either be described internally or syntactically via its axioms and theorems, or externally or semantically via its models. The fundamental connection between the two perspectives is given by the Gödel completeness theorem.
- Intrinsic-extrinsic duality A (Riemannian) manifold can either be viewed intrinsically (using only concepts that do not require an ambient space, such as the Levi-Civita connection), or extrinsically, for instance as the level set of some defining function in an ambient space. Some important connections between the two perspectives includes the Nash embedding theorem and the theorema egregium.
- Group duality A group can be described either via presentations (lists of generators, together with relations between them) or representations (realisations of that group in some more concrete group of transformations). A fundamental connection between the two is Cayley’s theorem. Unfortunately, in general it is difficult to build upon this connection (except in special cases, such as the abelian case), and one cannot always pass effortlessly from one perspective to the other.
- Pontryagin group duality A (locally compact Hausdorff) abelian group can be described either by listing its elements , or by listing the characters (i.e. continuous homomorphisms from to the unit circle, or equivalently elements of ). The connection between the two is the focus of abstract harmonic analysis.
- Pontryagin subgroup duality A subgroup of a locally compact abelian group can be described either by generators in , or generators in the orthogonal complement . One of the fundamental connections between the two is the Poisson summation formula.
- Fourier duality A (sufficiently nice) function on a locally compact abelian group (equipped with a Haar measure ) can either be described in physical space (by its values at each element of ) or in frequency space (by the values at elements of the Pontryagin dual ). The fundamental connection between the two is the Fourier inversion formula.
- The uncertainty principle The behaviour of a function at physical scales above (resp. below) a certain scale is almost completely controlled by the behaviour of its Fourier transform at frequency scales below (resp. above) the dual scale and vice versa, thanks to various mathematical manifestations of the uncertainty principle. (The Poisson summation formula can also be viewed as a variant of this principle, using subgroups instead of scales.)
- Stone/Gelfand duality A (locally compact Hausdorff) topological space can be viewed in physical space (as a collection of points), or dually, via the algebra of continuous complex-valued functions on that space, or (in the case when is compact and totally disconnected) via the boolean algebra of clopen sets (or equivalently, the idempotents of ). The fundamental connection between the two is given by the Stone representation theorem or the (commutative) Gelfand-Naimark theorem.
I have discussed a fair number of these examples in previous blog posts (indeed, most of the links above are to my own blog). In this post, I would like to discuss the uncertainty principle, that describes the dual relationship between physical space and frequency space. There are various concrete formalisations of this principle, most famously the Heisenberg uncertainty principle and the Hardy uncertainty principle – but in many situations, it is the heuristic formulation of the principle that is more useful and insightful than any particular rigorous theorem that attempts to capture that principle. Unfortunately, it is a bit tricky to formulate this heuristic in a succinct way that covers all the various applications of that principle; the Heisenberg inequality is a good start, but it only captures a portion of what the principle tells us. Consider for instance the following (deliberately vague) statements, each of which can be viewed (heuristically, at least) as a manifestation of the uncertainty principle:
- A function which is band-limited (restricted to low frequencies) is featureless and smooth at fine scales, but can be oscillatory (i.e. containing plenty of cancellation) at coarse scales. Conversely, a function which is smooth at fine scales will be almost entirely restricted to low frequencies.
- A function which is restricted to high frequencies is oscillatory at fine scales, but is negligible at coarse scales. Conversely, a function which is oscillatory at fine scales will be almost entirely restricted to high frequencies.
- Projecting a function to low frequencies corresponds to averaging out (or spreading out) that function at fine scales, leaving only the coarse scale behaviour.
- Projecting a frequency to high frequencies corresponds to removing the averaged coarse scale behaviour, leaving only the fine scale oscillation.
- The number of degrees of freedom of a function is bounded by the product of its spatial uncertainty and its frequency uncertainty (or more generally, by the volume of the phase space uncertainty). In particular, there are not enough degrees of freedom for a non-trivial function to be simulatenously localised to both very fine scales and very low frequencies.
- To control the coarse scale (or global) averaged behaviour of a function, one essentially only needs to know the low frequency components of the function (and vice versa).
- To control the fine scale (or local) oscillation of a function, one only needs to know the high frequency components of the function (and vice versa).
- Localising a function to a region of physical space will cause its Fourier transform (or inverse Fourier transform) to resemble a plane wave on every dual region of frequency space.
- Averaging a function along certain spatial directions or at certain scales will cause the Fourier transform to become localised to the dual directions and scales. The smoother the averaging, the sharper the localisation.
- The smoother a function is, the more rapidly decreasing its Fourier transform (or inverse Fourier transform) is (and vice versa).
- If a function is smooth or almost constant in certain directions or at certain scales, then its Fourier transform (or inverse Fourier transform) will decay away from the dual directions or beyond the dual scales.
- If a function has a singularity spanning certain directions or certain scales, then its Fourier transform (or inverse Fourier transform) will decay slowly along the dual directions or within the dual scales.
- Localisation operations in position approximately commute with localisation operations in frequency so long as the product of the spatial uncertainty and the frequency uncertainty is significantly larger than one.
- In the high frequency (or large scale) limit, position and frequency asymptotically behave like a pair of classical observables, and partial differential equations asymptotically behave like classical ordinary differential equations. At lower frequencies (or finer scales), the former becomes a “quantum mechanical perturbation” of the latter, with the strength of the quantum effects increasing as one moves to increasingly lower frequencies and finer spatial scales.
- Etc., etc.
- Almost all of the above statements generalise to other locally compact abelian groups than or , in which the concept of a direction or scale is replaced by that of a subgroup or an approximate subgroup. (In particular, as we will see below, the Poisson summation formula can be viewed as another manifestation of the uncertainty principle.)
I think of all of the above (closely related) assertions as being instances of “the uncertainty principle”, but it seems difficult to combine them all into a single unified assertion, even at the heuristic level; they seem to be better arranged as a cloud of tightly interconnected assertions, each of which is reinforced by several of the others. The famous inequality is at the centre of this cloud, but is by no means the only aspect of it.
The uncertainty principle (as interpreted in the above broad sense) is one of the most fundamental principles in harmonic analysis (and more specifically, to the subfield of time-frequency analysis), second only to the Fourier inversion formula (and more generally, Plancherel’s theorem) in importance; understanding this principle is a key piece of intuition in the subject that one has to internalise before one can really get to grips with this subject (and also with closely related subjects, such as semi-classical analysis and microlocal analysis). Like many fundamental results in mathematics, the principle is not actually that difficult to understand, once one sees how it works; and when one needs to use it rigorously, it is usually not too difficult to improvise a suitable formalisation of the principle for the occasion. But, given how vague this principle is, it is difficult to present this principle in a traditional “theorem-proof-remark” manner. Even in the more informal format of a blog post, I was surprised by how challenging it was to describe my own understanding of this piece of mathematics in a linear fashion, despite (or perhaps because of) it being one of the most central and basic conceptual tools in my own personal mathematical toolbox. In the end, I chose to give below a cloud of interrelated discussions about this principle rather than a linear development of the theory, as this seemed to more closely align with the nature of this principle.
In these notes we lay out the basic theory of the Fourier transform, which is of course the most fundamental tool in harmonic analysis and also of major importance in related fields (functional analysis, complex analysis, PDE, number theory, additive combinatorics, representation theory, signal processing, etc.). The Fourier transform, in conjunction with the Fourier inversion formula, allows one to take essentially arbitrary (complex-valued) functions on a group (or more generally, a space that acts on, e.g. a homogeneous space ), and decompose them as a (discrete or continuous) superposition of much more symmetric functions on the domain, such as characters ; the precise superposition is given by Fourier coefficients , which take values in some dual object such as the Pontryagin dual of . Characters behave in a very simple manner with respect to translation (indeed, they are eigenfunctions of the translation action), and so the Fourier transform tends to simplify any mathematical problem which enjoys a translation invariance symmetry (or an approximation to such a symmetry), and is somehow “linear” (i.e. it interacts nicely with superpositions). In particular, Fourier analytic methods are particularly useful for studying operations such as convolution and set-theoretic addition , or the closely related problem of counting solutions to additive problems such as or , where are constrained to lie in specific sets . The Fourier transform is also a particularly powerful tool for solving constant-coefficient linear ODE and PDE (because of the translation invariance), and can also approximately solve some variable-coefficient (or slightly non-linear) equations if the coefficients vary smoothly enough and the nonlinear terms are sufficiently tame.
The Fourier transform also provides an important new way of looking at a function , as it highlights the distribution of in frequency space (the domain of the frequency variable ) rather than physical space (the domain of the physical variable ). A given property of in the physical domain may be transformed to a rather different-looking property of in the frequency domain. For instance:
- Smoothness of in the physical domain corresponds to decay of in the Fourier domain, and conversely. (More generally, fine scale properties of tend to manifest themselves as coarse scale properties of , and conversely.)
- Convolution in the physical domain corresponds to pointwise multiplication in the Fourier domain, and conversely.
- Constant coefficient differential operators such as in the physical domain corresponds to multiplication by polynomials such as in the Fourier domain, and conversely.
- More generally, translation invariant operators in the physical domain correspond to multiplication by symbols in the Fourier domain, and conversely.
- Rescaling in the physical domain by an invertible linear transformation corresponds to an inverse (adjoint) rescaling in the Fourier domain.
- Restriction to a subspace (or subgroup) in the physical domain corresponds to projection to the dual quotient space (or quotient group) in the Fourier domain, and conversely.
- Frequency modulation in the physical domain corresponds to translation in the frequency domain, and conversely.
(We will make these statements more precise below.)
On the other hand, some operations in the physical domain remain essentially unchanged in the Fourier domain. Most importantly, the norm (or energy) of a function is the same as that of its Fourier transform, and more generally the inner product of two functions is the same as that of their Fourier transforms. Indeed, the Fourier transform is a unitary operator on (a fact which is variously known as the Plancherel theorem or the Parseval identity). This makes it easier to pass back and forth between the physical domain and frequency domain, so that one can combine techniques that are easy to execute in the physical domain with other techniques that are easy to execute in the frequency domain. (In fact, one can combine the physical and frequency domains together into a product domain known as phase space, and there are entire fields of mathematics (e.g. microlocal analysis, geometric quantisation, time-frequency analysis) devoted to performing analysis on these sorts of spaces directly, but this is beyond the scope of this course.)
In these notes, we briefly discuss the general theory of the Fourier transform, but will mainly focus on the two classical domains for Fourier analysis: the torus , and the Euclidean space . For these domains one has the advantage of being able to perform very explicit algebraic calculations, involving concrete functions such as plane waves or Gaussians .
[This post was typeset using a LaTeX to WordPress-HTML converter kindly provided to me by Luca Trevisan.]
Many properties of a (sufficiently nice) function are reflected in its Fourier transform , defined by the formula
For instance, decay properties of are reflected in smoothness properties of , as the following table shows:
If is… | then is… | and this relates to… |
Square-integrable | square-integrable | Plancherel’s theorem |
Absolutely integrable | continuous | Riemann-Lebesgue lemma |
Rapidly decreasing | smooth | theory of Schwartz functions |
Exponentially decreasing | analytic in a strip | |
Compactly supported | entire and at most exponential growth | Paley-Wiener theorem |
Another important relationship between a function and its Fourier transform is the uncertainty principle, which roughly asserts that if a function is highly localised in space, then its Fourier transform must be widely dispersed in space, or to put it another way, and cannot both decay too strongly at infinity (except of course in the degenerate case ). There are many ways to make this intuition precise. One of them is the Heisenberg uncertainty principle, which asserts that if we normalise
then we must have
thus forcing at least one of or to not be too concentrated near the origin. This principle can be proven (for sufficiently nice , initially) by observing the integration by parts identity
and then using Cauchy-Schwarz and the Plancherel identity.
Another well known manifestation of the uncertainty principle is the fact that it is not possible for and to both be compactly supported (unless of course they vanish entirely). This can be in fact be seen from the above table: if is compactly supported, then is an entire function; but the zeroes of a non-zero entire function are isolated, yielding a contradiction unless vanishes. (Indeed, the table also shows that if one of and is compactly supported, then the other cannot have exponential decay.)
On the other hand, we have the example of the Gaussian functions , , which both decay faster than exponentially. The classical Hardy uncertainty principle asserts, roughly speaking, that this is the fastest that and can simultaneously decay:
Theorem 1 (Hardy uncertainty principle) Suppose that is a (measurable) function such that and for all and some . Then is a scalar multiple of the gaussian .
This theorem is proven by complex-analytic methods, in particular the Phragmén-Lindelöf principle; for sake of completeness we give that proof below. But I was curious to see if there was a real-variable proof of the same theorem, avoiding the use of complex analysis. I was able to find the proof of a slightly weaker theorem:
Theorem 2 (Weak Hardy uncertainty principle) Suppose that is a non-zero (measurable) function such that and for all and some . Then for some absolute constant .
Note that the correct value of should be , as is implied by the true Hardy uncertainty principle. Despite the weaker statement, I thought the proof might still might be of interest as it is a little less “magical” than the complex-variable one, and so I am giving it below.
In the next few lectures, we will be studying four major classes of function spaces. In decreasing order of generality, these classes are the topological vector spaces, the normed vector spaces, the Banach spaces, and the Hilbert spaces. In order to motivate the discussion of the more general classes of spaces, we will first focus on the most special class – that of (real and complex) Hilbert spaces. These spaces can be viewed as generalisations of (real and complex) Euclidean spaces such as and to infinite-dimensional settings, and indeed much of one’s Euclidean geometry intuition concerning lengths, angles, orthogonality, subspaces, etc. will transfer readily to arbitrary Hilbert spaces; in contrast, this intuition is not always accurate in the more general vector spaces mentioned above. In addition to Euclidean spaces, another fundamental example of Hilbert spaces comes from the Lebesgue spaces of a measure space . (There are of course many other Hilbert spaces of importance in complex analysis, harmonic analysis, and PDE, such as Hardy spaces , Sobolev spaces , and the space of Hilbert-Schmidt operators, but we will not discuss those spaces much in this course. Complex Hilbert spaces also play a fundamental role in the foundations of quantum mechanics, being the natural space to hold all the possible states of a quantum system (possibly after projectivising the Hilbert space), but we will not discuss this subject here.)
Hilbert spaces are the natural abstract framework in which to study two important (and closely related) concepts: orthogonality and unitarity, allowing us to generalise familiar concepts and facts from Euclidean geometry such as the Cartesian coordinate system, rotations and reflections, and the Pythagorean theorem to Hilbert spaces. (For instance, the Fourier transform is a unitary transformation and can thus be viewed as a kind of generalised rotation.) Furthermore, the Hodge duality on Euclidean spaces has a partial analogue for Hilbert spaces, namely the Riesz representation theorem for Hilbert spaces, which makes the theory of duality and adjoints for Hilbert spaces especially simple (when compared with the more subtle theory of duality for, say, Banach spaces). Much later (next quarter, in fact), we will see that this duality allows us to extend the spectral theorem for self-adjoint matrices to that of self-adjoint operators on a Hilbert space.
These notes are only the most basic introduction to the theory of Hilbert spaces. In particular, the theory of linear transformations between two Hilbert spaces, which is perhaps the most important aspect of the subject, is not covered much at all here (but I hope to discuss it further in future lectures.)
Recent Comments