You are currently browsing the tag archive for the ‘Fourier transform’ tag.

We will shortly turn to the complex-analytic approach to multiplicative number theory, which relies on the basic properties of complex analytic functions. In this supplement to the main notes, we quickly review the portions of complex analysis that we will be using in this course. We will not attempt a comprehensive review of this subject; for instance, we will completely neglect the conformal geometry or Riemann surface aspect of complex analysis, and we will also avoid using the various boundary convergence theorems for Taylor series or Dirichlet series (the latter type of result is traditionally utilised in multiplicative number theory, but I personally find them a little unintuitive to use, and will instead rely on a slightly different set of complex-analytic tools). We will also focus on the “local” structure of complex analytic functions, in particular adopting the philosophy that such functions behave locally like complex polynomials; the classical “global” theory of entire functions, while traditionally used in the theory of the Riemann zeta function, will be downplayed in these notes. On the other hand, we will play up the relationship between complex analysis and Fourier analysis, as we will incline to using the latter tool over the former in some of the subsequent material. (In the traditional approach to the subject, the Mellin transform is used in place of the Fourier transform, but we will not emphasise the role of the Mellin transform here.)

We begin by recalling the notion of a holomorphic function, which will later be shown to be essentially synonymous with that of a complex analytic function.

Definition 1 (Holomorphic function) Let ${\Omega}$ be an open subset of ${{\bf C}}$, and let ${f: \Omega \rightarrow {\bf C}}$ be a function. If ${z \in {\bf C}}$, we say that ${f}$ is complex differentiable at ${z}$ if the limit

$\displaystyle f'(z) := \lim_{h \rightarrow 0; h \in {\bf C} \backslash \{0\}} \frac{f(z+h)-f(z)}{h}$

exists, in which case we refer to ${f'(z)}$ as the (complex) derivative of ${f}$ at ${z}$. If ${f}$ is differentiable at every point ${z}$ of ${\Omega}$, and the derivative ${f': \Omega \rightarrow {\bf C}}$ is continuous, we say that ${f}$ is holomorphic on ${\Omega}$.

Exercise 2 Show that a function ${f: \Omega \rightarrow {\bf C}}$ is holomorphic if and only if the two-variable function ${(x,y) \mapsto f(x+iy)}$ is continuously differentiable on ${\{ (x,y) \in {\bf R}^2: x+iy \in \Omega\}}$ and obeys the Cauchy-Riemann equation

$\displaystyle \frac{\partial}{\partial x} f(x+iy) = \frac{1}{i} \frac{\partial}{\partial y} f(x+iy). \ \ \ \ \ (1)$

Basic examples of holomorphic functions include complex polynomials

$\displaystyle P(z) = a_n z^n + \dots + a_1 z + a_0$

as well as the complex exponential function

$\displaystyle \exp(z) := \sum_{n=0}^\infty \frac{z^n}{n!}$

which are holomorphic on the entire complex plane ${{\bf C}}$ (i.e., they are entire functions). The sum or product of two holomorphic functions is again holomorphic; the quotient of two holomorphic functions is holomorphic so long as the denominator is non-zero. Finally, the composition of two holomorphic functions is holomorphic wherever the composition is defined.

Exercise 3

• (i) Establish Euler’s formula

$\displaystyle \exp(x+iy) = e^x (\cos y + i \sin y)$

for all ${x,y \in {\bf R}}$. (Hint: it is a bit tricky to do this starting from the trigonometric definitions of sine and cosine; I recommend either using the Taylor series formulations of these functions instead, or alternatively relying on the ordinary differential equations obeyed by sine and cosine.)

• (ii) Show that every non-zero complex number ${z}$ has a complex logarithm ${\log(z)}$ such that ${\exp(\log(z))=z}$, and that this logarithm is unique up to integer multiples of ${2\pi i}$.
• (iii) Show that there exists a unique principal branch ${\hbox{Log}(z)}$ of the complex logarithm in the region ${{\bf C} \backslash (-\infty,0]}$, defined by requiring ${\hbox{Log}(z)}$ to be a logarithm of ${z}$ with imaginary part between ${-\pi}$ and ${\pi}$. Show that this principal branch is holomorphic with derivative ${1/z}$.

In real analysis, we have the fundamental theorem of calculus, which asserts that

$\displaystyle \int_a^b F'(t)\ dt = F(b) - F(a)$

whenever ${[a,b]}$ is a real interval and ${F: [a,b] \rightarrow {\bf R}}$ is a continuously differentiable function. The complex analogue of this fact is that

$\displaystyle \int_\gamma F'(z)\ dz = F(\gamma(1)) - F(\gamma(0)) \ \ \ \ \ (2)$

whenever ${F: \Omega \rightarrow {\bf C}}$ is a holomorphic function, and ${\gamma: [0,1] \rightarrow \Omega}$ is a contour in ${\Omega}$, by which we mean a piecewise continuously differentiable function, and the contour integral ${\int_\gamma f(z)\ dz}$ for a continuous function ${f}$ is defined via change of variables as

$\displaystyle \int_\gamma f(z)\ dz := \int_0^1 f(\gamma(t)) \gamma'(t)\ dt.$

The complex fundamental theorem of calculus (2) follows easily from the real fundamental theorem and the chain rule.

In real analysis, we have the rather trivial fact that the integral of a continuous function on a closed contour is always zero:

$\displaystyle \int_a^b f(t)\ dt + \int_b^a f(t)\ dt = 0.$

In complex analysis, the analogous fact is significantly more powerful, and is known as Cauchy’s theorem:

Theorem 4 (Cauchy’s theorem) Let ${f: \Omega \rightarrow {\bf C}}$ be a holomorphic function in a simply connected open set ${\Omega}$, and let ${\gamma: [0,1] \rightarrow \Omega}$ be a closed contour in ${\Omega}$ (thus ${\gamma(1)=\gamma(0)}$). Then ${\int_\gamma f(z)\ dz = 0}$.

Exercise 5 Use Stokes’ theorem to give a proof of Cauchy’s theorem.

A useful reformulation of Cauchy’s theorem is that of contour shifting: if ${f: \Omega \rightarrow {\bf C}}$ is a holomorphic function on a open set ${\Omega}$, and ${\gamma, \tilde \gamma}$ are two contours in an open set ${\Omega}$ with ${\gamma(0)=\tilde \gamma(0)}$ and ${\gamma(1) = \tilde \gamma(1)}$, such that ${\gamma}$ can be continuously deformed into ${\tilde \gamma}$, then ${\int_\gamma f(z)\ dz = \int_{\tilde \gamma} f(z)\ dz}$. A basic application of contour shifting is the Cauchy integral formula:

Theorem 6 (Cauchy integral formula) Let ${f: \Omega \rightarrow {\bf C}}$ be a holomorphic function in a simply connected open set ${\Omega}$, and let ${\gamma: [0,1] \rightarrow \Omega}$ be a closed contour which is simple (thus ${\gamma}$ does not traverse any point more than once, with the exception of the endpoint ${\gamma(0)=\gamma(1)}$ that is traversed twice), and which encloses a bounded region ${U}$ in the anticlockwise direction. Then for any ${z_0 \in U}$, one has

$\displaystyle \int_\gamma \frac{f(z)}{z-z_0}\ dz= 2\pi i f(z_0).$

Proof: Let ${\varepsilon > 0}$ be a sufficiently small quantity. By contour shifting, one can replace the contour ${\gamma}$ by the sum (concatenation) of three contours: a contour ${\rho}$ from ${\gamma(0)}$ to ${z_0+\varepsilon}$, a contour ${C_\varepsilon}$ traversing the circle ${\{z: |z-z_0|=\varepsilon\}}$ once anticlockwise, and the reversal ${-\rho}$ of the contour ${\rho}$ that goes from ${z_0+\varepsilon}$ to ${\gamma_0}$. The contributions of the contours ${\rho, -\rho}$ cancel each other, thus

$\displaystyle \int_\gamma \frac{f(z)}{z-z_0}\ dz = \int_{C_\varepsilon} \frac{f(z)}{z-z_0}\ dz.$

By a change of variables, the right-hand side can be expanded as

$\displaystyle 2\pi i \int_0^1 f(z_0 + \varepsilon e^{2\pi i t})\ dt.$

Sending ${\varepsilon \rightarrow 0}$, we obtain the claim. $\Box$

The Cauchy integral formula has many consequences. Specialising to the case when ${\gamma}$ traverses a circle ${\{ z: |z-z_0|=r\}}$ around ${z_0}$, we conclude the mean value property

$\displaystyle f(z_0) = \int_0^1 f(z_0 + re^{2\pi i t})\ dt \ \ \ \ \ (3)$

whenever ${f}$ is holomorphic in a neighbourhood of the disk ${\{ z: |z-z_0| \leq r \}}$. In a similar spirit, we have the maximum principle for holomorphic functions:

Lemma 7 (Maximum principle) Let ${\Omega}$ be a simply connected open set, and let ${\gamma}$ be a simple closed contour in ${\Omega}$ enclosing a bounded region ${U}$ anti-clockwise. Let ${f: \Omega \rightarrow {\bf C}}$ be a holomorphic function. If we have the bound ${|f(z)| \leq M}$ for all ${z}$ on the contour ${\gamma}$, then we also have the bound ${|f(z_0)| \leq M}$ for all ${z_0 \in U}$.

Proof: We use an argument of Landau. Fix ${z_0 \in U}$. From the Cauchy integral formula and the triangle inequality we have the bound

$\displaystyle |f(z_0)| \leq C_{z_0,\gamma} M$

for some constant ${C_{z_0,\gamma} > 0}$ depending on ${z_0}$ and ${\gamma}$. This ostensibly looks like a weaker bound than what we want, but we can miraculously make the constant ${C_{z_0,\gamma}}$ disappear by the “tensor power trick“. Namely, observe that if ${f}$ is a holomorphic function bounded in magnitude by ${M}$ on ${\gamma}$, and ${n}$ is a natural number, then ${f^n}$ is a holomorphic function bounded in magnitude by ${M^n}$ on ${\gamma}$. Applying the preceding argument with ${f, M}$ replaced by ${f^n, M^n}$ we conclude that

$\displaystyle |f(z_0)|^n \leq C_{z_0,\gamma} M^n$

and hence

$\displaystyle |f(z_0)| \leq C_{z_0,\gamma}^{1/n} M.$

Sending ${n \rightarrow \infty}$, we obtain the claim. $\Box$

Another basic application of the integral formula is

Corollary 8 Every holomorphic function ${f: \Omega \rightarrow {\bf C}}$ is complex analytic, thus it has a convergent Taylor series around every point ${z_0}$ in the domain. In particular, holomorphic functions are smooth, and the derivative of a holomorphic function is again holomorphic.

Conversely, it is easy to see that complex analytic functions are holomorphic. Thus, the terms “complex analytic” and “holomorphic” are synonymous, at least when working on open domains. (On a non-open set ${\Omega}$, saying that ${f}$ is analytic on ${\Omega}$ is equivalent to asserting that ${f}$ extends to a holomorphic function of an open neighbourhood of ${\Omega}$.) This is in marked contrast to real analysis, in which a function can be continuously differentiable, or even smooth, without being real analytic.

Proof: By translation, we may suppose that ${z_0=0}$. Let ${C_r}$ be a a contour traversing the circle ${\{ z: |z|=r\}}$ that is contained in the domain ${\Omega}$, then by the Cauchy integral formula one has

$\displaystyle f(z) = \frac{1}{2\pi i} \int_{C_r} \frac{f(w)}{w-z}\ dw$

for all ${z}$ in the disk ${\{ z: |z| < r \}}$. As ${f}$ is continuously differentiable (and hence continuous) on ${C_r}$, it is bounded. From the geometric series formula

$\displaystyle \frac{1}{w-z} = \frac{1}{w} + \frac{1}{w^2} z + \frac{1}{w^3} z^2 + \dots$

and dominated convergence, we conclude that

$\displaystyle f(z) = \sum_{n=0}^\infty (\frac{1}{2\pi i} \int_{C_r} \frac{f(w)}{w^{n+1}}\ dw) z^n$

with the right-hand side an absolutely convergent series for ${|z| < r}$, and the claim follows. $\Box$

Exercise 9 Establish the generalised Cauchy integral formulae

$\displaystyle f^{(k)}(z_0) = \frac{k!}{2\pi i} \int_\gamma \frac{f(z)}{(z-z_0)^{k+1}}\ dz$

for any non-negative integer ${k}$, where ${f^{(k)}}$ is the ${k}$-fold complex derivative of ${f}$.

This in turn leads to a converse to Cauchy’s theorem, known as Morera’s theorem:

Corollary 10 (Morera’s theorem) Let ${f: \Omega \rightarrow {\bf C}}$ be a continuous function on an open set ${\Omega}$ with the property that ${\int_\gamma f(z)\ dz = 0}$ for all closed contours ${\gamma: [0,1] \rightarrow \Omega}$. Then ${f}$ is holomorphic.

Proof: We can of course assume ${\Omega}$ to be non-empty and connected (hence path-connected). Fix a point ${z_0 \in \Omega}$, and define a “primitive” ${F: \Omega \rightarrow {\bf C}}$ of ${f}$ by defining ${F(z_1) = \int_\gamma f(z)\ dz}$, with ${\gamma: [0,1] \rightarrow \Omega}$ being any contour from ${z_0}$ to ${z_1}$ (this is well defined by hypothesis). By mimicking the proof of the real fundamental theorem of calculus, we see that ${F}$ is holomorphic with ${F'=f}$, and the claim now follows from Corollary 8. $\Box$

An important consequence of Morera’s theorem for us is

Corollary 11 (Locally uniform limit of holomorphic functions is holomorphic) Let ${f_n: \Omega \rightarrow {\bf C}}$ be holomorphic functions on an open set ${\Omega}$ which converge locally uniformly to a function ${f: \Omega \rightarrow {\bf C}}$. Then ${f}$ is also holomorphic on ${\Omega}$.

Proof: By working locally we may assume that ${\Omega}$ is a ball, and in particular simply connected. By Cauchy’s theorem, ${\int_\gamma f_n(z)\ dz = 0}$ for all closed contours ${\gamma}$ in ${\Omega}$. By local uniform convergence, this implies that ${\int_\gamma f(z)\ dz = 0}$ for all such contours, and the claim then follows from Morera’s theorem. $\Box$

Now we study the zeroes of complex analytic functions. If a complex analytic function ${f}$ vanishes at a point ${z_0}$, but is not identically zero in a neighbourhood of that point, then by Taylor expansion we see that ${f}$ factors in a sufficiently small neighbourhood of ${z_0}$ as

$\displaystyle f(z) = (z-z_0)^n g(z_0) \ \ \ \ \ (4)$

for some natural number ${n}$ (which we call the order or multiplicity of the zero at ${f}$) and some function ${g}$ that is complex analytic and non-zero near ${z_0}$; this generalises the factor theorem for polynomials. In particular, the zero ${z_0}$ is isolated if ${f}$ does not vanish identically near ${z_0}$. We conclude that if ${\Omega}$ is connected and ${f}$ vanishes on a neighbourhood of some point ${z_0}$ in ${\Omega}$, then it must vanish on all of ${\Omega}$ (since the maximal connected neighbourhood of ${z_0}$ in ${\Omega}$ on which ${f}$ vanishes cannot have any boundary point in ${\Omega}$). This implies unique continuation of analytic functions: if two complex analytic functions on ${\Omega}$ agree on a non-empty open set, then they agree everywhere. In particular, if a complex analytic function does not vanish everywhere, then all of its zeroes are isolated, so in particular it has only finitely many zeroes on any given compact set.

Recall that a rational function is a function ${f}$ which is a quotient ${g/h}$ of two polynomials (at least outside of the set where ${h}$ vanishes). Analogously, let us define a meromorphic function on an open set ${\Omega}$ to be a function ${f: \Omega \backslash S \rightarrow {\bf C}}$ defined outside of a discrete subset ${S}$ of ${\Omega}$ (the singularities of ${f}$), which is locally the quotient ${g/h}$ of holomorphic functions, in the sense that for every ${z_0 \in \Omega}$, one has ${f=g/h}$ in a neighbourhood of ${z_0}$ excluding ${S}$, with ${g, h}$ holomorphic near ${z_0}$ and with ${h}$ non-vanishing outside of ${S}$. If ${z_0 \in S}$ and ${g}$ has a zero of equal or higher order than ${h}$ at ${z_0}$, then the singularity is removable and one can extend the meromorphic function holomorphically across ${z_0}$ (by the holomorphic factor theorem (4)); otherwise, the singularity is non-removable and is known as a pole, whose order is equal to the difference between the order of ${h}$ and the order of ${g}$ at ${z_0}$. (If one wished, one could extend meromorphic functions to the poles by embedding ${{\bf C}}$ in the Riemann sphere ${{\bf C} \cup \{\infty\}}$ and mapping each pole to ${\infty}$, but we will not do so here. One could also consider non-meromorphic functions with essential singularities at various points, but we will have no need to analyse such singularities in this course.) If the order of a pole or zero is one, we say that it is simple; if it is two, we say it is double; and so forth.

Exercise 12 Show that the space of meromorphic functions on a non-empty open set ${\Omega}$, quotiented by almost everywhere equivalence, forms a field.

By quotienting two Taylor series, we see that if a meromorphic function ${f}$ has a pole of order ${n}$ at some point ${z_0}$, then it has a Laurent expansion

$\displaystyle f = \sum_{m=-n}^\infty a_m (z-z_0)^m,$

absolutely convergent in a neighbourhood of ${z_0}$ excluding ${z_0}$ itself, and with ${a_{-n}}$ non-zero. The Laurent coefficient ${a_{-1}}$ has a special significance, and is called the residue of the meromorphic function ${f}$ at ${z_0}$, which we will denote as ${\hbox{Res}(f;z_0)}$. The importance of this coefficient comes from the following significant generalisation of the Cauchy integral formula, known as the residue theorem:

Exercise 13 (Residue theorem) Let ${f}$ be a meromorphic function on a simply connected domain ${\Omega}$, and let ${\gamma}$ be a closed contour in ${\Omega}$ enclosing a bounded region ${U}$ anticlockwise, and avoiding all the singularities of ${f}$. Show that

$\displaystyle \int_\gamma f(z)\ dz = 2\pi i \sum_\rho \hbox{Res}(f;\rho)$

where ${\rho}$ is summed over all the poles of ${f}$ that lie in ${U}$.

The residue theorem is particularly useful when applied to logarithmic derivatives ${f'/f}$ of meromorphic functions ${f}$, because the residue is of a specific form:

Exercise 14 Let ${f}$ be a meromorphic function on an open set ${\Omega}$ that does not vanish identically. Show that the only poles of ${f'/f}$ are simple poles (poles of order ${1}$), occurring at the poles and zeroes of ${f}$ (after all removable singularities have been removed). Furthermore, the residue of ${f'/f}$ at a pole ${z_0}$ is an integer, equal to the order of zero of ${f}$ if ${f}$ has a zero at ${z_0}$, or equal to negative the order of pole at ${f}$ if ${f}$ has a pole at ${z_0}$.

Remark 15 The fact that residues of logarithmic derivatives of meromorphic functions are automatically integers is a remarkable feature of the complex analytic approach to multiplicative number theory, which is difficult (though not entirely impossible) to duplicate in other approaches to the subject. Here is a sample application of this integrality, which is challenging to reproduce by non-complex-analytic means: if ${f}$ is meromorphic near ${z_0}$, and one has the bound ${|\frac{f'}{f}(z_0+t)| \leq \frac{0.9}{t} + O(1)}$ as ${t \rightarrow 0^+}$, then ${\frac{f'}{f}}$ must in fact stay bounded near ${z_0}$, because the only integer of magnitude less than ${0.9}$ is zero.

Roth’s theorem on arithmetic progressions asserts that every subset of the integers ${{\bf Z}}$ of positive upper density contains infinitely many arithmetic progressions of length three. There are many versions and variants of this theorem. Here is one of them:

Theorem 1 (Roth’s theorem) Let ${G = (G,+)}$ be a compact abelian group, with Haar probability measure ${\mu}$, which is ${2}$-divisible (i.e. the map ${x \mapsto 2x}$ is surjective) and let ${A}$ be a measurable subset of ${G}$ with ${\mu(A) \geq \alpha}$ for some ${0 < \alpha < 1}$. Then we have

$\displaystyle \int_G \int_G 1_A(x) 1_A(x+r) 1_A(x+2r)\ d\mu(x) d\mu(r) \gg_\alpha 1,$

where ${X \gg_\alpha Y}$ denotes the bound ${X \geq c_\alpha Y}$ for some ${c_\alpha > 0}$ depending only on ${\alpha}$.

This theorem is usually formulated in the case that ${G}$ is a finite abelian group of odd order (in which case the result is essentially due to Meshulam) or more specifically a cyclic group ${G = {\bf Z}/N{\bf Z}}$ of odd order (in which case it is essentially due to Varnavides), but is also valid for the more general setting of ${2}$-divisible compact abelian groups, as we shall shortly see. One can be more precise about the dependence of the implied constant ${c_\alpha}$ on ${\alpha}$, but to keep the exposition simple we will work at the qualitative level here, without trying at all to get good quantitative bounds. The theorem is also true without the ${2}$-divisibility hypothesis, but the proof we will discuss runs into some technical issues due to the degeneracy of the ${2r}$ shift in that case.

We can deduce Theorem 1 from the following more general Khintchine-type statement. Let ${\hat G}$ denote the Pontryagin dual of a compact abelian group ${G}$, that is to say the set of all continuous homomorphisms ${\xi: x \mapsto \xi \cdot x}$ from ${G}$ to the (additive) unit circle ${{\bf R}/{\bf Z}}$. Thus ${\hat G}$ is a discrete abelian group, and functions ${f \in L^2(G)}$ have a Fourier transform ${\hat f \in \ell^2(\hat G)}$ defined by

$\displaystyle \hat f(\xi) := \int_G f(x) e^{-2\pi i \xi \cdot x}\ d\mu(x).$

If ${G}$ is ${2}$-divisible, then ${\hat G}$ is ${2}$-torsion-free in the sense that the map ${\xi \mapsto 2 \xi}$ is injective. For any finite set ${S \subset \hat G}$ and any radius ${\rho>0}$, define the Bohr set

$\displaystyle B(S,\rho) := \{ x \in G: \sup_{\xi \in S} \| \xi \cdot x \|_{{\bf R}/{\bf Z}} < \rho \}$

where ${\|\theta\|_{{\bf R}/{\bf Z}}}$ denotes the distance of ${\theta}$ to the nearest integer. We refer to the cardinality ${|S|}$ of ${S}$ as the rank of the Bohr set. We record a simple volume bound on Bohr sets:

Lemma 2 (Volume packing bound) Let ${G}$ be a compact abelian group with Haar probability measure ${\mu}$. For any Bohr set ${B(S,\rho)}$, we have

$\displaystyle \mu( B( S, \rho ) ) \gg_{|S|, \rho} 1.$

Proof: We can cover the torus ${({\bf R}/{\bf Z})^S}$ by ${O_{|S|,\rho}(1)}$ translates ${\theta+Q}$ of the cube ${Q := \{ (\theta_\xi)_{\xi \in S} \in ({\bf R}/{\bf Z})^S: \sup_{\xi \in S} \|\theta_\xi\|_{{\bf R}/{\bf Z}} < \rho/2 \}}$. Then the sets ${\{ x \in G: (\xi \cdot x)_{\xi \in S} \in \theta + Q \}}$ form an cover of ${G}$. But all of these sets lie in a translate of ${B(S,\rho)}$, and the claim then follows from the translation invariance of ${\mu}$. $\Box$

Given any Bohr set ${B(S,\rho)}$, we define a normalised “Lipschitz” cutoff function ${\nu_{B(S,\rho)}: G \rightarrow {\bf R}}$ by the formula

$\displaystyle \nu_{B(S,\rho)}(x) = c_{B(S,\rho)} (1 - \frac{1}{\rho} \sup_{\xi \in S} \|\xi \cdot x\|_{{\bf R}/{\bf Z}})_+ \ \ \ \ \ (1)$

where ${c_{B(S,\rho)}}$ is the constant such that

$\displaystyle \int_G \nu_{B(S,\rho)}\ d\mu = 1,$

thus

$\displaystyle c_{B(S,\rho)} = \left( \int_{B(S,\rho)} (1 - \frac{1}{\rho} \sup_{\xi \in S} \|\xi \cdot x\|_{{\bf R}/{\bf Z}})\ d\mu(x) \right)^{-1}.$

The function ${\nu_{B(S,\rho)}}$ should be viewed as an ${L^1}$-normalised “tent function” cutoff to ${B(S,\rho)}$. Note from Lemma 2 that

$\displaystyle 1 \ll_{|S|,\rho} c_{B(S,\rho)} \ll_{|S|,\rho} 1. \ \ \ \ \ (2)$

We then have the following sharper version of Theorem 1:

Theorem 3 (Roth-Khintchine theorem) Let ${G = (G,+)}$ be a ${2}$-divisible compact abelian group, with Haar probability measure ${\mu}$, and let ${\epsilon>0}$. Then for any measurable function ${f: G \rightarrow [0,1]}$, there exists a Bohr set ${B(S,\rho)}$ with ${|S| \ll_\epsilon 1}$ and ${\rho \gg_\epsilon 1}$ such that

$\displaystyle \int_G \int_G f(x) f(x+r) f(x+2r) \nu_{B(S,\rho)}*\nu_{B(S,\rho)}(r)\ d\mu(x) d\mu(r) \ \ \ \ \ (3)$

$\displaystyle \geq (\int_G f\ d\mu)^3 - O(\epsilon)$

where ${*}$ denotes the convolution operation

$\displaystyle f*g(x) := \int_G f(y) g(x-y)\ d\mu(y).$

A variant of this result (expressed in the language of ergodic theory) appears in this paper of Bergelson, Host, and Kra; a combinatorial version of the Bergelson-Host-Kra result that is closer to Theorem 3 subsequently appeared in this paper of Ben Green and myself, but this theorem arguably appears implicitly in a much older paper of Bourgain. To see why Theorem 3 implies Theorem 1, we apply the theorem with ${f := 1_A}$ and ${\epsilon}$ equal to a small multiple of ${\alpha^3}$ to conclude that there is a Bohr set ${B(S,\rho)}$ with ${|S| \ll_\alpha 1}$ and ${\rho \gg_\alpha 1}$ such that

$\displaystyle \int_G \int_G 1_A(x) 1_A(x+r) 1_A(x+2r) \nu_{B(S,\rho)}*\nu_{B(S,\rho)}(r)\ d\mu(x) d\mu(r) \gg \alpha^3.$

But from (2) we have the pointwise bound ${\nu_{B(S,\rho)}*\nu_{B(S,\rho)} \ll_\alpha 1}$, and Theorem 1 follows.

Below the fold, we give a short proof of Theorem 3, using an “energy pigeonholing” argument that essentially dates back to the 1986 paper of Bourgain mentioned previously (not to be confused with a later 1999 paper of Bourgain on Roth’s theorem that was highly influential, for instance in emphasising the importance of Bohr sets). The idea is to use the pigeonhole principle to choose the Bohr set ${B(S,\rho)}$ to capture all the “large Fourier coefficients” of ${f}$, but such that a certain “dilate” of ${B(S,\rho)}$ does not capture much more Fourier energy of ${f}$ than ${B(S,\rho)}$ itself. The bound (3) may then be obtained through elementary Fourier analysis, without much need to explicitly compute things like the Fourier transform of an indicator function of a Bohr set. (However, the bound obtained by this argument is going to be quite poor – of tower-exponential type.) To do this we perform a structural decomposition of ${f}$ into “structured”, “small”, and “highly pseudorandom” components, as is common in the subject (e.g. in this previous blog post), but even though we crucially need to retain non-negativity of one of the components in this decomposition, we can avoid recourse to conditional expectation with respect to a partition (or “factor”) of the space, using instead convolution with one of the ${\nu_{B(S,\rho)}}$ considered above to achieve a similar effect.

The classification of finite simple groups (CFSG), first announced in 1983 but only fully completed in 2004, is one of the monumental achievements of twentieth century mathematics. Spanning hundreds of papers and tens of thousands of pages, it has been called the “enormous theorem”. A “second generation” proof of the theorem is nearly completed which is a little shorter (estimated at about five thousand pages in length), but currently there is no reasonably sized proof of the classification.

An important precursor of the CFSG is the Feit-Thompson theorem from 1962-1963, which asserts that every finite group of odd order is solvable, or equivalently that every non-abelian finite simple group has even order. This is an immediate consequence of CFSG, and conversely the Feit-Thompson theorem is an essential starting point in the proof of the classification, since it allows one to reduce matters to groups of even order for which key additional tools (such as the Brauer-Fowler theorem) become available. The original proof of the Feit-Thompson theorem is 255 pages long, which is significantly shorter than the proof of the CFSG, but still far from short. While parts of the proof of the Feit-Thompson theorem have been simplified (and it has recently been converted, after six years of effort, into an argument that has been verified by the proof assistant Coq), the available proofs of this theorem are still extremely lengthy by any reasonable standard.

However, there is a significantly simpler special case of the Feit-Thompson theorem that was established previously by Suzuki in 1957, which was influential in the proof of the more general Feit-Thompson theorem (and thus indirectly to the proof of CFSG). Define a CA-group to be a group ${G}$ with the property that the centraliser ${C_G(x) := \{ g \in G: gx=xg \}}$ of any non-identity element ${x \in G}$ is abelian; equivalently, the commuting relation ${x \sim y}$ (defined as the relation that holds when ${x}$ commutes with ${y}$, thus ${xy=yx}$) is an equivalence relation on the non-identity elements ${G \backslash \{1\}}$ of ${G}$. Trivially, every abelian group is CA. A non-abelian example of a CA-group is the ${ax+b}$ group of invertible affine transformations ${x \mapsto ax+b}$ on a field ${F}$. A little less obviously, the special linear group ${SL_2(F_q)}$ over a finite field ${F_q}$ is a CA-group when ${q}$ is a power of two. The finite simple groups of Lie type are not, in general, CA-groups, but when the rank is bounded they tend to behave as if they were “almost CA”; the centraliser of a generic element in ${SL_d(F_q)}$, for instance, when ${d}$ is bounded and ${q}$ is large), is typically a maximal torus (because most elements in ${SL_d(F_q)}$ are regular semisimple) which is certainly abelian. In view of the CFSG, we thus see that CA or nearly CA groups form an important subclass of the simple groups, and it is thus of interest to study them separately. To this end, we have

Theorem 1 (Suzuki’s theorem on CA-groups) Every finite CA-group of odd order is solvable.

Of course, this theorem is superceded by the more general Feit-Thompson theorem, but Suzuki’s proof is substantially shorter (the original proof is nine pages) and will be given in this post. (See this survey of Solomon for some discussion of the link between Suzuki’s argument and the Feit-Thompson argument.) Suzuki’s analysis can be pushed further to give an essentially complete classification of all the finite CA-groups (of either odd or even order), but we will not pursue these matters here.

Moving even further down the ladder of simple precursors of CSFG is the following theorem of Frobenius from 1901. Define a Frobenius group to be a finite group ${G}$ which has a subgroup ${H}$ (called the Frobenius complement) with the property that all the non-trivial conjugates ${gHg^{-1}}$ of ${H}$ for ${g \in G \backslash H}$, intersect ${H}$ only at the origin. For instance the ${ax+b}$ group is also a Frobenius group (take ${H}$ to be the affine transformations that fix a specified point ${x_0 \in F}$, e.g. the origin). This example suggests that there is some overlap between the notions of a Frobenius group and a CA group. Indeed, note that if ${G}$ is a CA-group and ${H}$ is a maximal abelian subgroup of ${G}$, then any conjugate ${gHg^{-1}}$ of ${H}$ that is not identical to ${H}$ will intersect ${H}$ only at the origin (because ${H}$ and each of its conjugates consist of equivalence classes under the commuting relation ${\sim}$, together with the identity). So if a maximal abelian subgroup ${H}$ of a CA-group is its own normaliser (thus ${N(H) := \{ g \in G: gH=Hg\}}$ is equal to ${H}$), then the group is a Frobenius group.

Frobenius’ theorem places an unexpectedly strong amount of structure on a Frobenius group:

Theorem 2 (Frobenius’ theorem) Let ${G}$ be a Frobenius group with Frobenius complement ${H}$. Then there exists a normal subgroup ${K}$ of ${G}$ (called the Frobenius kernel of ${G}$) such that ${G}$ is the semi-direct product ${H \ltimes K}$ of ${H}$ and ${K}$.

Roughly speaking, this theorem indicates that all Frobenius groups “behave” like the ${ax+b}$ example (which is a quintessential example of a semi-direct product).

Note that if every CA-group of odd order was either Frobenius or abelian, then Theorem 2 would imply Theorem 1 by an induction on the order of ${G}$, since any subgroup of a CA-group is clearly again a CA-group. Indeed, the proof of Suzuki’s theorem does basically proceed by this route (Suzuki’s arguments do indeed imply that CA-groups of odd order are Frobenius or abelian, although we will not quite establish that fact here).

Frobenius’ theorem can be reformulated in the following concrete combinatorial form:

Theorem 3 (Frobenius’ theorem, equivalent version) Let ${G}$ be a group of permutations acting transitively on a finite set ${X}$, with the property that any non-identity permutation in ${G}$ fixes at most one point in ${X}$. Then the set of permutations in ${G}$ that fix no points in ${X}$, together with the identity, is closed under composition.

Again, a good example to keep in mind for this theorem is when ${G}$ is the group of affine permutations on a field ${F}$ (i.e. the ${ax+b}$ group for that field), and ${X}$ is the set of points on that field. In that case, the set of permutations in ${G}$ that do not fix any points are the non-trivial translations.

To deduce Theorem 3 from Theorem 2, one applies Theorem 2 to the stabiliser of a single point in ${X}$. Conversely, to deduce Theorem 2 from Theorem 3, set ${X := G/H = \{ gH: g \in G \}}$ to be the space of left-cosets of ${H}$, with the obvious left ${G}$-action; one easily verifies that this action is faithful, transitive, and each non-identity element ${g}$ of ${G}$ fixes at most one left-coset of ${H}$ (basically because it lies in at most one conjugate of ${H}$). If we let ${K}$ be the elements of ${G}$ that do not fix any point in ${X}$, plus the identity, then by Theorem 3 ${K}$ is closed under composition; it is also clearly closed under inverse and conjugation, and is hence a normal subgroup of ${G}$. From construction ${K}$ is the identity plus the complement of all the ${|G|/|H|}$ conjugates of ${H}$, which are all disjoint except at the identity, so by counting elements we see that

$\displaystyle |K| = |G| - \frac{|G|}{|H|}(|H|-1) = |G|/|H|.$

As ${H}$ normalises ${K}$ and is disjoint from ${K}$, we thus see that ${KH = H \ltimes K}$ is all of ${G}$, giving Theorem 2.

Despite the appealingly concrete and elementary form of Theorem 3, the only known proofs of that theorem (or equivalently, Theorem 2) in its full generality proceed via the machinery of group characters (which one can think of as a version of Fourier analysis for nonabelian groups). On the other hand, once one establishes the basic theory of these characters (reviewed below the fold), the proof of Frobenius’ theorem is very short, which gives quite a striking example of the power of character theory. The proof of Suzuki’s theorem also proceeds via character theory, and is basically a more involved version of the Frobenius argument; again, no character-free proof of Suzuki’s theorem is currently known. (The proofs of Feit-Thompson and CFSG also involve characters, but those proofs also contain many other arguments of much greater complexity than the character-based portions of the proof.)

It seems to me that the above four theorems (Frobenius, Suzuki, Feit-Thompson, and CFSG) provide a ladder of sorts (with exponentially increasing complexity at each step) to the full classification, and that any new approach to the classification might first begin by revisiting the earlier theorems on this ladder and finding new proofs of these results first (in particular, if one had a “robust” proof of Suzuki’s theorem that also gave non-trivial control on “almost CA-groups” – whatever that means – then this might lead to a new route to classifying the finite simple groups of Lie type and bounded rank). But even for the simplest two results on this ladder – Frobenius and Suzuki – it seems remarkably difficult to find any proof that is not essentially the character-based proof. (Even trying to replace character theory by its close cousin, representation theory, doesn’t seem to work unless one gives in to the temptation to take traces everywhere and put the characters back in; it seems that rather than abandon characters altogether, one needs to find some sort of “robust” generalisation of existing character-based methods.) In any case, I am recording here the standard character-based proofs of the theorems of Frobenius and Suzuki below the fold. There is nothing particularly novel here, but I wanted to collect all the relevant material in one place, largely for my own benefit.

Let ${A, B}$ be ${n \times n}$ Hermitian matrices, with eigenvalues ${\lambda_1(A) \leq \ldots \leq \lambda_n(A)}$ and ${\lambda_1(B) \leq \ldots\leq \lambda_n(B)}$. The Harish-ChandraItzykson-Zuber integral formula exactly computes the integral

$\displaystyle \int_{U(n)} \exp( t \hbox{tr}( A U B U^* ) )\ dU$

where ${U}$ is integrated over the Haar probability measure of the unitary group ${U(n)}$ and ${t}$ is a non-zero complex parameter, as the expression

$\displaystyle c_n \frac{ \det( \exp( t \lambda_i(A) \lambda_j(B) ) )_{1 \leq i,j \leq n} }{t^{(n^2-n)/2} \Delta(\lambda(A)) \Delta(\lambda(B))}$

when the eigenvalues of ${A,B}$ are simple, where ${\Delta}$ denotes the Vandermonde determinant

$\displaystyle \Delta(\lambda(A)) := \prod_{1 \leq i

and ${c_n}$ is the constant

$\displaystyle c_n := \prod_{i=1}^{n-1} i!.$

There are at least two standard ways to prove this formula in the literature. One way is by applying the Duistermaat-Heckman theorem to the pushforward of Liouville measure on the coadjoint orbit ${{\mathcal O}_B := \{ UBU^*: U \in U(n) \}}$ (or more precisely, a rotation of such an orbit by ${i}$) under the moment map ${M \mapsto \hbox{diag}(M)}$, and then using a stationary phase expansion. Another way, which I only learned about recently, is to use the formulae for evolution of eigenvalues under Dyson Brownian motion (as well as the closely related formulae for the GUE ensemble), which were derived in this previous blog post. Both of these approaches can be found in several places in the literature (the former being observed in the original paper of Duistermaat and Heckman, and the latter observed in the paper of Itzykson and Zuber as well as in this later paper of Johansson), but I thought I would record both of these here for my own benefit.

The Harish-Chandra-Itzykson-Zuber formula can be extended to other compact Lie groups than ${U(n)}$. At first glance, this might suggest that these formulae could be of use in the study of the GOE ensemble, but unfortunately the Lie algebra associated to ${O(n)}$ corresponds to real anti-symmetric matrices rather than real symmetric matrices. This also occurs in the ${U(n)}$ case, but there one can simply multiply by ${i}$ to rotate a complex skew-Hermitian matrix into a complex Hermitian matrix. This is consistent, though, with the fact that the (somewhat rarely studied) anti-symmetric GOE ensemble has cleaner formulae (in particular, having a determinantal structure similar to GUE) than the (much more commonly studied) symmetric GOE ensemble.

Let ${G}$ be a compact group. (Throughout this post, all topological groups are assumed to be Hausdorff.) Then ${G}$ has a number of unitary representations, i.e. continuous homomorphisms ${\rho: G \rightarrow U(H)}$ to the group ${U(H)}$ of unitary operators on a Hilbert space ${H}$, equipped with the strong operator topology. In particular, one has the left-regular representation ${\tau: G \rightarrow U(L^2(G))}$, where we equip ${G}$ with its normalised Haar measure ${\mu}$ (and the Borel ${\sigma}$-algebra) to form the Hilbert space ${L^2(G)}$, and ${\tau}$ is the translation operation

$\displaystyle \tau(g) f(x) := f(g^{-1} x).$

We call two unitary representations ${\rho: G \rightarrow U(H)}$ and ${\rho': G \rightarrow U(H')}$ isomorphic if one has ${\rho'(g) = U \rho(g) U^{-1}}$ for some unitary transformation ${U: H \rightarrow H'}$, in which case we write ${\rho \equiv \rho'}$.

Given two unitary representations ${\rho: G \rightarrow U(H)}$ and ${\rho': G \rightarrow U(H')}$, one can form their direct sum ${\rho \oplus \rho': G \rightarrow U(H \oplus H')}$ in the obvious manner: ${\rho \oplus \rho'(g)(v) := (\rho(g) v, \rho'(g) v)}$. Conversely, if a unitary representation ${\rho: G \rightarrow U(H)}$ has a closed invariant subspace ${V \subset H}$ of ${H}$ (thus ${\rho(g) V \subset V}$ for all ${g \in G}$), then the orthogonal complement ${V^\perp}$ is also invariant, leading to a decomposition ${\rho \equiv \rho\downharpoonright_V \oplus \rho\downharpoonright_{V^\perp}}$ of ${\rho}$ into the subrepresentations ${\rho\downharpoonright_V: G \rightarrow U(V)}$, ${\rho\downharpoonright_{V^\perp}: G \rightarrow U(V^\perp)}$. Accordingly, we will call a unitary representation ${\rho: G \rightarrow U(H)}$ irreducible if ${H}$ is nontrivial (i.e. ${H \neq \{0\}}$) and there are no nontrivial invariant subspaces (i.e. no invariant subspaces other than ${\{0\}}$ and ${H}$); the irreducible representations play a role in the subject analogous to those of prime numbers in multiplicative number theory. By the principle of infinite descent, every finite-dimensional unitary representation is then expressible (perhaps non-uniquely) as the direct sum of irreducible representations.

The Peter-Weyl theorem asserts, among other things, that the same claim is true for the regular representation:

Theorem 1 (Peter-Weyl theorem) Let ${G}$ be a compact group. Then the regular representation ${\tau: G \rightarrow U(L^2(G))}$ is isomorphic to the direct sum of irreducible representations. In fact, one has ${\tau \equiv \bigoplus_{\xi \in \hat G} \rho_\xi^{\oplus \hbox{dim}(V_\xi)}}$, where ${(\rho_\xi)_{\xi \in \hat G}}$ is an enumeration of the irreducible finite-dimensional unitary representations ${\rho_\xi: G \rightarrow U(V_\xi)}$ of ${G}$ (up to isomorphism). (It is not difficult to see that such an enumeration exists.)

In the case when ${G}$ is abelian, the Peter-Weyl theorem is a consequence of the Plancherel theorem; in that case, the irreducible representations are all one dimensional, and are thus indexed by the space ${\hat G}$ of characters ${\xi: G \rightarrow {\bf R}/{\bf Z}}$ (i.e. continuous homomorphisms into the unit circle ${{\bf R}/{\bf Z}}$), known as the Pontryagin dual of ${G}$. (See for instance my lecture notes on the Fourier transform.) Conversely, the Peter-Weyl theorem can be used to deduce the Plancherel theorem for compact groups, as well as other basic results in Fourier analysis on these groups, such as the Fourier inversion formula.

Because the regular representation is faithful (i.e. injective), a corollary of the Peter-Weyl theorem (and a classical theorem of Cartan) is that every compact group can be expressed as the inverse limit of Lie groups, leading to a solution to Hilbert’s fifth problem in the compact case. Furthermore, the compact case is then an important building block in the more general theory surrounding Hilbert’s fifth problem, and in particular a result of Yamabe that any locally compact group contains an open subgroup that is the inverse limit of Lie groups.

I’ve recently become interested in the theory around Hilbert’s fifth problem, due to the existence of a correspondence principle between locally compact groups and approximate groups, which play a fundamental role in arithmetic combinatorics. I hope to elaborate upon this correspondence in a subsequent post, but I will mention that versions of this principle play a crucial role in Gromov’s proof of his theorem on groups of polynomial growth (discussed previously on this blog), and in a more recent paper of Hrushovski on approximate groups (also discussed previously). It is also analogous in many ways to the more well-known Furstenberg correspondence principle between ergodic theory and combinatorics (also discussed previously).

Because of the above motivation, I have decided to write some notes on how the Peter-Weyl theorem is proven. This is utterly standard stuff in abstract harmonic analysis; these notes are primarily for my own benefit, but perhaps they may be of interest to some readers also.

A recurring theme in mathematics is that of duality: a mathematical object ${X}$ can either be described internally (or in physical space, or locally), by describing what ${X}$ physically consists of (or what kind of maps exist into ${X}$), or externally (or in frequency space, or globally), by describing what ${X}$ globally interacts or resonates with (or what kind of maps exist out of ${X}$). These two fundamentally opposed perspectives on the object ${X}$ are often dual to each other in various ways: performing an operation on ${X}$ may transform it one way in physical space, but in a dual way in frequency space, with the frequency space description often being a “inversion” of the physical space description. In several important cases, one is fortunate enough to have some sort of fundamental theorem connecting the internal and external perspectives. Here are some (closely inter-related) examples of this perspective:

1. Vector space duality A vector space ${V}$ over a field ${F}$ can be described either by the set of vectors inside ${V}$, or dually by the set of linear functionals ${\lambda: V \rightarrow F}$ from ${V}$ to the field ${F}$ (or equivalently, the set of vectors inside the dual space ${V^*}$). (If one is working in the category of topological vector spaces, one would work instead with continuous linear functionals; and so forth.) A fundamental connection between the two is given by the Hahn-Banach theorem (and its relatives).
2. Vector subspace duality In a similar spirit, a subspace ${W}$ of ${V}$ can be described either by listing a basis or a spanning set, or dually by a list of linear functionals that cut out that subspace (i.e. a spanning set for the orthogonal complement ${W^\perp := \{ \lambda \in V^*: \lambda(w)=0 \hbox{ for all } w \in W \})}$. Again, the Hahn-Banach theorem provides a fundamental connection between the two perspectives.
3. Convex duality More generally, a (closed, bounded) convex body ${K}$ in a vector space ${V}$ can be described either by listing a set of (extreme) points whose convex hull is ${K}$, or else by listing a set of (irreducible) linear inequalities that cut out ${K}$. The fundamental connection between the two is given by the Farkas lemma.
4. Ideal-variety duality In a slightly different direction, an algebraic variety ${V}$ in an affine space ${A^n}$ can be viewed either “in physical space” or “internally” as a collection of points in ${V}$, or else “in frequency space” or “externally” as a collection of polynomials on ${A^n}$ whose simultaneous zero locus cuts out ${V}$. The fundamental connection between the two perspectives is given by the nullstellensatz, which then leads to many of the basic fundamental theorems in classical algebraic geometry.
5. Hilbert space duality An element ${v}$ in a Hilbert space ${H}$ can either be thought of in physical space as a vector in that space, or in momentum space as a covector ${w \mapsto \langle v, w \rangle}$ on that space. The fundamental connection between the two is given by the Riesz representation theorem for Hilbert spaces.
6. Semantic-syntactic duality Much more generally still, a mathematical theory can either be described internally or syntactically via its axioms and theorems, or externally or semantically via its models. The fundamental connection between the two perspectives is given by the Gödel completeness theorem.
7. Intrinsic-extrinsic duality A (Riemannian) manifold ${M}$ can either be viewed intrinsically (using only concepts that do not require an ambient space, such as the Levi-Civita connection), or extrinsically, for instance as the level set of some defining function in an ambient space. Some important connections between the two perspectives includes the Nash embedding theorem and the theorema egregium.
8. Group duality A group ${G}$ can be described either via presentations (lists of generators, together with relations between them) or representations (realisations of that group in some more concrete group of transformations). A fundamental connection between the two is Cayley’s theorem. Unfortunately, in general it is difficult to build upon this connection (except in special cases, such as the abelian case), and one cannot always pass effortlessly from one perspective to the other.
9. Pontryagin group duality A (locally compact Hausdorff) abelian group ${G}$ can be described either by listing its elements ${g \in G}$, or by listing the characters ${\chi: G \rightarrow {\bf R}/{\bf Z}}$ (i.e. continuous homomorphisms from ${G}$ to the unit circle, or equivalently elements of ${\hat G}$). The connection between the two is the focus of abstract harmonic analysis.
10. Pontryagin subgroup duality A subgroup ${H}$ of a locally compact abelian group ${G}$ can be described either by generators in ${H}$, or generators in the orthogonal complement ${H^\perp := \{ \xi \in \hat G: \xi \cdot h = 0 \hbox{ for all } h \in H \}}$. One of the fundamental connections between the two is the Poisson summation formula.
11. Fourier duality A (sufficiently nice) function ${f: G \rightarrow {\bf C}}$ on a locally compact abelian group ${G}$ (equipped with a Haar measure ${\mu}$) can either be described in physical space (by its values ${f(x)}$ at each element ${x}$ of ${G}$) or in frequency space (by the values ${\hat f(\xi) = \int_G f(x) e( - \xi \cdot x )\ d\mu(x)}$ at elements ${\xi}$ of the Pontryagin dual ${\hat G}$). The fundamental connection between the two is the Fourier inversion formula.
12. The uncertainty principle The behaviour of a function ${f}$ at physical scales above (resp. below) a certain scale ${R}$ is almost completely controlled by the behaviour of its Fourier transform ${\hat f}$ at frequency scales below (resp. above) the dual scale ${1/R}$ and vice versa, thanks to various mathematical manifestations of the uncertainty principle. (The Poisson summation formula can also be viewed as a variant of this principle, using subgroups instead of scales.)
13. Stone/Gelfand duality A (locally compact Hausdorff) topological space ${X}$ can be viewed in physical space (as a collection of points), or dually, via the ${C^*}$ algebra ${C(X)}$ of continuous complex-valued functions on that space, or (in the case when ${X}$ is compact and totally disconnected) via the boolean algebra of clopen sets (or equivalently, the idempotents of ${C(X)}$). The fundamental connection between the two is given by the Stone representation theorem or the (commutative) Gelfand-Naimark theorem.

I have discussed a fair number of these examples in previous blog posts (indeed, most of the links above are to my own blog). In this post, I would like to discuss the uncertainty principle, that describes the dual relationship between physical space and frequency space. There are various concrete formalisations of this principle, most famously the Heisenberg uncertainty principle and the Hardy uncertainty principle – but in many situations, it is the heuristic formulation of the principle that is more useful and insightful than any particular rigorous theorem that attempts to capture that principle. Unfortunately, it is a bit tricky to formulate this heuristic in a succinct way that covers all the various applications of that principle; the Heisenberg inequality ${\Delta x \cdot \Delta \xi \gtrsim 1}$ is a good start, but it only captures a portion of what the principle tells us. Consider for instance the following (deliberately vague) statements, each of which can be viewed (heuristically, at least) as a manifestation of the uncertainty principle:

1. A function which is band-limited (restricted to low frequencies) is featureless and smooth at fine scales, but can be oscillatory (i.e. containing plenty of cancellation) at coarse scales. Conversely, a function which is smooth at fine scales will be almost entirely restricted to low frequencies.
2. A function which is restricted to high frequencies is oscillatory at fine scales, but is negligible at coarse scales. Conversely, a function which is oscillatory at fine scales will be almost entirely restricted to high frequencies.
3. Projecting a function to low frequencies corresponds to averaging out (or spreading out) that function at fine scales, leaving only the coarse scale behaviour.
4. Projecting a frequency to high frequencies corresponds to removing the averaged coarse scale behaviour, leaving only the fine scale oscillation.
5. The number of degrees of freedom of a function is bounded by the product of its spatial uncertainty and its frequency uncertainty (or more generally, by the volume of the phase space uncertainty). In particular, there are not enough degrees of freedom for a non-trivial function to be simulatenously localised to both very fine scales and very low frequencies.
6. To control the coarse scale (or global) averaged behaviour of a function, one essentially only needs to know the low frequency components of the function (and vice versa).
7. To control the fine scale (or local) oscillation of a function, one only needs to know the high frequency components of the function (and vice versa).
8. Localising a function to a region of physical space will cause its Fourier transform (or inverse Fourier transform) to resemble a plane wave on every dual region of frequency space.
9. Averaging a function along certain spatial directions or at certain scales will cause the Fourier transform to become localised to the dual directions and scales. The smoother the averaging, the sharper the localisation.
10. The smoother a function is, the more rapidly decreasing its Fourier transform (or inverse Fourier transform) is (and vice versa).
11. If a function is smooth or almost constant in certain directions or at certain scales, then its Fourier transform (or inverse Fourier transform) will decay away from the dual directions or beyond the dual scales.
12. If a function has a singularity spanning certain directions or certain scales, then its Fourier transform (or inverse Fourier transform) will decay slowly along the dual directions or within the dual scales.
13. Localisation operations in position approximately commute with localisation operations in frequency so long as the product of the spatial uncertainty and the frequency uncertainty is significantly larger than one.
14. In the high frequency (or large scale) limit, position and frequency asymptotically behave like a pair of classical observables, and partial differential equations asymptotically behave like classical ordinary differential equations. At lower frequencies (or finer scales), the former becomes a “quantum mechanical perturbation” of the latter, with the strength of the quantum effects increasing as one moves to increasingly lower frequencies and finer spatial scales.
15. Etc., etc.
16. Almost all of the above statements generalise to other locally compact abelian groups than ${{\bf R}}$ or ${{\bf R}^n}$, in which the concept of a direction or scale is replaced by that of a subgroup or an approximate subgroup. (In particular, as we will see below, the Poisson summation formula can be viewed as another manifestation of the uncertainty principle.)

I think of all of the above (closely related) assertions as being instances of “the uncertainty principle”, but it seems difficult to combine them all into a single unified assertion, even at the heuristic level; they seem to be better arranged as a cloud of tightly interconnected assertions, each of which is reinforced by several of the others. The famous inequality ${\Delta x \cdot \Delta \xi \gtrsim 1}$ is at the centre of this cloud, but is by no means the only aspect of it.

The uncertainty principle (as interpreted in the above broad sense) is one of the most fundamental principles in harmonic analysis (and more specifically, to the subfield of time-frequency analysis), second only to the Fourier inversion formula (and more generally, Plancherel’s theorem) in importance; understanding this principle is a key piece of intuition in the subject that one has to internalise before one can really get to grips with this subject (and also with closely related subjects, such as semi-classical analysis and microlocal analysis). Like many fundamental results in mathematics, the principle is not actually that difficult to understand, once one sees how it works; and when one needs to use it rigorously, it is usually not too difficult to improvise a suitable formalisation of the principle for the occasion. But, given how vague this principle is, it is difficult to present this principle in a traditional “theorem-proof-remark” manner. Even in the more informal format of a blog post, I was surprised by how challenging it was to describe my own understanding of this piece of mathematics in a linear fashion, despite (or perhaps because of) it being one of the most central and basic conceptual tools in my own personal mathematical toolbox. In the end, I chose to give below a cloud of interrelated discussions about this principle rather than a linear development of the theory, as this seemed to more closely align with the nature of this principle.

In these notes we lay out the basic theory of the Fourier transform, which is of course the most fundamental tool in harmonic analysis and also of major importance in related fields (functional analysis, complex analysis, PDE, number theory, additive combinatorics, representation theory, signal processing, etc.). The Fourier transform, in conjunction with the Fourier inversion formula, allows one to take essentially arbitrary (complex-valued) functions on a group ${G}$ (or more generally, a space ${X}$ that ${G}$ acts on, e.g. a homogeneous space ${G/H}$), and decompose them as a (discrete or continuous) superposition of much more symmetric functions on the domain, such as characters ${\chi: G \rightarrow S^1}$; the precise superposition is given by Fourier coefficients ${\hat f(\xi)}$, which take values in some dual object such as the Pontryagin dual ${\hat G}$ of ${G}$. Characters behave in a very simple manner with respect to translation (indeed, they are eigenfunctions of the translation action), and so the Fourier transform tends to simplify any mathematical problem which enjoys a translation invariance symmetry (or an approximation to such a symmetry), and is somehow “linear” (i.e. it interacts nicely with superpositions). In particular, Fourier analytic methods are particularly useful for studying operations such as convolution ${f, g \mapsto f*g}$ and set-theoretic addition ${A, B \mapsto A+B}$, or the closely related problem of counting solutions to additive problems such as ${x = a_1 + a_2 + a_3}$ or ${x = a_1 - a_2}$, where ${a_1, a_2, a_3}$ are constrained to lie in specific sets ${A_1, A_2, A_3}$. The Fourier transform is also a particularly powerful tool for solving constant-coefficient linear ODE and PDE (because of the translation invariance), and can also approximately solve some variable-coefficient (or slightly non-linear) equations if the coefficients vary smoothly enough and the nonlinear terms are sufficiently tame.

The Fourier transform ${\hat f(\xi)}$ also provides an important new way of looking at a function ${f(x)}$, as it highlights the distribution of ${f}$ in frequency space (the domain of the frequency variable ${\xi}$) rather than physical space (the domain of the physical variable ${x}$). A given property of ${f}$ in the physical domain may be transformed to a rather different-looking property of ${\hat f}$ in the frequency domain. For instance:

• Smoothness of ${f}$ in the physical domain corresponds to decay of ${\hat f}$ in the Fourier domain, and conversely. (More generally, fine scale properties of ${f}$ tend to manifest themselves as coarse scale properties of ${\hat f}$, and conversely.)
• Convolution in the physical domain corresponds to pointwise multiplication in the Fourier domain, and conversely.
• Constant coefficient differential operators such as ${d/dx}$ in the physical domain corresponds to multiplication by polynomials such as ${2\pi i \xi}$ in the Fourier domain, and conversely.
• More generally, translation invariant operators in the physical domain correspond to multiplication by symbols in the Fourier domain, and conversely.
• Rescaling in the physical domain by an invertible linear transformation corresponds to an inverse (adjoint) rescaling in the Fourier domain.
• Restriction to a subspace (or subgroup) in the physical domain corresponds to projection to the dual quotient space (or quotient group) in the Fourier domain, and conversely.
• Frequency modulation in the physical domain corresponds to translation in the frequency domain, and conversely.

(We will make these statements more precise below.)

On the other hand, some operations in the physical domain remain essentially unchanged in the Fourier domain. Most importantly, the ${L^2}$ norm (or energy) of a function ${f}$ is the same as that of its Fourier transform, and more generally the inner product ${\langle f, g \rangle}$ of two functions ${f}$ is the same as that of their Fourier transforms. Indeed, the Fourier transform is a unitary operator on ${L^2}$ (a fact which is variously known as the Plancherel theorem or the Parseval identity). This makes it easier to pass back and forth between the physical domain and frequency domain, so that one can combine techniques that are easy to execute in the physical domain with other techniques that are easy to execute in the frequency domain. (In fact, one can combine the physical and frequency domains together into a product domain known as phase space, and there are entire fields of mathematics (e.g. microlocal analysis, geometric quantisation, time-frequency analysis) devoted to performing analysis on these sorts of spaces directly, but this is beyond the scope of this course.)

In these notes, we briefly discuss the general theory of the Fourier transform, but will mainly focus on the two classical domains for Fourier analysis: the torus ${{\Bbb T}^d := ({\bf R}/{\bf Z})^d}$, and the Euclidean space ${{\bf R}^d}$. For these domains one has the advantage of being able to perform very explicit algebraic calculations, involving concrete functions such as plane waves ${x \mapsto e^{2\pi i x \cdot \xi}}$ or Gaussians ${x \mapsto A^{d/2} e^{-\pi A |x|^2}}$.

[This post was typeset using a LaTeX to WordPress-HTML converter kindly provided to me by Luca Trevisan.]

Many properties of a (sufficiently nice) function ${f: {\mathbb R} \rightarrow {\mathbb C}}$ are reflected in its Fourier transform ${\hat f: {\mathbb R} \rightarrow {\mathbb C}}$, defined by the formula

$\displaystyle \hat f(\xi) := \int_{-\infty}^\infty f(x) e^{-2\pi i x \xi}\ dx. \ \ \ \ \ (1)$

For instance, decay properties of ${f}$ are reflected in smoothness properties of ${\hat f}$, as the following table shows:

 If ${f}$ is… then ${\hat f}$ is… and this relates to… Square-integrable square-integrable Plancherel’s theorem Absolutely integrable continuous Riemann-Lebesgue lemma Rapidly decreasing smooth theory of Schwartz functions Exponentially decreasing analytic in a strip Compactly supported entire and at most exponential growth Paley-Wiener theorem

Another important relationship between a function ${f}$ and its Fourier transform ${\hat f}$ is the uncertainty principle, which roughly asserts that if a function ${f}$ is highly localised in space, then its Fourier transform ${\hat f}$ must be widely dispersed in space, or to put it another way, ${f}$ and ${\hat f}$ cannot both decay too strongly at infinity (except of course in the degenerate case ${f=0}$). There are many ways to make this intuition precise. One of them is the Heisenberg uncertainty principle, which asserts that if we normalise

$\displaystyle \int_{{\mathbb R}} |f(x)|^2\ dx = \int_{\mathbb R} |\hat f(\xi)|^2\ d\xi = 1$

then we must have

$\displaystyle (\int_{\mathbb R} |x|^2 |f(x)|^2\ dx) \cdot (\int_{\mathbb R} |\xi|^2 |\hat f(\xi)|^2\ d\xi)\geq \frac{1}{(4\pi)^2}$

thus forcing at least one of ${f}$ or ${\hat f}$ to not be too concentrated near the origin. This principle can be proven (for sufficiently nice ${f}$, initially) by observing the integration by parts identity

$\displaystyle \langle xf, f' \rangle = \int_{\mathbb R} x f(x) \overline{f'(x)}\ dx = - \frac{1}{2} \int_{\mathbb R} |f(x)|^2\ dx$

and then using Cauchy-Schwarz and the Plancherel identity.

Another well known manifestation of the uncertainty principle is the fact that it is not possible for ${f}$ and ${\hat f}$ to both be compactly supported (unless of course they vanish entirely). This can be in fact be seen from the above table: if ${f}$ is compactly supported, then ${\hat f}$ is an entire function; but the zeroes of a non-zero entire function are isolated, yielding a contradiction unless ${f}$ vanishes. (Indeed, the table also shows that if one of ${f}$ and ${\hat f}$ is compactly supported, then the other cannot have exponential decay.)

On the other hand, we have the example of the Gaussian functions ${f(x) = e^{-\pi a x^2}}$, ${\hat f(\xi) = \frac{1}{\sqrt{a}} e^{-\pi \xi^2/a }}$, which both decay faster than exponentially. The classical Hardy uncertainty principle asserts, roughly speaking, that this is the fastest that ${f}$ and ${\hat f}$ can simultaneously decay:

Theorem 1 (Hardy uncertainty principle) Suppose that ${f}$ is a (measurable) function such that ${|f(x)| \leq C e^{-\pi a x^2 }}$ and ${|\hat f(\xi)| \leq C' e^{-\pi \xi^2/a}}$ for all ${x, \xi}$ and some ${C, C', a > 0}$. Then ${f(x)}$ is a scalar multiple of the gaussian ${e^{-\pi ax^2}}$.

This theorem is proven by complex-analytic methods, in particular the Phragmén-Lindelöf principle; for sake of completeness we give that proof below. But I was curious to see if there was a real-variable proof of the same theorem, avoiding the use of complex analysis. I was able to find the proof of a slightly weaker theorem:

Theorem 2 (Weak Hardy uncertainty principle) Suppose that ${f}$ is a non-zero (measurable) function such that ${|f(x)| \leq C e^{-\pi a x^2 }}$ and ${|\hat f(\xi)| \leq C' e^{-\pi b \xi^2}}$ for all ${x, \xi}$ and some ${C, C', a, b > 0}$. Then ${ab \leq C_0}$ for some absolute constant ${C_0}$.

Note that the correct value of ${C_0}$ should be ${1}$, as is implied by the true Hardy uncertainty principle. Despite the weaker statement, I thought the proof might still might be of interest as it is a little less “magical” than the complex-variable one, and so I am giving it below.

In the next few lectures, we will be studying four major classes of function spaces. In decreasing order of generality, these classes are the topological vector spaces, the normed vector spaces, the Banach spaces, and the Hilbert spaces. In order to motivate the discussion of the more general classes of spaces, we will first focus on the most special class – that of (real and complex) Hilbert spaces. These spaces can be viewed as generalisations of (real and complex) Euclidean spaces such as ${\Bbb R}^n$ and ${\Bbb C}^n$ to infinite-dimensional settings, and indeed much of one’s Euclidean geometry intuition concerning lengths, angles, orthogonality, subspaces, etc. will transfer readily to arbitrary Hilbert spaces; in contrast, this intuition is not always accurate in the more general vector spaces mentioned above. In addition to Euclidean spaces, another fundamental example of Hilbert spaces comes from the Lebesgue spaces $L^2(X,{\mathcal X},\mu)$ of a measure space $(X,{\mathcal X},\mu)$. (There are of course many other Hilbert spaces of importance in complex analysis, harmonic analysis, and PDE, such as Hardy spaces ${\mathcal H}^2$, Sobolev spaces $H^s = W^{s,2}$, and the space $HS$ of Hilbert-Schmidt operators, but we will not discuss those spaces much in this course.  Complex Hilbert spaces also play a fundamental role in the foundations of quantum mechanics, being the natural space to hold all the possible states of a quantum system (possibly after projectivising the Hilbert space), but we will not discuss this subject here.)

Hilbert spaces are the natural abstract framework in which to study two important (and closely related) concepts: orthogonality and unitarity, allowing us to generalise familiar concepts and facts from Euclidean geometry such as the Cartesian coordinate system, rotations and reflections, and the Pythagorean theorem to Hilbert spaces. (For instance, the Fourier transform is a unitary transformation and can thus be viewed as a kind of generalised rotation.) Furthermore, the Hodge duality on Euclidean spaces has a partial analogue for Hilbert spaces, namely the Riesz representation theorem for Hilbert spaces, which makes the theory of duality and adjoints for Hilbert spaces especially simple (when compared with the more subtle theory of duality for, say, Banach spaces). Much later (next quarter, in fact), we will see that this duality allows us to extend the spectral theorem for self-adjoint matrices to that of self-adjoint operators on a Hilbert space.

These notes are only the most basic introduction to the theory of Hilbert spaces.  In particular, the theory of linear transformations between two Hilbert spaces, which is perhaps the most important aspect of the subject, is not covered much at all here (but I hope to discuss it further in future lectures.)

Title: Use basic examples to calibrate exponents

Motivation: In the more quantitative areas of mathematics, such as analysis and combinatorics, one has to frequently keep track of a large number of exponents in one’s identities, inequalities, and estimates.  For instance, if one is studying a set of N elements, then many expressions that one is faced with will often involve some power $N^p$ of N; if one is instead studying a function f on a measure space X, then perhaps it is an $L^p$ norm $\|f\|_{L^p(X)}$ which will appear instead.  The exponent $p$ involved will typically evolve slowly over the course of the argument, as various algebraic or analytic manipulations are applied.  In some cases, the exact value of this exponent is immaterial, but at other times it is crucial to have the correct value of $p$ at hand.   One can (and should) of course carefully go through one’s arguments line by line to work out the exponents correctly, but it is all too easy to make a sign error or other mis-step at one of the lines, causing all the exponents on subsequent lines to be incorrect.  However, one can guard against this (and avoid some tedious line-by-line exponent checking) by continually calibrating these exponents at key junctures of the arguments by using basic examples of the object of study (sets, functions, graphs, etc.) as test cases.  This is a simple trick, but it lets one avoid many unforced errors with exponents, and also lets one compute more rapidly.

Quick description: When trying to quickly work out what an exponent p in an estimate, identity, or inequality should be without deriving that statement line-by-line, test that statement with a simple example which has non-trivial behaviour with respect to that exponent p, but trivial behaviour with respect to as many other components of that statement as one is able to manage.   The “non-trivial” behaviour should be parametrised by some very large or very small parameter.  By matching the dependence on this parameter on both sides of the estimate, identity, or inequality, one should recover p (or at least a good prediction as to what p should be).

General discussion: The test examples should be as basic as possible; ideally they should have trivial behaviour in all aspects except for one feature that relates to the exponent p that one is trying to calibrate, thus being only “barely” non-trivial.   When the object of study is a function, then (appropriately rescaled, or otherwise modified) bump functions are very typical test objects, as are Dirac masses, constant functions, Gaussians, or other functions that are simple and easy to compute with.  In additive combinatorics, when the object of study is a subset of a group, then subgroups, arithmetic progressions, or random sets are typical test objects.  In graph theory, typical examples of test objects include complete graphs, complete bipartite graphs, and random graphs. And so forth.

This trick is closely related to that of using dimensional analysis to recover exponents; indeed, one can view dimensional analysis as the special case of exponent calibration when using test objects which are non-trivial in one dimensional aspect (e.g. they exist at a single very large or very small length scale) but are otherwise of a trivial or “featureless” nature.   But the calibration trick is more general, as it can involve parameters (such as probabilities, angles, or eccentricities) which are not commonly associated with the physical concept of a dimension.  And personally, I find example-based calibration to be a much more satisfying (and convincing) explanation of an exponent than a calibration arising from formal dimensional analysis.

When one is trying to calibrate an inequality or estimate, one should try to pick a basic example which one expects to saturate that inequality or estimate, i.e. an example for which the inequality is close to being an equality.  Otherwise, one would only expect to obtain some partial information on the desired exponent p (e.g. a lower bound or an upper bound only).  Knowing the examples that saturate an estimate that one is trying to prove is also useful for several other reasons – for instance, it strongly suggests that any technique which is not efficient when applied to the saturating example, is unlikely to be strong enough to prove the estimate in general, thus eliminating fruitless approaches to a problem and (hopefully) refocusing one’s attention on those strategies which actually have a chance of working.

Calibration is best used for the type of quick-and-dirty calculations one uses when trying to rapidly map out an argument that one has roughly worked out already, but without precise details; in particular, I find it particularly useful when writing up a rapid prototype.  When the time comes to write out the paper in full detail, then of course one should instead carefully work things out line by line, but if all goes well, the exponents obtained in that process should match up with the preliminary guesses for those exponents obtained by calibration, which adds confidence that there are no exponent errors have been committed.