You are currently browsing the category archive for the ‘math.CV’ category.

An extremely large portion of mathematics is concerned with locating solutions to equations such as

$\displaystyle f(x) = 0$

or

$\displaystyle \Phi(x) = x \ \ \ \ \ (1)$

for ${x}$ in some suitable domain space (either finite-dimensional or infinite-dimensional), and various maps ${f}$ or ${\Phi}$. To solve the fixed point iteration equation (1), the simplest general method available is the fixed point iteration method: one starts with an initial approximate solution ${x_0}$ to (1), so that ${\Phi(x_0) \approx x_0}$, and then recursively constructs the sequence ${x_1, x_2, x_3, \dots}$ by ${x_n := \Phi(x_{n-1})}$. If ${\Phi}$ behaves enough like a “contraction”, and the domain is complete, then one can expect the ${x_n}$ to converge to a limit ${x}$, which should then be a solution to (1). For instance, if ${\Phi: X \rightarrow X}$ is a map from a metric space ${X = (X,d)}$ to itself, which is a contraction in the sense that

$\displaystyle d( \Phi(x), \Phi(y) ) \leq (1-\eta) d(x,y)$

for all ${x,y \in X}$ and some ${\eta>0}$, then with ${x_n}$ as above we have

$\displaystyle d( x_{n+1}, x_n ) \leq (1-\eta) d(x_n, x_{n-1} )$

for any ${n}$, and so the distances ${d(x_n, x_{n-1} )}$ between successive elements of the sequence decay at at least a geometric rate. This leads to the contraction mapping theorem, which has many important consequences, such as the inverse function theorem and the Picard existence theorem.

A slightly more complicated instance of this strategy arises when trying to linearise a complex map ${f: U \rightarrow {\bf C}}$ defined in a neighbourhood ${U}$ of a fixed point. For simplicity we normalise the fixed point to be the origin, thus ${0 \in U}$ and ${f(0)=0}$. When studying the complex dynamics ${f^2 = f \circ f}$, ${f^3 = f \circ f \circ f}$, ${\dots}$ of such a map, it can be useful to try to conjugate ${f}$ to another function ${g = \psi^{-1} \circ f \circ \psi}$, where ${\psi}$ is a holomorphic function defined and invertible near ${0}$ with ${\psi(0)=0}$, since the dynamics of ${g}$ will be conjguate to that of ${f}$. Note that if ${f(0)=0}$ and ${f'(0)=\lambda}$, then from the chain rule any conjugate ${g}$ of ${f}$ will also have ${g(0)=0}$ and ${g'(0)=\lambda}$. Thus, the “simplest” function one can hope to conjugate ${f}$ to is the linear function ${z \mapsto \lambda z}$. Let us say that ${f}$ is linearisable (around ${0}$) if it is conjugate to ${z \mapsto \lambda z}$ in some neighbourhood of ${0}$. Equivalently, ${f}$ is linearisable if there is a solution to the Schröder equation

$\displaystyle f( \psi(z) ) = \psi(\lambda z) \ \ \ \ \ (2)$

for some ${\psi: U' \rightarrow {\bf C}}$ defined and invertible in a neighbourhood ${U'}$ of ${0}$ with ${\psi(0)=0}$, and all ${z}$ sufficiently close to ${0}$. (The Schröder equation is normalised somewhat differently in the literature, but this form is equivalent to the usual form, at least when ${\lambda}$ is non-zero.) Note that if ${\psi}$ solves the above equation, then so does ${z \mapsto \psi(cz)}$ for any non-zero ${c}$, so we may normalise ${\psi'(0)=1}$ in addition to ${\psi(0)=0}$, which also ensures local invertibility from the inverse function theorem. (Note from winding number considerations that ${\psi}$ cannot be invertible near zero if ${\psi'(0)}$ vanishes.)

We have the following basic result of Koenigs:

Theorem 1 (Koenig’s linearisation theorem) Let ${f: U \rightarrow {\bf C}}$ be a holomorphic function defined near ${0}$ with ${f(0)=0}$ and ${f'(0)=\lambda}$. If ${0 < |\lambda| < 1}$ (attracting case) or ${1 < |\lambda| < \infty}$ (repelling case), then ${f}$ is linearisable near zero.

Proof: Observe that if ${f, \psi, \lambda}$ solve (2), then ${f^{-1}, \psi^{-1}, \lambda^{-1}}$ solve (2) also (in a sufficiently small neighbourhood of zero). Thus we may reduce to the attractive case ${0 < |\lambda| < 1}$.

Let ${r>0}$ be a sufficiently small radius, and let ${X}$ denote the space of holomorphic functions ${\psi: B(0,r) \rightarrow {\bf C}}$ on the complex disk ${B(0,r) := \{z \in {\bf C}: |z| < r \}}$ with ${\psi(0)=0}$ and ${\psi'(0)=1}$. We can view the Schröder equation (2) as a fixed point equation

$\displaystyle \psi = \Phi(\psi)$

where ${\Phi: X' \rightarrow X}$ is the partially defined function on ${X}$ that maps a function ${\psi: B(0,r) \rightarrow {\bf C}}$ to the function ${\Phi(\psi): B(0,r) \rightarrow {\bf C}}$ defined by

$\displaystyle \Phi(\psi)(z) := f^{-1}( \psi( \lambda z ) ),$

assuming that ${f^{-1}}$ is well-defined on the range of ${\psi(B(0,\lambda r))}$ (this is why ${\Phi}$ is only partially defined).

We can solve this equation by the fixed point iteration method, if ${r}$ is small enough. Namely, we start with ${\psi_0: B(0,r) \rightarrow {\bf C}}$ being the identity map, and set ${\psi_1 := \Phi(\psi_0), \psi_2 := \Phi(\psi_1)}$, etc. We equip ${X}$ with the uniform metric ${d( \psi, \tilde \psi ) := \sup_{z \in B(0,r)} |\psi(z) - \tilde \psi(z)|}$. Observe that if ${d( \psi, \psi_0 ), d(\tilde \psi, \psi_0) \leq r}$, and ${r}$ is small enough, then ${\psi, \tilde \psi}$ takes values in ${B(0,2r)}$, and ${\Phi(\psi), \Phi(\tilde \psi)}$ are well-defined and lie in ${X}$. Also, since ${f^{-1}}$ is smooth and has derivative ${\lambda^{-1}}$ at ${0}$, we have

$\displaystyle |f^{-1}(z) - f^{-1}(w)| \leq (1+\varepsilon) |\lambda|^{-1} |z-w|$

if ${z, w \in B(0,r)}$, ${\varepsilon>0}$ and ${r}$ is sufficiently small depending on ${\varepsilon}$. This is not yet enough to establish the required contraction (thanks to Mario Bonk for pointing this out); but observe that the function ${\frac{\psi(z)-\tilde \psi(z)}{z^2}}$ is holomorphic on ${B(0,r)}$ and bounded by ${d(\psi,\tilde \psi)/r^2}$ on the boundary of this ball (or slightly within this boundary), so by the maximum principle we see that

$\displaystyle |\frac{\psi(z)-\tilde \psi(z)}{z^2}| \leq \frac{1}{r^2} d(\psi,\tilde \psi)$

on all of ${B(0,r)}$, and in particular

$\displaystyle |\psi(z)-\tilde \psi(z)| \leq |\lambda|^2 d(\psi,\tilde \psi)$

on ${B(0,\lambda r)}$. Putting all this together, we see that

$\displaystyle d( \Phi(\psi), \Phi(\tilde \psi)) \leq (1+\varepsilon) |\lambda| d(\psi, \tilde \psi);$

since ${|\lambda|<1}$, we thus obtain a contraction on the ball ${\{ \psi \in X: d(\psi,\psi_0) \leq r \}}$ if ${\varepsilon}$ is small enough (and ${r}$ sufficiently small depending on ${\varepsilon}$). From this (and the completeness of ${X}$, which follows from Morera’s theorem) we see that the iteration ${\psi_n}$ converges (exponentially fast) to a limit ${\psi \in X}$ which is a fixed point of ${\Phi}$, and thus solves Schröder’s equation, as required. $\Box$

Koenig’s linearisation theorem leaves open the indifferent case when ${|\lambda|=1}$. In the rationally indifferent case when ${\lambda^n=1}$ for some natural number ${n}$, there is an obvious obstruction to linearisability, namely that ${f^n = 1}$ (in particular, linearisation is not possible in this case when ${f}$ is a non-trivial rational function). An obstruction is also present in some irrationally indifferent cases (where ${|\lambda|=1}$ but ${\lambda^n \neq 1}$ for any natural number ${n}$), if ${\lambda}$ is sufficiently close to various roots of unity; the first result of this form is due to Cremer, and the optimal result of this type for quadratic maps was established by Yoccoz. In the other direction, we have the following result of Siegel:

Theorem 2 (Siegel’s linearisation theorem) Let ${f: U \rightarrow {\bf C}}$ be a holomorphic function defined near ${0}$ with ${f(0)=0}$ and ${f'(0)=\lambda}$. If ${|\lambda|=1}$ and one has the Diophantine condition ${\frac{1}{|\lambda^n-1|} \leq C n^C}$ for all natural numbers ${n}$ and some constant ${C>0}$, then ${f}$ is linearisable at ${0}$.

The Diophantine condition can be relaxed to a more general condition involving the rational exponents of the phase ${\theta}$ of ${\lambda = e^{2\pi i \theta}}$; this was worked out by Brjuno, with the condition matching the one later obtained by Yoccoz. Amusingly, while the set of Diophantine numbers (and hence the set of linearisable ${\lambda}$) has full measure on the unit circle, the set of non-linearisable ${\lambda}$ is generic (the complement of countably many nowhere dense sets) due to the above-mentioned work of Cremer, leading to a striking disparity between the measure-theoretic and category notions of “largeness”.

Siegel’s theorem does not seem to be provable using a fixed point iteration method. However, it can be established by modifying another basic method to solve equations, namely Newton’s method. Let us first review how this method works to solve the equation ${f(x)=0}$ for some smooth function ${f: I \rightarrow {\bf R}}$ defined on an interval ${I}$. We suppose we have some initial approximant ${x_0 \in I}$ to this equation, with ${f(x_0)}$ small but not necessarily zero. To make the analysis more quantitative, let us suppose that the interval ${[x_0-r_0,x_0+r_0]}$ lies in ${I}$ for some ${r_0>0}$, and we have the estimates

$\displaystyle |f(x_0)| \leq \delta_0 r_0$

$\displaystyle |f'(x)| \geq \eta_0$

$\displaystyle |f''(x)| \leq \frac{1}{\eta_0 r_0}$

for some ${\delta_0 > 0}$ and ${0 < \eta_0 < 1/2}$ and all ${x \in [x_0-r_0,x_0+r_0]}$ (the factors of ${r_0}$ are present to make ${\delta_0,\eta_0}$ “dimensionless”).

Lemma 3 Under the above hypotheses, we can find ${x_1}$ with ${|x_1 - x_0| \leq \eta_0 r_0}$ such that

$\displaystyle |f(x_1)| \ll \delta_0^2 \eta_0^{-O(1)} r_0.$

In particular, setting ${r_1 := (1-\eta_0) r_0}$, ${\eta_1 := \eta_0/2}$, and ${\delta_1 = O(\delta_0^2 \eta_0^{-O(1)})}$, we have ${[x_1-r_1,x_1+r_1] \subset [x_0-r_0,x_0+r_0] \subset I}$, and

$\displaystyle |f(x_1)| \leq \delta_1 r_1$

$\displaystyle |f'(x)| \geq \eta_1$

$\displaystyle |f''(x)| \leq \frac{1}{\eta_1 r_1}$

for all ${x \in [x_1-r_1,x_1+r_1]}$.

The crucial point here is that the new error ${\delta_1}$ is roughly the square of the previous error ${\delta_0}$. This leads to extremely fast (double-exponential) improvement in the error upon iteration, which is more than enough to absorb the exponential losses coming from the ${\eta_0^{-O(1)}}$ factor.

Proof: If ${\delta_0 > c \eta_0^{C}}$ for some absolute constants ${C,c>0}$ then we may simply take ${x_0=x_1}$, so we may assume that ${\delta_0 \leq c \eta_0^{C}}$ for some small ${c>0}$ and large ${C>0}$. Using the Newton approximation ${f(x_0+h) \approx f(x_0) + h f'(x_0)}$ we are led to the choice

$\displaystyle x_1 := x_0 - \frac{f(x_0)}{f'(x_0)}$

for ${x_1}$. From the hypotheses on ${f}$ and the smallness hypothesis on ${\delta}$ we certainly have ${|x_1-x_0| \leq \eta_0 r_0}$. From Taylor’s theorem with remainder we have

$\displaystyle f(x_1) = f(x_0) - \frac{f(x_0)}{f'(x_0)} f'(x_0) + O( \frac{1}{\eta_0 r_0} |\frac{f(x_0)}{f'(x_0)}|^2 )$

$\displaystyle = O( \frac{1}{\eta_0 r_0} (\frac{\delta_0 r_0}{\eta_0})^2 )$

and the claim follows. $\Box$

We can iterate this procedure; starting with ${x_0,\eta_0,r_0,\delta_0}$ as above, we obtain a sequence of nested intervals ${[x_n-r_n,x_n+r_n]}$ with ${f(x_n)| \leq \delta_n}$, and with ${\eta_n,r_n,\delta_n,x_n}$ evolving by the recursive equations and estimates

$\displaystyle \eta_n = \eta_{n-1} / 2$

$\displaystyle r_n = (1 - \eta_{n-1}) r_{n-1}$

$\displaystyle \delta_n = O( \delta_{n-1}^2 \eta_{n-1}^{-O(1)} )$

$\displaystyle |x_n - x_{n-1}| \leq \eta_{n-1} r_{n-1}.$

If ${\delta_0}$ is sufficiently small depending on ${\eta_0}$, we see that ${\delta_n}$ converges rapidly to zero (indeed, we can inductively obtain a bound of the form ${\delta_n \leq \eta_0^{C (2^n + n)}}$ for some large absolute constant ${C}$ if ${\delta_0}$ is small enough), and ${x_n}$ converges to a limit ${x \in I}$ which then solves the equation ${f(x)=0}$ by the continuity of ${f}$.

As I recently learned from Zhiqiang Li, a similar scheme works to prove Siegel’s theorem, as can be found for instance in this text of Carleson and Gamelin. The key is the following analogue of Lemma 3.

Lemma 4 Let ${\lambda}$ be a complex number with ${|\lambda|=1}$ and ${\frac{1}{|\lambda^n-1|} \ll n^{O(1)}}$ for all natural numbers ${n}$. Let ${r_0>0}$, and let ${f_0: B(0,r_0) \rightarrow {\bf C}}$ be a holomorphic function with ${f_0(0)=0}$, ${f'_0(0)=\lambda}$, and

$\displaystyle |f_0(z) - \lambda z| \leq \delta_0 r_0 \ \ \ \ \ (3)$

for all ${z \in B(0,r_0)}$ and some ${\delta_0>0}$. Let ${0 < \eta_0 \leq 1/2}$, and set ${r_1 := (1-\eta_0) r_0}$. Then there exists an injective holomorphic function ${\psi_0: B(0, r_1) \rightarrow B(0, r_0)}$ and a holomorphic function ${f_1: B(0,r_1) \rightarrow {\bf C}}$ such that

$\displaystyle f_0( \psi_1(z) ) = \psi_1(f_1(z)) \ \ \ \ \ (4)$

for all ${z \in B(0,r_1)}$, and such that

$\displaystyle |\psi_1(z) - z| \ll \delta_0 \eta_0^{-O(1)} r_1$

and

$\displaystyle |f_1(z) - \lambda z| \leq \delta_1 r_1$

for all ${z \in B(0,r_1)}$ and some ${\delta_1 = O(\delta_0^2 \eta_0^{-O(1)})}$.

Proof: By scaling we may normalise ${r_0=1}$. If ${\delta_0 > c \eta_0^C}$ for some constants ${c,C>0}$, then we can simply take ${\psi_1}$ to be the identity and ${f_1=f_0}$, so we may assume that ${\delta_0 \leq c \eta_0^C}$ for some small ${c>0}$ and large ${C>0}$.

To motivate the choice of ${\psi_1}$, we write ${f_0(z) = \lambda z + \hat f_0(z)}$ and ${\psi_1(z) = z + \hat \psi(z)}$, with ${\hat f_0}$ and ${\hat \psi_1}$ viewed as small. We would like to have ${f_0(\psi_1(z)) \approx \psi_1(\lambda z)}$, which expands as

$\displaystyle \lambda z + \lambda \hat \psi_1(z) + \hat f_0( z + \hat \psi_1(z) ) \approx \lambda z + \hat \psi_1(\lambda z).$

As ${\hat f_0}$ and ${\hat \psi}$ are both small, we can heuristically approximate ${\hat f_0(z + \hat \psi_1(z) ) \approx \hat f_0(z)}$ up to quadratic errors (compare with the Newton approximation ${f(x_0+h) \approx f(x_0) + h f'(x_0)}$), and arrive at the equation

$\displaystyle \hat \psi_1(\lambda z) - \lambda \hat \psi_1(z) = \hat f_0(z). \ \ \ \ \ (5)$

This equation can be solved by Taylor series; the function ${\hat f_0}$ vanishes to second order at the origin and thus has a Taylor expansion

$\displaystyle \hat f_0(z) = \sum_{n=2}^\infty a_n z^n$

and then ${\hat \psi_1}$ has a Taylor expansion

$\displaystyle \hat \psi_1(z) = \sum_{n=2}^\infty \frac{a_n}{\lambda^n - \lambda} z^n.$

We take this as our definition of ${\hat \psi_1}$, define ${\psi_1(z) := z + \hat \psi_1(z)}$, and then define ${f_1}$ implicitly via (4).

Let us now justify that this choice works. By (3) and the generalised Cauchy integral formula, we have ${|a_n| \leq \delta_0}$ for all ${n}$; by the Diophantine assumption on ${\lambda}$, we thus have ${|\frac{a_n}{\lambda^n - \lambda}| \ll \delta_0 n^{O(1)}}$. In particular, ${\hat \psi_1}$ converges on ${B(0,1)}$, and on the disk ${B(0, (1-\eta_0/4))}$ (say) we have the bounds

$\displaystyle |\hat \psi_1(z)|, |\hat \psi'_1(z)| \ll \delta_0 \sum_{n=2}^\infty n^{O(1)} (1-\eta_0/4)^n \ll \eta_0^{-O(1)} \delta_0. \ \ \ \ \ (6)$

In particular, as ${\delta_0}$ is so small, we see that ${\psi_1}$ maps ${B(0, (1-\eta_0/4))}$ injectively to ${B(0,1)}$ and ${B(0,1-\eta_0)}$ to ${B(0,1-3\eta_0/4)}$, and the inverse ${\psi_1^{-1}}$ maps ${B(0, (1-\eta_0/2))}$ to ${B(0, (1-\eta_0/4))}$. From (3) we see that ${f_0}$ maps ${B(0,1-3\eta_0/4)}$ to ${B(0,1-\eta_0/2)}$, and so if we set ${f_1: B(0,1-\eta_0) \rightarrow B(0,1-\eta_0/4)}$ to be the function ${f_1 := \psi_1^{-1} \circ f_0 \circ \psi_1}$, then ${f_1}$ is a holomorphic function obeying (4). Expanding (4) in terms of ${\hat f_0}$ and ${\hat \psi_1}$ as before, and also writing ${f_1(z) = \lambda z + \hat f_1(z)}$, we have

$\displaystyle \lambda z + \lambda \hat \psi_1(z) + \hat f_0( z + \hat \psi_1(z) ) = \lambda z + \hat f_1(z) + \hat \psi_1(\lambda z + \hat f_1(z))$

for ${z \in B(0, 1-\eta_0)}$, which by (5) simplifies to

$\displaystyle \hat f_1(z) = \hat f_0( z + \hat \psi_1(z) ) - \hat f_0(z) + \hat \psi_1(\lambda z) - \hat \psi_1(\lambda z + \hat f_1(z)).$

From (6), the fundamental theorem of calculus, and the smallness of ${\delta_0}$ we have

$\displaystyle |\hat \psi_1(\lambda z) - \hat \psi_1(\lambda z + \hat f_1(z))| \leq \frac{1}{2} |\hat f_1(z)|$

and thus

$\displaystyle |\hat f_1(z)| \leq 2 |\hat f_0( z + \hat \psi_1(z) ) - \hat f_0(z)|.$

From (3) and the Cauchy integral formula we have ${\hat f'_0(z) = O( \delta_0 \eta_0^{-O(1)})}$ on (say) ${B(0,1-\eta_0/4)}$, and so from (6) and the fundamental theorem of calculus we conclude that

$\displaystyle |\hat f_1(z)| \ll \delta_0^2 \eta_0^{-O(1)}$

on ${B(0,1-\eta_0)}$, and the claim follows. $\Box$

If we set ${\eta_0 := 1/2}$, ${f_0 := f}$, and ${\delta_0>0}$ to be sufficiently small, then (since ${f(z)-\lambda z}$ vanishes to second order at the origin), the hypotheses of this lemma will be obeyed for some sufficiently small ${r_0}$. Iterating the lemma (and halving ${\eta_0}$ repeatedly), we can then find sequences ${\eta_n, \delta_n, r_n > 0}$, injective holomorphic functions ${\psi_n: B(0,r_n) \rightarrow B(0,r_{n-1})}$ and holomorphic functions ${f_n: B(0,r_n) \rightarrow {\bf C}}$ such that one has the recursive identities and estimates

$\displaystyle \eta_n = \eta_{n-1} / 2$

$\displaystyle r_n = (1 - \eta_{n-1}) r_{n-1}$

$\displaystyle \delta_n = O( \delta_{n-1}^2 \eta_{n-1}^{-O(1)} )$

$\displaystyle |\psi_n(z) - z| \ll \delta_{n-1} \eta_{n-1}^{-O(1)} r_n$

$\displaystyle |f_n(z) - \lambda z| \leq \delta_n r_n$

$\displaystyle f_{n-1}( \psi_n(z) ) = \psi_n(f_n(z))$

for all ${n \geq 1}$ and ${z \in B(0,r_n)}$. By construction, ${r_n}$ decreases to a positive radius ${r_\infty}$ that is a constant multiple of ${r_0}$, while (for ${\delta_0}$ small enough) ${\delta_n}$ converges double-exponentially to zero, so in particular ${f_n(z)}$ converges uniformly to ${\lambda z}$ on ${B(0,r_\infty)}$. Also, ${\psi_n}$ is close enough to the identity, the compositions ${\Psi_n := \psi_1 \circ \dots \circ \psi_n}$ are uniformly convergent on ${B(0,r_\infty/2)}$ with ${\Psi_n(0)=0}$ and ${\Psi'_n(0)=1}$. From this we have

$\displaystyle f( \Psi_n(z) ) = \Psi_n(f_n(z))$

on ${B(0,r_\infty/4)}$, and on taking limits using Morera’s theorem we obtain a holomorphic function ${\Psi}$ defined near ${0}$ with ${\Psi(0)=0}$, ${\Psi'(0)=1}$, and

$\displaystyle f( \Psi(z) ) = \Psi(\lambda z),$

obtaining the required linearisation.

Remark 5 The idea of using a Newton-type method to obtain error terms that decay double-exponentially, and can therefore absorb exponential losses in the iteration, also occurs in KAM theory and in Nash-Moser iteration, presumably due to Siegel’s influence on Moser. (I discuss Nash-Moser iteration in this note that I wrote back in 2006.)

In Notes 2, the Riemann zeta function ${\zeta}$ (and more generally, the Dirichlet ${L}$-functions ${L(\cdot,\chi)}$) were extended meromorphically into the region ${\{ s: \hbox{Re}(s) > 0 \}}$ in and to the right of the critical strip. This is a sufficient amount of meromorphic continuation for many applications in analytic number theory, such as establishing the prime number theorem and its variants. The zeroes of the zeta function in the critical strip ${\{ s: 0 < \hbox{Re}(s) < 1 \}}$ are known as the non-trivial zeroes of ${\zeta}$, and thanks to the truncated explicit formulae developed in Notes 2, they control the asymptotic distribution of the primes (up to small errors).

The ${\zeta}$ function obeys the trivial functional equation

$\displaystyle \zeta(\overline{s}) = \overline{\zeta(s)} \ \ \ \ \ (1)$

for all ${s}$ in its domain of definition. Indeed, as ${\zeta(s)}$ is real-valued when ${s}$ is real, the function ${\zeta(s) - \overline{\zeta(\overline{s})}}$ vanishes on the real line and is also meromorphic, and hence vanishes everywhere. Similarly one has the functional equation

$\displaystyle \overline{L(s, \chi)} = L(\overline{s}, \overline{\chi}). \ \ \ \ \ (2)$

From these equations we see that the zeroes of the zeta function are symmetric across the real axis, and the zeroes of ${L(\cdot,\chi)}$ are the reflection of the zeroes of ${L(\cdot,\overline{\chi})}$ across this axis.

It is a remarkable fact that these functions obey an additional, and more non-trivial, functional equation, this time establishing a symmetry across the critical line ${\{ s: \hbox{Re}(s) = \frac{1}{2} \}}$ rather than the real axis. One consequence of this symmetry is that the zeta function and ${L}$-functions may be extended meromorphically to the entire complex plane. For the zeta function, the functional equation was discovered by Riemann, and reads as follows:

Theorem 1 (Functional equation for the Riemann zeta function) The Riemann zeta function ${\zeta}$ extends meromorphically to the entire complex plane, with a simple pole at ${s=1}$ and no other poles. Furthermore, one has the functional equation

$\displaystyle \zeta(s) = \alpha(s) \zeta(1-s) \ \ \ \ \ (3)$

or equivalently

$\displaystyle \zeta(1-s) = \alpha(1-s) \zeta(s) \ \ \ \ \ (4)$

for all complex ${s}$ other than ${s=0,1}$, where ${\alpha}$ is the function

$\displaystyle \alpha(s) := 2^s \pi^{s-1} \sin( \frac{\pi s}{2}) \Gamma(1-s). \ \ \ \ \ (5)$

Here ${\cos(z) := \frac{e^z + e^{-z}}{2}}$, ${\sin(z) := \frac{e^{-z}-e^{-z}}{2i}}$ are the complex-analytic extensions of the classical trigionometric functions ${\cos(x), \sin(x)}$, and ${\Gamma}$ is the Gamma function, whose definition and properties we review below the fold.

The functional equation can be placed in a more symmetric form as follows:

Corollary 2 (Functional equation for the Riemann xi function) The Riemann xi function

$\displaystyle \xi(s) := \frac{1}{2} s(s-1) \pi^{-s/2} \Gamma(\frac{s}{2}) \zeta(s) \ \ \ \ \ (6)$

is analytic on the entire complex plane ${{\bf C}}$ (after removing all removable singularities), and obeys the functional equations

$\displaystyle \xi(\overline{s}) = \overline{\xi(s)}$

and

$\displaystyle \xi(s) = \xi(1-s). \ \ \ \ \ (7)$

In particular, the zeroes of ${\xi}$ consist precisely of the non-trivial zeroes of ${\zeta}$, and are symmetric about both the real axis and the critical line. Also, ${\xi}$ is real-valued on the critical line and on the real axis.

Corollary 2 is an easy consequence of Theorem 1 together with the duplication theorem for the Gamma function, and the fact that ${\zeta}$ has no zeroes to the right of the critical strip, and is left as an exercise to the reader (Exercise 19). The functional equation in Theorem 1 has many proofs, but most of them are related in on way or another to the Poisson summation formula

$\displaystyle \sum_n f(n) = \sum_m \hat f(2\pi m) \ \ \ \ \ (8)$

(Theorem 34 from Supplement 2, at least in the case when ${f}$ is twice continuously differentiable and compactly supported), which can be viewed as a Fourier-analytic link between the coarse-scale distribution of the integers and the fine-scale distribution of the integers. Indeed, there is a quick heuristic proof of the functional equation that comes from formally applying the Poisson summation formula to the function ${1_{x>0} \frac{1}{x^s}}$, and noting that the functions ${x \mapsto \frac{1}{x^s}}$ and ${\xi \mapsto \frac{1}{\xi^{1-s}}}$ are formally Fourier transforms of each other, up to some Gamma function factors, as well as some trigonometric factors arising from the distinction between the real line and the half-line. Such a heuristic proof can indeed be made rigorous, and we do so below the fold, while also providing Riemann’s two classical proofs of the functional equation.

From the functional equation (and the poles of the Gamma function), one can see that ${\zeta}$ has trivial zeroes at the negative even integers ${-2,-4,-6,\dots}$, in addition to the non-trivial zeroes in the critical strip. More generally, the following table summarises the zeroes and poles of the various special functions appearing in the functional equation, after they have been meromorphically extended to the entire complex plane, and with zeroes classified as “non-trivial” or “trivial” depending on whether they lie in the critical strip or not. (Exponential functions such as ${2^{s-1}}$ or ${\pi^{-s}}$ have no zeroes or poles, and will be ignored in this table; the zeroes and poles of rational functions such as ${s(s-1)}$ are self-evident and will also not be displayed here.)

 Function Non-trivial zeroes Trivial zeroes Poles ${\zeta(s)}$ Yes ${-2,-4,-6,\dots}$ ${1}$ ${\zeta(1-s)}$ Yes ${1,3,5,\dots}$ ${0}$ ${\sin(\pi s/2)}$ No Even integers No ${\cos(\pi s/2)}$ No Odd integers No ${\sin(\pi s)}$ No Integers No ${\Gamma(s)}$ No No ${0,-1,-2,\dots}$ ${\Gamma(s/2)}$ No No ${0,-2,-4,\dots}$ ${\Gamma(1-s)}$ No No ${1,2,3,\dots}$ ${\Gamma((1-s)/2)}$ No No ${2,4,6,\dots}$ ${\xi(s)}$ Yes No No

Among other things, this table indicates that the Gamma and trigonometric factors in the functional equation are tied to the trivial zeroes and poles of zeta, but have no direct bearing on the distribution of the non-trivial zeroes, which is the most important feature of the zeta function for the purposes of analytic number theory, beyond the fact that they are symmetric about the real axis and critical line. In particular, the Riemann hypothesis is not going to be resolved just from further analysis of the Gamma function!

The zeta function computes the “global” sum ${\sum_n \frac{1}{n^s}}$, with ${n}$ ranging all the way from ${1}$ to infinity. However, by some Fourier-analytic (or complex-analytic) manipulation, it is possible to use the zeta function to also control more “localised” sums, such as ${\sum_n \frac{1}{n^s} \psi(\log n - \log N)}$ for some ${N \gg 1}$ and some smooth compactly supported function ${\psi: {\bf R} \rightarrow {\bf C}}$. It turns out that the functional equation (3) for the zeta function localises to this context, giving an approximate functional equation which roughly speaking takes the form

$\displaystyle \sum_n \frac{1}{n^s} \psi( \log n - \log N ) \approx \alpha(s) \sum_m \frac{1}{m^{1-s}} \psi( \log M - \log m )$

whenever ${s=\sigma+it}$ and ${NM = \frac{|t|}{2\pi}}$; see Theorem 38 below for a precise formulation of this equation. Unsurprisingly, this form of the functional equation is also very closely related to the Poisson summation formula (8), indeed it is essentially a special case of that formula (or more precisely, of the van der Corput ${B}$-process). This useful identity relates long smoothed sums of ${\frac{1}{n^s}}$ to short smoothed sums of ${\frac{1}{m^{1-s}}}$ (or vice versa), and can thus be used to shorten exponential sums involving terms such as ${\frac{1}{n^s}}$, which is useful when obtaining some of the more advanced estimates on the Riemann zeta function.

We will give two other basic uses of the functional equation. The first is to get a good count (as opposed to merely an upper bound) on the density of zeroes in the critical strip, establishing the Riemann-von Mangoldt formula that the number ${N(T)}$ of zeroes of imaginary part between ${0}$ and ${T}$ is ${\frac{T}{2\pi} \log \frac{T}{2\pi} - \frac{T}{2\pi} + O(\log T)}$ for large ${T}$. The other is to obtain untruncated versions of the explicit formula from Notes 2, giving a remarkable exact formula for sums involving the von Mangoldt function in terms of zeroes of the Riemann zeta function. These results are not strictly necessary for most of the material in the rest of the course, but certainly help to clarify the nature of the Riemann zeta function and its relation to the primes.

In view of the material in previous notes, it should not be surprising that there are analogues of all of the above theory for Dirichlet ${L}$-functions ${L(\cdot,\chi)}$. We will restrict attention to primitive characters ${\chi}$, since the ${L}$-function for imprimitive characters merely differs from the ${L}$-function of the associated primitive factor by a finite Euler product; indeed, if ${\chi = \chi' \chi_0}$ for some principal ${\chi_0}$ whose modulus ${q_0}$ is coprime to that of ${\chi'}$, then

$\displaystyle L(s,\chi) = L(s,\chi') \prod_{p|q_0} (1 - \frac{1}{p^s}) \ \ \ \ \ (9)$

(cf. equation (45) of Notes 2).

The main new feature is that the Poisson summation formula needs to be “twisted” by a Dirichlet character ${\chi}$, and this boils down to the problem of understanding the finite (additive) Fourier transform of a Dirichlet character. This is achieved by the classical theory of Gauss sums, which we review below the fold. There is one new wrinkle; the value of ${\chi(-1) \in \{-1,+1\}}$ plays a role in the functional equation. More precisely, we have

Theorem 3 (Functional equation for ${L}$-functions) Let ${\chi}$ be a primitive character of modulus ${q}$ with ${q>1}$. Then ${L(s,\chi)}$ extends to an entire function on the complex plane, with

$\displaystyle L(s,\chi) = \varepsilon(\chi) 2^s \pi^{s-1} q^{1/2-s} \sin(\frac{\pi}{2}(s+\kappa)) \Gamma(1-s) L(1-s,\overline{\chi})$

or equivalently

$\displaystyle L(1-s,\overline{\chi}) = \varepsilon(\overline{\chi}) 2^{1-s} \pi^{-s} q^{s-1/2} \sin(\frac{\pi}{2}(1-s+\kappa)) \Gamma(s) L(s,\chi)$

for all ${s}$, where ${\kappa}$ is equal to ${0}$ in the even case ${\chi(-1)=+1}$ and ${1}$ in the odd case ${\chi(-1)=-1}$, and

$\displaystyle \varepsilon(\chi) := \frac{\tau(\chi)}{i^\kappa \sqrt{q}} \ \ \ \ \ (10)$

where ${\tau(\chi)}$ is the Gauss sum

$\displaystyle \tau(\chi) := \sum_{n \in {\bf Z}/q{\bf Z}} \chi(n) e(n/q). \ \ \ \ \ (11)$

and ${e(x) := e^{2\pi ix}}$, with the convention that the ${q}$-periodic function ${n \mapsto e(n/q)}$ is also (by abuse of notation) applied to ${n}$ in the cyclic group ${{\bf Z}/q{\bf Z}}$.

From this functional equation and (2) we see that, as with the Riemann zeta function, the non-trivial zeroes of ${L(s,\chi)}$ (defined as the zeroes within the critical strip ${\{ s: 0 < \hbox{Re}(s) < 1 \}}$ are symmetric around the critical line (and, if ${\chi}$ is real, are also symmetric around the real axis). In addition, ${L(s,\chi)}$ acquires trivial zeroes at the negative even integers and at zero if ${\chi(-1)=1}$, and at the negative odd integers if ${\chi(-1)=-1}$. For imprimitive ${\chi}$, we see from (9) that ${L(s,\chi)}$ also acquires some additional trivial zeroes on the left edge of the critical strip.

There is also a symmetric version of this equation, analogous to Corollary 2:

Corollary 4 Let ${\chi,q,\varepsilon(\chi)}$ be as above, and set

$\displaystyle \xi(s,\chi) := (q/\pi)^{(s+\kappa)/2} \Gamma((s+\kappa)/2) L(s,\chi),$

then ${\xi(\cdot,\chi)}$ is entire with ${\xi(1-s,\chi) = \varepsilon(\chi) \xi(s,\chi)}$.

For further detail on the functional equation and its implications, I recommend the classic text of Titchmarsh or the text of Davenport.

In Notes 1, we approached multiplicative number theory (the study of multiplicative functions ${f: {\bf N} \rightarrow {\bf C}}$ and their relatives) via elementary methods, in which attention was primarily focused on obtaining asymptotic control on summatory functions ${\sum_{n \leq x} f(n)}$ and logarithmic sums ${\sum_{n \leq x} \frac{f(n)}{n}}$. Now we turn to the complex approach to multiplicative number theory, in which the focus is instead on obtaining various types of control on the Dirichlet series ${{\mathcal D} f}$, defined (at least for ${s}$ of sufficiently large real part) by the formula

$\displaystyle {\mathcal D} f(s) := \sum_n \frac{f(n)}{n^s}.$

These series also made an appearance in the elementary approach to the subject, but only for real ${s}$ that were larger than ${1}$. But now we will exploit the freedom to extend the variable ${s}$ to the complex domain; this gives enough freedom (in principle, at least) to recover control of elementary sums such as ${\sum_{n\leq x} f(n)}$ or ${\sum_{n\leq x} \frac{f(n)}{n}}$ from control on the Dirichlet series. Crucially, for many key functions ${f}$ of number-theoretic interest, the Dirichlet series ${{\mathcal D} f}$ can be analytically (or at least meromorphically) continued to the left of the line ${\{ s: \hbox{Re}(s) = 1 \}}$. The zeroes and poles of the resulting meromorphic continuations of ${{\mathcal D} f}$ (and of related functions) then turn out to control the asymptotic behaviour of the elementary sums of ${f}$; the more one knows about the former, the more one knows about the latter. In particular, knowledge of where the zeroes of the Riemann zeta function ${\zeta}$ are located can give very precise information about the distribution of the primes, by means of a fundamental relationship known as the explicit formula. There are many ways of phrasing this explicit formula (both in exact and in approximate forms), but they are all trying to formalise an approximation to the von Mangoldt function ${\Lambda}$ (and hence to the primes) of the form

$\displaystyle \Lambda(n) \approx 1 - \sum_\rho n^{\rho-1} \ \ \ \ \ (1)$

where the sum is over zeroes ${\rho}$ (counting multiplicity) of the Riemann zeta function ${\zeta = {\mathcal D} 1}$ (with the sum often restricted so that ${\rho}$ has large real part and bounded imaginary part), and the approximation is in a suitable weak sense, so that

$\displaystyle \sum_n \Lambda(n) g(n) \approx \int_0^\infty g(y)\ dy - \sum_\rho \int_0^\infty g(y) y^{\rho-1}\ dy \ \ \ \ \ (2)$

for suitable “test functions” ${g}$ (which in practice are restricted to be fairly smooth and slowly varying, with the precise amount of restriction dependent on the amount of truncation in the sum over zeroes one wishes to take). Among other things, such approximations can be used to rigorously establish the prime number theorem

$\displaystyle \sum_{n \leq x} \Lambda(n) = x + o(x) \ \ \ \ \ (3)$

as ${x \rightarrow \infty}$, with the size of the error term ${o(x)}$ closely tied to the location of the zeroes ${\rho}$ of the Riemann zeta function.

The explicit formula (1) (or any of its more rigorous forms) is closely tied to the counterpart approximation

$\displaystyle -\frac{\zeta'}{\zeta}(s) \approx \frac{1}{s-1} - \sum_\rho \frac{1}{s-\rho} \ \ \ \ \ (4)$

for the Dirichlet series ${{\mathcal D} \Lambda = -\frac{\zeta'}{\zeta}}$ of the von Mangoldt function; note that (4) is formally the special case of (2) when ${g(n) = n^{-s}}$. Such approximations come from the general theory of local factorisations of meromorphic functions, as discussed in Supplement 2; the passage from (4) to (2) is accomplished by such tools as the residue theorem and the Fourier inversion formula, which were also covered in Supplement 2. The relative ease of uncovering the Fourier-like duality between primes and zeroes (sometimes referred to poetically as the “music of the primes”) is one of the major advantages of the complex-analytic approach to multiplicative number theory; this important duality tends to be rather obscured in the other approaches to the subject, although it can still in principle be discernible with sufficient effort.

More generally, one has an explicit formula

$\displaystyle \Lambda(n) \chi(n) \approx - \sum_\rho n^{\rho-1} \ \ \ \ \ (5)$

for any (non-principal) Dirichlet character ${\chi}$, where ${\rho}$ now ranges over the zeroes of the associated Dirichlet ${L}$-function ${L(s,\chi) := {\mathcal D} \chi(s)}$; we view this formula as a “twist” of (1) by the Dirichlet character ${\chi}$. The explicit formula (5), proven similarly (in any of its rigorous forms) to (1), is important in establishing the prime number theorem in arithmetic progressions, which asserts that

$\displaystyle \sum_{n \leq x: n = a\ (q)} \Lambda(n) = \frac{x}{\phi(q)} + o(x) \ \ \ \ \ (6)$

as ${x \rightarrow \infty}$, whenever ${a\ (q)}$ is a fixed primitive residue class. Again, the size of the error term ${o(x)}$ here is closely tied to the location of the zeroes of the Dirichlet ${L}$-function, with particular importance given to whether there is a zero very close to ${s=1}$ (such a zero is known as an exceptional zero or Siegel zero).

While any information on the behaviour of zeta functions or ${L}$-functions is in principle welcome for the purposes of analytic number theory, some regions of the complex plane are more important than others in this regard, due to the differing weights assigned to each zero in the explicit formula. Roughly speaking, in descending order of importance, the most crucial regions on which knowledge of these functions is useful are

1. The region on or near the point ${s=1}$.
2. The region on or near the right edge ${\{ 1+it: t \in {\bf R} \}}$ of the critical strip ${\{ s: 0 \leq \hbox{Re}(s) \leq 1 \}}$.
3. The right half ${\{ s: \frac{1}{2} < \hbox{Re}(s) < 1 \}}$ of the critical strip.
4. The region on or near the critical line ${\{ \frac{1}{2} + it: t \in {\bf R} \}}$ that bisects the critical strip.
5. Everywhere else.

For instance:

1. We will shortly show that the Riemann zeta function ${\zeta}$ has a simple pole at ${s=1}$ with residue ${1}$, which is already sufficient to recover much of the classical theorems of Mertens discussed in the previous set of notes, as well as results on mean values of multiplicative functions such as the divisor function ${\tau}$. For Dirichlet ${L}$-functions, the behaviour is instead controlled by the quantity ${L(1,\chi)}$ discussed in Notes 1, which is in turn closely tied to the existence and location of a Siegel zero.
2. The zeta function is also known to have no zeroes on the right edge ${\{1+it: t \in {\bf R}\}}$ of the critical strip, which is sufficient to prove (and is in fact equivalent to) the prime number theorem. Any enlargement of the zero-free region for ${\zeta}$ into the critical strip leads to improved error terms in that theorem, with larger zero-free regions leading to stronger error estimates. Similarly for ${L}$-functions and the prime number theorem in arithmetic progressions.
3. The (as yet unproven) Riemann hypothesis prohibits ${\zeta}$ from having any zeroes within the right half ${\{ s: \frac{1}{2} < \hbox{Re}(s) < 1 \}}$ of the critical strip, and gives very good control on the number of primes in intervals, even when the intervals are relatively short compared to the size of the entries. Even without assuming the Riemann hypothesis, zero density estimates in this region are available that give some partial control of this form. Similarly for ${L}$-functions, primes in short arithmetic progressions, and the generalised Riemann hypothesis.
4. Assuming the Riemann hypothesis, further distributional information about the zeroes on the critical line (such as Montgomery’s pair correlation conjecture, or the more general GUE hypothesis) can give finer information about the error terms in the prime number theorem in short intervals, as well as other arithmetic information. Again, one has analogues for ${L}$-functions and primes in short arithmetic progressions.
5. The functional equation of the zeta function describes the behaviour of ${\zeta}$ to the left of the critical line, in terms of the behaviour to the right of the critical line. This is useful for building a “global” picture of the structure of the zeta function, and for improving a number of estimates about that function, but (in the absence of unproven conjectures such as the Riemann hypothesis or the pair correlation conjecture) it turns out that many of the basic analytic number theory results using the zeta function can be established without relying on this equation. Similarly for ${L}$-functions.

Remark 1 If one takes an “adelic” viewpoint, one can unite the Riemann zeta function ${\zeta(\sigma+it) = \sum_n n^{-\sigma-it}}$ and all of the ${L}$-functions ${L(\sigma+it,\chi) = \sum_n \chi(n) n^{-\sigma-it}}$ for various Dirichlet characters ${\chi}$ into a single object, viewing ${n \mapsto \chi(n) n^{-it}}$ as a general multiplicative character on the adeles; thus the imaginary coordinate ${t}$ and the Dirichlet character ${\chi}$ are really the Archimedean and non-Archimedean components respectively of a single adelic frequency parameter. This viewpoint was famously developed in Tate’s thesis, which among other things helps to clarify the nature of the functional equation, as discussed in this previous post. We will not pursue the adelic viewpoint further in these notes, but it does supply a “high-level” explanation for why so much of the theory of the Riemann zeta function extends to the Dirichlet ${L}$-functions. (The non-Archimedean character ${\chi(n)}$ and the Archimedean character ${n^{it}}$ behave similarly from an algebraic point of view, but not so much from an analytic point of view; as such, the adelic viewpoint is well suited for algebraic tasks (such as establishing the functional equation), but not for analytic tasks (such as establishing a zero-free region).)

Roughly speaking, the elementary multiplicative number theory from Notes 1 corresponds to the information one can extract from the complex-analytic method in region 1 of the above hierarchy, while the more advanced elementary number theory used to prove the prime number theorem (and which we will not cover in full detail in these notes) corresponds to what one can extract from regions 1 and 2.

As a consequence of this hierarchy of importance, information about the ${\zeta}$ function away from the critical strip, such as Euler’s identity

$\displaystyle \zeta(2) = \frac{\pi^2}{6}$

or equivalently

$\displaystyle 1 + \frac{1}{2^2} + \frac{1}{3^2} + \dots = \frac{\pi^2}{6}$

or the infamous identity

$\displaystyle \zeta(-1) = -\frac{1}{12},$

which is often presented (slightly misleadingly, if one’s conventions for divergent summation are not made explicit) as

$\displaystyle 1 + 2 + 3 + \dots = -\frac{1}{12},$

are of relatively little direct importance in analytic prime number theory, although they are still of interest for some other, non-number-theoretic, applications. (The quantity ${\zeta(2)}$ does play a minor role as a normalising factor in some asymptotics, see e.g. Exercise 28 from Notes 1, but its precise value is usually not of major importance.) In contrast, the value ${L(1,\chi)}$ of an ${L}$-function at ${s=1}$ turns out to be extremely important in analytic number theory, with many results in this subject relying ultimately on a non-trivial lower-bound on this quantity coming from Siegel’s theorem, discussed below the fold.

For a more in-depth treatment of the topics in this set of notes, see Davenport’s “Multiplicative number theory“.

We will shortly turn to the complex-analytic approach to multiplicative number theory, which relies on the basic properties of complex analytic functions. In this supplement to the main notes, we quickly review the portions of complex analysis that we will be using in this course. We will not attempt a comprehensive review of this subject; for instance, we will completely neglect the conformal geometry or Riemann surface aspect of complex analysis, and we will also avoid using the various boundary convergence theorems for Taylor series or Dirichlet series (the latter type of result is traditionally utilised in multiplicative number theory, but I personally find them a little unintuitive to use, and will instead rely on a slightly different set of complex-analytic tools). We will also focus on the “local” structure of complex analytic functions, in particular adopting the philosophy that such functions behave locally like complex polynomials; the classical “global” theory of entire functions, while traditionally used in the theory of the Riemann zeta function, will be downplayed in these notes. On the other hand, we will play up the relationship between complex analysis and Fourier analysis, as we will incline to using the latter tool over the former in some of the subsequent material. (In the traditional approach to the subject, the Mellin transform is used in place of the Fourier transform, but we will not emphasise the role of the Mellin transform here.)

We begin by recalling the notion of a holomorphic function, which will later be shown to be essentially synonymous with that of a complex analytic function.

Definition 1 (Holomorphic function) Let ${\Omega}$ be an open subset of ${{\bf C}}$, and let ${f: \Omega \rightarrow {\bf C}}$ be a function. If ${z \in {\bf C}}$, we say that ${f}$ is complex differentiable at ${z}$ if the limit

$\displaystyle f'(z) := \lim_{h \rightarrow 0; h \in {\bf C} \backslash \{0\}} \frac{f(z+h)-f(z)}{h}$

exists, in which case we refer to ${f'(z)}$ as the (complex) derivative of ${f}$ at ${z}$. If ${f}$ is differentiable at every point ${z}$ of ${\Omega}$, and the derivative ${f': \Omega \rightarrow {\bf C}}$ is continuous, we say that ${f}$ is holomorphic on ${\Omega}$.

Exercise 2 Show that a function ${f: \Omega \rightarrow {\bf C}}$ is holomorphic if and only if the two-variable function ${(x,y) \mapsto f(x+iy)}$ is continuously differentiable on ${\{ (x,y) \in {\bf R}^2: x+iy \in \Omega\}}$ and obeys the Cauchy-Riemann equation

$\displaystyle \frac{\partial}{\partial x} f(x+iy) = \frac{1}{i} \frac{\partial}{\partial y} f(x+iy). \ \ \ \ \ (1)$

Basic examples of holomorphic functions include complex polynomials

$\displaystyle P(z) = a_n z^n + \dots + a_1 z + a_0$

as well as the complex exponential function

$\displaystyle \exp(z) := \sum_{n=0}^\infty \frac{z^n}{n!}$

which are holomorphic on the entire complex plane ${{\bf C}}$ (i.e., they are entire functions). The sum or product of two holomorphic functions is again holomorphic; the quotient of two holomorphic functions is holomorphic so long as the denominator is non-zero. Finally, the composition of two holomorphic functions is holomorphic wherever the composition is defined.

Exercise 3

• (i) Establish Euler’s formula

$\displaystyle \exp(x+iy) = e^x (\cos y + i \sin y)$

for all ${x,y \in {\bf R}}$. (Hint: it is a bit tricky to do this starting from the trigonometric definitions of sine and cosine; I recommend either using the Taylor series formulations of these functions instead, or alternatively relying on the ordinary differential equations obeyed by sine and cosine.)

• (ii) Show that every non-zero complex number ${z}$ has a complex logarithm ${\log(z)}$ such that ${\exp(\log(z))=z}$, and that this logarithm is unique up to integer multiples of ${2\pi i}$.
• (iii) Show that there exists a unique principal branch ${\hbox{Log}(z)}$ of the complex logarithm in the region ${{\bf C} \backslash (-\infty,0]}$, defined by requiring ${\hbox{Log}(z)}$ to be a logarithm of ${z}$ with imaginary part between ${-\pi}$ and ${\pi}$. Show that this principal branch is holomorphic with derivative ${1/z}$.

In real analysis, we have the fundamental theorem of calculus, which asserts that

$\displaystyle \int_a^b F'(t)\ dt = F(b) - F(a)$

whenever ${[a,b]}$ is a real interval and ${F: [a,b] \rightarrow {\bf R}}$ is a continuously differentiable function. The complex analogue of this fact is that

$\displaystyle \int_\gamma F'(z)\ dz = F(\gamma(1)) - F(\gamma(0)) \ \ \ \ \ (2)$

whenever ${F: \Omega \rightarrow {\bf C}}$ is a holomorphic function, and ${\gamma: [0,1] \rightarrow \Omega}$ is a contour in ${\Omega}$, by which we mean a piecewise continuously differentiable function, and the contour integral ${\int_\gamma f(z)\ dz}$ for a continuous function ${f}$ is defined via change of variables as

$\displaystyle \int_\gamma f(z)\ dz := \int_0^1 f(\gamma(t)) \gamma'(t)\ dt.$

The complex fundamental theorem of calculus (2) follows easily from the real fundamental theorem and the chain rule.

In real analysis, we have the rather trivial fact that the integral of a continuous function on a closed contour is always zero:

$\displaystyle \int_a^b f(t)\ dt + \int_b^a f(t)\ dt = 0.$

In complex analysis, the analogous fact is significantly more powerful, and is known as Cauchy’s theorem:

Theorem 4 (Cauchy’s theorem) Let ${f: \Omega \rightarrow {\bf C}}$ be a holomorphic function in a simply connected open set ${\Omega}$, and let ${\gamma: [0,1] \rightarrow \Omega}$ be a closed contour in ${\Omega}$ (thus ${\gamma(1)=\gamma(0)}$). Then ${\int_\gamma f(z)\ dz = 0}$.

Exercise 5 Use Stokes’ theorem to give a proof of Cauchy’s theorem.

A useful reformulation of Cauchy’s theorem is that of contour shifting: if ${f: \Omega \rightarrow {\bf C}}$ is a holomorphic function on a open set ${\Omega}$, and ${\gamma, \tilde \gamma}$ are two contours in an open set ${\Omega}$ with ${\gamma(0)=\tilde \gamma(0)}$ and ${\gamma(1) = \tilde \gamma(1)}$, such that ${\gamma}$ can be continuously deformed into ${\tilde \gamma}$, then ${\int_\gamma f(z)\ dz = \int_{\tilde \gamma} f(z)\ dz}$. A basic application of contour shifting is the Cauchy integral formula:

Theorem 6 (Cauchy integral formula) Let ${f: \Omega \rightarrow {\bf C}}$ be a holomorphic function in a simply connected open set ${\Omega}$, and let ${\gamma: [0,1] \rightarrow \Omega}$ be a closed contour which is simple (thus ${\gamma}$ does not traverse any point more than once, with the exception of the endpoint ${\gamma(0)=\gamma(1)}$ that is traversed twice), and which encloses a bounded region ${U}$ in the anticlockwise direction. Then for any ${z_0 \in U}$, one has

$\displaystyle \int_\gamma \frac{f(z)}{z-z_0}\ dz= 2\pi i f(z_0).$

Proof: Let ${\varepsilon > 0}$ be a sufficiently small quantity. By contour shifting, one can replace the contour ${\gamma}$ by the sum (concatenation) of three contours: a contour ${\rho}$ from ${\gamma(0)}$ to ${z_0+\varepsilon}$, a contour ${C_\varepsilon}$ traversing the circle ${\{z: |z-z_0|=\varepsilon\}}$ once anticlockwise, and the reversal ${-\rho}$ of the contour ${\rho}$ that goes from ${z_0+\varepsilon}$ to ${\gamma_0}$. The contributions of the contours ${\rho, -\rho}$ cancel each other, thus

$\displaystyle \int_\gamma \frac{f(z)}{z-z_0}\ dz = \int_{C_\varepsilon} \frac{f(z)}{z-z_0}\ dz.$

By a change of variables, the right-hand side can be expanded as

$\displaystyle 2\pi i \int_0^1 f(z_0 + \varepsilon e^{2\pi i t})\ dt.$

Sending ${\varepsilon \rightarrow 0}$, we obtain the claim. $\Box$

The Cauchy integral formula has many consequences. Specialising to the case when ${\gamma}$ traverses a circle ${\{ z: |z-z_0|=r\}}$ around ${z_0}$, we conclude the mean value property

$\displaystyle f(z_0) = \int_0^1 f(z_0 + re^{2\pi i t})\ dt \ \ \ \ \ (3)$

whenever ${f}$ is holomorphic in a neighbourhood of the disk ${\{ z: |z-z_0| \leq r \}}$. In a similar spirit, we have the maximum principle for holomorphic functions:

Lemma 7 (Maximum principle) Let ${\Omega}$ be a simply connected open set, and let ${\gamma}$ be a simple closed contour in ${\Omega}$ enclosing a bounded region ${U}$ anti-clockwise. Let ${f: \Omega \rightarrow {\bf C}}$ be a holomorphic function. If we have the bound ${|f(z)| \leq M}$ for all ${z}$ on the contour ${\gamma}$, then we also have the bound ${|f(z_0)| \leq M}$ for all ${z_0 \in U}$.

Proof: We use an argument of Landau. Fix ${z_0 \in U}$. From the Cauchy integral formula and the triangle inequality we have the bound

$\displaystyle |f(z_0)| \leq C_{z_0,\gamma} M$

for some constant ${C_{z_0,\gamma} > 0}$ depending on ${z_0}$ and ${\gamma}$. This ostensibly looks like a weaker bound than what we want, but we can miraculously make the constant ${C_{z_0,\gamma}}$ disappear by the “tensor power trick“. Namely, observe that if ${f}$ is a holomorphic function bounded in magnitude by ${M}$ on ${\gamma}$, and ${n}$ is a natural number, then ${f^n}$ is a holomorphic function bounded in magnitude by ${M^n}$ on ${\gamma}$. Applying the preceding argument with ${f, M}$ replaced by ${f^n, M^n}$ we conclude that

$\displaystyle |f(z_0)|^n \leq C_{z_0,\gamma} M^n$

and hence

$\displaystyle |f(z_0)| \leq C_{z_0,\gamma}^{1/n} M.$

Sending ${n \rightarrow \infty}$, we obtain the claim. $\Box$

Another basic application of the integral formula is

Corollary 8 Every holomorphic function ${f: \Omega \rightarrow {\bf C}}$ is complex analytic, thus it has a convergent Taylor series around every point ${z_0}$ in the domain. In particular, holomorphic functions are smooth, and the derivative of a holomorphic function is again holomorphic.

Conversely, it is easy to see that complex analytic functions are holomorphic. Thus, the terms “complex analytic” and “holomorphic” are synonymous, at least when working on open domains. (On a non-open set ${\Omega}$, saying that ${f}$ is analytic on ${\Omega}$ is equivalent to asserting that ${f}$ extends to a holomorphic function of an open neighbourhood of ${\Omega}$.) This is in marked contrast to real analysis, in which a function can be continuously differentiable, or even smooth, without being real analytic.

Proof: By translation, we may suppose that ${z_0=0}$. Let ${C_r}$ be a a contour traversing the circle ${\{ z: |z|=r\}}$ that is contained in the domain ${\Omega}$, then by the Cauchy integral formula one has

$\displaystyle f(z) = \frac{1}{2\pi i} \int_{C_r} \frac{f(w)}{w-z}\ dw$

for all ${z}$ in the disk ${\{ z: |z| < r \}}$. As ${f}$ is continuously differentiable (and hence continuous) on ${C_r}$, it is bounded. From the geometric series formula

$\displaystyle \frac{1}{w-z} = \frac{1}{w} + \frac{1}{w^2} z + \frac{1}{w^3} z^2 + \dots$

and dominated convergence, we conclude that

$\displaystyle f(z) = \sum_{n=0}^\infty (\frac{1}{2\pi i} \int_{C_r} \frac{f(w)}{w^{n+1}}\ dw) z^n$

with the right-hand side an absolutely convergent series for ${|z| < r}$, and the claim follows. $\Box$

Exercise 9 Establish the generalised Cauchy integral formulae

$\displaystyle f^{(k)}(z_0) = \frac{k!}{2\pi i} \int_\gamma \frac{f(z)}{(z-z_0)^{k+1}}\ dz$

for any non-negative integer ${k}$, where ${f^{(k)}}$ is the ${k}$-fold complex derivative of ${f}$.

This in turn leads to a converse to Cauchy’s theorem, known as Morera’s theorem:

Corollary 10 (Morera’s theorem) Let ${f: \Omega \rightarrow {\bf C}}$ be a continuous function on an open set ${\Omega}$ with the property that ${\int_\gamma f(z)\ dz = 0}$ for all closed contours ${\gamma: [0,1] \rightarrow \Omega}$. Then ${f}$ is holomorphic.

Proof: We can of course assume ${\Omega}$ to be non-empty and connected (hence path-connected). Fix a point ${z_0 \in \Omega}$, and define a “primitive” ${F: \Omega \rightarrow {\bf C}}$ of ${f}$ by defining ${F(z_1) = \int_\gamma f(z)\ dz}$, with ${\gamma: [0,1] \rightarrow \Omega}$ being any contour from ${z_0}$ to ${z_1}$ (this is well defined by hypothesis). By mimicking the proof of the real fundamental theorem of calculus, we see that ${F}$ is holomorphic with ${F'=f}$, and the claim now follows from Corollary 8. $\Box$

An important consequence of Morera’s theorem for us is

Corollary 11 (Locally uniform limit of holomorphic functions is holomorphic) Let ${f_n: \Omega \rightarrow {\bf C}}$ be holomorphic functions on an open set ${\Omega}$ which converge locally uniformly to a function ${f: \Omega \rightarrow {\bf C}}$. Then ${f}$ is also holomorphic on ${\Omega}$.

Proof: By working locally we may assume that ${\Omega}$ is a ball, and in particular simply connected. By Cauchy’s theorem, ${\int_\gamma f_n(z)\ dz = 0}$ for all closed contours ${\gamma}$ in ${\Omega}$. By local uniform convergence, this implies that ${\int_\gamma f(z)\ dz = 0}$ for all such contours, and the claim then follows from Morera’s theorem. $\Box$

Now we study the zeroes of complex analytic functions. If a complex analytic function ${f}$ vanishes at a point ${z_0}$, but is not identically zero in a neighbourhood of that point, then by Taylor expansion we see that ${f}$ factors in a sufficiently small neighbourhood of ${z_0}$ as

$\displaystyle f(z) = (z-z_0)^n g(z_0) \ \ \ \ \ (4)$

for some natural number ${n}$ (which we call the order or multiplicity of the zero at ${f}$) and some function ${g}$ that is complex analytic and non-zero near ${z_0}$; this generalises the factor theorem for polynomials. In particular, the zero ${z_0}$ is isolated if ${f}$ does not vanish identically near ${z_0}$. We conclude that if ${\Omega}$ is connected and ${f}$ vanishes on a neighbourhood of some point ${z_0}$ in ${\Omega}$, then it must vanish on all of ${\Omega}$ (since the maximal connected neighbourhood of ${z_0}$ in ${\Omega}$ on which ${f}$ vanishes cannot have any boundary point in ${\Omega}$). This implies unique continuation of analytic functions: if two complex analytic functions on ${\Omega}$ agree on a non-empty open set, then they agree everywhere. In particular, if a complex analytic function does not vanish everywhere, then all of its zeroes are isolated, so in particular it has only finitely many zeroes on any given compact set.

Recall that a rational function is a function ${f}$ which is a quotient ${g/h}$ of two polynomials (at least outside of the set where ${h}$ vanishes). Analogously, let us define a meromorphic function on an open set ${\Omega}$ to be a function ${f: \Omega \backslash S \rightarrow {\bf C}}$ defined outside of a discrete subset ${S}$ of ${\Omega}$ (the singularities of ${f}$), which is locally the quotient ${g/h}$ of holomorphic functions, in the sense that for every ${z_0 \in \Omega}$, one has ${f=g/h}$ in a neighbourhood of ${z_0}$ excluding ${S}$, with ${g, h}$ holomorphic near ${z_0}$ and with ${h}$ non-vanishing outside of ${S}$. If ${z_0 \in S}$ and ${g}$ has a zero of equal or higher order than ${h}$ at ${z_0}$, then the singularity is removable and one can extend the meromorphic function holomorphically across ${z_0}$ (by the holomorphic factor theorem (4)); otherwise, the singularity is non-removable and is known as a pole, whose order is equal to the difference between the order of ${h}$ and the order of ${g}$ at ${z_0}$. (If one wished, one could extend meromorphic functions to the poles by embedding ${{\bf C}}$ in the Riemann sphere ${{\bf C} \cup \{\infty\}}$ and mapping each pole to ${\infty}$, but we will not do so here. One could also consider non-meromorphic functions with essential singularities at various points, but we will have no need to analyse such singularities in this course.) If the order of a pole or zero is one, we say that it is simple; if it is two, we say it is double; and so forth.

Exercise 12 Show that the space of meromorphic functions on a non-empty open set ${\Omega}$, quotiented by almost everywhere equivalence, forms a field.

By quotienting two Taylor series, we see that if a meromorphic function ${f}$ has a pole of order ${n}$ at some point ${z_0}$, then it has a Laurent expansion

$\displaystyle f = \sum_{m=-n}^\infty a_m (z-z_0)^m,$

absolutely convergent in a neighbourhood of ${z_0}$ excluding ${z_0}$ itself, and with ${a_{-n}}$ non-zero. The Laurent coefficient ${a_{-1}}$ has a special significance, and is called the residue of the meromorphic function ${f}$ at ${z_0}$, which we will denote as ${\hbox{Res}(f;z_0)}$. The importance of this coefficient comes from the following significant generalisation of the Cauchy integral formula, known as the residue theorem:

Exercise 13 (Residue theorem) Let ${f}$ be a meromorphic function on a simply connected domain ${\Omega}$, and let ${\gamma}$ be a closed contour in ${\Omega}$ enclosing a bounded region ${U}$ anticlockwise, and avoiding all the singularities of ${f}$. Show that

$\displaystyle \int_\gamma f(z)\ dz = 2\pi i \sum_\rho \hbox{Res}(f;\rho)$

where ${\rho}$ is summed over all the poles of ${f}$ that lie in ${U}$.

The residue theorem is particularly useful when applied to logarithmic derivatives ${f'/f}$ of meromorphic functions ${f}$, because the residue is of a specific form:

Exercise 14 Let ${f}$ be a meromorphic function on an open set ${\Omega}$ that does not vanish identically. Show that the only poles of ${f'/f}$ are simple poles (poles of order ${1}$), occurring at the poles and zeroes of ${f}$ (after all removable singularities have been removed). Furthermore, the residue of ${f'/f}$ at a pole ${z_0}$ is an integer, equal to the order of zero of ${f}$ if ${f}$ has a zero at ${z_0}$, or equal to negative the order of pole at ${f}$ if ${f}$ has a pole at ${z_0}$.

Remark 15 The fact that residues of logarithmic derivatives of meromorphic functions are automatically integers is a remarkable feature of the complex analytic approach to multiplicative number theory, which is difficult (though not entirely impossible) to duplicate in other approaches to the subject. Here is a sample application of this integrality, which is challenging to reproduce by non-complex-analytic means: if ${f}$ is meromorphic near ${z_0}$, and one has the bound ${|\frac{f'}{f}(z_0+t)| \leq \frac{0.9}{t} + O(1)}$ as ${t \rightarrow 0^+}$, then ${\frac{f'}{f}}$ must in fact stay bounded near ${z_0}$, because the only integer of magnitude less than ${0.9}$ is zero.

Given a function ${f: X \rightarrow Y}$ between two sets ${X, Y}$, we can form the graph

$\displaystyle \Sigma := \{ (x,f(x)): x\in X \},$

which is a subset of the Cartesian product ${X \times Y}$.

There are a number of “closed graph theorems” in mathematics which relate the regularity properties of the function ${f}$ with the closure properties of the graph ${\Sigma}$, assuming some “completeness” properties of the domain ${X}$ and range ${Y}$. The most famous of these is the closed graph theorem from functional analysis, which I phrase as follows:

Theorem 1 (Closed graph theorem (functional analysis)) Let ${X, Y}$ be complete normed vector spaces over the reals (i.e. Banach spaces). Then a function ${f: X \rightarrow Y}$ is a continuous linear transformation if and only if the graph ${\Sigma := \{ (x,f(x)): x \in X \}}$ is both linearly closed (i.e. it is a linear subspace of ${X \times Y}$) and topologically closed (i.e. closed in the product topology of ${X \times Y}$).

I like to think of this theorem as linking together qualitative and quantitative notions of regularity preservation properties of an operator ${f}$; see this blog post for further discussion.

The theorem is equivalent to the assertion that any continuous linear bijection ${f: X \rightarrow Y}$ from one Banach space to another is necessarily an isomorphism in the sense that the inverse map is also continuous and linear. Indeed, to see that this claim implies the closed graph theorem, one applies it to the projection from ${\Sigma}$ to ${X}$, which is a continuous linear bijection; conversely, to deduce this claim from the closed graph theorem, observe that the graph of the inverse ${f^{-1}}$ is the reflection of the graph of ${f}$. As such, the closed graph theorem is a corollary of the open mapping theorem, which asserts that any continuous linear surjection from one Banach space to another is open. (Conversely, one can deduce the open mapping theorem from the closed graph theorem by quotienting out the kernel of the continuous surjection to get a bijection.)

It turns out that there is a closed graph theorem (or equivalent reformulations of that theorem, such as an assertion that bijective morphisms between sufficiently “complete” objects are necessarily isomorphisms, or as an open mapping theorem) in many other categories in mathematics as well. Here are some easy ones:

Theorem 2 (Closed graph theorem (linear algebra)) Let ${X, Y}$ be vector spaces over a field ${k}$. Then a function ${f: X \rightarrow Y}$ is a linear transformation if and only if the graph ${\Sigma := \{ (x,f(x)): x \in X \}}$ is linearly closed.

Theorem 3 (Closed graph theorem (group theory)) Let ${X, Y}$ be groups. Then a function ${f: X \rightarrow Y}$ is a group homomorphism if and only if the graph ${\Sigma := \{ (x,f(x)): x \in X \}}$ is closed under the group operations (i.e. it is a subgroup of ${X \times Y}$).

Theorem 4 (Closed graph theorem (order theory)) Let ${X, Y}$ be totally ordered sets. Then a function ${f: X \rightarrow Y}$ is monotone increasing if and only if the graph ${\Sigma := \{ (x,f(x)): x \in X \}}$ is totally ordered (using the product order on ${X \times Y}$).

Remark 1 Similar results to the above three theorems (with similarly easy proofs) hold for other algebraic structures, such as rings (using the usual product of rings), modules, algebras, or Lie algebras, groupoids, or even categories (a map between categories is a functor iff its graph is again a category). (ADDED IN VIEW OF COMMENTS: further examples include affine spaces and ${G}$-sets (sets with an action of a given group ${G}$).) There are also various approximate versions of this theorem that are useful in arithmetic combinatorics, that relate the property of a map ${f}$ being an “approximate homomorphism” in some sense with its graph being an “approximate group” in some sense. This is particularly useful for this subfield of mathematics because there are currently more theorems about approximate groups than about approximate homomorphisms, so that one can profitably use closed graph theorems to transfer results about the former to results about the latter.

A slightly more sophisticated result in the same vein:

Theorem 5 (Closed graph theorem (point set topology)) Let ${X, Y}$ be compact Hausdorff spaces. Then a function ${f: X \rightarrow Y}$ is continuous if and only if the graph ${\Sigma := \{ (x,f(x)): x \in X \}}$ is topologically closed.

Indeed, the “only if” direction is easy, while for the “if” direction, note that if ${\Sigma}$ is a closed subset of ${X \times Y}$, then it is compact Hausdorff, and the projection map from ${\Sigma}$ to ${X}$ is then a bijective continuous map between compact Hausdorff spaces, which is then closed, thus open, and hence a homeomorphism, giving the claim.

Note that the compactness hypothesis is necessary: for instance, the function ${f: {\bf R} \rightarrow {\bf R}}$ defined by ${f(x) := 1/x}$ for ${x \neq 0}$ and ${f(0)=0}$ for ${x=0}$ is a function which has a closed graph, but is discontinuous.

A similar result (but relying on a much deeper theorem) is available in algebraic geometry, as I learned after asking this MathOverflow question:

Theorem 6 (Closed graph theorem (algebraic geometry)) Let ${X, Y}$ be normal projective varieties over an algebraically closed field ${k}$ of characteristic zero. Then a function ${f: X \rightarrow Y}$ is a regular map if and only if the graph ${\Sigma := \{ (x,f(x)): x \in X \}}$ is Zariski-closed.

Proof: (Sketch) For the only if direction, note that the map ${x \mapsto (x,f(x))}$ is a regular map from the projective variety ${X}$ to the projective variety ${X \times Y}$ and is thus a projective morphism, hence is proper. In particular, the image ${\Sigma}$ of ${X}$ under this map is Zariski-closed.

Conversely, if ${\Sigma}$ is Zariski-closed, then it is also a projective variety, and the projection ${(x,y) \mapsto x}$ is a projective morphism from ${\Sigma}$ to ${X}$, which is clearly quasi-finite; by the characteristic zero hypothesis, it is also separated. Applying (Grothendieck’s form of) Zariski’s main theorem, this projection is the composition of an open immersion and a finite map. As projective varieties are complete, the open immersion is an isomorphism, and so the projection from ${\Sigma}$ to ${X}$ is finite. Being injective and separable, the degree of this finite map must be one, and hence ${k(\Sigma)}$ and ${k(X)}$ are isomorphic, hence (by normality of ${X}$) ${k[\Sigma]}$ is contained in (the image of) ${k[X]}$, which makes the map from ${X}$ to ${\Sigma}$ regular, which makes ${f}$ regular. $\Box$

The counterexample of the map ${f: k \rightarrow k}$ given by ${f(x) := 1/x}$ for ${x \neq 0}$ and ${f(0) := 0}$ demonstrates why the projective hypothesis is necessary. The necessity of the normality condition (or more precisely, a weak normality condition) is demonstrated by (the projective version of) the map ${(t^2,t^3) \mapsto t}$ from the cusipdal curve ${\{ (t^2,t^3): t \in k \}}$ to ${k}$. (If one restricts attention to smooth varieties, though, normality becomes automatic.) The necessity of characteristic zero is demonstrated by (the projective version of) the inverse of the Frobenius map ${x \mapsto x^p}$ on a field ${k}$ of characteristic ${p}$.

There are also a number of closed graph theorems for topological groups, of which the following is typical (see Exercise 3 of these previous blog notes):

Theorem 7 (Closed graph theorem (topological group theory)) Let ${X, Y}$ be ${\sigma}$-compact, locally compact Hausdorff groups. Then a function ${X \rightarrow Y}$ is a continuous homomorphism if and only if the graph ${\Sigma := \{ (x,f(x)): x \in X \}}$ is both group-theoretically closed and topologically closed.

The hypotheses of being ${\sigma}$-compact, locally compact, and Hausdorff can be relaxed somewhat, but I doubt that they can be eliminated entirely (though I do not have a ready counterexample for this).

In several complex variables, it is a classical theorem (see e.g. Lemma 4 of this blog post) that a holomorphic function from a domain in ${{\bf C}^n}$ to ${{\bf C}^n}$ is locally injective if and only if it is a local diffeomorphism (i.e. its derivative is everywhere non-singular). This leads to a closed graph theorem for complex manifolds:

Theorem 8 (Closed graph theorem (complex manifolds)) Let ${X, Y}$ be complex manifolds. Then a function ${f: X \rightarrow Y}$ is holomorphic if and only if the graph ${\Sigma := \{ (x,f(x)): x \in X \}}$ is a complex manifold (using the complex structure inherited from ${X \times Y}$) of the same dimension as ${X}$.

Indeed, one applies the previous observation to the projection from ${\Sigma}$ to ${X}$. The dimension requirement is needed, as can be seen from the example of the map ${f: {\bf C} \rightarrow {\bf C}}$ defined by ${f(z) =1/z}$ for ${z \neq 0}$ and ${f(0)=0}$.

(ADDED LATER:) There is a real analogue to the above theorem:

Theorem 9 (Closed graph theorem (real manifolds)) Let ${X, Y}$ be real manifolds. Then a function ${f: X \rightarrow Y}$ is continuous if and only if the graph ${\Sigma := \{ (x,f(x)): x \in X \}}$ is a real manifold of the same dimension as ${X}$.

This theorem can be proven by applying invariance of domain (discussed in this previous post) to the projection of ${\Sigma}$ to ${X}$, to show that it is open if ${\Sigma}$ has the same dimension as ${X}$.

Note though that the analogous claim for smooth real manifolds fails: the function ${f: {\bf R} \rightarrow {\bf R}}$ defined by ${f(x) := x^{1/3}}$ has a smooth graph, but is not itself smooth.

(ADDED YET LATER:) Here is an easy closed graph theorem in the symplectic category:

Theorem 10 (Closed graph theorem (symplectic geometry)) Let ${X = (X,\omega_X)}$ and ${Y = (Y,\omega_Y)}$ be smooth symplectic manifolds of the same dimension. Then a smooth map ${f: X \rightarrow Y}$ is a symplectic morphism (i.e. ${f^* \omega_Y = \omega_X}$) if and only if the graph ${\Sigma := \{(x,f(x)): x \in X \}}$ is a Lagrangian submanifold of ${X \times Y}$ with the symplectic form ${\omega_X \oplus -\omega_Y}$.

In view of the symplectic rigidity phenomenon, it is likely that the smoothness hypotheses on ${f,X,Y}$ can be relaxed substantially, but I will not try to formulate such a result here.

There are presumably many further examples of closed graph theorems (or closely related theorems, such as criteria for inverting a morphism, or open mapping type theorems) throughout mathematics; I would be interested to know of further examples.

$\Box$

One of the most well known problems from ancient Greek mathematics was that of trisecting an angle by straightedge and compass, which was eventually proven impossible in 1837 by Pierre Wantzel, using methods from Galois theory.

Formally, one can set up the problem as follows. Define a configuration to be a finite collection ${{\mathcal C}}$ of points, lines, and circles in the Euclidean plane. Define a construction step to be one of the following operations to enlarge the collection ${{\mathcal C}}$:

• (Straightedge) Given two distinct points ${A, B}$ in ${{\mathcal C}}$, form the line ${\overline{AB}}$ that connects ${A}$ and ${B}$, and add it to ${{\mathcal C}}$.
• (Compass) Given two distinct points ${A, B}$ in ${{\mathcal C}}$, and given a third point ${O}$ in ${{\mathcal C}}$ (which may or may not equal ${A}$ or ${B}$), form the circle with centre ${O}$ and radius equal to the length ${|AB|}$ of the line segment joining ${A}$ and ${B}$, and add it to ${{\mathcal C}}$.
• (Intersection) Given two distinct curves ${\gamma, \gamma'}$ in ${{\mathcal C}}$ (thus ${\gamma}$ is either a line or a circle in ${{\mathcal C}}$, and similarly for ${\gamma'}$), select a point ${P}$ that is common to both ${\gamma}$ and ${\gamma'}$ (there are at most two such points), and add it to ${{\mathcal C}}$.

We say that a point, line, or circle is constructible by straightedge and compass from a configuration ${{\mathcal C}}$ if it can be obtained from ${{\mathcal C}}$ after applying a finite number of construction steps.

Problem 1 (Angle trisection) Let ${A, B, C}$ be distinct points in the plane. Is it always possible to construct by straightedge and compass from ${A,B,C}$ a line ${\ell}$ through ${A}$ that trisects the angle ${\angle BAC}$, in the sense that the angle between ${\ell}$ and ${BA}$ is one third of the angle of ${\angle BAC}$?

Thanks to Wantzel’s result, the answer to this problem is known to be “no” in general; a generic angle ${\angle BAC}$ cannot be trisected by straightedge and compass. (On the other hand, some special angles can certainly be trisected by straightedge and compass, such as a right angle. Also, one can certainly trisect generic angles using other methods than straightedge and compass; see the Wikipedia page on angle trisection for some examples of this.)

The impossibility of angle trisection stands in sharp contrast to the easy construction of angle bisection via straightedge and compass, which we briefly review as follows:

1. Start with three points ${A, B, C}$.
2. Form the circle ${c_0}$ with centre ${A}$ and radius ${AB}$, and intersect it with the line ${\overline{AC}}$. Let ${D}$ be the point in this intersection that lies on the same side of ${A}$ as ${C}$. (${D}$ may well be equal to ${C}$).
3. Form the circle ${c_1}$ with centre ${B}$ and radius ${AB}$, and the circle ${c_2}$ with centre ${D}$ and radius ${AB}$. Let ${E}$ be the point of intersection of ${c_1}$ and ${c_2}$ that is not ${A}$.
4. The line ${\ell := \overline{AE}}$ will then bisect the angle ${\angle BAC}$.

The key difference between angle trisection and angle bisection ultimately boils down to the following trivial number-theoretic fact:

Lemma 2 There is no power of ${2}$ that is evenly divisible by ${3}$.

Proof: Obvious by modular arithmetic, by induction, or by the fundamental theorem of arithmetic. $\Box$

In contrast, there are of course plenty of powers of ${2}$ that are evenly divisible by ${2}$, and this is ultimately why angle bisection is easy while angle trisection is hard.

The standard way in which Lemma 2 is used to demonstrate the impossibility of angle trisection is via Galois theory. The implication is quite short if one knows this theory, but quite opaque otherwise. We briefly sketch the proof of this implication here, though we will not need it in the rest of the discussion. Firstly, Lemma 2 implies the following fact about field extensions.

Corollary 3 Let ${F}$ be a field, and let ${E}$ be an extension of ${F}$ that can be constructed out of ${F}$ by a finite sequence of quadratic extensions. Then ${E}$ does not contain any cubic extensions ${K}$ of ${F}$.

Proof: If $E$ contained a cubic extension $K$ of $F$, then the dimension of $E$ over $F$ would be a multiple of three. On the other hand, if $E$ is obtained from $F$ by a tower of quadratic extensions, then the dimension of $E$ over $F$ is a power of two. The claim then follows from Lemma 2. $\Box$

To conclude the proof, one then notes that any point, line, or circle that can be constructed from a configuration ${{\mathcal C}}$ is definable in a field obtained from the coefficients of all the objects in ${{\mathcal C}}$ after taking a finite number of quadratic extensions, whereas a trisection of an angle ${\angle ABC}$ will generically only be definable in a cubic extension of the field generated by the coordinates of ${A,B,C}$.

The Galois theory method also allows one to obtain many other impossibility results of this type, most famously the Abel-Ruffini theorem on the insolvability of the quintic equation by radicals. For this reason (and also because of the many applications of Galois theory to number theory and other branches of mathematics), the Galois theory argument is the “right” way to prove the impossibility of angle trisection within the broader framework of modern mathematics. However, this argument has the drawback that it requires one to first understand Galois theory (or at least field theory), which is usually not presented until an advanced undergraduate algebra or number theory course, whilst the angle trisection problem requires only high-school level mathematics to formulate. Even if one is allowed to “cheat” and sweep several technicalities under the rug, one still needs to possess a fair amount of solid intuition about advanced algebra in order to appreciate the proof. (This was undoubtedly one reason why, even after Wantzel’s impossibility result was published, a large amount of effort was still expended by amateur mathematicians to try to trisect a general angle.)

In this post I would therefore like to present a different proof (or perhaps more accurately, a disguised version of the standard proof) of the impossibility of angle trisection by straightedge and compass, that avoids explicit mention of Galois theory (though it is never far beneath the surface). With “cheats”, the proof is actually quite simple and geometric (except for Lemma 2, which is still used at a crucial juncture), based on the basic geometric concept of monodromy; unfortunately, some technical work is needed however to remove these cheats.

To describe the intuitive idea of the proof, let us return to the angle bisection construction, that takes a triple ${A, B, C}$ of points as input and returns a bisecting line ${\ell}$ as output. We iterate the construction to create a quadrisecting line ${m}$, via the following sequence of steps that extend the original bisection construction:

1. Start with three points ${A, B, C}$.
2. Form the circle ${c_0}$ with centre ${A}$ and radius ${AB}$, and intersect it with the line ${\overline{AC}}$. Let ${D}$ be the point in this intersection that lies on the same side of ${A}$ as ${C}$. (${D}$ may well be equal to ${C}$).
3. Form the circle ${c_1}$ with centre ${B}$ and radius ${AB}$, and the circle ${c_2}$ with centre ${D}$ and radius ${AB}$. Let ${E}$ be the point of intersection of ${c_1}$ and ${c_2}$ that is not ${A}$.
4. Let ${F}$ be the point on the line ${\ell := \overline{AE}}$ which lies on ${c_0}$, and is on the same side of ${A}$ as ${E}$.
5. Form the circle ${c_3}$ with centre ${F}$ and radius ${AB}$. Let ${G}$ be the point of intersection of ${c_1}$ and ${c_3}$ that is not ${A}$.
6. The line ${m := \overline{AG}}$ will then quadrisect the angle ${\angle BAC}$.

Let us fix the points ${A}$ and ${B}$, but not ${C}$, and view ${m}$ (as well as intermediate objects such as ${D}$, ${c_2}$, ${E}$, ${\ell}$, ${F}$, ${c_3}$, ${G}$) as a function of ${C}$.

Let us now do the following: we begin rotating ${C}$ counterclockwise around ${A}$, which drags around the other objects ${D}$, ${c_2}$, ${E}$, ${\ell}$, ${F}$, ${c_3}$, ${G}$ that were constructed by ${C}$ accordingly. For instance, here is an early stage of this rotation process, when the angle ${\angle BAC}$ has become obtuse:

Now for the slightly tricky bit. We are going to keep rotating ${C}$ beyond a half-rotation of ${180^\circ}$, so that ${\angle BAC}$ now becomes a reflex angle. At this point, a singularity occurs; the point ${E}$ collides into ${A}$, and so there is an instant in which the line ${\ell = \overline{AE}}$ is not well-defined. However, this turns out to be a removable singularity (and the easiest way to demonstrate this will be to tap the power of complex analysis, as complex numbers can easily route around such a singularity), and we can blast through it to the other side, giving a picture like this:

Note that we have now deviated from the original construction in that ${F}$ and ${E}$ are no longer on the same side of ${A}$; we are thus now working in a continuation of that construction rather than with the construction itself. Nevertheless, we can still work with this continuation (much as, say, one works with analytic continuations of infinite series such as ${\sum_{n=1}^\infty \frac{1}{n^s}}$ beyond their original domain of definition).

We now keep rotating ${C}$ around ${A}$. Here, ${\angle BAC}$ is approaching a full rotation of ${360^\circ}$:

When ${\angle BAC}$ reaches a full rotation, a different singularity occurs: ${c_1}$ and ${c_2}$ coincide. Nevertheless, this is also a removable singularity, and we blast through to beyond a full rotation:

And now ${C}$ is back where it started, as are ${D}$, ${c_2}$, ${E}$, and ${\ell}$… but the point ${F}$ has moved, from one intersection point of ${\ell \cap c_3}$ to the other. As a consequence, ${c_3}$, ${G}$, and ${m}$ have also changed, with ${m}$ being at right angles to where it was before. (In the jargon of modern mathematics, the quadrisection construction has a non-trivial monodromy.)

But nothing stops us from rotating ${C}$ some more. If we continue this procedure, we see that after two full rotations of ${C}$ around ${A}$, all points, lines, and circles constructed from ${A, B, C}$ have returned to their original positions. Because of this, we shall say that the quadrisection construction described above is periodic with period ${2}$.

Similarly, if one performs an octisection of the angle ${\angle BAC}$ by bisecting the quadrisection, one can verify that this octisection is periodic with period ${4}$; it takes four full rotations of ${C}$ around ${A}$ before the configuration returns to where it started. More generally, one can show

Proposition 4 Any construction of straightedge and compass from the points ${A,B,C}$ is periodic with period equal to a power of ${2}$.

The reason for this, ultimately, is because any two circles or lines will intersect each other in at most two points, and so at each step of a straightedge-and-compass construction there is an ambiguity of at most ${2! = 2}$. Each rotation of ${C}$ around ${A}$ can potentially flip one of these points to the other, but then if one rotates again, the point returns to its original position, and then one can analyse the next point in the construction in the same fashion until one obtains the proposition.

But now consider a putative trisection operation, that starts with an arbitrary angle ${\angle BAC}$ and somehow uses some sequence of straightedge and compass constructions to end up with a trisecting line ${\ell}$:

What is the period of this construction? If we continuously rotate ${C}$ around ${A}$, we observe that a full rotations of ${C}$ only causes the trisecting line ${\ell}$ to rotate by a third of a full rotation (i.e. by ${120^\circ}$):

Because of this, we see that the period of any construction that contains ${\ell}$ must be a multiple of ${3}$. But this contradicts Proposition 4 and Lemma 2.

Below the fold, I will make the above proof rigorous. Unfortunately, in doing so, I had to again leave the world of high-school mathematics, as one needs a little bit of algebraic geometry and complex analysis to resolve the issues with singularities that we saw in the above sketch. Still, I feel that at an intuitive level at least, this argument is more geometric and accessible than the Galois-theoretic argument (though anyone familiar with Galois theory will note that there is really not that much difference between the proofs, ultimately, as one has simply replaced the Galois group with a closely related monodromy group instead).

The Riemann zeta function ${\zeta(s)}$ is defined in the region ${\hbox{Re}(s)>1}$ by the absolutely convergent series

$\displaystyle \zeta(s) = \sum_{n=1}^\infty \frac{1}{n^s} = 1 + \frac{1}{2^s} + \frac{1}{3^s} + \ldots. \ \ \ \ \ (1)$

Thus, for instance, it is known that ${\zeta(2)=\pi^2/6}$, and thus

$\displaystyle \sum_{n=1}^\infty \frac{1}{n^2} = 1 + \frac{1}{4} + \frac{1}{9} + \ldots = \frac{\pi^2}{6}. \ \ \ \ \ (2)$

For ${\hbox{Re}(s) \leq 1}$, the series on the right-hand side of (1) is no longer absolutely convergent, or even conditionally convergent. Nevertheless, the ${\zeta}$ function can be extended to this region (with a pole at ${s=1}$) by analytic continuation. For instance, it can be shown that after analytic continuation, one has ${\zeta(0) = -1/2}$, ${\zeta(-1) = -1/12}$, and ${\zeta(-2)=0}$, and more generally

$\displaystyle \zeta(-s) = - \frac{B_{s+1}}{s+1} \ \ \ \ \ (3)$

for ${s=1,2,\ldots}$, where ${B_n}$ are the Bernoulli numbers. If one formally applies (1) at these values of ${s}$, one obtains the somewhat bizarre formulae

$\displaystyle \sum_{n=1}^\infty 1 = 1 + 1 + 1 + \ldots = -1/2 \ \ \ \ \ (4)$

$\displaystyle \sum_{n=1}^\infty n = 1 + 2 + 3 + \ldots = -1/12 \ \ \ \ \ (5)$

$\displaystyle \sum_{n=1}^\infty n^2 = 1 + 4 + 9 + \ldots = 0 \ \ \ \ \ (6)$

and

$\displaystyle \sum_{n=1}^\infty n^s = 1 + 2^s + 3^s + \ldots = -\frac{B_{s+1}}{s+1}. \ \ \ \ \ (7)$

Clearly, these formulae do not make sense if one stays within the traditional way to evaluate infinite series, and so it seems that one is forced to use the somewhat unintuitive analytic continuation interpretation of such sums to make these formulae rigorous. But as it stands, the formulae look “wrong” for several reasons. Most obviously, the summands on the left are all positive, but the right-hand sides can be zero or negative. A little more subtly, the identities do not appear to be consistent with each other. For instance, if one adds (4) to (5), one obtains

$\displaystyle \sum_{n=1}^\infty (n+1) = 2 + 3 + 4 + \ldots = -7/12 \ \ \ \ \ (8)$

whereas if one subtracts ${1}$ from (5) one obtains instead

$\displaystyle \sum_{n=2}^\infty n = 0 + 2 + 3 + 4 + \ldots = -13/12 \ \ \ \ \ (9)$

and the two equations seem inconsistent with each other.

However, it is possible to interpret (4), (5), (6) by purely real-variable methods, without recourse to complex analysis methods such as analytic continuation, thus giving an “elementary” interpretation of these sums that only requires undergraduate calculus; we will later also explain how this interpretation deals with the apparent inconsistencies pointed out above.

To see this, let us first consider a convergent sum such as (2). The classical interpretation of this formula is the assertion that the partial sums

$\displaystyle \sum_{n=1}^N \frac{1}{n^2} = 1 + \frac{1}{4} + \frac{1}{9} + \ldots + \frac{1}{N^2}$

converge to ${\pi^2/6}$ as ${N \rightarrow \infty}$, or in other words that

$\displaystyle \sum_{n=1}^N \frac{1}{n^2} = \frac{\pi^2}{6} + o(1)$

where ${o(1)}$ denotes a quantity that goes to zero as ${N \rightarrow \infty}$. Actually, by using the integral test estimate

$\displaystyle \sum_{n=N+1}^\infty \frac{1}{n^2} \leq \int_N^\infty \frac{dx}{x^2} = \frac{1}{N}$

we have the sharper result

$\displaystyle \sum_{n=1}^N \frac{1}{n^2} = \frac{\pi^2}{6} + O(\frac{1}{N}).$

Thus we can view ${\frac{\pi^2}{6}}$ as the leading coefficient of the asymptotic expansion of the partial sums of ${\sum_{n=1}^\infty 1/n^2}$.

One can then try to inspect the partial sums of the expressions in (4), (5), (6), but the coefficients bear no obvious relationship to the right-hand sides:

$\displaystyle \sum_{n=1}^N 1 = N$

$\displaystyle \sum_{n=1}^N n = \frac{1}{2} N^2 + \frac{1}{2} N$

$\displaystyle \sum_{n=1}^N n^2 = \frac{1}{3} N^3 + \frac{1}{2} N^2 + \frac{1}{6} N.$

For (7), the classical Faulhaber formula (or Bernoulli formula) gives

$\displaystyle \sum_{n=1}^N n^s = \frac{1}{s+1} \sum_{j=0}^s \binom{s+1}{j} B_j N^{s+1-j} \ \ \ \ \ (10)$

$\displaystyle = \frac{1}{s+1} N^{s+1} + \frac{1}{2} N^s + \frac{s}{12} N^{s-1} + \ldots + B_s N$

for ${s \geq 2}$, which has a vague resemblance to (7), but again the connection is not particularly clear.

The problem here is the discrete nature of the partial sum

$\displaystyle \sum_{n=1}^N n^s = \sum_{n \leq N} n^s,$

which (if ${N}$ is viewed as a real number) has jump discontinuities at each positive integer value of ${N}$. These discontinuities yield various artefacts when trying to approximate this sum by a polynomial in ${N}$. (These artefacts also occur in (2), but happen in that case to be obscured in the error term ${O(1/N)}$; but for the divergent sums (4), (5), (6), (7), they are large enough to cause real trouble.)

However, these issues can be resolved by replacing the abruptly truncated partial sums ${\sum_{n=1}^N n^s}$ with smoothed sums ${\sum_{n=1}^\infty \eta(n/N) n^s}$, where ${\eta: {\bf R}^+ \rightarrow {\bf R}}$ is a cutoff function, or more precisely a compactly supported bounded function that equals ${1}$ at ${0}$. The case when ${\eta}$ is the indicator function ${1_{[0,1]}}$ then corresponds to the traditional partial sums, with all the attendant discretisation artefacts; but if one chooses a smoother cutoff, then these artefacts begin to disappear (or at least become lower order), and the true asymptotic expansion becomes more manifest.

Note that smoothing does not affect the asymptotic value of sums that were already absolutely convergent, thanks to the dominated convergence theorem. For instance, we have

$\displaystyle \sum_{n=1}^\infty \eta(n/N) \frac{1}{n^2} = \frac{\pi^2}{6} + o(1)$

whenever ${\eta}$ is a cutoff function (since ${\eta(n/N) \rightarrow 1}$ pointwise as ${N \rightarrow \infty}$ and is uniformly bounded). If ${\eta}$ is equal to ${1}$ on a neighbourhood of the origin, then the integral test argument then recovers the ${O(1/N)}$ decay rate:

$\displaystyle \sum_{n=1}^\infty \eta(n/N) \frac{1}{n^2} = \frac{\pi^2}{6} + O(\frac{1}{N}).$

However, smoothing can greatly improve the convergence properties of a divergent sum. The simplest example is Grandi’s series

$\displaystyle \sum_{n=1}^\infty (-1)^{n-1} = 1 - 1 + 1 - \ldots.$

The partial sums

$\displaystyle \sum_{n=1}^N (-1)^{n-1} = \frac{1}{2} + \frac{1}{2} (-1)^{N-1}$

oscillate between ${1}$ and ${0}$, and so this series is not conditionally convergent (and certainly not absolutely convergent). However, if one performs analytic continuation on the series

$\displaystyle \sum_{n=1}^\infty \frac{(-1)^{n-1}}{n^s} = 1 - \frac{1}{2^s} + \frac{1}{3^s} - \ldots$

and sets ${s = 0}$, one obtains a formal value of ${1/2}$ for this series. This value can also be obtained by smooth summation. Indeed, for any cutoff function ${\eta}$, we can regroup

$\displaystyle \sum_{n=1}^\infty (-1)^{n-1} \eta(n/N) =$

$\displaystyle \frac{\eta(1/N)}{2} + \sum_{m=1}^\infty \frac{\eta((2m-1)/N) - 2\eta(2m/N) + \eta((2m+1)/N)}{2}.$

If ${\eta}$ is twice continuously differentiable (i.e. ${\eta \in C^2}$), then from Taylor expansion we see that the summand has size ${O(1/N^2)}$, and also (from the compact support of ${\eta}$) is only non-zero when ${m=O(N)}$. This leads to the asymptotic

$\displaystyle \sum_{n=1}^\infty (-1)^{n-1} \eta(n/N) = \frac{1}{2} + O( \frac{1}{N} )$

and so we recover the value of ${1/2}$ as the leading term of the asymptotic expansion.

Exercise 1 Show that if ${\eta}$ is merely once continuously differentiable (i.e. ${\eta \in C^1}$), then we have a similar asymptotic, but with an error term of ${o(1)}$ instead of ${O(1/N)}$. This is an instance of a more general principle that smoother cutoffs lead to better error terms, though the improvement sometimes stops after some degree of regularity.

Remark 1 The most famous instance of smoothed summation is Cesáro summation, which corresponds to the cutoff function ${\eta(x) := (1-x)_+}$. Unsurprisingly, when Cesáro summation is applied to Grandi’s series, one again recovers the value of ${1/2}$.

If we now revisit the divergent series (4), (5), (6), (7) with smooth summation in mind, we finally begin to see the origin of the right-hand sides. Indeed, for any fixed smooth cutoff function ${\eta}$, we will shortly show that

$\displaystyle \sum_{n=1}^\infty \eta(n/N) = -\frac{1}{2} + C_{\eta,0} N + O(\frac{1}{N}) \ \ \ \ \ (11)$

$\displaystyle \sum_{n=1}^\infty n \eta(n/N) = -\frac{1}{12} + C_{\eta,1} N^2 + O(\frac{1}{N}) \ \ \ \ \ (12)$

$\displaystyle \sum_{n=1}^\infty n^2 \eta(n/N) = C_{\eta,2} N^3 + O(\frac{1}{N}) \ \ \ \ \ (13)$

and more generally

$\displaystyle \sum_{n=1}^\infty n^s \eta(n/N) = -\frac{B_{s+1}}{s+1} + C_{\eta,s} N^{s+1} + O(\frac{1}{N}) \ \ \ \ \ (14)$

for any fixed ${s=1,2,3,\ldots}$ where ${C_{\eta,s}}$ is the Archimedean factor

$\displaystyle C_{\eta,s} := \int_0^\infty x^s \eta(x)\ dx \ \ \ \ \ (15)$

(which is also essentially the Mellin transform of ${\eta}$). Thus we see that the values (4), (5), (6), (7) obtained by analytic continuation are nothing more than the constant terms of the asymptotic expansion of the smoothed partial sums. This is not a coincidence; we will explain the equivalence of these two interpretations of such sums (in the model case when the analytic continuation has only finitely many poles and does not grow too fast at infinity) below the fold.

This interpretation clears up the apparent inconsistencies alluded to earlier. For instance, the sum ${\sum_{n=1}^\infty n = 1 + 2 + 3 + \ldots}$ consists only of non-negative terms, as does its smoothed partial sums ${\sum_{n=1}^\infty n \eta(n/N)}$ (if ${\eta}$ is non-negative). Comparing this with (13), we see that this forces the highest-order term ${C_{\eta,1} N^2}$ to be non-negative (as indeed it is), but does not prohibit the lower-order constant term ${-\frac{1}{12}}$ from being negative (which of course it is).

Similarly, if we add together (12) and (11) we obtain

$\displaystyle \sum_{n=1}^\infty (n+1) \eta(n/N) = -\frac{7}{12} + C_{\eta,1} N^2 + C_{\eta,0} N + O(\frac{1}{N}) \ \ \ \ \ (16)$

while if we subtract ${1}$ from (12) we obtain

$\displaystyle \sum_{n=2}^\infty n \eta(n/N) = -\frac{13}{12} + C_{\eta,1} N^2 + O(\frac{1}{N}). \ \ \ \ \ (17)$

These two asymptotics are not inconsistent with each other; indeed, if we shift the index of summation in (17), we can write

$\displaystyle \sum_{n=2}^\infty n \eta(n/N) = \sum_{n=1}^\infty (n+1) \eta((n+1)/N) \ \ \ \ \ (18)$

and so we now see that the discrepancy between the two sums in (8), (9) come from the shifting of the cutoff ${\eta(n/N)}$, which is invisible in the formal expressions in (8), (9) but become manifestly present in the smoothed sum formulation.

Exercise 2 By Taylor expanding ${\eta(n+1/N)}$ and using (11), (18) show that (16) and (17) are indeed consistent with each other, and in particular one can deduce the latter from the former.

Jean-Pierre Serre (whose papers are, of course, always worth reading) recently posted a lovely lecture on the arXiv entitled “How to use finite fields for problems concerning infinite fields”. In it, he describes several ways in which algebraic statements over fields of zero characteristic, such as ${{\mathbb C}}$, can be deduced from their positive characteristic counterparts such as ${F_{p^m}}$, despite the fact that there is no non-trivial field homomorphism between the two types of fields. In particular finitary tools, including such basic concepts as cardinality, can now be deployed to establish infinitary results. This leads to some simple and elegant proofs of non-trivial algebraic results which are not easy to establish by other means.

One deduction of this type is based on the idea that positive characteristic fields can partially model zero characteristic fields, and proceeds like this: if a certain algebraic statement failed over (say) ${{\mathbb C}}$, then there should be a “finitary algebraic” obstruction that “witnesses” this failure over ${{\mathbb C}}$. Because this obstruction is both finitary and algebraic, it must also be definable in some (large) finite characteristic, thus leading to a comparable failure over a finite characteristic field. Taking contrapositives, one obtains the claim.

Algebra is definitely not my own field of expertise, but it is interesting to note that similar themes have also come up in my own area of additive combinatorics (and more generally arithmetic combinatorics), because the combinatorics of addition and multiplication on finite sets is definitely of a “finitary algebraic” nature. For instance, a recent paper of Vu, Wood, and Wood establishes a finitary “Freiman-type” homomorphism from (finite subsets of) the complex numbers to large finite fields that allows them to pull back many results in arithmetic combinatorics in finite fields (e.g. the sum-product theorem) to the complex plane. (Van Vu and I also used a similar trick to control the singularity property of random sign matrices by first mapping them into finite fields in which cardinality arguments became available.) And I have a particular fondness for correspondences between finitary and infinitary mathematics; the correspondence Serre discusses is slightly different from the one I discuss for instance in here or here, although there seems to be a common theme of “compactness” (or of model theory) tying these correspondences together.

As one of his examples, Serre cites one of my own favourite results in algebra, discovered independently by Ax and by Grothendieck (and then rediscovered many times since). Here is a special case of that theorem:

Theorem 1 (Ax-Grothendieck theorem, special case) Let ${P: {\mathbb C}^n \rightarrow {\mathbb C}^n}$ be a polynomial map from a complex vector space to itself. If ${P}$ is injective, then ${P}$ is bijective.

The full version of the theorem allows one to replace ${{\mathbb C}^n}$ by an algebraic variety ${X}$ over any algebraically closed field, and for ${P}$ to be an morphism from the algebraic variety ${X}$ to itself, but for simplicity I will just discuss the above special case. This theorem is not at all obvious; it is not too difficult (see Lemma 4 below) to show that the Jacobian of ${P}$ is non-degenerate, but this does not come close to solving the problem since one would then be faced with the notorious Jacobian conjecture. Also, the claim fails if “polynomial” is replaced by “holomorphic”, due to the existence of Fatou-Bieberbach domains.

In this post I would like to give the proof of Theorem 1 based on finite fields as mentioned by Serre, as well as another elegant proof of Rudin that combines algebra with some elementary complex variable methods. (There are several other proofs of this theorem and its generalisations, for instance a topological proof by Borel, which I will not discuss here.)

Update, March 8: Some corrections to the finite field proof. Thanks to Matthias Aschenbrenner also for clarifying the relationship with Tarski’s theorem and some further references.

[This post was typeset using a LaTeX to WordPress-HTML converter kindly provided to me by Luca Trevisan.]

Many properties of a (sufficiently nice) function ${f: {\mathbb R} \rightarrow {\mathbb C}}$ are reflected in its Fourier transform ${\hat f: {\mathbb R} \rightarrow {\mathbb C}}$, defined by the formula

$\displaystyle \hat f(\xi) := \int_{-\infty}^\infty f(x) e^{-2\pi i x \xi}\ dx. \ \ \ \ \ (1)$

For instance, decay properties of ${f}$ are reflected in smoothness properties of ${\hat f}$, as the following table shows:

 If ${f}$ is… then ${\hat f}$ is… and this relates to… Square-integrable square-integrable Plancherel’s theorem Absolutely integrable continuous Riemann-Lebesgue lemma Rapidly decreasing smooth theory of Schwartz functions Exponentially decreasing analytic in a strip Compactly supported entire and at most exponential growth Paley-Wiener theorem

Another important relationship between a function ${f}$ and its Fourier transform ${\hat f}$ is the uncertainty principle, which roughly asserts that if a function ${f}$ is highly localised in space, then its Fourier transform ${\hat f}$ must be widely dispersed in space, or to put it another way, ${f}$ and ${\hat f}$ cannot both decay too strongly at infinity (except of course in the degenerate case ${f=0}$). There are many ways to make this intuition precise. One of them is the Heisenberg uncertainty principle, which asserts that if we normalise

$\displaystyle \int_{{\mathbb R}} |f(x)|^2\ dx = \int_{\mathbb R} |\hat f(\xi)|^2\ d\xi = 1$

then we must have

$\displaystyle (\int_{\mathbb R} |x|^2 |f(x)|^2\ dx) \cdot (\int_{\mathbb R} |\xi|^2 |\hat f(\xi)|^2\ dx)\geq \frac{1}{(4\pi)^2}$

thus forcing at least one of ${f}$ or ${\hat f}$ to not be too concentrated near the origin. This principle can be proven (for sufficiently nice ${f}$, initially) by observing the integration by parts identity

$\displaystyle \langle xf, f' \rangle = \int_{\mathbb R} x f(x) \overline{f'(x)}\ dx = - \frac{1}{2} \int_{\mathbb R} |f(x)|^2\ dx$

and then using Cauchy-Schwarz and the Plancherel identity.

Another well known manifestation of the uncertainty principle is the fact that it is not possible for ${f}$ and ${\hat f}$ to both be compactly supported (unless of course they vanish entirely). This can be in fact be seen from the above table: if ${f}$ is compactly supported, then ${\hat f}$ is an entire function; but the zeroes of a non-zero entire function are isolated, yielding a contradiction unless ${f}$ vanishes. (Indeed, the table also shows that if one of ${f}$ and ${\hat f}$ is compactly supported, then the other cannot have exponential decay.)

On the other hand, we have the example of the Gaussian functions ${f(x) = e^{-\pi a x^2}}$, ${\hat f(\xi) = \frac{1}{\sqrt{a}} e^{-\pi \xi^2/a }}$, which both decay faster than exponentially. The classical Hardy uncertainty principle asserts, roughly speaking, that this is the fastest that ${f}$ and ${\hat f}$ can simultaneously decay:

Theorem 1 (Hardy uncertainty principle) Suppose that ${f}$ is a (measurable) function such that ${|f(x)| \leq C e^{-\pi a x^2 }}$ and ${|\hat f(\xi)| \leq C' e^{-\pi \xi^2/a}}$ for all ${x, \xi}$ and some ${C, C', a > 0}$. Then ${f(x)}$ is a scalar multiple of the gaussian ${e^{-\pi ax^2}}$.

This theorem is proven by complex-analytic methods, in particular the Phragmén-Lindelöf principle; for sake of completeness we give that proof below. But I was curious to see if there was a real-variable proof of the same theorem, avoiding the use of complex analysis. I was able to find the proof of a slightly weaker theorem:

Theorem 2 (Weak Hardy uncertainty principle) Suppose that ${f}$ is a non-zero (measurable) function such that ${|f(x)| \leq C e^{-\pi a x^2 }}$ and ${|\hat f(\xi)| \leq C' e^{-\pi b \xi^2}}$ for all ${x, \xi}$ and some ${C, C', a, b > 0}$. Then ${ab \leq C_0}$ for some absolute constant ${C_0}$.

Note that the correct value of ${C_0}$ should be ${1}$, as is implied by the true Hardy uncertainty principle. Despite the weaker statement, I thought the proof might still might be of interest as it is a little less “magical” than the complex-variable one, and so I am giving it below.

In the second of the Distinguished Lecture Series given by Eli Stein here at UCLA, Eli expanded on the themes in the first lecture, in particular providing more details as to the recent (not yet published) results of Lanzani and Stein on the boundedness of the Cauchy integral on domains in several complex variables.