You are currently browsing the category archive for the ‘math.CV’ category.

By an odd coincidence, I stumbled upon a second question in as many weeks about power series, and once again the only way I know how to prove the result is by complex methods; once again, I am leaving it here as a challenge to any interested readers, and I would be particularly interested in knowing of a proof that was not based on complex analysis (or thinly disguised versions thereof), or for a reference to previous literature where something like this identity has occured. (I suspect for instance that something like this may have shown up before in free probability, based on the answer to part (ii) of the problem.)

Here is a purely algebraic form of the problem:

Problem 1 Let ${F = F(z)}$ be a formal function of one variable ${z}$. Suppose that ${G = G(z)}$ is the formal function defined by

$\displaystyle G := \sum_{n=1}^\infty \left( \frac{F^n}{n!} \right)^{(n-1)}$

$\displaystyle = F + \left(\frac{F^2}{2}\right)' + \left(\frac{F^3}{6}\right)'' + \dots$

$\displaystyle = F + FF' + (F (F')^2 + \frac{1}{2} F^2 F'') + \dots,$

where we use ${f^{(k)}}$ to denote the ${k}$-fold derivative of ${f}$ with respect to the variable ${z}$.

• (i) Show that ${F}$ can be formally recovered from ${G}$ by the formula

$\displaystyle F = \sum_{n=1}^\infty (-1)^{n-1} \left( \frac{G^n}{n!} \right)^{(n-1)}$

$\displaystyle = G - \left(\frac{G^2}{2}\right)' + \left(\frac{G^3}{6}\right)'' - \dots$

$\displaystyle = G - GG' + (G (G')^2 + \frac{1}{2} G^2 G'') - \dots.$

• (ii) There is a remarkable further formal identity relating ${F(z)}$ with ${G(z)}$ that does not explicitly involve any infinite summation. What is this identity?

To rigorously formulate part (i) of this problem, one could work in the commutative differential ring of formal infinite series generated by polynomial combinations of ${F}$ and its derivatives (with no constant term). Part (ii) is a bit trickier to formulate in this abstract ring; the identity in question is easier to state if ${F, G}$ are formal power series, or (even better) convergent power series, as it involves operations such as composition or inversion that can be more easily defined in those latter settings.

To illustrate Problem 1(i), let us compute up to third order in ${F}$, using ${{\mathcal O}(F^4)}$ to denote any quantity involving four or more factors of ${F}$ and its derivatives, and similarly for other exponents than ${4}$. Then we have

$\displaystyle G = F + FF' + (F (F')^2 + \frac{1}{2} F^2 F'') + {\mathcal O}(F^4)$

and hence

$\displaystyle G' = F' + (F')^2 + FF'' + {\mathcal O}(F^3)$

$\displaystyle G'' = F'' + {\mathcal O}(F^2);$

multiplying, we have

$\displaystyle GG' = FF' + F (F')^2 + F^2 F'' + F (F')^2 + {\mathcal O}(F^4)$

and

$\displaystyle G (G')^2 + \frac{1}{2} G^2 G'' = F (F')^2 + \frac{1}{2} F^2 F'' + {\mathcal O}(F^4)$

and hence after a lot of canceling

$\displaystyle G - GG' + (G (G')^2 + \frac{1}{2} G^2 G'') = F + {\mathcal O}(F^4).$

Thus Problem 1(i) holds up to errors of ${{\mathcal O}(F^4)}$ at least. In principle one can continue verifying Problem 1(i) to increasingly high order in ${F}$, but the computations rapidly become quite lengthy, and I do not know of a direct way to ensure that one always obtains the required cancellation at the end of the computation.

Problem 1(i) can also be posed in formal power series: if

$\displaystyle F(z) = a_1 z + a_2 z^2 + a_3 z^3 + \dots$

is a formal power series with no constant term with complex coefficients ${a_1, a_2, \dots}$ with ${|a_1|<1}$, then one can verify that the series

$\displaystyle G := \sum_{n=1}^\infty \left( \frac{F^n}{n!} \right)^{(n-1)}$

makes sense as a formal power series with no constant term, thus

$\displaystyle G(z) = b_1 z + b_2 z^2 + b_3 z^3 + \dots.$

For instance it is not difficult to show that ${b_1 = \frac{a_1}{1-a_1}}$. If one further has ${|b_1| < 1}$, then it turns out that

$\displaystyle F = \sum_{n=1}^\infty (-1)^{n-1} \left( \frac{G^n}{n!} \right)^{(n-1)}$

as formal power series. Currently the only way I know how to show this is by first proving the claim for power series with a positive radius of convergence using the Cauchy integral formula, but even this is a bit tricky unless one has managed to guess the identity in (ii) first. (In fact, the way I discovered this problem was by first trying to solve (a variant of) the identity in (ii) by Taylor expansion in the course of attacking another problem, and obtaining the transform in Problem 1 as a consequence.)

The transform that takes ${F}$ to ${G}$ resembles both the exponential function

$\displaystyle \exp(F) = \sum_{n=0}^\infty \frac{F^n}{n!}$

and Taylor’s formula

$\displaystyle F(z) = \sum_{n=0}^\infty \frac{F^{(n)}(0)}{n!} z^n$

but does not seem to be directly connected to either (this is more apparent once one knows the identity in (ii)).

In the previous set of notes we introduced the notion of a complex diffeomorphism ${f: U \rightarrow V}$ between two open subsets ${U,V}$ of the complex plane ${{\bf C}}$ (or more generally, two Riemann surfaces): an invertible holomorphic map whose inverse was also holomorphic. (Actually, the last part is automatic, thanks to Exercise 40 of Notes 4.) Such maps are also known as biholomorphic maps or conformal maps (although in some literature the notion of “conformal map” is expanded to permit maps such as the complex conjugation map ${z \mapsto \overline{z}}$ that are angle-preserving but not orientation-preserving, as well as maps such as the exponential map ${z \mapsto \exp(z)}$ from ${{\bf C}}$ to ${{\bf C} \backslash \{0\}}$ that are only locally injective rather than globally injective). Such complex diffeomorphisms can be used in complex analysis (or in the analysis of harmonic functions) to change the underlying domain ${U}$ to a domain that may be more convenient for calculations, thanks to the following basic lemma:

Lemma 1 (Holomorphicity and harmonicity are conformal invariants) Let ${\phi: U \rightarrow V}$ be a complex diffeomorphism between two Riemann surfaces ${U,V}$.

• (i) If ${f: V \rightarrow W}$ is a function to another Riemann surface ${W}$, then ${f}$ is holomorphic if and only if ${f \circ \phi: U \rightarrow W}$ is holomorphic.
• (ii) If ${U,V}$ are open subsets of ${{\bf C}}$ and ${u: V \rightarrow {\bf R}}$ is a function, then ${u}$ is harmonic if and only if ${u \circ \phi: U \rightarrow {\bf R}}$ is harmonic.

Proof: Part (i) is immediate since the composition of two holomorphic functions is holomorphic. For part (ii), observe that if ${u: V \rightarrow {\bf R}}$ is harmonic then on any ball ${B(z_0,r)}$ in ${V}$, ${u}$ is the real part of some holomorphic function ${f: B(z_0,r) \rightarrow {\bf C}}$ thanks to Exercise 62 of Notes 3. By part (i), ${f \circ \phi: B(z_0,r) \rightarrow {\bf C}}$ is also holomorphic. Taking real parts we see that ${u \circ \phi}$ is harmonic on each ball ${B(z_0,r)}$ in ${V}$, and hence harmonic on all of ${V}$, giving one direction of (ii); the other direction is proven similarly. $\Box$

Exercise 2 Establish Lemma 1(ii) by direct calculation, avoiding the use of holomorphic functions. (Hint: the calculations are cleanest if one uses Wirtinger derivatives, as per Exercise 27 of Notes 1.)

Exercise 3 Let ${\phi: U \rightarrow V}$ be a complex diffeomorphism between two open subsets ${U,V}$ of ${{\bf C}}$, let ${z_0}$ be a point in ${U}$, let ${m}$ be a natural number, and let ${f: V \rightarrow {\bf C} \cup \{\infty\}}$ be holomorphic. Show that ${f: V \rightarrow {\bf C} \cup \{\infty\}}$ has a zero (resp. a pole) of order ${m}$ at ${\phi(z_0)}$ if and only if ${f \circ \phi: U \rightarrow {\bf C} \cup \{\infty\}}$ has a zero (resp. a pole) of order ${m}$ at ${z_0}$.

From Lemma 1(ii) we can now define the notion of a harmonic function ${u: M \rightarrow {\bf R}}$ on a Riemann surface ${M}$; such a function ${u}$ is harmonic if, for every coordinate chart ${\phi_\alpha: U_\alpha \rightarrow V_\alpha}$ in some atlas, the map ${u \circ \phi_\alpha^{-1}: V_\alpha \rightarrow {\bf R}}$ is harmonic. Lemma 1(ii) ensures that this definition of harmonicity does not depend on the choice of atlas. Similarly, using Exercise 3 one can define what it means for a holomorphic map ${f: M \rightarrow {\bf C} \cup \{\infty\}}$ on a Riemann surface ${M}$ to have a pole or zero of a given order at a point ${p_0 \in M}$, with the definition being independent of the choice of atlas.

In view of Lemma 1, it is thus natural to ask which Riemann surfaces are complex diffeomorphic to each other, and more generally to understand the space of holomorphic maps from one given Riemann surface to another. We will initially focus attention on three important model Riemann surfaces:

• (i) (Elliptic model) The Riemann sphere ${{\bf C} \cup \{\infty\}}$;
• (ii) (Parabolic model) The complex plane ${{\bf C}}$; and
• (iii) (Hyperbolic model) The unit disk ${D(0,1)}$.

The designation of these model Riemann surfaces as elliptic, parabolic, and hyperbolic comes from Riemannian geometry, where it is natural to endow each of these surfaces with a constant curvature Riemannian metric which is positive, zero, or negative in the elliptic, parabolic, and hyperbolic cases respectively. However, we will not discuss Riemannian geometry further here.

All three model Riemann surfaces are simply connected, but none of them are complex diffeomorphic to any other; indeed, there are no non-constant holomorphic maps from the Riemann sphere to the plane or the disk, nor are there any non-constant holomorphic maps from the plane to the disk (although there are plenty of holomorphic maps going in the opposite directions). The complex automorphisms (that is, the complex diffeomorphisms from a surface to itself) of each of the three surfaces can be classified explicitly. The automorphisms of the Riemann sphere turn out to be the Möbius transformations ${z \mapsto \frac{az+b}{cz+d}}$ with ${ad-bc \neq 0}$, also known as fractional linear transformations. The automorphisms of the complex plane are the linear transformations ${z \mapsto az+b}$ with ${a \neq 0}$, and the automorphisms of the disk are the fractional linear transformations of the form ${z \mapsto e^{i\theta} \frac{\alpha - z}{1 - \overline{\alpha} z}}$ for ${\theta \in {\bf R}}$ and ${\alpha \in D(0,1)}$. Holomorphic maps ${f: D(0,1) \rightarrow D(0,1)}$ from the disk ${D(0,1)}$ to itself that fix the origin obey a basic but incredibly important estimate known as the Schwarz lemma: they are “dominated” by the identity function ${z \mapsto z}$ in the sense that ${|f(z)| \leq |z|}$ for all ${z \in D(0,1)}$. Among other things, this lemma gives guidance to determine when a given Riemann surface is complex diffeomorphic to a disk; we shall discuss this point further below.

It is a beautiful and fundamental fact in complex analysis that these three model Riemann surfaces are in fact an exhaustive list of the simply connected Riemann surfaces, up to complex diffeomorphism. More precisely, we have the Riemann mapping theorem and the uniformisation theorem:

Theorem 4 (Riemann mapping theorem) Let ${U}$ be a simply connected open subset of ${{\bf C}}$ that is not all of ${{\bf C}}$. Then ${U}$ is complex diffeomorphic to ${D(0,1)}$.

Theorem 5 (Uniformisation theorem) Let ${M}$ be a simply connected Riemann surface. Then ${M}$ is complex diffeomorphic to ${{\bf C} \cup \{\infty\}}$, ${{\bf C}}$, or ${D(0,1)}$.

As we shall see, every connected Riemann surface can be viewed as the quotient of its simply connected universal cover by a discrete group of automorphisms known as deck transformations. This in principle gives a complete classification of Riemann surfaces up to complex diffeomorphism, although the situation is still somewhat complicated in the hyperbolic case because of the wide variety of discrete groups of automorphisms available in that case.

We will prove the Riemann mapping theorem in these notes, using the elegant argument of Koebe that is based on the Schwarz lemma and Montel’s theorem (Exercise 57 of Notes 4). The uniformisation theorem is however more difficult to establish; we discuss some components of a proof (based on the Perron method of subharmonic functions) here, but stop short of providing a complete proof.

The above theorems show that it is in principle possible to conformally map various domains into model domains such as the unit disk, but the proofs of these theorems do not readily produce explicit conformal maps for this purpose. For some domains we can just write down a suitable such map. For instance:

Exercise 6 (Cayley transform) Let ${{\bf H} := \{ z \in {\bf C}: \mathrm{Im} z > 0 \}}$ be the upper half-plane. Show that the Cayley transform ${\phi: {\bf H} \rightarrow D(0,1)}$, defined by

$\displaystyle \phi(z) := \frac{z-i}{z+i},$

is a complex diffeomorphism from the upper half-plane ${{\bf H}}$ to the disk ${D(0,1)}$, with inverse map ${\phi^{-1}: D(0,1) \rightarrow {\bf H}}$ given by

$\displaystyle \phi^{-1}(w) := i \frac{1+w}{1-w}.$

Exercise 7 Show that for any real numbers ${a, the strip ${\{ z \in {\bf C}: a < \mathrm{Re}(z) < b \}}$ is complex diffeomorphic to the disk ${D(0,1)}$. (Hint: use the complex exponential and a linear transformation to map the strip onto the half-plane ${{\bf H}}$.)

We will discuss some other explicit conformal maps in this set of notes, such as the Schwarz-Christoffel maps that transform the upper half-plane ${{\bf H}}$ to polygonal regions. Further examples of conformal mapping can be found in the text of Stein-Shakarchi.

My colleague Tom Liggett recently posed to me the following problem about power series in one real variable ${x}$. Observe that the power series

$\displaystyle \sum_{n=0}^\infty (-1)^n\frac{x^n}{n!}$

has very rapidly decaying coefficients (of order ${O(1/n!)}$), leading to an infinite radius of convergence; also, as the series converges to ${e^{-x}}$, the series decays very rapidly as ${x}$ approaches ${+\infty}$. The problem is whether this is essentially the only example of this type. More precisely:

Problem 1 Let ${a_0, a_1, \dots}$ be a bounded sequence of real numbers, and suppose that the power series

$\displaystyle f(x) := \sum_{n=0}^\infty a_n\frac{x^n}{n!}$

(which has an infinite radius of convergence) decays like ${O(e^{-x})}$ as ${x \rightarrow +\infty}$, in the sense that the function ${e^x f(x)}$ remains bounded as ${x \rightarrow +\infty}$. Must the sequence ${a_n}$ be of the form ${a_n = C (-1)^n}$ for some constant ${C}$?

As it turns out, the problem has a very nice solution using complex analysis methods, which by coincidence I happen to be teaching right now. I am therefore posing as a challenge to my complex analysis students and to other readers of this blog to answer the above problem by complex methods; feel free to post solutions in the comments below (and in particular, if you don’t want to be spoiled, you should probably refrain from reading the comments). In fact, the only way I know how to solve this problem currently is by complex methods; I would be interested in seeing a purely real-variable solution that is not simply a thinly disguised version of a complex-variable argument.

(To be fair to my students, the complex variable argument does require one additional tool that is not directly covered in my notes. That tool can be found here.)

In the previous set of notes we saw that functions ${f: U \rightarrow {\bf C}}$ that were holomorphic on an open set ${U}$ enjoyed a large number of useful properties, particularly if the domain ${U}$ was simply connected. In many situations, though, we need to consider functions ${f}$ that are only holomorphic (or even well-defined) on most of a domain ${U}$, thus they are actually functions ${f: U \backslash S \rightarrow {\bf C}}$ outside of some small singular set ${S}$ inside ${U}$. (In this set of notes we only consider interior singularities; one can also discuss singular behaviour at the boundary of ${U}$, but this is a whole separate topic and will not be pursued here.) Since we have only defined the notion of holomorphicity on open sets, we will require the singular sets ${S}$ to be closed, so that the domain ${U \backslash S}$ on which ${f}$ remains holomorphic is still open. A typical class of examples are the functions of the form ${\frac{f(z)}{z-z_0}}$ that were already encountered in the Cauchy integral formula; if ${f: U \rightarrow {\bf C}}$ is holomorphic and ${z_0 \in U}$, such a function would be holomorphic save for a singularity at ${z_0}$. Another basic class of examples are the rational functions ${P(z)/Q(z)}$, which are holomorphic outside of the zeroes of the denominator ${Q}$.

Singularities come in varying levels of “badness” in complex analysis. The least harmful type of singularity is the removable singularity – a point ${z_0}$ which is an isolated singularity (i.e., an isolated point of the singular set ${S}$) where the function ${f}$ is undefined, but for which one can extend the function across the singularity in such a fashion that the function becomes holomorphic in a neighbourhood of the singularity. A typical example is that of the complex sinc function ${\frac{\sin(z)}{z}}$, which has a removable singularity at the origin ${0}$, which can be removed by declaring the sinc function to equal ${1}$ at ${0}$. The detection of isolated removable singularities can be accomplished by Riemann’s theorem on removable singularities (Exercise 35 from Notes 3): if a holomorphic function ${f: U \backslash S \rightarrow {\bf C}}$ is bounded near an isolated singularity ${z_0 \in S}$, then the singularity at ${z_0}$ may be removed.

After removable singularities, the mildest form of singularity one can encounter is that of a pole – an isolated singularity ${z_0}$ such that ${f(z)}$ can be factored as ${f(z) = \frac{g(z)}{(z-z_0)^m}}$ for some ${m \geq 1}$ (known as the order of the pole), where ${g}$ has a removable singularity at ${z_0}$ (and is non-zero at ${z_0}$ once the singularity is removed). Such functions have already made a frequent appearance in previous notes, particularly the case of simple poles when ${m=1}$. The behaviour near ${z_0}$ of function ${f}$ with a pole of order ${m}$ is well understood: for instance, ${|f(z)|}$ goes to infinity as ${z}$ approaches ${z_0}$ (at a rate comparable to ${|z-z_0|^{-m}}$). These singularities are not, strictly speaking, removable; but if one compactifies the range ${{\bf C}}$ of the holomorphic function ${f: U \backslash S \rightarrow {\bf C}}$ to a slightly larger space ${{\bf C} \cup \{\infty\}}$ known as the Riemann sphere, then the singularity can be removed. In particular, functions ${f: U \backslash S \rightarrow {\bf C}}$ which only have isolated singularities that are either poles or removable can be extended to holomorphic functions ${f: U \rightarrow {\bf C} \cup \{\infty\}}$ to the Riemann sphere. Such functions are known as meromorphic functions, and are nearly as well-behaved as holomorphic functions in many ways. In fact, in one key respect, the family of meromorphic functions is better: the meromorphic functions on ${U}$ turn out to form a field, in particular the quotient of two meromorphic functions is again meromorphic (if the denominator is not identically zero).

Unfortunately, there are isolated singularities that are neither removable or poles, and are known as essential singularities. A typical example is the function ${f(z) = e^{1/z}}$, which turns out to have an essential singularity at ${z=0}$. The behaviour of such essential singularities is quite wild; we will show here the Casorati-Weierstrass theorem, which shows that the image of ${f}$ near the essential singularity is dense in the complex plane, as well as the more difficult great Picard theorem which asserts that in fact the image can omit at most one point in the complex plane. Nevertheless, around any isolated singularity (even the essential ones) ${z_0}$, it is possible to expand ${f}$ as a variant of a Taylor series known as a Laurent series ${\sum_{n=-\infty}^\infty a_n (z-z_0)^n}$. The ${\frac{1}{z-z_0}}$ coefficient ${a_{-1}}$ of this series is particularly important for contour integration purposes, and is known as the residue of ${f}$ at the isolated singularity ${z_0}$. These residues play a central role in a common generalisation of Cauchy’s theorem and the Cauchy integral formula known as the residue theorem, which is a particularly useful tool for computing (or at least transforming) contour integrals of meromorphic functions, and has proven to be a particularly popular technique to use in analytic number theory. Within complex analysis, one important consequence of the residue theorem is the argument principle, which gives a topological (and analytical) way to control the zeroes and poles of a meromorphic function.

Finally, there are the non-isolated singularities. Little can be said about these singularities in general (for instance, the residue theorem does not directly apply in the presence of such singularities), but certain types of non-isolated singularities are still relatively easy to understand. One particularly common example of such non-isolated singularity arises when trying to invert a non-injective function, such as the complex exponential ${z \mapsto \exp(z)}$ or a power function ${z \mapsto z^n}$, leading to branches of multivalued functions such as the complex logarithm ${z \mapsto \log(z)}$ or the ${n^{th}}$ root function ${z \mapsto z^{1/n}}$ respectively. Such branches will typically have a non-isolated singularity along a branch cut; this branch cut can be moved around the complex domain by switching from one branch to another, but usually cannot be eliminated entirely, unless one is willing to lift up the domain ${U}$ to a more general type of domain known as a Riemann surface. As such, one can view branch cuts as being an “artificial” form of singularity, being an artefact of a choice of local coordinates of a Riemann surface, rather than reflecting any intrinsic singularity of the function itself. The further study of Riemann surfaces is an important topic in complex analysis (as well as the related fields of complex geometry and algebraic geometry), but unfortunately this topic will probably be postponed to the next course in this sequence (which I will not be teaching).

Having discussed differentiation of complex mappings in the preceding notes, we now turn to the integration of complex maps. We first briefly review the situation of integration of (suitably regular) real functions ${f: [a,b] \rightarrow {\bf R}}$ of one variable. Actually there are three closely related concepts of integration that arise in this setting:

• (i) The signed definite integral ${\int_a^b f(x)\ dx}$, which is usually interpreted as the Riemann integral (or equivalently, the Darboux integral), which can be defined as the limit (if it exists) of the Riemann sums

$\displaystyle \sum_{j=1}^n f(x_j^*) (x_j - x_{j-1}) \ \ \ \ \ (1)$

where ${a = x_0 < x_1 < \dots < x_n = b}$ is some partition of ${[a,b]}$, ${x_j^*}$ is an element of the interval ${[x_{j-1},x_j]}$, and the limit is taken as the maximum mesh size ${\max_{1 \leq j \leq n} |x_j - x_{j-1}|}$ goes to zero. It is convenient to adopt the convention that ${\int_b^a f(x)\ dx := - \int_a^b f(x)\ dx}$ for ${a < b}$; alternatively one can interpret ${\int_b^a f(x)\ dx}$ as the limit of the Riemann sums (1), where now the (reversed) partition ${b = x_0 > x_1 > \dots > x_n = a}$ goes leftwards from ${b}$ to ${a}$, rather than rightwards from ${a}$ to ${b}$.

• (ii) The unsigned definite integral ${\int_{[a,b]} f(x)\ dx}$, usually interpreted as the Lebesgue integral. The precise definition of this integral is a little complicated (see e.g. this previous post), but roughly speaking the idea is to approximate ${f}$ by simple functions ${\sum_{i=1}^n c_i 1_{E_i}}$ for some coefficients ${c_i \in {\bf R}}$ and sets ${E_i \subset [a,b]}$, and then approximate the integral ${\int_{[a,b]} f(x)\ dx}$ by the quantities ${\sum_{i=1}^n c_i m(E_i)}$, where ${E_i}$ is the Lebesgue measure of ${E_i}$. In contrast to the signed definite integral, no orientation is imposed or used on the underlying domain of integration, which is viewed as an “undirected” set ${[a,b]}$.
• (iii) The indefinite integral or antiderivative ${\int f(x)\ dx}$, defined as any function ${F: [a,b] \rightarrow {\bf R}}$ whose derivative ${F'}$ exists and is equal to ${f}$ on ${[a,b]}$. Famously, the antiderivative is only defined up to the addition of an arbitrary constant ${C}$, thus for instance ${\int x\ dx = \frac{1}{2} x^2 + C}$.

There are some other variants of the above integrals (e.g. the Henstock-Kurzweil integral, discussed for instance in this previous post), which can handle slightly different classes of functions and have slightly different properties than the standard integrals listed here, but we will not need to discuss such alternative integrals in this course (with the exception of some improper and principal value integrals, which we will encounter in later notes).

The above three notions of integration are closely related to each other. For instance, if ${f: [a,b] \rightarrow {\bf R}}$ is a Riemann integrable function, then the signed definite integral and unsigned definite integral coincide (when the former is oriented correctly), thus

$\displaystyle \int_a^b f(x)\ dx = \int_{[a,b]} f(x)\ dx$

and

$\displaystyle \int_b^a f(x)\ dx = -\int_{[a,b]} f(x)\ dx$

If ${f: [a,b] \rightarrow {\bf R}}$ is continuous, then by the fundamental theorem of calculus, it possesses an antiderivative ${F = \int f(x)\ dx}$, which is well defined up to an additive constant ${C}$, and

$\displaystyle \int_c^d f(x)\ dx = F(d) - F(c)$

for any ${c,d \in [a,b]}$, thus for instance ${\int_a^b F(x)\ dx = F(b) - F(a)}$ and ${\int_b^a F(x)\ dx = F(a) - F(b)}$.

All three of the above integration concepts have analogues in complex analysis. By far the most important notion will be the complex analogue of the signed definite integral, namely the contour integral ${\int_\gamma f(z)\ dz}$, in which the directed line segment from one real number ${a}$ to another ${b}$ is now replaced by a type of curve in the complex plane known as a contour. The contour integral can be viewed as the special case of the more general line integral ${\int_\gamma f(z) dx + g(z) dy}$, that is of particular relevance in complex analysis. There are also analogues of the Lebesgue integral, namely the arclength measure integrals ${\int_\gamma f(z)\ |dz|}$ and the area integrals ${\int_\Omega f(x+iy)\ dx dy}$, but these play only an auxiliary role in the subject. Finally, we still have the notion of an antiderivative ${F(z)}$ (also known as a primitive) of a complex function ${f(z)}$.

As it turns out, the fundamental theorem of calculus continues to hold in the complex plane: under suitable regularity assumptions on a complex function ${f}$ and a primitive ${F}$ of that function, one has

$\displaystyle \int_\gamma f(z)\ dz = F(z_1) - F(z_0)$

whenever ${\gamma}$ is a contour from ${z_0}$ to ${z_1}$ that lies in the domain of ${f}$. In particular, functions ${f}$ that possess a primitive must be conservative in the sense that ${\int_\gamma f(z)\ dz = 0}$ for any closed contour. This property of being conservative is not typical, in that “most” functions ${f}$ will not be conservative. However, there is a remarkable and far-reaching theorem, the Cauchy integral theorem (also known as the Cauchy-Goursat theorem), which asserts that any holomorphic function is conservative, so long as the domain is simply connected (or if one restricts attention to contractible closed contours). We will explore this theorem and several of its consequences the next set of notes.

At the core of almost any undergraduate real analysis course are the concepts of differentiation and integration, with these two basic operations being tied together by the fundamental theorem of calculus (and its higher dimensional generalisations, such as Stokes’ theorem). Similarly, the notion of the complex derivative and the complex line integral (that is to say, the contour integral) lie at the core of any introductory complex analysis course. Once again, they are tied to each other by the fundamental theorem of calculus; but in the complex case there is a further variant of the fundamental theorem, namely Cauchy’s theorem, which endows complex differentiable functions with many important and surprising properties that are often not shared by their real differentiable counterparts. We will give complex differentiable functions another name to emphasise this extra structure, by referring to such functions as holomorphic functions. (This term is also useful to distinguish these functions from the slightly less well-behaved meromorphic functions, which we will discuss in later notes.)

In this set of notes we will focus solely on the concept of complex differentiation, deferring the discussion of contour integration to the next set of notes. To begin with, the theory of complex differentiation will greatly resemble the theory of real differentiation; the definitions look almost identical, and well known laws of differential calculus such as the product rule, quotient rule, and chain rule carry over verbatim to the complex setting, and the theory of complex power series is similarly almost identical to the theory of real power series. However, when one compares the “one-dimensional” differentiation theory of the complex numbers with the “two-dimensional” differentiation theory of two real variables, we find that the dimensional discrepancy forces complex differentiable functions to obey a real-variable constraint, namely the Cauchy-Riemann equations. These equations make complex differentiable functions substantially more “rigid” than their real-variable counterparts; they imply for instance that the imaginary part of a complex differentiable function is essentially determined (up to constants) by the real part, and vice versa. Furthermore, even when considered separately, the real and imaginary components of complex differentiable functions are forced to obey the strong constraint of being harmonic. In later notes we will see these constraints manifest themselves in integral form, particularly through Cauchy’s theorem and the closely related Cauchy integral formula.

Despite all the constraints that holomorphic functions have to obey, a surprisingly large number of the functions of a complex variable that one actually encounters in applications turn out to be holomorphic. For instance, any polynomial ${z \mapsto P(z)}$ with complex coefficients will be holomorphic, as will the complex exponential ${z \mapsto \exp(z)}$. From this and the laws of differential calculus one can then generate many further holomorphic functions. Also, as we will show presently, complex power series will automatically be holomorphic inside their disk of convergence. On the other hand, there are certainly basic complex functions of interest that are not holomorphic, such as the complex conjugation function ${z \mapsto \overline{z}}$, the absolute value function ${z \mapsto |z|}$, or the real and imaginary part functions ${z \mapsto \mathrm{Re}(z), z \mapsto \mathrm{Im}(z)}$. We will also encounter functions that are only holomorphic at some portions of the complex plane, but not on others; for instance, rational functions will be holomorphic except at those few points where the denominator vanishes, and are prime examples of the meromorphic functions mentioned previously. Later on we will also consider functions such as branches of the logarithm or square root, which will be holomorphic outside of a branch cut corresponding to the choice of branch. It is a basic but important skill in complex analysis to be able to quickly recognise which functions are holomorphic and which ones are not, as many of useful theorems available to the former (such as Cauchy’s theorem) break down spectacularly for the latter. Indeed, in my experience, one of the most common “rookie errors” that beginning complex analysis students make is the error of attempting to apply a theorem about holomorphic functions to a function that is not at all holomorphic. This stands in contrast to the situation in real analysis, in which one can often obtain correct conclusions by formally applying the laws of differential or integral calculus to functions that might not actually be differentiable or integrable in a classical sense. (This latter phenomenon, by the way, can be largely explained using the theory of distributions, as covered for instance in this previous post, but this is beyond the scope of the current course.)

Remark 1 In this set of notes it will be convenient to impose some unnecessarily generous regularity hypotheses (e.g. continuous second differentiability) on the holomorphic functions one is studying in order to make the proofs simpler. In later notes, we will discover that these hypotheses are in fact redundant, due to the phenomenon of elliptic regularity that ensures that holomorphic functions are automatically smooth.

Kronecker is famously reported to have said, “God created the natural numbers; all else is the work of man”. The truth of this statement (literal or otherwise) is debatable; but one can certainly view the other standard number systems ${{\bf Z}, {\bf Q}, {\bf R}, {\bf C}}$ as (iterated) completions of the natural numbers ${{\bf N}}$ in various senses. For instance:

• The integers ${{\bf Z}}$ are the additive completion of the natural numbers ${{\bf N}}$ (the minimal additive group that contains a copy of ${{\bf N}}$).
• The rationals ${{\bf Q}}$ are the multiplicative completion of the integers ${{\bf Z}}$ (the minimal field that contains a copy of ${{\bf Z}}$).
• The reals ${{\bf R}}$ are the metric completion of the rationals ${{\bf Q}}$ (the minimal complete metric space that contains a copy of ${{\bf Q}}$).
• The complex numbers ${{\bf C}}$ are the algebraic completion of the reals ${{\bf R}}$ (the minimal algebraically closed field that contains a copy of ${{\bf R}}$).

These descriptions of the standard number systems are elegant and conceptual, but not entirely suitable for constructing the number systems in a non-circular manner from more primitive foundations. For instance, one cannot quite define the reals ${{\bf R}}$ from scratch as the metric completion of the rationals ${{\bf Q}}$, because the definition of a metric space itself requires the notion of the reals! (One can of course construct ${{\bf R}}$ by other means, for instance by using Dedekind cuts or by using uniform spaces in place of metric spaces.) The definition of the complex numbers as the algebraic completion of the reals does not suffer from such a non-circularity issue, but a certain amount of field theory is required to work with this definition initially. For the purposes of quickly constructing the complex numbers, it is thus more traditional to first define ${{\bf C}}$ as a quadratic extension of the reals ${{\bf R}}$, and more precisely as the extension ${{\bf C} = {\bf R}(i)}$ formed by adjoining a square root ${i}$ of ${-1}$ to the reals, that is to say a solution to the equation ${i^2+1=0}$. It is not immediately obvious that this extension is in fact algebraically closed; this is the content of the famous fundamental theorem of algebra, which we will prove later in this course.

The two equivalent definitions of ${{\bf C}}$ – as the algebraic closure, and as a quadratic extension, of the reals respectively – each reveal important features of the complex numbers in applications. Because ${{\bf C}}$ is algebraically closed, all polynomials over the complex numbers split completely, which leads to a good spectral theory for both finite-dimensional matrices and infinite-dimensional operators; in particular, one expects to be able to diagonalise most matrices and operators. Applying this theory to constant coefficient ordinary differential equations leads to a unified theory of such solutions, in which real-variable ODE behaviour such as exponential growth or decay, polynomial growth, and sinusoidal oscillation all become aspects of a single object, the complex exponential ${z \mapsto e^z}$ (or more generally, the matrix exponential ${A \mapsto \exp(A)}$). Applying this theory more generally to diagonalise arbitrary translation-invariant operators over some locally compact abelian group, one arrives at Fourier analysis, which is thus most naturally phrased in terms of complex-valued functions rather than real-valued ones. If one drops the assumption that the underlying group is abelian, one instead discovers the representation theory of unitary representations, which is simpler to study than the real-valued counterpart of orthogonal representations. For closely related reasons, the theory of complex Lie groups is simpler than that of real Lie groups.

Meanwhile, the fact that the complex numbers are a quadratic extension of the reals lets one view the complex numbers geometrically as a two-dimensional plane over the reals (the Argand plane). Whereas a point singularity in the real line disconnects that line, a point singularity in the Argand plane leaves the rest of the plane connected (although, importantly, the punctured plane is no longer simply connected). As we shall see, this fact causes singularities in complex analytic functions to be better behaved than singularities of real analytic functions, ultimately leading to the powerful residue calculus for computing complex integrals. Remarkably, this calculus, when combined with the quintessentially complex-variable technique of contour shifting, can also be used to compute some (though certainly not all) definite integrals of real-valued functions that would be much more difficult to compute by purely real-variable methods; this is a prime example of Hadamard’s famous dictum that “the shortest path between two truths in the real domain passes through the complex domain”.

Another important geometric feature of the Argand plane is the angle between two tangent vectors to a point in the plane. As it turns out, the operation of multiplication by a complex scalar preserves the magnitude and orientation of such angles; the same fact is true for any non-degenerate complex analytic mapping, as can be seen by performing a Taylor expansion to first order. This fact ties the study of complex mappings closely to that of the conformal geometry of the plane (and more generally, of two-dimensional surfaces and domains). In particular, one can use complex analytic maps to conformally transform one two-dimensional domain to another, leading among other things to the famous Riemann mapping theorem, and to the classification of Riemann surfaces.

If one Taylor expands complex analytic maps to second order rather than first order, one discovers a further important property of these maps, namely that they are harmonic. This fact makes the class of complex analytic maps extremely rigid and well behaved analytically; indeed, the entire theory of elliptic PDE now comes into play, giving useful properties such as elliptic regularity and the maximum principle. In fact, due to the magic of residue calculus and contour shifting, we already obtain these properties for maps that are merely complex differentiable rather than complex analytic, which leads to the striking fact that complex differentiable functions are automatically analytic (in contrast to the real-variable case, in which real differentiable functions can be very far from being analytic).

The geometric structure of the complex numbers (and more generally of complex manifolds and complex varieties), when combined with the algebraic closure of the complex numbers, leads to the beautiful subject of complex algebraic geometry, which motivates the much more general theory developed in modern algebraic geometry. However, we will not develop the algebraic geometry aspects of complex analysis here.

Last, but not least, because of the good behaviour of Taylor series in the complex plane, complex analysis is an excellent setting in which to manipulate various generating functions, particularly Fourier series ${\sum_n a_n e^{2\pi i n \theta}}$ (which can be viewed as boundary values of power (or Laurent) series ${\sum_n a_n z^n}$), as well as Dirichlet series ${\sum_n \frac{a_n}{n^s}}$. The theory of contour integration provides a very useful dictionary between the asymptotic behaviour of the sequence ${a_n}$, and the complex analytic behaviour of the Dirichlet or Fourier series, particularly with regard to its poles and other singularities. This turns out to be a particularly handy dictionary in analytic number theory, for instance relating the distribution of the primes to the Riemann zeta function. Nowadays, many of the analytic number theory results first obtained through complex analysis (such as the prime number theorem) can also be obtained by more “real-variable” methods; however the complex-analytic viewpoint is still extremely valuable and illuminating.

We will frequently touch upon many of these connections to other fields of mathematics in these lecture notes. However, these are mostly side remarks intended to provide context, and it is certainly possible to skip most of these tangents and focus purely on the complex analysis material in these notes if desired.

Note: complex analysis is a very visual subject, and one should draw plenty of pictures while learning it. I am however not planning to put too many pictures in these notes, partly as it is somewhat inconvenient to do so on this blog from a technical perspective, but also because pictures that one draws on one’s own are likely to be far more useful to you than pictures that were supplied by someone else.

An extremely large portion of mathematics is concerned with locating solutions to equations such as

$\displaystyle f(x) = 0$

or

$\displaystyle \Phi(x) = x \ \ \ \ \ (1)$

for ${x}$ in some suitable domain space (either finite-dimensional or infinite-dimensional), and various maps ${f}$ or ${\Phi}$. To solve the fixed point iteration equation (1), the simplest general method available is the fixed point iteration method: one starts with an initial approximate solution ${x_0}$ to (1), so that ${\Phi(x_0) \approx x_0}$, and then recursively constructs the sequence ${x_1, x_2, x_3, \dots}$ by ${x_n := \Phi(x_{n-1})}$. If ${\Phi}$ behaves enough like a “contraction”, and the domain is complete, then one can expect the ${x_n}$ to converge to a limit ${x}$, which should then be a solution to (1). For instance, if ${\Phi: X \rightarrow X}$ is a map from a metric space ${X = (X,d)}$ to itself, which is a contraction in the sense that

$\displaystyle d( \Phi(x), \Phi(y) ) \leq (1-\eta) d(x,y)$

for all ${x,y \in X}$ and some ${\eta>0}$, then with ${x_n}$ as above we have

$\displaystyle d( x_{n+1}, x_n ) \leq (1-\eta) d(x_n, x_{n-1} )$

for any ${n}$, and so the distances ${d(x_n, x_{n-1} )}$ between successive elements of the sequence decay at at least a geometric rate. This leads to the contraction mapping theorem, which has many important consequences, such as the inverse function theorem and the Picard existence theorem.

A slightly more complicated instance of this strategy arises when trying to linearise a complex map ${f: U \rightarrow {\bf C}}$ defined in a neighbourhood ${U}$ of a fixed point. For simplicity we normalise the fixed point to be the origin, thus ${0 \in U}$ and ${f(0)=0}$. When studying the complex dynamics ${f^2 = f \circ f}$, ${f^3 = f \circ f \circ f}$, ${\dots}$ of such a map, it can be useful to try to conjugate ${f}$ to another function ${g = \psi^{-1} \circ f \circ \psi}$, where ${\psi}$ is a holomorphic function defined and invertible near ${0}$ with ${\psi(0)=0}$, since the dynamics of ${g}$ will be conjguate to that of ${f}$. Note that if ${f(0)=0}$ and ${f'(0)=\lambda}$, then from the chain rule any conjugate ${g}$ of ${f}$ will also have ${g(0)=0}$ and ${g'(0)=\lambda}$. Thus, the “simplest” function one can hope to conjugate ${f}$ to is the linear function ${z \mapsto \lambda z}$. Let us say that ${f}$ is linearisable (around ${0}$) if it is conjugate to ${z \mapsto \lambda z}$ in some neighbourhood of ${0}$. Equivalently, ${f}$ is linearisable if there is a solution to the Schröder equation

$\displaystyle f( \psi(z) ) = \psi(\lambda z) \ \ \ \ \ (2)$

for some ${\psi: U' \rightarrow {\bf C}}$ defined and invertible in a neighbourhood ${U'}$ of ${0}$ with ${\psi(0)=0}$, and all ${z}$ sufficiently close to ${0}$. (The Schröder equation is normalised somewhat differently in the literature, but this form is equivalent to the usual form, at least when ${\lambda}$ is non-zero.) Note that if ${\psi}$ solves the above equation, then so does ${z \mapsto \psi(cz)}$ for any non-zero ${c}$, so we may normalise ${\psi'(0)=1}$ in addition to ${\psi(0)=0}$, which also ensures local invertibility from the inverse function theorem. (Note from winding number considerations that ${\psi}$ cannot be invertible near zero if ${\psi'(0)}$ vanishes.)

We have the following basic result of Koenigs:

Theorem 1 (Koenig’s linearisation theorem) Let ${f: U \rightarrow {\bf C}}$ be a holomorphic function defined near ${0}$ with ${f(0)=0}$ and ${f'(0)=\lambda}$. If ${0 < |\lambda| < 1}$ (attracting case) or ${1 < |\lambda| < \infty}$ (repelling case), then ${f}$ is linearisable near zero.

Proof: Observe that if ${f, \psi, \lambda}$ solve (2), then ${f^{-1}, \psi^{-1}, \lambda^{-1}}$ solve (2) also (in a sufficiently small neighbourhood of zero). Thus we may reduce to the attractive case ${0 < |\lambda| < 1}$.

Let ${r>0}$ be a sufficiently small radius, and let ${X}$ denote the space of holomorphic functions ${\psi: B(0,r) \rightarrow {\bf C}}$ on the complex disk ${B(0,r) := \{z \in {\bf C}: |z| < r \}}$ with ${\psi(0)=0}$ and ${\psi'(0)=1}$. We can view the Schröder equation (2) as a fixed point equation

$\displaystyle \psi = \Phi(\psi)$

where ${\Phi: X' \rightarrow X}$ is the partially defined function on ${X}$ that maps a function ${\psi: B(0,r) \rightarrow {\bf C}}$ to the function ${\Phi(\psi): B(0,r) \rightarrow {\bf C}}$ defined by

$\displaystyle \Phi(\psi)(z) := f^{-1}( \psi( \lambda z ) ),$

assuming that ${f^{-1}}$ is well-defined on the range of ${\psi(B(0,\lambda r))}$ (this is why ${\Phi}$ is only partially defined).

We can solve this equation by the fixed point iteration method, if ${r}$ is small enough. Namely, we start with ${\psi_0: B(0,r) \rightarrow {\bf C}}$ being the identity map, and set ${\psi_1 := \Phi(\psi_0), \psi_2 := \Phi(\psi_1)}$, etc. We equip ${X}$ with the uniform metric ${d( \psi, \tilde \psi ) := \sup_{z \in B(0,r)} |\psi(z) - \tilde \psi(z)|}$. Observe that if ${d( \psi, \psi_0 ), d(\tilde \psi, \psi_0) \leq r}$, and ${r}$ is small enough, then ${\psi, \tilde \psi}$ takes values in ${B(0,2r)}$, and ${\Phi(\psi), \Phi(\tilde \psi)}$ are well-defined and lie in ${X}$. Also, since ${f^{-1}}$ is smooth and has derivative ${\lambda^{-1}}$ at ${0}$, we have

$\displaystyle |f^{-1}(z) - f^{-1}(w)| \leq (1+\varepsilon) |\lambda|^{-1} |z-w|$

if ${z, w \in B(0,r)}$, ${\varepsilon>0}$ and ${r}$ is sufficiently small depending on ${\varepsilon}$. This is not yet enough to establish the required contraction (thanks to Mario Bonk for pointing this out); but observe that the function ${\frac{\psi(z)-\tilde \psi(z)}{z^2}}$ is holomorphic on ${B(0,r)}$ and bounded by ${d(\psi,\tilde \psi)/r^2}$ on the boundary of this ball (or slightly within this boundary), so by the maximum principle we see that

$\displaystyle |\frac{\psi(z)-\tilde \psi(z)}{z^2}| \leq \frac{1}{r^2} d(\psi,\tilde \psi)$

on all of ${B(0,r)}$, and in particular

$\displaystyle |\psi(z)-\tilde \psi(z)| \leq |\lambda|^2 d(\psi,\tilde \psi)$

on ${B(0,\lambda r)}$. Putting all this together, we see that

$\displaystyle d( \Phi(\psi), \Phi(\tilde \psi)) \leq (1+\varepsilon) |\lambda| d(\psi, \tilde \psi);$

since ${|\lambda|<1}$, we thus obtain a contraction on the ball ${\{ \psi \in X: d(\psi,\psi_0) \leq r \}}$ if ${\varepsilon}$ is small enough (and ${r}$ sufficiently small depending on ${\varepsilon}$). From this (and the completeness of ${X}$, which follows from Morera’s theorem) we see that the iteration ${\psi_n}$ converges (exponentially fast) to a limit ${\psi \in X}$ which is a fixed point of ${\Phi}$, and thus solves Schröder’s equation, as required. $\Box$

Koenig’s linearisation theorem leaves open the indifferent case when ${|\lambda|=1}$. In the rationally indifferent case when ${\lambda^n=1}$ for some natural number ${n}$, there is an obvious obstruction to linearisability, namely that ${f^n = 1}$ (in particular, linearisation is not possible in this case when ${f}$ is a non-trivial rational function). An obstruction is also present in some irrationally indifferent cases (where ${|\lambda|=1}$ but ${\lambda^n \neq 1}$ for any natural number ${n}$), if ${\lambda}$ is sufficiently close to various roots of unity; the first result of this form is due to Cremer, and the optimal result of this type for quadratic maps was established by Yoccoz. In the other direction, we have the following result of Siegel:

Theorem 2 (Siegel’s linearisation theorem) Let ${f: U \rightarrow {\bf C}}$ be a holomorphic function defined near ${0}$ with ${f(0)=0}$ and ${f'(0)=\lambda}$. If ${|\lambda|=1}$ and one has the Diophantine condition ${\frac{1}{|\lambda^n-1|} \leq C n^C}$ for all natural numbers ${n}$ and some constant ${C>0}$, then ${f}$ is linearisable at ${0}$.

The Diophantine condition can be relaxed to a more general condition involving the rational exponents of the phase ${\theta}$ of ${\lambda = e^{2\pi i \theta}}$; this was worked out by Brjuno, with the condition matching the one later obtained by Yoccoz. Amusingly, while the set of Diophantine numbers (and hence the set of linearisable ${\lambda}$) has full measure on the unit circle, the set of non-linearisable ${\lambda}$ is generic (the complement of countably many nowhere dense sets) due to the above-mentioned work of Cremer, leading to a striking disparity between the measure-theoretic and category notions of “largeness”.

Siegel’s theorem does not seem to be provable using a fixed point iteration method. However, it can be established by modifying another basic method to solve equations, namely Newton’s method. Let us first review how this method works to solve the equation ${f(x)=0}$ for some smooth function ${f: I \rightarrow {\bf R}}$ defined on an interval ${I}$. We suppose we have some initial approximant ${x_0 \in I}$ to this equation, with ${f(x_0)}$ small but not necessarily zero. To make the analysis more quantitative, let us suppose that the interval ${[x_0-r_0,x_0+r_0]}$ lies in ${I}$ for some ${r_0>0}$, and we have the estimates

$\displaystyle |f(x_0)| \leq \delta_0 r_0$

$\displaystyle |f'(x)| \geq \eta_0$

$\displaystyle |f''(x)| \leq \frac{1}{\eta_0 r_0}$

for some ${\delta_0 > 0}$ and ${0 < \eta_0 < 1/2}$ and all ${x \in [x_0-r_0,x_0+r_0]}$ (the factors of ${r_0}$ are present to make ${\delta_0,\eta_0}$ “dimensionless”).

Lemma 3 Under the above hypotheses, we can find ${x_1}$ with ${|x_1 - x_0| \leq \eta_0 r_0}$ such that

$\displaystyle |f(x_1)| \ll \delta_0^2 \eta_0^{-O(1)} r_0.$

In particular, setting ${r_1 := (1-\eta_0) r_0}$, ${\eta_1 := \eta_0/2}$, and ${\delta_1 = O(\delta_0^2 \eta_0^{-O(1)})}$, we have ${[x_1-r_1,x_1+r_1] \subset [x_0-r_0,x_0+r_0] \subset I}$, and

$\displaystyle |f(x_1)| \leq \delta_1 r_1$

$\displaystyle |f'(x)| \geq \eta_1$

$\displaystyle |f''(x)| \leq \frac{1}{\eta_1 r_1}$

for all ${x \in [x_1-r_1,x_1+r_1]}$.

The crucial point here is that the new error ${\delta_1}$ is roughly the square of the previous error ${\delta_0}$. This leads to extremely fast (double-exponential) improvement in the error upon iteration, which is more than enough to absorb the exponential losses coming from the ${\eta_0^{-O(1)}}$ factor.

Proof: If ${\delta_0 > c \eta_0^{C}}$ for some absolute constants ${C,c>0}$ then we may simply take ${x_0=x_1}$, so we may assume that ${\delta_0 \leq c \eta_0^{C}}$ for some small ${c>0}$ and large ${C>0}$. Using the Newton approximation ${f(x_0+h) \approx f(x_0) + h f'(x_0)}$ we are led to the choice

$\displaystyle x_1 := x_0 - \frac{f(x_0)}{f'(x_0)}$

for ${x_1}$. From the hypotheses on ${f}$ and the smallness hypothesis on ${\delta}$ we certainly have ${|x_1-x_0| \leq \eta_0 r_0}$. From Taylor’s theorem with remainder we have

$\displaystyle f(x_1) = f(x_0) - \frac{f(x_0)}{f'(x_0)} f'(x_0) + O( \frac{1}{\eta_0 r_0} |\frac{f(x_0)}{f'(x_0)}|^2 )$

$\displaystyle = O( \frac{1}{\eta_0 r_0} (\frac{\delta_0 r_0}{\eta_0})^2 )$

and the claim follows. $\Box$

We can iterate this procedure; starting with ${x_0,\eta_0,r_0,\delta_0}$ as above, we obtain a sequence of nested intervals ${[x_n-r_n,x_n+r_n]}$ with ${f(x_n)| \leq \delta_n}$, and with ${\eta_n,r_n,\delta_n,x_n}$ evolving by the recursive equations and estimates

$\displaystyle \eta_n = \eta_{n-1} / 2$

$\displaystyle r_n = (1 - \eta_{n-1}) r_{n-1}$

$\displaystyle \delta_n = O( \delta_{n-1}^2 \eta_{n-1}^{-O(1)} )$

$\displaystyle |x_n - x_{n-1}| \leq \eta_{n-1} r_{n-1}.$

If ${\delta_0}$ is sufficiently small depending on ${\eta_0}$, we see that ${\delta_n}$ converges rapidly to zero (indeed, we can inductively obtain a bound of the form ${\delta_n \leq \eta_0^{C (2^n + n)}}$ for some large absolute constant ${C}$ if ${\delta_0}$ is small enough), and ${x_n}$ converges to a limit ${x \in I}$ which then solves the equation ${f(x)=0}$ by the continuity of ${f}$.

As I recently learned from Zhiqiang Li, a similar scheme works to prove Siegel’s theorem, as can be found for instance in this text of Carleson and Gamelin. The key is the following analogue of Lemma 3.

Lemma 4 Let ${\lambda}$ be a complex number with ${|\lambda|=1}$ and ${\frac{1}{|\lambda^n-1|} \ll n^{O(1)}}$ for all natural numbers ${n}$. Let ${r_0>0}$, and let ${f_0: B(0,r_0) \rightarrow {\bf C}}$ be a holomorphic function with ${f_0(0)=0}$, ${f'_0(0)=\lambda}$, and

$\displaystyle |f_0(z) - \lambda z| \leq \delta_0 r_0 \ \ \ \ \ (3)$

for all ${z \in B(0,r_0)}$ and some ${\delta_0>0}$. Let ${0 < \eta_0 \leq 1/2}$, and set ${r_1 := (1-\eta_0) r_0}$. Then there exists an injective holomorphic function ${\psi_0: B(0, r_1) \rightarrow B(0, r_0)}$ and a holomorphic function ${f_1: B(0,r_1) \rightarrow {\bf C}}$ such that

$\displaystyle f_0( \psi_1(z) ) = \psi_1(f_1(z)) \ \ \ \ \ (4)$

for all ${z \in B(0,r_1)}$, and such that

$\displaystyle |\psi_1(z) - z| \ll \delta_0 \eta_0^{-O(1)} r_1$

and

$\displaystyle |f_1(z) - \lambda z| \leq \delta_1 r_1$

for all ${z \in B(0,r_1)}$ and some ${\delta_1 = O(\delta_0^2 \eta_0^{-O(1)})}$.

Proof: By scaling we may normalise ${r_0=1}$. If ${\delta_0 > c \eta_0^C}$ for some constants ${c,C>0}$, then we can simply take ${\psi_1}$ to be the identity and ${f_1=f_0}$, so we may assume that ${\delta_0 \leq c \eta_0^C}$ for some small ${c>0}$ and large ${C>0}$.

To motivate the choice of ${\psi_1}$, we write ${f_0(z) = \lambda z + \hat f_0(z)}$ and ${\psi_1(z) = z + \hat \psi(z)}$, with ${\hat f_0}$ and ${\hat \psi_1}$ viewed as small. We would like to have ${f_0(\psi_1(z)) \approx \psi_1(\lambda z)}$, which expands as

$\displaystyle \lambda z + \lambda \hat \psi_1(z) + \hat f_0( z + \hat \psi_1(z) ) \approx \lambda z + \hat \psi_1(\lambda z).$

As ${\hat f_0}$ and ${\hat \psi}$ are both small, we can heuristically approximate ${\hat f_0(z + \hat \psi_1(z) ) \approx \hat f_0(z)}$ up to quadratic errors (compare with the Newton approximation ${f(x_0+h) \approx f(x_0) + h f'(x_0)}$), and arrive at the equation

$\displaystyle \hat \psi_1(\lambda z) - \lambda \hat \psi_1(z) = \hat f_0(z). \ \ \ \ \ (5)$

This equation can be solved by Taylor series; the function ${\hat f_0}$ vanishes to second order at the origin and thus has a Taylor expansion

$\displaystyle \hat f_0(z) = \sum_{n=2}^\infty a_n z^n$

and then ${\hat \psi_1}$ has a Taylor expansion

$\displaystyle \hat \psi_1(z) = \sum_{n=2}^\infty \frac{a_n}{\lambda^n - \lambda} z^n.$

We take this as our definition of ${\hat \psi_1}$, define ${\psi_1(z) := z + \hat \psi_1(z)}$, and then define ${f_1}$ implicitly via (4).

Let us now justify that this choice works. By (3) and the generalised Cauchy integral formula, we have ${|a_n| \leq \delta_0}$ for all ${n}$; by the Diophantine assumption on ${\lambda}$, we thus have ${|\frac{a_n}{\lambda^n - \lambda}| \ll \delta_0 n^{O(1)}}$. In particular, ${\hat \psi_1}$ converges on ${B(0,1)}$, and on the disk ${B(0, (1-\eta_0/4))}$ (say) we have the bounds

$\displaystyle |\hat \psi_1(z)|, |\hat \psi'_1(z)| \ll \delta_0 \sum_{n=2}^\infty n^{O(1)} (1-\eta_0/4)^n \ll \eta_0^{-O(1)} \delta_0. \ \ \ \ \ (6)$

In particular, as ${\delta_0}$ is so small, we see that ${\psi_1}$ maps ${B(0, (1-\eta_0/4))}$ injectively to ${B(0,1)}$ and ${B(0,1-\eta_0)}$ to ${B(0,1-3\eta_0/4)}$, and the inverse ${\psi_1^{-1}}$ maps ${B(0, (1-\eta_0/2))}$ to ${B(0, (1-\eta_0/4))}$. From (3) we see that ${f_0}$ maps ${B(0,1-3\eta_0/4)}$ to ${B(0,1-\eta_0/2)}$, and so if we set ${f_1: B(0,1-\eta_0) \rightarrow B(0,1-\eta_0/4)}$ to be the function ${f_1 := \psi_1^{-1} \circ f_0 \circ \psi_1}$, then ${f_1}$ is a holomorphic function obeying (4). Expanding (4) in terms of ${\hat f_0}$ and ${\hat \psi_1}$ as before, and also writing ${f_1(z) = \lambda z + \hat f_1(z)}$, we have

$\displaystyle \lambda z + \lambda \hat \psi_1(z) + \hat f_0( z + \hat \psi_1(z) ) = \lambda z + \hat f_1(z) + \hat \psi_1(\lambda z + \hat f_1(z))$

for ${z \in B(0, 1-\eta_0)}$, which by (5) simplifies to

$\displaystyle \hat f_1(z) = \hat f_0( z + \hat \psi_1(z) ) - \hat f_0(z) + \hat \psi_1(\lambda z) - \hat \psi_1(\lambda z + \hat f_1(z)).$

From (6), the fundamental theorem of calculus, and the smallness of ${\delta_0}$ we have

$\displaystyle |\hat \psi_1(\lambda z) - \hat \psi_1(\lambda z + \hat f_1(z))| \leq \frac{1}{2} |\hat f_1(z)|$

and thus

$\displaystyle |\hat f_1(z)| \leq 2 |\hat f_0( z + \hat \psi_1(z) ) - \hat f_0(z)|.$

From (3) and the Cauchy integral formula we have ${\hat f'_0(z) = O( \delta_0 \eta_0^{-O(1)})}$ on (say) ${B(0,1-\eta_0/4)}$, and so from (6) and the fundamental theorem of calculus we conclude that

$\displaystyle |\hat f_1(z)| \ll \delta_0^2 \eta_0^{-O(1)}$

on ${B(0,1-\eta_0)}$, and the claim follows. $\Box$

If we set ${\eta_0 := 1/2}$, ${f_0 := f}$, and ${\delta_0>0}$ to be sufficiently small, then (since ${f(z)-\lambda z}$ vanishes to second order at the origin), the hypotheses of this lemma will be obeyed for some sufficiently small ${r_0}$. Iterating the lemma (and halving ${\eta_0}$ repeatedly), we can then find sequences ${\eta_n, \delta_n, r_n > 0}$, injective holomorphic functions ${\psi_n: B(0,r_n) \rightarrow B(0,r_{n-1})}$ and holomorphic functions ${f_n: B(0,r_n) \rightarrow {\bf C}}$ such that one has the recursive identities and estimates

$\displaystyle \eta_n = \eta_{n-1} / 2$

$\displaystyle r_n = (1 - \eta_{n-1}) r_{n-1}$

$\displaystyle \delta_n = O( \delta_{n-1}^2 \eta_{n-1}^{-O(1)} )$

$\displaystyle |\psi_n(z) - z| \ll \delta_{n-1} \eta_{n-1}^{-O(1)} r_n$

$\displaystyle |f_n(z) - \lambda z| \leq \delta_n r_n$

$\displaystyle f_{n-1}( \psi_n(z) ) = \psi_n(f_n(z))$

for all ${n \geq 1}$ and ${z \in B(0,r_n)}$. By construction, ${r_n}$ decreases to a positive radius ${r_\infty}$ that is a constant multiple of ${r_0}$, while (for ${\delta_0}$ small enough) ${\delta_n}$ converges double-exponentially to zero, so in particular ${f_n(z)}$ converges uniformly to ${\lambda z}$ on ${B(0,r_\infty)}$. Also, ${\psi_n}$ is close enough to the identity, the compositions ${\Psi_n := \psi_1 \circ \dots \circ \psi_n}$ are uniformly convergent on ${B(0,r_\infty/2)}$ with ${\Psi_n(0)=0}$ and ${\Psi'_n(0)=1}$. From this we have

$\displaystyle f( \Psi_n(z) ) = \Psi_n(f_n(z))$

on ${B(0,r_\infty/4)}$, and on taking limits using Morera’s theorem we obtain a holomorphic function ${\Psi}$ defined near ${0}$ with ${\Psi(0)=0}$, ${\Psi'(0)=1}$, and

$\displaystyle f( \Psi(z) ) = \Psi(\lambda z),$

obtaining the required linearisation.

Remark 5 The idea of using a Newton-type method to obtain error terms that decay double-exponentially, and can therefore absorb exponential losses in the iteration, also occurs in KAM theory and in Nash-Moser iteration, presumably due to Siegel’s influence on Moser. (I discuss Nash-Moser iteration in this note that I wrote back in 2006.)

In Notes 2, the Riemann zeta function ${\zeta}$ (and more generally, the Dirichlet ${L}$-functions ${L(\cdot,\chi)}$) were extended meromorphically into the region ${\{ s: \hbox{Re}(s) > 0 \}}$ in and to the right of the critical strip. This is a sufficient amount of meromorphic continuation for many applications in analytic number theory, such as establishing the prime number theorem and its variants. The zeroes of the zeta function in the critical strip ${\{ s: 0 < \hbox{Re}(s) < 1 \}}$ are known as the non-trivial zeroes of ${\zeta}$, and thanks to the truncated explicit formulae developed in Notes 2, they control the asymptotic distribution of the primes (up to small errors).

The ${\zeta}$ function obeys the trivial functional equation

$\displaystyle \zeta(\overline{s}) = \overline{\zeta(s)} \ \ \ \ \ (1)$

for all ${s}$ in its domain of definition. Indeed, as ${\zeta(s)}$ is real-valued when ${s}$ is real, the function ${\zeta(s) - \overline{\zeta(\overline{s})}}$ vanishes on the real line and is also meromorphic, and hence vanishes everywhere. Similarly one has the functional equation

$\displaystyle \overline{L(s, \chi)} = L(\overline{s}, \overline{\chi}). \ \ \ \ \ (2)$

From these equations we see that the zeroes of the zeta function are symmetric across the real axis, and the zeroes of ${L(\cdot,\chi)}$ are the reflection of the zeroes of ${L(\cdot,\overline{\chi})}$ across this axis.

It is a remarkable fact that these functions obey an additional, and more non-trivial, functional equation, this time establishing a symmetry across the critical line ${\{ s: \hbox{Re}(s) = \frac{1}{2} \}}$ rather than the real axis. One consequence of this symmetry is that the zeta function and ${L}$-functions may be extended meromorphically to the entire complex plane. For the zeta function, the functional equation was discovered by Riemann, and reads as follows:

Theorem 1 (Functional equation for the Riemann zeta function) The Riemann zeta function ${\zeta}$ extends meromorphically to the entire complex plane, with a simple pole at ${s=1}$ and no other poles. Furthermore, one has the functional equation

$\displaystyle \zeta(s) = \alpha(s) \zeta(1-s) \ \ \ \ \ (3)$

or equivalently

$\displaystyle \zeta(1-s) = \alpha(1-s) \zeta(s) \ \ \ \ \ (4)$

for all complex ${s}$ other than ${s=0,1}$, where ${\alpha}$ is the function

$\displaystyle \alpha(s) := 2^s \pi^{s-1} \sin( \frac{\pi s}{2}) \Gamma(1-s). \ \ \ \ \ (5)$

Here ${\cos(z) := \frac{e^z + e^{-z}}{2}}$, ${\sin(z) := \frac{e^{-z}-e^{-z}}{2i}}$ are the complex-analytic extensions of the classical trigionometric functions ${\cos(x), \sin(x)}$, and ${\Gamma}$ is the Gamma function, whose definition and properties we review below the fold.

The functional equation can be placed in a more symmetric form as follows:

Corollary 2 (Functional equation for the Riemann xi function) The Riemann xi function

$\displaystyle \xi(s) := \frac{1}{2} s(s-1) \pi^{-s/2} \Gamma(\frac{s}{2}) \zeta(s) \ \ \ \ \ (6)$

is analytic on the entire complex plane ${{\bf C}}$ (after removing all removable singularities), and obeys the functional equations

$\displaystyle \xi(\overline{s}) = \overline{\xi(s)}$

and

$\displaystyle \xi(s) = \xi(1-s). \ \ \ \ \ (7)$

In particular, the zeroes of ${\xi}$ consist precisely of the non-trivial zeroes of ${\zeta}$, and are symmetric about both the real axis and the critical line. Also, ${\xi}$ is real-valued on the critical line and on the real axis.

Corollary 2 is an easy consequence of Theorem 1 together with the duplication theorem for the Gamma function, and the fact that ${\zeta}$ has no zeroes to the right of the critical strip, and is left as an exercise to the reader (Exercise 19). The functional equation in Theorem 1 has many proofs, but most of them are related in on way or another to the Poisson summation formula

$\displaystyle \sum_n f(n) = \sum_m \hat f(2\pi m) \ \ \ \ \ (8)$

(Theorem 34 from Supplement 2, at least in the case when ${f}$ is twice continuously differentiable and compactly supported), which can be viewed as a Fourier-analytic link between the coarse-scale distribution of the integers and the fine-scale distribution of the integers. Indeed, there is a quick heuristic proof of the functional equation that comes from formally applying the Poisson summation formula to the function ${1_{x>0} \frac{1}{x^s}}$, and noting that the functions ${x \mapsto \frac{1}{x^s}}$ and ${\xi \mapsto \frac{1}{\xi^{1-s}}}$ are formally Fourier transforms of each other, up to some Gamma function factors, as well as some trigonometric factors arising from the distinction between the real line and the half-line. Such a heuristic proof can indeed be made rigorous, and we do so below the fold, while also providing Riemann’s two classical proofs of the functional equation.

From the functional equation (and the poles of the Gamma function), one can see that ${\zeta}$ has trivial zeroes at the negative even integers ${-2,-4,-6,\dots}$, in addition to the non-trivial zeroes in the critical strip. More generally, the following table summarises the zeroes and poles of the various special functions appearing in the functional equation, after they have been meromorphically extended to the entire complex plane, and with zeroes classified as “non-trivial” or “trivial” depending on whether they lie in the critical strip or not. (Exponential functions such as ${2^{s-1}}$ or ${\pi^{-s}}$ have no zeroes or poles, and will be ignored in this table; the zeroes and poles of rational functions such as ${s(s-1)}$ are self-evident and will also not be displayed here.)

 Function Non-trivial zeroes Trivial zeroes Poles ${\zeta(s)}$ Yes ${-2,-4,-6,\dots}$ ${1}$ ${\zeta(1-s)}$ Yes ${3,5,\dots}$ ${0}$ ${\sin(\pi s/2)}$ No Even integers No ${\cos(\pi s/2)}$ No Odd integers No ${\sin(\pi s)}$ No Integers No ${\Gamma(s)}$ No No ${0,-1,-2,\dots}$ ${\Gamma(s/2)}$ No No ${0,-2,-4,\dots}$ ${\Gamma(1-s)}$ No No ${1,2,3,\dots}$ ${\Gamma((1-s)/2)}$ No No ${2,4,6,\dots}$ ${\xi(s)}$ Yes No No

Among other things, this table indicates that the Gamma and trigonometric factors in the functional equation are tied to the trivial zeroes and poles of zeta, but have no direct bearing on the distribution of the non-trivial zeroes, which is the most important feature of the zeta function for the purposes of analytic number theory, beyond the fact that they are symmetric about the real axis and critical line. In particular, the Riemann hypothesis is not going to be resolved just from further analysis of the Gamma function!

The zeta function computes the “global” sum ${\sum_n \frac{1}{n^s}}$, with ${n}$ ranging all the way from ${1}$ to infinity. However, by some Fourier-analytic (or complex-analytic) manipulation, it is possible to use the zeta function to also control more “localised” sums, such as ${\sum_n \frac{1}{n^s} \psi(\log n - \log N)}$ for some ${N \gg 1}$ and some smooth compactly supported function ${\psi: {\bf R} \rightarrow {\bf C}}$. It turns out that the functional equation (3) for the zeta function localises to this context, giving an approximate functional equation which roughly speaking takes the form

$\displaystyle \sum_n \frac{1}{n^s} \psi( \log n - \log N ) \approx \alpha(s) \sum_m \frac{1}{m^{1-s}} \psi( \log M - \log m )$

whenever ${s=\sigma+it}$ and ${NM = \frac{|t|}{2\pi}}$; see Theorem 38 below for a precise formulation of this equation. Unsurprisingly, this form of the functional equation is also very closely related to the Poisson summation formula (8), indeed it is essentially a special case of that formula (or more precisely, of the van der Corput ${B}$-process). This useful identity relates long smoothed sums of ${\frac{1}{n^s}}$ to short smoothed sums of ${\frac{1}{m^{1-s}}}$ (or vice versa), and can thus be used to shorten exponential sums involving terms such as ${\frac{1}{n^s}}$, which is useful when obtaining some of the more advanced estimates on the Riemann zeta function.

We will give two other basic uses of the functional equation. The first is to get a good count (as opposed to merely an upper bound) on the density of zeroes in the critical strip, establishing the Riemann-von Mangoldt formula that the number ${N(T)}$ of zeroes of imaginary part between ${0}$ and ${T}$ is ${\frac{T}{2\pi} \log \frac{T}{2\pi} - \frac{T}{2\pi} + O(\log T)}$ for large ${T}$. The other is to obtain untruncated versions of the explicit formula from Notes 2, giving a remarkable exact formula for sums involving the von Mangoldt function in terms of zeroes of the Riemann zeta function. These results are not strictly necessary for most of the material in the rest of the course, but certainly help to clarify the nature of the Riemann zeta function and its relation to the primes.

In view of the material in previous notes, it should not be surprising that there are analogues of all of the above theory for Dirichlet ${L}$-functions ${L(\cdot,\chi)}$. We will restrict attention to primitive characters ${\chi}$, since the ${L}$-function for imprimitive characters merely differs from the ${L}$-function of the associated primitive factor by a finite Euler product; indeed, if ${\chi = \chi' \chi_0}$ for some principal ${\chi_0}$ whose modulus ${q_0}$ is coprime to that of ${\chi'}$, then

$\displaystyle L(s,\chi) = L(s,\chi') \prod_{p|q_0} (1 - \frac{1}{p^s}) \ \ \ \ \ (9)$

(cf. equation (45) of Notes 2).

The main new feature is that the Poisson summation formula needs to be “twisted” by a Dirichlet character ${\chi}$, and this boils down to the problem of understanding the finite (additive) Fourier transform of a Dirichlet character. This is achieved by the classical theory of Gauss sums, which we review below the fold. There is one new wrinkle; the value of ${\chi(-1) \in \{-1,+1\}}$ plays a role in the functional equation. More precisely, we have

Theorem 3 (Functional equation for ${L}$-functions) Let ${\chi}$ be a primitive character of modulus ${q}$ with ${q>1}$. Then ${L(s,\chi)}$ extends to an entire function on the complex plane, with

$\displaystyle L(s,\chi) = \varepsilon(\chi) 2^s \pi^{s-1} q^{1/2-s} \sin(\frac{\pi}{2}(s+\kappa)) \Gamma(1-s) L(1-s,\overline{\chi})$

or equivalently

$\displaystyle L(1-s,\overline{\chi}) = \varepsilon(\overline{\chi}) 2^{1-s} \pi^{-s} q^{s-1/2} \sin(\frac{\pi}{2}(1-s+\kappa)) \Gamma(s) L(s,\chi)$

for all ${s}$, where ${\kappa}$ is equal to ${0}$ in the even case ${\chi(-1)=+1}$ and ${1}$ in the odd case ${\chi(-1)=-1}$, and

$\displaystyle \varepsilon(\chi) := \frac{\tau(\chi)}{i^\kappa \sqrt{q}} \ \ \ \ \ (10)$

where ${\tau(\chi)}$ is the Gauss sum

$\displaystyle \tau(\chi) := \sum_{n \in {\bf Z}/q{\bf Z}} \chi(n) e(n/q). \ \ \ \ \ (11)$

and ${e(x) := e^{2\pi ix}}$, with the convention that the ${q}$-periodic function ${n \mapsto e(n/q)}$ is also (by abuse of notation) applied to ${n}$ in the cyclic group ${{\bf Z}/q{\bf Z}}$.

From this functional equation and (2) we see that, as with the Riemann zeta function, the non-trivial zeroes of ${L(s,\chi)}$ (defined as the zeroes within the critical strip ${\{ s: 0 < \hbox{Re}(s) < 1 \}}$ are symmetric around the critical line (and, if ${\chi}$ is real, are also symmetric around the real axis). In addition, ${L(s,\chi)}$ acquires trivial zeroes at the negative even integers and at zero if ${\chi(-1)=1}$, and at the negative odd integers if ${\chi(-1)=-1}$. For imprimitive ${\chi}$, we see from (9) that ${L(s,\chi)}$ also acquires some additional trivial zeroes on the left edge of the critical strip.

There is also a symmetric version of this equation, analogous to Corollary 2:

Corollary 4 Let ${\chi,q,\varepsilon(\chi)}$ be as above, and set

$\displaystyle \xi(s,\chi) := (q/\pi)^{(s+\kappa)/2} \Gamma((s+\kappa)/2) L(s,\chi),$

then ${\xi(\cdot,\chi)}$ is entire with ${\xi(1-s,\chi) = \varepsilon(\chi) \xi(s,\chi)}$.

For further detail on the functional equation and its implications, I recommend the classic text of Titchmarsh or the text of Davenport.

In Notes 1, we approached multiplicative number theory (the study of multiplicative functions ${f: {\bf N} \rightarrow {\bf C}}$ and their relatives) via elementary methods, in which attention was primarily focused on obtaining asymptotic control on summatory functions ${\sum_{n \leq x} f(n)}$ and logarithmic sums ${\sum_{n \leq x} \frac{f(n)}{n}}$. Now we turn to the complex approach to multiplicative number theory, in which the focus is instead on obtaining various types of control on the Dirichlet series ${{\mathcal D} f}$, defined (at least for ${s}$ of sufficiently large real part) by the formula

$\displaystyle {\mathcal D} f(s) := \sum_n \frac{f(n)}{n^s}.$

These series also made an appearance in the elementary approach to the subject, but only for real ${s}$ that were larger than ${1}$. But now we will exploit the freedom to extend the variable ${s}$ to the complex domain; this gives enough freedom (in principle, at least) to recover control of elementary sums such as ${\sum_{n\leq x} f(n)}$ or ${\sum_{n\leq x} \frac{f(n)}{n}}$ from control on the Dirichlet series. Crucially, for many key functions ${f}$ of number-theoretic interest, the Dirichlet series ${{\mathcal D} f}$ can be analytically (or at least meromorphically) continued to the left of the line ${\{ s: \hbox{Re}(s) = 1 \}}$. The zeroes and poles of the resulting meromorphic continuations of ${{\mathcal D} f}$ (and of related functions) then turn out to control the asymptotic behaviour of the elementary sums of ${f}$; the more one knows about the former, the more one knows about the latter. In particular, knowledge of where the zeroes of the Riemann zeta function ${\zeta}$ are located can give very precise information about the distribution of the primes, by means of a fundamental relationship known as the explicit formula. There are many ways of phrasing this explicit formula (both in exact and in approximate forms), but they are all trying to formalise an approximation to the von Mangoldt function ${\Lambda}$ (and hence to the primes) of the form

$\displaystyle \Lambda(n) \approx 1 - \sum_\rho n^{\rho-1} \ \ \ \ \ (1)$

where the sum is over zeroes ${\rho}$ (counting multiplicity) of the Riemann zeta function ${\zeta = {\mathcal D} 1}$ (with the sum often restricted so that ${\rho}$ has large real part and bounded imaginary part), and the approximation is in a suitable weak sense, so that

$\displaystyle \sum_n \Lambda(n) g(n) \approx \int_0^\infty g(y)\ dy - \sum_\rho \int_0^\infty g(y) y^{\rho-1}\ dy \ \ \ \ \ (2)$

for suitable “test functions” ${g}$ (which in practice are restricted to be fairly smooth and slowly varying, with the precise amount of restriction dependent on the amount of truncation in the sum over zeroes one wishes to take). Among other things, such approximations can be used to rigorously establish the prime number theorem

$\displaystyle \sum_{n \leq x} \Lambda(n) = x + o(x) \ \ \ \ \ (3)$

as ${x \rightarrow \infty}$, with the size of the error term ${o(x)}$ closely tied to the location of the zeroes ${\rho}$ of the Riemann zeta function.

The explicit formula (1) (or any of its more rigorous forms) is closely tied to the counterpart approximation

$\displaystyle -\frac{\zeta'}{\zeta}(s) \approx \frac{1}{s-1} - \sum_\rho \frac{1}{s-\rho} \ \ \ \ \ (4)$

for the Dirichlet series ${{\mathcal D} \Lambda = -\frac{\zeta'}{\zeta}}$ of the von Mangoldt function; note that (4) is formally the special case of (2) when ${g(n) = n^{-s}}$. Such approximations come from the general theory of local factorisations of meromorphic functions, as discussed in Supplement 2; the passage from (4) to (2) is accomplished by such tools as the residue theorem and the Fourier inversion formula, which were also covered in Supplement 2. The relative ease of uncovering the Fourier-like duality between primes and zeroes (sometimes referred to poetically as the “music of the primes”) is one of the major advantages of the complex-analytic approach to multiplicative number theory; this important duality tends to be rather obscured in the other approaches to the subject, although it can still in principle be discernible with sufficient effort.

More generally, one has an explicit formula

$\displaystyle \Lambda(n) \chi(n) \approx - \sum_\rho n^{\rho-1} \ \ \ \ \ (5)$

for any (non-principal) Dirichlet character ${\chi}$, where ${\rho}$ now ranges over the zeroes of the associated Dirichlet ${L}$-function ${L(s,\chi) := {\mathcal D} \chi(s)}$; we view this formula as a “twist” of (1) by the Dirichlet character ${\chi}$. The explicit formula (5), proven similarly (in any of its rigorous forms) to (1), is important in establishing the prime number theorem in arithmetic progressions, which asserts that

$\displaystyle \sum_{n \leq x: n = a\ (q)} \Lambda(n) = \frac{x}{\phi(q)} + o(x) \ \ \ \ \ (6)$

as ${x \rightarrow \infty}$, whenever ${a\ (q)}$ is a fixed primitive residue class. Again, the size of the error term ${o(x)}$ here is closely tied to the location of the zeroes of the Dirichlet ${L}$-function, with particular importance given to whether there is a zero very close to ${s=1}$ (such a zero is known as an exceptional zero or Siegel zero).

While any information on the behaviour of zeta functions or ${L}$-functions is in principle welcome for the purposes of analytic number theory, some regions of the complex plane are more important than others in this regard, due to the differing weights assigned to each zero in the explicit formula. Roughly speaking, in descending order of importance, the most crucial regions on which knowledge of these functions is useful are

1. The region on or near the point ${s=1}$.
2. The region on or near the right edge ${\{ 1+it: t \in {\bf R} \}}$ of the critical strip ${\{ s: 0 \leq \hbox{Re}(s) \leq 1 \}}$.
3. The right half ${\{ s: \frac{1}{2} < \hbox{Re}(s) < 1 \}}$ of the critical strip.
4. The region on or near the critical line ${\{ \frac{1}{2} + it: t \in {\bf R} \}}$ that bisects the critical strip.
5. Everywhere else.

For instance:

1. We will shortly show that the Riemann zeta function ${\zeta}$ has a simple pole at ${s=1}$ with residue ${1}$, which is already sufficient to recover much of the classical theorems of Mertens discussed in the previous set of notes, as well as results on mean values of multiplicative functions such as the divisor function ${\tau}$. For Dirichlet ${L}$-functions, the behaviour is instead controlled by the quantity ${L(1,\chi)}$ discussed in Notes 1, which is in turn closely tied to the existence and location of a Siegel zero.
2. The zeta function is also known to have no zeroes on the right edge ${\{1+it: t \in {\bf R}\}}$ of the critical strip, which is sufficient to prove (and is in fact equivalent to) the prime number theorem. Any enlargement of the zero-free region for ${\zeta}$ into the critical strip leads to improved error terms in that theorem, with larger zero-free regions leading to stronger error estimates. Similarly for ${L}$-functions and the prime number theorem in arithmetic progressions.
3. The (as yet unproven) Riemann hypothesis prohibits ${\zeta}$ from having any zeroes within the right half ${\{ s: \frac{1}{2} < \hbox{Re}(s) < 1 \}}$ of the critical strip, and gives very good control on the number of primes in intervals, even when the intervals are relatively short compared to the size of the entries. Even without assuming the Riemann hypothesis, zero density estimates in this region are available that give some partial control of this form. Similarly for ${L}$-functions, primes in short arithmetic progressions, and the generalised Riemann hypothesis.
4. Assuming the Riemann hypothesis, further distributional information about the zeroes on the critical line (such as Montgomery’s pair correlation conjecture, or the more general GUE hypothesis) can give finer information about the error terms in the prime number theorem in short intervals, as well as other arithmetic information. Again, one has analogues for ${L}$-functions and primes in short arithmetic progressions.
5. The functional equation of the zeta function describes the behaviour of ${\zeta}$ to the left of the critical line, in terms of the behaviour to the right of the critical line. This is useful for building a “global” picture of the structure of the zeta function, and for improving a number of estimates about that function, but (in the absence of unproven conjectures such as the Riemann hypothesis or the pair correlation conjecture) it turns out that many of the basic analytic number theory results using the zeta function can be established without relying on this equation. Similarly for ${L}$-functions.

Remark 1 If one takes an “adelic” viewpoint, one can unite the Riemann zeta function ${\zeta(\sigma+it) = \sum_n n^{-\sigma-it}}$ and all of the ${L}$-functions ${L(\sigma+it,\chi) = \sum_n \chi(n) n^{-\sigma-it}}$ for various Dirichlet characters ${\chi}$ into a single object, viewing ${n \mapsto \chi(n) n^{-it}}$ as a general multiplicative character on the adeles; thus the imaginary coordinate ${t}$ and the Dirichlet character ${\chi}$ are really the Archimedean and non-Archimedean components respectively of a single adelic frequency parameter. This viewpoint was famously developed in Tate’s thesis, which among other things helps to clarify the nature of the functional equation, as discussed in this previous post. We will not pursue the adelic viewpoint further in these notes, but it does supply a “high-level” explanation for why so much of the theory of the Riemann zeta function extends to the Dirichlet ${L}$-functions. (The non-Archimedean character ${\chi(n)}$ and the Archimedean character ${n^{it}}$ behave similarly from an algebraic point of view, but not so much from an analytic point of view; as such, the adelic viewpoint is well suited for algebraic tasks (such as establishing the functional equation), but not for analytic tasks (such as establishing a zero-free region).)

Roughly speaking, the elementary multiplicative number theory from Notes 1 corresponds to the information one can extract from the complex-analytic method in region 1 of the above hierarchy, while the more advanced elementary number theory used to prove the prime number theorem (and which we will not cover in full detail in these notes) corresponds to what one can extract from regions 1 and 2.

As a consequence of this hierarchy of importance, information about the ${\zeta}$ function away from the critical strip, such as Euler’s identity

$\displaystyle \zeta(2) = \frac{\pi^2}{6}$

or equivalently

$\displaystyle 1 + \frac{1}{2^2} + \frac{1}{3^2} + \dots = \frac{\pi^2}{6}$

or the infamous identity

$\displaystyle \zeta(-1) = -\frac{1}{12},$

which is often presented (slightly misleadingly, if one’s conventions for divergent summation are not made explicit) as

$\displaystyle 1 + 2 + 3 + \dots = -\frac{1}{12},$

are of relatively little direct importance in analytic prime number theory, although they are still of interest for some other, non-number-theoretic, applications. (The quantity ${\zeta(2)}$ does play a minor role as a normalising factor in some asymptotics, see e.g. Exercise 28 from Notes 1, but its precise value is usually not of major importance.) In contrast, the value ${L(1,\chi)}$ of an ${L}$-function at ${s=1}$ turns out to be extremely important in analytic number theory, with many results in this subject relying ultimately on a non-trivial lower-bound on this quantity coming from Siegel’s theorem, discussed below the fold.

For a more in-depth treatment of the topics in this set of notes, see Davenport’s “Multiplicative number theory“.