You are currently browsing the category archive for the ‘math.CV’ category.

— 1. Jensen’s formula —

Suppose ${f}$ is a non-zero rational function ${f =P/Q}$, then by the fundamental theorem of algebra one can write

$\displaystyle f(z) = c \frac{\prod_\rho (z-\rho)}{\prod_\zeta (z-\zeta)}$

for some non-zero constant ${c}$, where ${\rho}$ ranges over the zeroes of ${P}$ (counting multiplicity) and ${\zeta}$ ranges over the zeroes of ${Q}$ (counting multiplicity), and assuming ${z}$ avoids the zeroes of ${Q}$. Taking absolute values and then logarithms, we arrive at the formula

$\displaystyle \log |f(z)| = \log |c| + \sum_\rho \log|z-\rho| - \sum_\zeta \log |z-\zeta|, \ \ \ \ \ (1)$

as long as ${z}$ avoids the zeroes of both ${P}$ and ${Q}$. (In this set of notes we use ${\log}$ for the natural logarithm when applied to a positive real number, and ${\mathrm{Log}}$ for the standard branch of the complex logarithm (which extends ${\log}$); the multi-valued complex logarithm ${\log}$ will only be used in passing.) Alternatively, taking logarithmic derivatives, we arrive at the closely related formula

$\displaystyle \frac{f'(z)}{f(z)} = \sum_\rho \frac{1}{z-\rho} - \sum_\zeta \frac{1}{z-\zeta}, \ \ \ \ \ (2)$

again for ${z}$ avoiding the zeroes of both ${P}$ and ${Q}$. Thus we see that the zeroes and poles of a rational function ${f}$ describe the behaviour of that rational function, as well as close relatives of that function such as the log-magnitude ${\log|f|}$ and log-derivative ${\frac{f'}{f}}$. We have already seen these sorts of formulae arise in our treatment of the argument principle in 246A Notes 4.

Exercise 1 Let ${P(z)}$ be a complex polynomial of degree ${n \geq 1}$.
• (i) (Gauss-Lucas theorem) Show that the complex roots of ${P'(z)}$ are contained in the closed convex hull of the complex roots of ${P(z)}$.
• (ii) (Laguerre separation theorem) If all the complex roots of ${P(z)}$ are contained in a disk ${D(z_0,r)}$, and ${\zeta \not \in D(z_0,r)}$, then all the complex roots of ${nP(z) + (\zeta - z) P'(z)}$ are also contained in ${D(z_0,r)}$. (Hint: apply a suitable Möbius transformation to move ${\zeta}$ to infinity, and then apply part (i) to a polynomial that emerges after applying this transformation.)

There are a number of useful ways to extend these formulae to more general meromorphic functions than rational functions. Firstly there is a very handy “local” variant of (1) known as Jensen’s formula:

Theorem 2 (Jensen’s formula) Let ${f}$ be a meromorphic function on an open neighbourhood of a disk ${\overline{D(z,r)} = \{ z: |z-z_0| \leq r \}}$, with all removable singularities removed. Then, if ${z_0}$ is neither a zero nor a pole of ${f}$, we have

$\displaystyle \log |f(z_0)| = \int_0^1 \log |f(z_0+re^{2\pi i t})|\ dt + \sum_{\rho: |\rho-z_0| \leq r} \log \frac{|\rho-z_0|}{r} \ \ \ \ \ (3)$

$\displaystyle - \sum_{\zeta: |\zeta-z_0| \leq r} \log \frac{|\zeta-z_0|}{r}$

where ${\rho}$ and ${\zeta}$ range over the zeroes and poles of ${f}$ respectively (counting multiplicity) in the disk ${\overline{D(z,r)}}$.

One can view (3) as a truncated (or localised) variant of (1). Note also that the summands ${\log \frac{|\rho-z_0|}{r}, \log \frac{|\zeta-z_0|}{r}}$ are always non-positive.

Proof: By perturbing ${r}$ slightly if necessary, we may assume that none of the zeroes or poles of ${f}$ (which form a discrete set) lie on the boundary circle ${\{ z: |z-z_0| = r \}}$. By translating and rescaling, we may then normalise ${z_0=0}$ and ${r=1}$, thus our task is now to show that

$\displaystyle \log |f(0)| = \int_0^1 \log |f(e^{2\pi i t})|\ dt + \sum_{\rho: |\rho| < 1} \log |\rho| - \sum_{\zeta: |\zeta| < 1} \log |\zeta|. \ \ \ \ \ (4)$

We may remove the poles and zeroes inside the disk ${D(0,1)}$ by the useful device of Blaschke products. Suppose for instance that ${f}$ has a zero ${\rho}$ inside the disk ${D(0,1)}$. Observe that the function

$\displaystyle B_\rho(z) := \frac{\rho - z}{1 - \overline{\rho} z} \ \ \ \ \ (5)$

has magnitude ${1}$ on the unit circle ${\{ z: |z| = 1\}}$, equals ${\rho}$ at the origin, has a simple zero at ${\rho}$, but has no other zeroes or poles inside the disk. Thus Jensen’s formula (4) already holds if ${f}$ is replaced by ${B_\rho}$. To prove (4) for ${f}$, it thus suffices to prove it for ${f/B_\rho}$, which effectively deletes a zero ${\rho}$ inside the disk ${D(0,1)}$ from ${f}$ (and replaces it instead with its inversion ${1/\overline{\rho}}$). Similarly we may remove all the poles inside the disk. As a meromorphic function only has finitely many poles and zeroes inside a compact set, we may thus reduce to the case when ${f}$ has no poles or zeroes on or inside the disk ${D(0,1)}$, at which point our goal is simply to show that

$\displaystyle \log |f(0)| = \int_0^1 \log |f(e^{2\pi i t})|\ dt.$

Since ${f}$ has no zeroes or poles inside the disk, it has a holomorphic logarithm ${F}$ (Exercise 46 of 246A Notes 4). In particular, ${\log |f|}$ is the real part of ${F}$. The claim now follows by applying the mean value property (Exercise 17 of 246A Notes 3) to ${\log f}$. $\Box$

An important special case of Jensen’s formula arises when ${f}$ is holomorphic in a neighborhood of ${\overline{D(z_0,r)}}$, in which case there are no contributions from poles and one simply has

$\displaystyle \int_0^1 \log |f(z_0+re^{2\pi i t})|\ dt = \log |f(z_0)| + \sum_{\rho: |\rho-z_0| \leq r} \log \frac{r}{|\rho-z_0|}. \ \ \ \ \ (6)$

This is quite a useful formula, mainly because the summands ${\log \frac{|\rho-z_0|}{r}}$ are non-negative. Here are some quick applications of this formula:

Exercise 3 Use (6) to give another proof of Liouville’s theorem: a bounded holomorphic function ${f}$ on the entire complex plane is necessarily constant.

Exercise 4 Use Jensen’s formula to prove the fundamental theorem of algebra: a complex polynomial ${P(z)}$ of degree ${n}$ has exactly ${n}$ complex zeroes (counting multiplicity), and can thus be factored as ${P(z) = c (z-z_1) \dots (z-z_n)}$ for some complex numbers ${c,z_1,\dots,z_n}$ with ${c \neq 0}$. (Note that the fundamental theorem was invoked previously in this section, but only for motivational purposes, so the proof here is non-circular.)

Exercise 5 (Shifted Jensen’s formula) Let ${f}$ be a meromorphic function on an open neighbourhood of a disk ${\{ z: |z-z_0| \leq r \}}$, with all removable singularities removed. Show that

$\displaystyle \log |f(z)| = \int_0^1 \log |f(z_0+re^{2\pi i t})| \mathrm{Re} \frac{r e^{2\pi i t} + (z-z_0)}{r e^{2\pi i t} - (z-z_0)}\ dt \ \ \ \ \ (7)$

$\displaystyle + \sum_{\rho: |\rho-z_0| \leq r} \log \frac{|\rho-z|}{|r - \rho^* (z-z_0)|}$

$\displaystyle - \sum_{\zeta: |\zeta-z_0| \leq r} \log \frac{|\zeta-z|}{|r - \zeta^* (z-z_0)|}$

for all ${z}$ in the open disk ${\{ z: |z-z_0| < r\}}$ that are not zeroes or poles of ${f}$, where ${\rho^* = \frac{\overline{\rho-z_0}}{r}}$ and ${\zeta^* = \frac{\overline{\zeta-z_0}}{r}}$. (The function ${\Re \frac{r e^{2\pi i t} + (z-z_0)}{r e^{2\pi i t} - (z-z_0)}}$ appearing in the integrand is sometimes known as the Poisson kernel, particularly if one normalises so that ${z_0=0}$ and ${r=1}$.)

Exercise 6 (Bounded type)
• (i) If ${f}$ is a bounded holomorphic function on ${D(0,1)}$ that is not identically zero, show that ${\liminf_{r \rightarrow 1^-} \int_0^{2\pi} \log |f(re^{i\theta})\ d\theta > -\infty}$.
• (ii) If ${f}$ is a meromorphic function on ${D(0,1)}$ that is the ratio of two bounded holomorphic functions that are not identically zero, show that ${\int_0^{2\pi} |\log |f(re^{i\theta})||\ d\theta < \infty}$. (Functions ${f}$ of this form are said to be of bounded type and lie in the Nevanlinna class for the unit disk ${D(0,1)}$.)

Exercise 7 (Smoothed out Jensen formula) Let ${f}$ be a meromorphic function on an open set ${U}$, and let ${\phi: U \rightarrow {\bf C}}$ be a smooth compactly supported function. Show that

$\displaystyle \sum_\rho \phi(\rho) - \sum_\zeta \phi(\zeta) = \frac{-1}{2\pi} \int\int_U ((\frac{\partial}{\partial x} + i \frac{\partial}{\partial y}) \phi(x+iy)) \frac{f'}{f}(x+iy)\ dx dy$

$\displaystyle \frac{1}{2\pi} \int\int_U ((\frac{\partial^2}{\partial x^2} + \frac{\partial^2}{\partial y}^2) \phi(x+iy)) \log |f(x+iy)|\ dx dy$

where ${\rho, \zeta}$ range over the zeroes and poles of ${f}$ (respectively) in the support of ${\phi}$. Informally argue why this identity is consistent with Jensen’s formula.

When applied to entire functions ${f}$, Jensen’s formula relates the order of growth of ${f}$ near infinity with the density of zeroes of ${f}$. Here is a typical result:

Proposition 8 Let ${f: {\bf C} \rightarrow {\bf C}}$ be an entire function, not identically zero, that obeys a growth bound ${f(z) \leq C \exp( |z|^\rho)}$ for some ${C, \rho > 0}$ and all ${z}$. Then there exists a constant ${C'>0}$ such that ${D(0,R)}$ has at most ${C' R^\rho}$ zeroes (counting multiplicity) for any ${R \geq 1}$.

Entire functions that obey a growth bound of the form ${f(z) \leq C_\varepsilon \exp( |z|^{\rho+\varepsilon})}$ for every ${\varepsilon>0}$ and ${z}$ (where ${C_\varepsilon}$ depends on ${\varepsilon}$) are said to be of order at most ${\rho}$. The above theorem shows that for such functions that are not identically zero, the number of zeroes in a disk of radius ${R}$ does not grow much faster than ${R^\rho}$. This is often a useful preliminary upper bound on the zeroes of entire functions, as the order of an entire function tends to be relatively easy to compute in practice.

Proof: First suppose that ${f(0)}$ is non-zero. From (6) applied with ${r=2R}$ and ${z_0=0}$ one has

$\displaystyle \int_0^1 \log(C \exp( (2R)^\rho ) )\ dt \geq \log |f(0)| + \sum_{\rho: |\rho| \leq 2R} \log \frac{2R}{|\rho|}.$

Every zero in ${D(0,R)}$ contribute at least ${\log 2}$ to a summand on the right-hand side, while all other zeroes contribute a non-negative quantity, thus

$\displaystyle C (2R)^\rho \geq \log |f(0)| + N_R \log 2$

where ${N_R}$ denotes the number of zeroes in ${D(0,R)}$. This gives the claim for ${f(0) \neq 0}$. When ${f(0)=0}$, one can shift ${f}$ by a small amount to make ${f}$ non-zero at the origin (using the fact that zeroes of holomorphic functions not identically zero are isolated), modifying ${C}$ in the process, and then repeating the previous arguments. $\Box$

Just as (3) and (7) give truncated variants of (1), we can create truncated versions of (2). The following crude truncation is adequate for many applications:

Theorem 9 (Truncated formula for log-derivative) Let ${f}$ be a holomorphic function on an open neighbourhood of a disk ${\{ z: |z-z_0| \leq r \}}$ that is not identically zero on this disk. Suppose that one has a bound of the form ${|f(z)| \leq M^{O_{c_1,c_2}(1)} |f(z_0)|}$ for some ${M \geq 1}$ and all ${z}$ on the circle ${\{ z: |z-z_0| = r\}}$. Let ${0 < c_2 < c_1 < 1}$ be constants. Then one has the approximate formula

$\displaystyle \frac{f'(z)}{f(z)} = \sum_{\rho: |\rho - z_0| \leq c_1 r} \frac{1}{z-\rho} + O_{c_1,c_2}( \frac{\log M}{r} )$

for all ${z}$ in the disk ${\{ z: |z-z_0| < c_2 r \}}$ other than zeroes of ${f}$. Furthermore, the number of zeroes ${\rho}$ in the above sum is ${O_{c_1,c_2}(\log M)}$.

Proof: To abbreviate notation, we allow all implied constants in this proof to depend on ${c_1,c_2}$.

We mimic the proof of Jensen’s formula. Firstly, we may translate and rescale so that ${z_0=0}$ and ${r=1}$, so we have ${|f(z)| \leq M^{O(1)} |f(0)|}$ when ${|z|=1}$, and our main task is to show that

$\displaystyle \frac{f'(z)}{f(z)} - \sum_{\rho: |\rho| \leq c_1} \frac{1}{z-\rho} = O( \log M ) \ \ \ \ \ (8)$

for ${|z| \leq c_2}$. Note that if ${f(0)=0}$ then ${f}$ vanishes on the unit circle and hence (by the maximum principle) vanishes identically on the disk, a contradiction, so we may assume ${f(0) \neq 0}$. From hypothesis we then have

$\displaystyle \log |f(z)| \leq \log |f(0)| + O(\log M)$

on the unit circle, and so from Jensen’s formula (3) we see that

$\displaystyle \sum_{\rho: |\rho| \leq 1} \log \frac{1}{|\rho|} = O(\log M). \ \ \ \ \ (9)$

In particular we see that the number of zeroes with ${|\rho| \leq c_1}$ is ${O(\log M)}$, as claimed.

Suppose ${f}$ has a zero ${\rho}$ with ${c_1 < |\rho| \leq 1}$. If we factor ${f = B_\rho g}$, where ${B_\rho}$ is the Blaschke product (5), then

$\displaystyle \frac{f'}{f} = \frac{B'_\rho}{B_\rho} + \frac{g'}{g}$

$\displaystyle = \frac{g'}{g} + \frac{1}{z-\rho} - \frac{1}{z-1/\overline{\rho}}.$

Observe from Taylor expansion that the distance between ${\rho}$ and ${1/\overline{\rho}}$ is ${O( \log \frac{1}{|\rho|} )}$, and hence ${\frac{1}{z-\rho} - \frac{1}{z-1/\overline{\rho}} = O( \log \frac{1}{|\rho|} )}$ for ${|z| \leq c_2}$. Thus we see from (9) that we may use Blaschke products to remove all the zeroes in the annulus ${c_1 < |\rho| \leq 1}$ while only affecting the left-hand side of (8) by ${O( \log M)}$; also, removing the Blaschke products does not affect ${|f(z)|}$ on the unit circle, and only affects ${\log |f(0)|}$ by ${O(\log M)}$ thanks to (9). Thus we may assume without loss of generality that there are no zeroes in this annulus.

Similarly, given a zero ${\rho}$ with ${|\rho| \leq c_1}$, we have ${\frac{1}{z-1/\overline{\rho}} = O(1)}$, so using Blaschke products to remove all of these zeroes also only affects the left-hand side of (8) by ${O(\log M)}$ (since the number of zeroes here is ${O(\log M)}$), with ${\log |f(0)|}$ also modified by at most ${O(\log M)}$. Thus we may assume in fact that ${f}$ has no zeroes whatsoever within the unit disk. We may then also normalise ${f(0) = 1}$, then ${\log |f(e^{2\pi i t})| \leq O(\log M)}$ for all ${t \in [0,1]}$. By Jensen’s formula again, we have

$\displaystyle \int_0^1 \log |f(e^{2\pi i t})|\ dt = 0$

and thus (by using the identity ${|x| = 2 \max(x,0) - x}$ for any real ${x}$)

$\displaystyle \int_0^1 |\log |f(e^{2\pi i t})|\ dt \ll \log M. \ \ \ \ \ (10)$

On the other hand, from (7) we have

$\displaystyle \log |f(z)| = \int_0^1 \log |f(e^{2\pi i t})| \Re \frac{e^{2\pi i t} + z}{e^{2\pi i t} - z}\ dt$

which implies from (10) that ${\log |f(z)|}$ and its first derivatives are ${O( \log M )}$ on the disk ${\{ z: |z| \leq c_2 \}}$. But recall from the proof of Jensen’s formula that ${\frac{f'}{f}}$ is the derivative of a logarithm ${\log f}$ of ${f}$, whose real part is ${\log |f|}$. By the Cauchy-Riemann equations for ${\log f}$, we conclude that ${\frac{f'}{f} = O(\log M)}$ on the disk ${\{ z: |z| \leq c_2 \}}$, as required. $\Box$

Exercise 10
• (i) (Borel-Carathéodory theorem) If ${f: U \rightarrow {\bf C}}$ is analytic on an open neighborhood of a disk ${\overline{D(z_0,R)}}$, show that

$\displaystyle \sup_{z \in D(z_0,r)} |f(z)| \leq \frac{2r}{R-r} \sup_{z \in \overline{D(z_0,R)}} \mathrm{Re} f(z) + \frac{R+r}{R-r} |f(z_0)|.$

(Hint: one can normalise ${z_0=0}$, ${R=1}$, ${f(0)=0}$, and ${\sup_{|z-z_0| \leq R} \mathrm{Re} f(z)=1}$. Now ${f}$ maps the unit disk to the half-plane ${\{ \mathrm{Re} z \geq 1 \}}$. Use a Möbius transformation to map the half-plane to the unit disk and then use the Schwarz lemma.)
• (ii) Use (i) to give an alternate way to conclude the proof of Theorem 9.

A variant of the above argument allows one to make precise the heuristic that holomorphic functions locally look like polynomials:

Exercise 11 (Local Weierstrass factorisation) Let the notation and hypotheses be as in Theorem 9. Then show that

$\displaystyle f(z) = P(z) \exp( g(z) )$

for all ${z}$ in the disk ${\{ z: |z-z_0| < c_2 r \}}$, where ${P}$ is a polynomial whose zeroes are precisely the zeroes of ${f}$ in ${\{ z: |z-z_0| \leq c_1r \}}$ (counting multiplicity) and ${g}$ is a holomorphic function on ${\{ z: |z-z_0| < c_2 r \}}$ of magnitude ${O_{c_1,c_2}( \log M )}$ and first derivative ${O_{c_1,c_2}( \log M / r )}$ on this disk. Furthermore, show that the degree of ${P}$ is ${O_{c_1,c_2}(\log M)}$.

Exercise 12 (Preliminary Beurling factorisation) Let ${H^\infty(D(0,1))}$ denote the space of bounded analytic functions ${f: D(0,1) \rightarrow {\bf C}}$ on the unit disk; this is a normed vector space with norm

$\displaystyle \|f\|_{H^\infty(D(0,1))} := \sup_{z \in D(0,1)} |f(z)|.$

• (i) If ${f \in H^\infty(D(0,1))}$ is not identically zero, and ${z_n}$ denote the zeroes of ${f}$ in ${D(0,1)}$ counting multiplicity, show that

$\displaystyle \sum_n (1-|z_n|) < \infty$

and

$\displaystyle \sup_{0 < r < 1} \int_0^{2\pi} | \log |f(re^{i\theta})| |\ d\theta < \infty.$

• (ii) Let the notation be as in (i). If we define the Blaschke product

$\displaystyle B(z) := z^m \prod_{|z_n| \neq 0} \frac{|z_n|}{z_n} \frac{z_n-z}{1-\overline{z_n} z}$

where ${m}$ is the order of vanishing of ${f}$ at zero, show that this product converges absolutely to a meromorphic function on ${{\bf C}}$ outside of the ${1/\overline{z_n}}$, and that ${|f(z)| \leq \|f\|_{H^\infty(D(0,1)} |B(z)|}$ for all ${z \in D(0,1)}$. (It may be easier to work with finite Blaschke products first to obtain this bound.)
• (iii) Continuing the notation from (i), establish a factorisation ${f(z) = B(z) \exp(g(z))}$ for some holomorphic function ${g: D(0,1) \rightarrow {\bf C}}$ with ${\mathrm{Re}(z) \leq \log \|f\|_{H^\infty(D(0,1)}}$ for all ${z\in D(0,1)}$.
• (iv) (Theorem of F. and M. Riesz, special case) If ${f \in H^\infty(D(0,1))}$ extends continuously to the boundary ${\{e^{i\theta}: 0 \leq \theta < 2\pi\}}$, show that the set ${\{ 0 \leq \theta < 2\pi: f(e^{i\theta})=0 \}}$ has zero measure.

Remark 13 The factorisation (iii) can be refined further, with ${g}$ being the Poisson integral of some finite measure on the unit circle. Using the Lebesgue decomposition of this finite measure into absolutely continuous parts one ends up factorising ${H^\infty(D(0,1))}$ functions into “outer functions” and “inner functions”, giving the Beurling factorisation of ${H^\infty}$. There are also extensions to larger spaces ${H^p(D(0,1))}$ than ${H^\infty(D(0,1))}$ (which are to ${H^\infty}$ as ${L^p}$ is to ${L^\infty}$), known as Hardy spaces. We will not discuss this topic further here, but see for instance this text of Garnett for a treatment.

Exercise 14 (Littlewood’s lemma) Let ${f}$ be holomorphic on an open neighbourhood of a rectangle ${R = \{ \sigma+it: \sigma_0 \leq \sigma \leq \sigma_1; 0 \leq t \leq T \}}$ for some ${\sigma_0 < \sigma_1}$ and ${T>0}$, with ${f}$ non-vanishing on the boundary of the rectangle. Show that

$\displaystyle 2\pi \sum_\rho (\mathrm{Re}(\rho)-\sigma_0) = \int_0^T \log |f(\sigma_0+it)|\ dt - \int_0^T \log |f(\sigma_1+it)|\ dt$

$\displaystyle + \int_{\sigma_0}^{\sigma_1} \mathrm{arg} f(\sigma+iT)\ d\sigma - \int_{\sigma_0}^{\sigma_1} \mathrm{arg} f(\sigma)\ d\sigma$

where ${\rho}$ ranges over the zeroes of ${f}$ inside ${R}$ (counting multiplicity) and one uses a branch of ${\mathrm{arg} f}$ which is continuous on the upper, lower, and right edges of ${C}$. (This lemma is a popular tool to explore the zeroes of Dirichlet series such as the Riemann zeta function.)

Just a short announcement that next quarter I will be continuing the recently concluded 246A complex analysis class as 246B. Topics I plan to cover:

• Schwartz-Christoffel transformations and the uniformisation theorem (using the remainder of the 246A notes);
• Jensen’s formula and factorization theorems (particularly Weierstrass and Hadamard); the Gamma function;
• Connections with the Fourier transform on the real line;
• Elliptic functions and their relatives;
• (if time permits) the Riemann zeta function and the prime number theorem.

Notes for the later material will appear on this blog in due course.

I’ve just uploaded to the arXiv my paper “Sendov’s conjecture for sufficiently high degree polynomials“. This paper is a contribution to an old conjecture of Sendov on the zeroes of polynomials:

Conjecture 1 (Sendov’s conjecture) Let ${f: {\bf C} \rightarrow {\bf C}}$ be a polynomial of degree ${n \geq 2}$ that has all zeroes in the closed unit disk ${\{ z: |z| \leq 1 \}}$. If ${\lambda_0}$ is one of these zeroes, then ${f'}$ has at least one zero in ${\{z: |z-\lambda_0| \leq 1\}}$.

It is common in the literature on this problem to normalise ${f}$ to be monic, and to rotate the zero ${\lambda_0}$ to be an element ${a}$ of the unit interval ${[0,1]}$. As it turns out, the location of ${a}$ on this unit interval ${[0,1]}$ ends up playing an important role in the arguments.

Many cases of this conjecture are already known, for instance

• When ${n<9}$ (Brown-Xiang 1999);
• When ${a=0}$ (Gauss-Lucas theorem);
• When ${a \leq \frac{1}{n-1}}$ (Bojanov 2011);
• When ${c \leq a \leq 1-c}$ for a fixed ${c>0}$, and ${n}$ is sufficiently large depending on ${c}$ (Dégot 2014);
• When ${C n^{-1/7} \leq a \leq 1 - C n^{-1/4}}$ for a sufficiently large absolute constant ${C}$ (Chalebgwa 2020);
• When ${a=1}$ (Rubinstein 1968; Goodman-Rahman-Ratti 1969; Joyal 1969);
• When ${a \geq 1-\varepsilon_n}$, where ${\varepsilon_n>0}$ is sufficiently small depending on ${n}$ (Miller 1993; Vajaitu-Zaharescu 1993);
• When ${a \geq 1 - \frac{1}{2 n^9 4^n}}$ (Chijiwa 2011);
• When ${a \geq 1 - \frac{90}{n^{12} \log n}}$ (Kasmalkar 2014).

In particular, in high degrees the only cases left uncovered by prior results are when ${a}$ is close (but not too close) to ${0}$, or when ${a}$ is close (but not too close) to ${1}$; see Figure 1 of my paper.

Our main result covers the high degree case uniformly for all values of ${a \in [0,1]}$:

Theorem 2 There exists an absolute constant ${n_0}$ such that Sendov’s conjecture holds for all ${n \geq n_0}$.

In principle, this reduces the verification of Sendov’s conjecture to a finite time computation, although our arguments use compactness methods and thus do not easily provide an explicit value of ${n_0}$. I believe that the compactness arguments can be replaced with quantitative substitutes that provide an explicit ${n_0}$, but the value of ${n_0}$ produced is likely to be extremely large (certainly much larger than ${9}$).

Because of the previous results (particularly those of Chalebgwa and Chijiwa), we will only need to establish the following two subcases of the above theorem:

Theorem 3 (Sendov’s conjecture near the origin) Under the additional hypothesis ${a = o(1/\log n)}$, Sendov’s conjecture holds for sufficiently large ${n}$.

Theorem 4 (Sendov’s conjecture near the unit circle) Under the additional hypothesis ${1-o(1) \leq a \leq 1 - \varepsilon_0^n}$ for a fixed ${\varepsilon_0>0}$, Sendov’s conjecture holds for sufficiently large ${n}$.

We approach these theorems using the “compactness and contradiction” strategy, assuming that there is a sequence of counterexamples whose degrees ${n}$ going to infinity, using various compactness theorems to extract various asymptotic objects in the limit ${n \rightarrow \infty}$, and somehow using these objects to derive a contradiction. There are many ways to effect such a strategy; we will use a formalism that I call “cheap nonstandard analysis” and which is common in the PDE literature, in which one repeatedly passes to subsequences as necessary whenever one invokes a compactness theorem to create a limit object. However, the particular choice of asymptotic formalism one selects is not of essential importance for the arguments.

I also found it useful to use the language of probability theory. Given a putative counterexample ${f}$ to Sendov’s conjecture, let ${\lambda}$ be a zero of ${f}$ (chosen uniformly at random among the ${n}$ zeroes of ${f}$, counting multiplicity), and let ${\zeta}$ similarly be a uniformly random zero of ${f'}$. We introduce the logarithmic potentials

$\displaystyle U_\lambda(z) := {\bf E} \log \frac{1}{|z-\lambda|}; \quad U_\zeta(z) := {\bf E} \log \frac{1}{|z-\zeta|}$

and the Stieltjes transforms

$\displaystyle s_\lambda(z) := {\bf E} \frac{1}{z-\lambda}; \quad s_\zeta(z) := {\bf E} \log \frac{1}{z-\zeta}.$

Standard calculations using the fundamental theorem of algebra yield the basic identities

$\displaystyle U_\lambda(z) = \frac{1}{n} \log \frac{1}{|f(z)|}; \quad U_\zeta(z) = \frac{1}{n-1} \log \frac{n}{|f'(z)|}$

and

$\displaystyle s_\lambda(z) = \frac{1}{n} \frac{f'(z)}{f(z)}; \quad s_\zeta(z) = \frac{1}{n-1} \frac{f''(z)}{f'(z)} \ \ \ \ \ (1)$

and in particular the random variables ${\lambda, \zeta}$ are linked to each other by the identity

$\displaystyle U_\lambda(z) - \frac{n-1}{n} U_\zeta(z) = \frac{1}{n} \log |s_\lambda(z)|. \ \ \ \ \ (2)$

On the other hand, the hypotheses of Sendov’s conjecture (and the Gauss-Lucas theorem) place ${\lambda,\zeta}$ inside the unit disk ${\{ z:|z| \leq 1\}}$. Applying Prokhorov’s theorem, and passing to a subsequence, one can then assume that the random variables ${\lambda,\zeta}$ converge in distribution to some limiting random variables ${\lambda^{(\infty)}, \zeta^{(\infty)}}$ (possibly defined on a different probability space than the original variables ${\lambda,\zeta}$), also living almost surely inside the unit disk. Standard potential theory then gives the convergence

$\displaystyle U_\lambda(z) \rightarrow U_{\lambda^{(\infty)}}(z); \quad U_\zeta(z) \rightarrow U_{\zeta^{(\infty)}}(z) \ \ \ \ \ (3)$

and

$\displaystyle s_\lambda(z) \rightarrow s_{\lambda^{(\infty)}}(z); \quad s_\zeta(z) \rightarrow s_{\zeta^{(\infty)}}(z) \ \ \ \ \ (4)$

at least in the local ${L^1}$ sense. Among other things, we then conclude from the identity (2) and some elementary inequalities that

$\displaystyle U_{\lambda^{(\infty)}}(z) = U_{\zeta^{(\infty)}}(z)$

for all ${|z|>1}$. This turns out to have an appealing interpretation in terms of Brownian motion: if one takes two Brownian motions in the complex plane, one originating from ${\lambda^{(\infty)}}$ and one originating from ${\zeta^{(\infty)}}$, then the location where these Brownian motions first exit the unit disk ${\{ z: |z| \leq 1 \}}$ will have the same distribution. (In our paper we actually replace Brownian motion with the closely related formalism of balayage.) This turns out to connect the random variables ${\lambda^{(\infty)}}$, ${\zeta^{(\infty)}}$ quite closely to each other. In particular, with this observation and some additional arguments involving both the unique continuation property for harmonic functions and Grace’s theorem (discussed in this previous post), with the latter drawn from the prior work of Dégot, we can get very good control on these distributions:

Theorem 5
• (i) If ${a = o(1)}$, then ${\lambda^{(\infty)}, \zeta^{(\infty)}}$ almost surely lie in the semicircle ${\{ e^{i\theta}: \pi/2 \leq \theta \leq 3\pi/2\}}$ and have the same distribution.
• (ii) If ${a = 1-o(1)}$, then ${\lambda^{(\infty)}}$ is uniformly distributed on the circle ${\{ z: |z|=1\}}$, and ${\zeta^{(\infty)}}$ is almost surely zero.

In case (i) (and strengthening the hypothesis ${a=o(1)}$ to ${a=o(1/\log n)}$ to control some technical contributions of “outlier” zeroes of ${f}$), we can use this information about ${\lambda^{(\infty)}}$ and (4) to ensure that the normalised logarithmic derivative ${\frac{1}{n} \frac{f'}{f} = s_\lambda}$ has a non-negative winding number in a certain small (but not too small) circle around the origin, which by the argument principle is inconsistent with the hypothesis that ${f}$ has a zero at ${a = o(1)}$ and that ${f'}$ has no zeroes near ${a}$. This is how we establish Theorem 3.

Case (ii) turns out to be more delicate. This is because there are a number of “near-counterexamples” to Sendov’s conjecture that are compatible with the hypotheses and conclusion of case (ii). The simplest such example is ${f(z) = z^n - 1}$, where the zeroes ${\lambda}$ of ${f}$ are uniformly distributed amongst the ${n^{th}}$ roots of unity (including at ${a=1}$), and the zeroes of ${f'}$ are all located at the origin. In my paper I also discuss a variant of this construction, in which ${f'}$ has zeroes mostly near the origin, but also acquires a bounded number of zeroes at various locations ${\lambda_1+o(1),\dots,\lambda_m+o(1)}$ inside the unit disk. Specifically, we take

$\displaystyle f(z) := \left(z + \frac{c_2}{n}\right)^{n-m} P(z) - \left(a + \frac{c_2}{n}\right)^{n-m} P(a)$

where ${a = 1 - \frac{c_1}{n}}$ for some constants ${0 < c_1 < c_2}$ and

$\displaystyle P(z) := (z-\lambda_1) \dots (z-\lambda_m).$

By a perturbative analysis to locate the zeroes of ${f}$, one eventually would be able to arrive at a true counterexample to Sendov’s conjecture if these locations ${\lambda_1,\dots,\lambda_m}$ were in the open lune

$\displaystyle \{ \lambda: |\lambda| < 1 < |\lambda-1| \}$

and if one had the inequality

$\displaystyle c_2 - c_1 - c_2 \cos \theta + \sum_{j=1}^m \log \left|\frac{1 - \lambda_j}{e^{i\theta} - \lambda_j}\right| < 0 \ \ \ \ \ (5)$

for all ${0 \leq \theta \leq 2\pi}$. However, if one takes the mean of this inequality in ${\theta}$, one arrives at the inequality

$\displaystyle c_2 - c_1 + \sum_{j=1}^m \log |1 - \lambda_j| < 0$

which is incompatible with the hypotheses ${c_2 > c_1}$ and ${|\lambda_j-1| > 1}$. In order to extend this argument to more general polynomials ${f}$, we require a stability analysis of the endpoint equation

$\displaystyle c_2 - c_1 + c_2 \cos \theta + \sum_{j=1}^m \log \left|\frac{1 - \lambda_j}{e^{i\theta} - \lambda_j}\right| = 0 \ \ \ \ \ (6)$

where we now only assume the closed conditions ${c_2 \geq c_1}$ and ${|\lambda_j-1| \geq 1}$. The above discussion then places all the zeros ${\lambda_j}$ on the arc

$\displaystyle \{ \lambda: |\lambda| < 1 = |\lambda-1|\} \ \ \ \ \ (7)$

and if one also takes the second Fourier coefficient of (6) one also obtains the vanishing second moment

$\displaystyle \sum_{j=1}^m \lambda_j^2 = 0.$

These two conditions are incompatible with each other (except in the degenerate case when all the ${\lambda_j}$ vanish), because all the non-zero elements ${\lambda}$ of the arc (7) have argument in ${\pm [\pi/3,\pi/2]}$, so in particular their square ${\lambda^2}$ will have negative real part. It turns out that one can adapt this argument to the more general potential counterexamples to Sendov’s conjecture (in the form of Theorem 4). The starting point is to use (1), (4), and Theorem 5(ii) to obtain good control on ${f''/f'}$, which one then integrates and exponentiates to get good control on ${f'}$, and then on a second integration one gets enough information about ${f}$ to pin down the location of its zeroes to high accuracy. The constraint that these zeroes lie inside the unit disk then gives an inequality resembling (5), and an adaptation of the above stability analysis is then enough to conclude. The arguments here are inspired by the previous arguments of Miller, which treated the case when ${a}$ was extremely close to ${1}$ via a similar perturbative analysis; the main novelty is to control the error terms not in terms of the magnitude of the largest zero ${\zeta}$ of ${f'}$ (which is difficult to manage when ${n}$ gets large), but rather by the variance of those zeroes, which ends up being a more tractable expression to keep track of.

Consider a disk ${D(z_0,r) := \{ z: |z-z_0| < r \}}$ in the complex plane. If one applies an affine-linear map ${f(z) = az+b}$ to this disk, one obtains

$\displaystyle f(D(z_0,r)) = D(f(z_0), |f'(z_0)| r).$

For maps that are merely holomorphic instead of affine-linear, one has some variants of this assertion, which I am recording here mostly for my own reference:

Theorem 1 (Holomorphic images of disks) Let ${D(z_0,r)}$ be a disk in the complex plane, and ${f: D(z_0,r) \rightarrow {\bf C}}$ be a holomorphic function with ${f'(z_0) \neq 0}$.
• (i) (Open mapping theorem or inverse function theorem) ${f(D(z_0,r))}$ contains a disk ${D(f(z_0),\varepsilon)}$ for some ${\varepsilon>0}$. (In fact there is even a holomorphic right inverse of ${f}$ from ${D(f(z_0), \varepsilon)}$ to ${D(z_0,r)}$.)
• (ii) (Bloch theorem) ${f(D(z_0,r))}$ contains a disk ${D(w, c |f'(z_0)| r)}$ for some absolute constant ${c>0}$ and some ${w \in {\bf C}}$. (In fact there is even a holomorphic right inverse of ${f}$ from ${D(w, c |f'(z_0)| r)}$ to ${D(z_0,r)}$.)
• (iii) (Koebe quarter theorem) If ${f}$ is injective, then ${f(D(z_0,r))}$ contains the disk ${D(f(z_0), \frac{1}{4} |f'(z_0)| r)}$.
• (iv) If ${f}$ is a polynomial of degree ${n}$, then ${f(D(z_0,r))}$ contains the disk ${D(f(z_0), \frac{1}{n} |f'(z_0)| r)}$.
• (v) If one has a bound of the form ${|f'(z)| \leq A |f'(z_0)|}$ for all ${z \in D(z_0,r)}$ and some ${A>1}$, then ${f(D(z_0,r))}$ contains the disk ${D(f(z_0), \frac{c}{A} |f'(z_0)| r)}$ for some absolute constant ${c>0}$. (In fact there is holomorphic right inverse of ${f}$ from ${D(f(z_0), \frac{c}{A} |f'(z_0)| r)}$ to ${D(z_0,r)}$.)

Parts (i), (ii), (iii) of this theorem are standard, as indicated by the given links. I found part (iv) as (a consequence of) Theorem 2 of this paper of Degot, who remarks that it “seems not already known in spite of its simplicity”; an equivalent form of this result also appears in Lemma 4 of this paper of Miller. The proof is simple:

Proof: (Proof of (iv)) Let ${w \in D(f(z_0), \frac{1}{n} |f'(z_0)| r)}$, then we have a lower bound for the log-derivative of ${f(z)-w}$ at ${z_0}$:

$\displaystyle \frac{|f'(z_0)|}{|f(z_0)-w|} > \frac{n}{r}$

(with the convention that the left-hand side is infinite when ${f(z_0)=w}$). But by the fundamental theorem of algebra we have

$\displaystyle \frac{f'(z_0)}{f(z_0)-w} = \sum_{j=1}^n \frac{1}{z_0-\zeta_j}$

where ${\zeta_1,\dots,\zeta_n}$ are the roots of the polynomial ${f(z)-w}$ (counting multiplicity). By the pigeonhole principle, there must therefore exist a root ${\zeta_j}$ of ${f(z) - w}$ such that

$\displaystyle \frac{1}{|z_0-\zeta_j|} > \frac{1}{r}$

and hence ${\zeta_j \in D(z_0,r)}$. Thus ${f(D(z_0,r))}$ contains ${w}$, and the claim follows. $\Box$

The constant ${\frac{1}{n}}$ in (iv) is completely sharp: if ${f(z) = z^n}$ and ${z_0}$ is non-zero then ${f(D(z_0,|z_0|))}$ contains the disk

$\displaystyle D(f(z_0), \frac{1}{n} |f'(z_0)| r) = D( z_0^n, |z_0|^n)$

but avoids the origin, thus does not contain any disk of the form ${D( z_0^n, |z_0|^n+\varepsilon)}$. This example also shows that despite parts (ii), (iii) of the theorem, one cannot hope for a general inclusion of the form

$\displaystyle f(D(z_0,r)) \supset D(f(z_0), c |f'(z_0)| r )$

for an absolute constant ${c>0}$.

Part (v) is implicit in the standard proof of Bloch’s theorem (part (ii)), and is easy to establish:

Proof: (Proof of (v)) From the Cauchy inequalities one has ${f''(z) = O(\frac{A}{r} |f'(z_0)|)}$ for ${z \in D(z_0,r/2)}$, hence by Taylor’s theorem with remainder ${f(z) = f(z_0) + f'(z_0) (z-z_0) (1 + O( A \frac{|z-z_0|}{r} ) )}$ for ${z \in D(z_0, r/2)}$. By Rouche’s theorem, this implies that the function ${f(z)-w}$ has a unique zero in ${D(z_0, 2cr/A)}$ for any ${w \in D(f(z_0), cr|f'(z_0)|/A)}$, if ${c>0}$ is a sufficiently small absolute constant. The claim follows. $\Box$

Note that part (v) implies part (i). A standard point picking argument also lets one deduce part (ii) from part (v):

Proof: (Proof of (ii)) By shrinking ${r}$ slightly if necessary we may assume that ${f}$ extends analytically to the closure of the disk ${D(z_0,r)}$. Let ${c}$ be the constant in (v) with ${A=2}$; we will prove (iii) with ${c}$ replaced by ${c/2}$. If we have ${|f'(z)| \leq 2 |f'(z_0)|}$ for all ${z \in D(z_0,r/2)}$ then we are done by (v), so we may assume without loss of generality that there is ${z_1 \in D(z_0,r/2)}$ such that ${|f'(z_1)| > 2 |f'(z_0)|}$. If ${|f'(z)| \leq 2 |f'(z_1)|}$ for all ${z \in D(z_1,r/4)}$ then by (v) we have

$\displaystyle f( D(z_0, r) ) \supset f( D(z_1,r/2) ) \supset D( f(z_1), \frac{c}{2} |f'(z_1)| \frac{r}{2} )$

$\displaystyle \supset D( f(z_1), \frac{c}{2} |f'(z_0)| r )$

and we are again done. Hence we may assume without loss of generality that there is ${z_2 \in D(z_1,r/4)}$ such that ${|f'(z_2)| > 2 |f'(z_1)|}$. Iterating this procedure in the obvious fashion we either are done, or obtain a Cauchy sequence ${z_0, z_1, \dots}$ in ${D(z_0,r)}$ such that ${f'(z_j)}$ goes to infinity as ${j \rightarrow \infty}$, which contradicts the analytic nature of ${f}$ (and hence continuous nature of ${f'}$) on the closure of ${D(z_0,r)}$. This gives the claim. $\Box$

Here is another classical result stated by Alexander (and then proven by Kakeya and by Szego, but also implied to a classical theorem of Grace and Heawood) that is broadly compatible with parts (iii), (iv) of the above theorem:

Proposition 2 Let ${D(z_0,r)}$ be a disk in the complex plane, and ${f: D(z_0,r) \rightarrow {\bf C}}$ be a polynomial of degree ${n \geq 1}$ with ${f'(z) \neq 0}$ for all ${z \in D(z_0,r)}$. Then ${f}$ is injective on ${D(z_0, \sin\frac{\pi}{n})}$.

The radius ${\sin \frac{\pi}{n}}$ is best possible, for the polynomial ${f(z) = z^n}$ has ${f'}$ non-vanishing on ${D(1,1)}$, but one has ${f(\cos(\pi/n) e^{i \pi/n}) = f(\cos(\pi/n) e^{-i\pi/n})}$, and ${\cos(\pi/n) e^{i \pi/n}, \cos(\pi/n) e^{-i\pi/n}}$ lie on the boundary of ${D(1,\sin \frac{\pi}{n})}$.

If one narrows ${\sin \frac{\pi}{n}}$ slightly to ${\sin \frac{\pi}{2n}}$ then one can quickly prove this proposition as follows. Suppose for contradiction that there exist distinct ${z_1, z_2 \in D(z_0, \sin\frac{\pi}{n})}$ with ${f(z_1)=f(z_2)}$, thus if we let ${\gamma}$ be the line segment contour from ${z_1}$ to ${z_2}$ then ${\int_\gamma f'(z)\ dz}$. However, by assumption we may factor ${f'(z) = c (z-\zeta_1) \dots (z-\zeta_{n-1})}$ where all the ${\zeta_j}$ lie outside of ${D(z_0,r)}$. Elementary trigonometry then tells us that the argument of ${z-\zeta_j}$ only varies by less than ${\frac{\pi}{n}}$ as ${z}$ traverses ${\gamma}$, hence the argument of ${f'(z)}$ only varies by less than ${\pi}$. Thus ${f'(z)}$ takes values in an open half-plane avoiding the origin and so it is not possible for ${\int_\gamma f'(z)\ dz}$ to vanish.

To recover the best constant of ${\sin \frac{\pi}{n}}$ requires some effort. By taking contrapositives and applying an affine rescaling and some trigonometry, the proposition can be deduced from the following result, known variously as the Grace-Heawood theorem or the complex Rolle theorem.

Proposition 3 (Grace-Heawood theorem) Let ${f: {\bf C} \rightarrow {\bf C}}$ be a polynomial of degree ${n \geq 1}$ such that ${f(1)=f(-1)}$. Then ${f'}$ contains a zero in the closure of ${D( 0, \cot \frac{\pi}{n} )}$.

This is in turn implied by a remarkable and powerful theorem of Grace (which we shall prove shortly). Given two polynomials ${f,g}$ of degree at most ${n}$, define the apolar form ${(f,g)_n}$ by

$\displaystyle (f,g)_n := \sum_{k=0}^n (-1)^k f^{(k)}(0) g^{(n-k)}(0). \ \ \ \ \ (1)$

Theorem 4 (Grace’s theorem) Let ${C}$ be a circle or line in ${{\bf C}}$, dividing ${{\bf C} \backslash C}$ into two open connected regions ${\Omega_1, \Omega_2}$. Let ${f,g}$ be two polynomials of degree at most ${n \geq 1}$, with all the zeroes of ${f}$ lying in ${\Omega_1}$ and all the zeroes of ${g}$ lying in ${\Omega_2}$. Then ${(f,g)_n \neq 0}$.

(Contrapositively: if ${(f,g)_n=0}$, then the zeroes of ${f}$ cannot be separated from the zeroes of ${g}$ by a circle or line.)

Indeed, a brief calculation reveals the identity

$\displaystyle f(1) - f(-1) = (f', g)_{n-1}$

where ${g}$ is the degree ${n-1}$ polynomial

$\displaystyle g(z) := \frac{1}{n!} ((z+1)^n - (z-1)^n).$

The zeroes of ${g}$ are ${i \cot \frac{\pi j}{n}}$ for ${j=1,\dots,n-1}$, so the Grace-Heawood theorem follows by applying Grace’s theorem with ${C}$ equal to the boundary of ${D(0, \cot \frac{\pi}{n})}$.

The same method of proof gives the following nice consequence:

Theorem 5 (Perpendicular bisector theorem) Let ${f: {\bf C} \rightarrow C}$ be a polynomial such that ${f(z_1)=f(z_2)}$ for some distinct ${z_1,z_2}$. Then the zeroes of ${f'}$ cannot all lie on one side of the perpendicular bisector of ${z_1,z_2}$. For instance, if ${f(1)=f(-1)}$, then the zeroes of ${f'}$ cannot all lie in the halfplane ${\{ z: \mathrm{Re} z > 0 \}}$ or the halfplane ${\{ z: \mathrm{Re} z < 0 \}}$.

I’d be interested in seeing a proof of this latter theorem that did not proceed via Grace’s theorem.

Now we give a proof of Grace’s theorem. The case ${n=1}$ can be established by direct computation, so suppose inductively that ${n>1}$ and that the claim has already been established for ${n-1}$. Given the involvement of circles and lines it is natural to suspect that a Möbius transformation symmetry is involved. This is indeed the case and can be made precise as follows. Let ${V_n}$ denote the vector space of polynomials ${f}$ of degree at most ${n}$, then the apolar form is a bilinear form ${(,)_n: V_n \times V_n \rightarrow {\bf C}}$. Each translation ${z \mapsto z+a}$ on the complex plane induces a corresponding map on ${V_n}$, mapping each polynomial ${f}$ to its shift ${\tau_a f(z) := f(z-a)}$. We claim that the apolar form is invariant with respect to these translations:

$\displaystyle ( \tau_a f, \tau_a g )_n = (f,g)_n.$

Taking derivatives in ${a}$, it suffices to establish the skew-adjointness relation

$\displaystyle (f', g)_n + (f,g')_n = 0$

but this is clear from the alternating form of (1).

Next, we see that the inversion map ${z \mapsto 1/z}$ also induces a corresponding map on ${V_n}$, mapping each polynomial ${f \in V_n}$ to its inversion ${\iota f(z) := z^n f(1/z)}$. From (1) we see that this map also (projectively) preserves the apolar form:

$\displaystyle (\iota f, \iota g)_n = (-1)^n (f,g)_n.$

More generally, the group of Möbius transformations on the Riemann sphere acts projectively on ${V_n}$, with each Möbius transformation ${T: {\bf C} \rightarrow {\bf C}}$ mapping each ${f \in V_n}$ to ${Tf(z) := g_T(z) f(T^{-1} z)}$, where ${g_T}$ is the unique (up to constants) rational function that maps this a map from ${V_n}$ to ${V_n}$ (its divisor is ${n(T \infty) - n(\infty)}$). Since the Möbius transformations are generated by translations and inversion, we see that the action of Möbius transformations projectively preserves the apolar form; also, we see this action of ${T}$ on ${V_n}$ also moves the zeroes of each ${f \in V_n}$ by ${T}$ (viewing polynomials of degree less than ${n}$ in ${V_n}$ as having zeroes at infinity). In particular, the hypotheses and conclusions of Grace’s theorem are preserved by this Möbius action. We can then apply such a transformation to move one of the zeroes of ${f}$ to infinity (thus making ${f}$ a polynomial of degree ${n-1}$), so that ${C}$ must now be a circle, with the zeroes of ${g}$ inside the circle and the remaining zeroes of ${f}$ outside the circle. But then

$\displaystyle (f,g)_n = (f, g')_{n-1}.$

By the Gauss-Lucas theorem, the zeroes of ${g'}$ are also inside ${C}$. The claim now follows from the induction hypothesis.

Dimitri Shlyakhtenko and I have uploaded to the arXiv our paper Fractional free convolution powers. For me, this project (which we started during the 2018 IPAM program on quantitative linear algebra) was motivated by a desire to understand the behavior of the minor process applied to a large random Hermitian ${N \times N}$ matrix ${A_N}$, in which one takes the successive upper left ${n \times n}$ minors ${A_n}$ of ${A_N}$ and computes their eigenvalues ${\lambda_1(A_n) \leq \dots \leq \lambda_n(A_n)}$ in non-decreasing order. These eigenvalues are related to each other by the Cauchy interlacing inequalities

$\displaystyle \lambda_i(A_{n+1}) \leq \lambda_i(A_n) \leq \lambda_{i+1}(A_{n+1})$

for ${1 \leq i \leq n < N}$, and are often arranged in a triangular array known as a Gelfand-Tsetlin pattern, as discussed in these previous blog posts.

When ${N}$ is large and the matrix ${A_N}$ is a random matrix with empirical spectral distribution converging to some compactly supported probability measure ${\mu}$ on the real line, then under suitable hypotheses (e.g., unitary conjugation invariance of the random matrix ensemble ${A_N}$), a “concentration of measure” effect occurs, with the spectral distribution of the minors ${A_n}$ for ${n = \lfloor N/k\rfloor}$ for any fixed ${k \geq 1}$ converging to a specific measure ${k^{-1}_* \mu^{\boxplus k}}$ that depends only on ${\mu}$ and ${k}$. The reason for this notation is that there is a surprising description of this measure ${k^{-1}_* \mu^{\boxplus k}}$ when ${k}$ is a natural number, namely it is the free convolution ${\mu^{\boxplus k}}$ of ${k}$ copies of ${\mu}$, pushed forward by the dilation map ${x \mapsto k^{-1} x}$. For instance, if ${\mu}$ is the Wigner semicircular measure ${d\mu_{sc} = \frac{1}{\pi} (4-x^2)^{1/2}_+\ dx}$, then ${k^{-1}_* \mu_{sc}^{\boxplus k} = k^{-1/2}_* \mu_{sc}}$. At the random matrix level, this reflects the fact that the minor of a GUE matrix is again a GUE matrix (up to a renormalizing constant).

As first observed by Bercovici and Voiculescu and developed further by Nica and Speicher, among other authors, the notion of a free convolution power ${\mu^{\boxplus k}}$ of ${\mu}$ can be extended to non-integer ${k \geq 1}$, thus giving the notion of a “fractional free convolution power”. This notion can be defined in several different ways. One of them proceeds via the Cauchy transform

$\displaystyle G_\mu(z) := \int_{\bf R} \frac{d\mu(x)}{z-x}$

of the measure ${\mu}$, and ${\mu^{\boxplus k}}$ can be defined by solving the Burgers-type equation

$\displaystyle (k \partial_k + z \partial_z) G_{\mu^{\boxplus k}}(z) = \frac{\partial_z G_{\mu^{\boxplus k}}(z)}{G_{\mu^{\boxplus k}}(z)} \ \ \ \ \ (1)$

with initial condition ${G_{\mu^{\boxplus 1}} = G_\mu}$ (see this previous blog post for a derivation). This equation can be solved explicitly using the ${R}$-transform ${R_\mu}$ of ${\mu}$, defined by solving the equation

$\displaystyle \frac{1}{G_\mu(z)} + R_\mu(G_\mu(z)) = z$

for sufficiently large ${z}$, in which case one can show that

$\displaystyle R_{\mu^{\boxplus k}}(z) = k R_\mu(z).$

(In the case of the semicircular measure ${\mu_{sc}}$, the ${R}$-transform is simply the identity: ${R_{\mu_{sc}}(z)=z}$.)

Nica and Speicher also gave a free probability interpretation of the fractional free convolution power: if ${A}$ is a noncommutative random variable in a noncommutative probability space ${({\mathcal A},\tau)}$ with distribution ${\mu}$, and ${p}$ is a real projection operator free of ${A}$ with trace ${1/k}$, then the “minor” ${[pAp]}$ of ${A}$ (viewed as an element of a new noncommutative probability space ${({\mathcal A}_p, \tau_p)}$ whose elements are minors ${[pXp]}$, ${X \in {\mathcal A}}$ with trace ${\tau_p([pXp]) := k \tau(pXp)}$) has the law of ${k^{-1}_* \mu^{\boxplus k}}$ (we give a self-contained proof of this in an appendix to our paper). This suggests that the minor process (or fractional free convolution) can be studied within the framework of free probability theory.

One of the known facts about integer free convolution powers ${\mu^{\boxplus k}}$ is monotonicity of the free entropy

$\displaystyle \chi(\mu) = \int_{\bf R} \int_{\bf R} \log|s-t|\ d\mu(s) d\mu(t) + \frac{3}{4} + \frac{1}{2} \log 2\pi$

and free Fisher information

$\displaystyle \Phi(\mu) = \frac{2\pi^2}{3} \int_{\bf R} \left(\frac{d\mu}{dx}\right)^3\ dx$

which were introduced by Voiculescu as free probability analogues of the classical probability concepts of differential entropy and classical Fisher information. (Here we correct a small typo in the normalization constant of Fisher entropy as presented in Voiculescu’s paper.) Namely, it was shown by Shylakhtenko that the quantity ${\chi(k^{-1/2}_* \mu^{\boxplus k})}$ is monotone non-decreasing for integer ${k}$, and the Fisher information ${\Phi(k^{-1/2}_* \mu^{\boxplus k})}$ is monotone non-increasing for integer ${k}$. This is the free probability analogue of the corresponding monotonicities for differential entropy and classical Fisher information that was established by Artstein, Ball, Barthe, and Naor, answering a question of Shannon.

Our first main result is to extend the monotonicity results of Shylakhtenko to fractional ${k \geq 1}$. We give two proofs of this fact, one using free probability machinery, and a more self contained (but less motivated) proof using integration by parts and contour integration. The free probability proof relies on the concept of the free score ${J(X)}$ of a noncommutative random variable, which is the analogue of the classical score. The free score, also introduced by Voiculescu, can be defined by duality as measuring the perturbation with respect to semicircular noise, or more precisely

$\displaystyle \frac{d}{d\varepsilon} \tau( Z P( X + \varepsilon Z) )|_{\varepsilon=0} = \tau( J(X) P(X) )$

whenever ${P}$ is a polynomial and ${Z}$ is a semicircular element free of ${X}$. If ${X}$ has an absolutely continuous law ${\mu = f\ dx}$ for a sufficiently regular ${f}$, one can calculate ${J(X)}$ explicitly as ${J(X) = 2\pi Hf(X)}$, where ${Hf}$ is the Hilbert transform of ${f}$, and the Fisher information is given by the formula

$\displaystyle \Phi(X) = \tau( J(X)^2 ).$

One can also define a notion of relative free score ${J(X:B)}$ relative to some subalgebra ${B}$ of noncommutative random variables.

The free score interacts very well with the free minor process ${X \mapsto [pXp]}$, in particular by standard calculations one can establish the identity

$\displaystyle J( [pXp] : [pBp] ) = k {\bf E}( [p J(X:B) p] | [pXp], [pBp] )$

whenever ${X}$ is a noncommutative random variable, ${B}$ is an algebra of noncommutative random variables, and ${p}$ is a real projection of trace ${1/k}$ that is free of both ${X}$ and ${B}$. The monotonicity of free Fisher information then follows from an application of Pythagoras’s theorem (which implies in particular that conditional expectation operators are contractions on ${L^2}$). The monotonicity of free entropy then follows from an integral representation of free entropy as an integral of free Fisher information along the free Ornstein-Uhlenbeck process (or equivalently, free Fisher information is essentially the rate of change of free entropy with respect to perturbation by semicircular noise). The argument also shows when equality holds in the monotonicity inequalities; this occurs precisely when ${\mu}$ is a semicircular measure up to affine rescaling.

After an extensive amount of calculation of all the quantities that were implicit in the above free probability argument (in particular computing the various terms involved in the application of Pythagoras’ theorem), we were able to extract a self-contained proof of monotonicity that relied on differentiating the quantities in ${k}$ and using the differential equation (1). It turns out that if ${d\mu = f\ dx}$ for sufficiently regular ${f}$, then there is an identity

$\displaystyle \partial_k \Phi( k^{-1/2}_* \mu^{\boxplus k} ) = -\frac{1}{2\pi^2} \lim_{\varepsilon \rightarrow 0} \sum_{\alpha,\beta = \pm} f(x) f(y) K(x+i\alpha \varepsilon, y+i\beta \varepsilon)\ dx dy \ \ \ \ \ (2)$

where ${K}$ is the kernel

$\displaystyle K(z,w) := \frac{1}{G(z) G(w)} (\frac{G(z)-G(w)}{z-w} + G(z) G(w))^2$

and ${G(z) := G_\mu(z)}$. It is not difficult to show that ${K(z,\overline{w})}$ is a positive semi-definite kernel, which gives the required monotonicity. It would be interesting to obtain some more insightful interpretation of the kernel ${K}$ and the identity (2).

These monotonicity properties hint at the minor process ${A \mapsto [pAp]}$ being associated to some sort of “gradient flow” in the ${k}$ parameter. We were not able to formalize this intuition; indeed, it is not clear what a gradient flow on a varying noncommutative probability space ${({\mathcal A}_p, \tau_p)}$ even means. However, after substantial further calculation we were able to formally describe the minor process as the Euler-Lagrange equation for an intriguing Lagrangian functional that we conjecture to have a random matrix interpretation. We first work in “Lagrangian coordinates”, defining the quantity ${\lambda(s,y)}$ on the “Gelfand-Tsetlin pyramid”

$\displaystyle \Delta = \{ (s,y): 0 < s < 1; 0 < y < s \}$

by the formula

$\displaystyle \mu^{\boxplus 1/s}((-\infty,\lambda(s,y)/s])=y/s,$

which is well defined if the density of ${\mu}$ is sufficiently well behaved. The random matrix interpretation of ${\lambda(s,y)}$ is that it is the asymptotic location of the ${\lfloor yN\rfloor^{th}}$ eigenvalue of the ${\lfloor sN \rfloor \times \lfloor sN \rfloor}$ upper left minor of a random ${N \times N}$ matrix ${A_N}$ with asymptotic empirical spectral distribution ${\mu}$ and with unitarily invariant distribution, thus ${\lambda}$ is in some sense a continuum limit of Gelfand-Tsetlin patterns. Thus for instance the Cauchy interlacing laws in this asymptotic limit regime become

$\displaystyle 0 \leq \partial_s \lambda \leq \partial_y \lambda.$

After a lengthy calculation (involving extensive use of the chain rule and product rule), the equation (1) is equivalent to the Euler-Lagrange equation

$\displaystyle \partial_s L_{\lambda_s}(\partial_s \lambda, \partial_y \lambda) + \partial_y L_{\lambda_y}(\partial_s \lambda, \partial_y \lambda) = 0$

where ${L}$ is the Lagrangian density

$\displaystyle L(\lambda_s, \lambda_y) := \log \lambda_y + \log \sin( \pi \frac{\lambda_s}{\lambda_y} ).$

Thus the minor process is formally a critical point of the integral ${\int_\Delta L(\partial_s \lambda, \partial_y \lambda)\ ds dy}$. The quantity ${\partial_y \lambda}$ measures the mean eigenvalue spacing at some location of the Gelfand-Tsetlin pyramid, and the ratio ${\frac{\partial_s \lambda}{\partial_y \lambda}}$ measures mean eigenvalue drift in the minor process. This suggests that this Lagrangian density is some sort of measure of entropy of the asymptotic microscale point process emerging from the minor process at this spacing and drift. There is work of Metcalfe demonstrating that this point process is given by the Boutillier bead model, so we conjecture that this Lagrangian density ${L}$ somehow measures the entropy density of this process.

Kari Astala, Steffen Rohde, Eero Saksman and I have (finally!) uploaded to the arXiv our preprint “Homogenization of iterated singular integrals with applications to random quasiconformal maps“. This project started (and was largely completed) over a decade ago, but for various reasons it was not finalised until very recently. The motivation for this project was to study the behaviour of “random” quasiconformal maps. Recall that a (smooth) quasiconformal map is a homeomorphism ${f: {\bf C} \rightarrow {\bf C}}$ that obeys the Beltrami equation

$\displaystyle \frac{\partial f}{\partial \overline{z}} = \mu \frac{\partial f}{\partial z}$

for some Beltrami coefficient ${\mu: {\bf C} \rightarrow D(0,1)}$; this can be viewed as a deformation of the Cauchy-Riemann equation ${\frac{\partial f}{\partial \overline{z}} = 0}$. Assuming that ${f(z)}$ is asymptotic to ${z}$ at infinity, one can (formally, at least) solve for ${f}$ in terms of ${\mu}$ using the Beurling transform

$\displaystyle Tf(z) := \frac{\partial}{\partial z}(\frac{\partial f}{\partial \overline{z}})^{-1}(z) = -\frac{1}{\pi} p.v. \int_{\bf C} \frac{f(w)}{(w-z)^2}\ dw$

by the Neumann series

$\displaystyle \frac{\partial f}{\partial \overline{z}} = \mu + \mu T \mu + \mu T \mu T \mu + \dots.$

We looked at the question of the asymptotic behaviour of ${f}$ if ${\mu = \mu_\delta}$ is a random field that oscillates at some fine spatial scale ${\delta>0}$. A simple model to keep in mind is

$\displaystyle \mu_\delta(z) = \varphi(z) \sum_{n \in {\bf Z}^2} \epsilon_n 1_{n\delta + [0,\delta]^2}(z) \ \ \ \ \ (1)$

where ${\epsilon_n = \pm 1}$ are independent random signs and ${\varphi: {\bf C} \rightarrow D(0,1)}$ is a bump function. For models such as these, we show that a homogenisation occurs in the limit ${\delta \rightarrow 0}$; each multilinear expression

$\displaystyle \mu_\delta T \mu_\delta \dots T \mu_\delta \ \ \ \ \ (2)$

converges weakly in probability (and almost surely, if we restrict ${\delta}$ to a lacunary sequence) to a deterministic limit, and the associated quasiconformal map ${f = f_\delta}$ similarly converges weakly in probability (or almost surely). (Results of this latter type were also recently obtained by Ivrii and Markovic by a more geometric method which is simpler, but is applied to a narrower class of Beltrami coefficients.) In the specific case (1), the limiting quasiconformal map is just the identity map ${f(z)=z}$, but if for instance replaces the ${\epsilon_n}$ by non-symmetric random variables then one can have significantly more complicated limits. The convergence theorem for multilinear expressions such as is not specific to the Beurling transform ${T}$; any other translation and dilation invariant singular integral can be used here.

The random expression (2) is somewhat reminiscent of a moment of a random matrix, and one can start computing it analogously. For instance, if one has a decomposition ${\mu_\delta = \sum_{n \in {\bf Z}^2} \mu_{\delta,n}}$ such as (1), then (2) expands out as a sum

$\displaystyle \sum_{n_1,\dots,n_k \in {\bf Z}^2} \mu_{\delta,n_1} T \mu_{\delta,n_2} \dots T \mu_{\delta,n_k}$

The random fluctuations of this sum can be treated by a routine second moment estimate, and the main task is to show that the expected value

$\displaystyle \sum_{n_1,\dots,n_k \in {\bf Z}^2} \mathop{\bf E}(\mu_{\delta,n_1} T \mu_{\delta,n_2} \dots T \mu_{\delta,n_k}) \ \ \ \ \ (3)$

becomes asymptotically independent of ${\delta}$.

If all the ${n_1,\dots,n_k}$ were distinct then one could use independence to factor the expectation to get

$\displaystyle \sum_{n_1,\dots,n_k \in {\bf Z}^2} \mathop{\bf E}(\mu_{\delta,n_1}) T \mathop{\bf E}(\mu_{\delta,n_2}) \dots T \mathop{\bf E}(\mu_{\delta,n_k})$

which is a relatively straightforward expression to calculate (particularly in the model (1), where all the expectations here in fact vanish). The main difficulty is that there are a number of configurations in (3) in which various of the ${n_j}$ collide with each other, preventing one from easily factoring the expression. A typical problematic contribution for instance would be a sum of the form

$\displaystyle \sum_{n_1,n_2 \in {\bf Z}^2: n_1 \neq n_2} \mathop{\bf E}(\mu_{\delta,n_1} T \mu_{\delta,n_2} T \mu_{\delta,n_1} T \mu_{\delta,n_2}). \ \ \ \ \ (4)$

This is an example of what we call a non-split sum. This can be compared with the split sum

$\displaystyle \sum_{n_1,n_2 \in {\bf Z}^2: n_1 \neq n_2} \mathop{\bf E}(\mu_{\delta,n_1} T \mu_{\delta,n_1} T \mu_{\delta,n_2} T \mu_{\delta,n_2}). \ \ \ \ \ (5)$

If we ignore the constraint ${n_1 \neq n_2}$ in the latter sum, then it splits into

$\displaystyle f_\delta T g_\delta$

where

$\displaystyle f_\delta := \sum_{n_1 \in {\bf Z}^2} \mathop{\bf E}(\mu_{\delta,n_1} T \mu_{\delta,n_1})$

and

$\displaystyle g_\delta := \sum_{n_2 \in {\bf Z}^2} \mathop{\bf E}(\mu_{\delta,n_2} T \mu_{\delta,n_2})$

and one can hope to treat this sum by an induction hypothesis. (To actually deal with constraints such as ${n_1 \neq n_2}$ requires an inclusion-exclusion argument that creates some notational headaches but is ultimately manageable.) As the name suggests, the non-split configurations such as (4) cannot be factored in this fashion, and are the most difficult to handle. A direct computation using the triangle inequality (and a certain amount of combinatorics and induction) reveals that these sums are somewhat localised, in that dyadic portions such as

$\displaystyle \sum_{n_1,n_2 \in {\bf Z}^2: |n_1 - n_2| \sim R} \mathop{\bf E}(\mu_{\delta,n_1} T \mu_{\delta,n_2} T \mu_{\delta,n_1} T \mu_{\delta,n_2})$

exhibit power decay in ${R}$ (when measured in suitable function space norms), basically because of the large number of times one has to transition back and forth between ${n_1}$ and ${n_2}$. Thus, morally at least, the dominant contribution to a non-split sum such as (4) comes from the local portion when ${n_2=n_1+O(1)}$. From the translation and dilation invariance of ${T}$ this type of expression then simplifies to something like

$\displaystyle \varphi(z)^4 \sum_{n \in {\bf Z}^2} \eta( \frac{n-z}{\delta} )$

(plus negligible errors) for some reasonably decaying function ${\eta}$, and this can be shown to converge to a weak limit as ${\delta \rightarrow 0}$.

In principle all of these limits are computable, but the combinatorics is remarkably complicated, and while there is certainly some algebraic structure to the calculations, it does not seem to be easily describable in terms of an existing framework (e.g., that of free probability).

A useful rule of thumb in complex analysis is that holomorphic functions ${f(z)}$ behave like large degree polynomials ${P(z)}$. This can be evidenced for instance at a “local” level by the Taylor series expansion for a complex analytic function in the disk, or at a “global” level by factorisation theorems such as the Weierstrass factorisation theorem (or the closely related Hadamard factorisation theorem). One can truncate these theorems in a variety of ways (e.g., Taylor’s theorem with remainder) to be able to approximate a holomorphic function by a polynomial on various domains.

In some cases it can be convenient instead to work with polynomials ${P(Z)}$ of another variable ${Z}$ such as ${Z = e^{2\pi i z}}$ (or more generally ${Z=e^{2\pi i z/N}}$ for a scaling parameter ${N}$). In the case of the Riemann zeta function, defined by meromorphic continuation of the formula

$\displaystyle \zeta(s) = \sum_{n=1}^\infty \frac{1}{n^s} \ \ \ \ \ (1)$

one ends up having the following heuristic approximation in the neighbourhood of a point ${\frac{1}{2}+it}$ on the critical line:

Heuristic 1 (Polynomial approximation) Let ${T \ggg 1}$ be a height, let ${t}$ be a “typical” element of ${[T,2T]}$, and let ${1 \lll N \ll \log T}$ be an integer. Let ${\phi_t = \phi_{t,T}: {\bf C} \rightarrow {\bf C}}$ be the linear change of variables

$\displaystyle \phi_t(z) := \frac{1}{2} + it - \frac{2\pi i z}{\log T}.$

Then one has an approximation

$\displaystyle \zeta( \phi_t(z) ) \approx P_t( e^{2\pi i z/N} ) \ \ \ \ \ (2)$

for ${z = o(N)}$ and some polynomial ${P_t = P_{t,T}}$ of degree ${N}$.

The requirement ${z=o(N)}$ is necessary since the right-hand side is periodic with period ${N}$ in the ${z}$ variable (or period ${\frac{2\pi i N}{\log T}}$ in the ${s = \phi_t(z)}$ variable), whereas the zeta function is not expected to have any such periodicity, even approximately.

Let us give two non-rigorous justifications of this heuristic. Firstly, it is standard that inside the critical strip (with ${\mathrm{Im}(s) = O(T)}$) we have an approximate form

$\displaystyle \zeta(s) \approx \sum_{n \leq T} \frac{1}{n^s}$

of (11). If we group the integers ${n}$ from ${1}$ to ${T}$ into ${N}$ bins depending on what powers of ${T^{1/N}}$ they lie between, we thus have

$\displaystyle \zeta(s) \approx \sum_{j=0}^N \sum_{T^{j/N} \leq n < T^{(j+1)/N}} \frac{1}{n^s}$

For ${s = \phi_t(z)}$ with ${z = o(N)}$ and ${T^{j/N} \leq n < T^{(j+1)/N}}$ we heuristically have

$\displaystyle \frac{1}{n^s} \approx \frac{1}{n^{\frac{1}{2}+it}} e^{2\pi i j z / N}$

and so

$\displaystyle \zeta(s) \approx \sum_{j=0}^N a_j(t) (e^{2\pi i z/N})^j$

where ${a_j(t)}$ are the partial Dirichlet series

$\displaystyle a_j(t) \approx \sum_{T^{j/N} \leq n < T^{(j+1)/N}} \frac{1}{n^{\frac{1}{2}+it}}. \ \ \ \ \ (3)$

This gives the desired polynomial approximation.

A second non-rigorous justification is as follows. From factorisation theorems such as the Hadamard factorisation theorem we expect to have

$\displaystyle \zeta(s) \propto \prod_\rho (s-\rho) \times \dots$

where ${\rho}$ runs over the non-trivial zeroes of ${\zeta}$, and there are some additional factors arising from the trivial zeroes and poles of ${\zeta}$ which we will ignore here; we will also completely ignore the issue of how to renormalise the product to make it converge properly. In the region ${s = \frac{1}{2} + it + o( N / \log T) = \phi_t( \{ z: z = o(N) \})}$, the dominant contribution to this product (besides multiplicative constants) should arise from zeroes ${\rho}$ that are also in this region. The Riemann-von Mangoldt formula suggests that for “typical” ${t}$ one should have about ${N}$ such zeroes. If one lets ${\rho_1,\dots,\rho_N}$ be any enumeration of ${N}$ zeroes closest to ${\frac{1}{2}+it}$, and then repeats this set of zeroes periodically by period ${\frac{2\pi i N}{\log T}}$, one then expects to have an approximation of the form

$\displaystyle \zeta(s) \propto \prod_{j=1}^N \prod_{k \in {\bf Z}} (s-(\rho_j+\frac{2\pi i kN}{\log T}) )$

again ignoring all issues of convergence. If one writes ${s = \phi_t(z)}$ and ${\rho_j = \phi_t(\lambda_j)}$, then Euler’s famous product formula for sine basically gives

$\displaystyle \prod_{k \in {\bf Z}} (s-(\rho_j+\frac{2\pi i kN}{\log T}) ) \propto \prod_{k \in {\bf Z}} (z - (\lambda_j+2\pi k N) )$

$\displaystyle \propto (e^{2\pi i z/N} - e^{2\pi i \lambda j/N})$

(here we are glossing over some technical issues regarding renormalisation of the infinite products, which can be dealt with by studying the asymptotics as ${\mathrm{Im}(z) \rightarrow \infty}$) and hence we expect

$\displaystyle \zeta(s) \propto \prod_{j=1}^N (e^{2\pi i z/N} - e^{2\pi i \lambda j/N}).$

This again gives the desired polynomial approximation.

Below the fold we give a rigorous version of the second argument suitable for “microscale” analysis. More precisely, we will show

Theorem 2 Let ${N = N(T)}$ be an integer going sufficiently slowly to infinity. Let ${W_0 \ll N}$ go to zero sufficiently slowly depending on ${N}$. Let ${t}$ be drawn uniformly at random from ${[T,2T]}$. Then with probability ${1-o(1)}$ (in the limit ${T \rightarrow \infty}$), and possibly after adjusting ${N}$ by ${1}$, there exists a polynomial ${P_t(Z)}$ of degree ${N}$ and obeying the functional equation (9) below, such that

$\displaystyle \zeta( \phi_t(z) ) = (1+o(1)) P_t( e^{2\pi i z/N} ) \ \ \ \ \ (4)$

whenever ${|z| \leq W_0}$.

It should be possible to refine the arguments to extend this theorem to the mesoscale setting by letting ${N}$ be anything growing like ${o(\log T)}$, and ${W_0}$ anything growing like ${o(N)}$; also we should be able to delete the need to adjust ${N}$ by ${1}$. We have not attempted these optimisations here.

Many conjectures and arguments involving the Riemann zeta function can be heuristically translated into arguments involving the polynomials ${P_t(Z)}$, which one can view as random degree ${N}$ polynomials if ${t}$ is interpreted as a random variable drawn uniformly at random from ${[T,2T]}$. These can be viewed as providing a “toy model” for the theory of the Riemann zeta function, in which the complex analysis is simplified to the study of the zeroes and coefficients of this random polynomial (for instance, the role of the gamma function is now played by a monomial in ${Z}$). This model also makes the zeta function theory more closely resemble the function field analogues of this theory (in which the analogue of the zeta function is also a polynomial (or a rational function) in some variable ${Z}$, as per the Weil conjectures). The parameter ${N}$ is at our disposal to choose, and reflects the scale ${\approx N/\log T}$ at which one wishes to study the zeta function. For “macroscopic” questions, at which one wishes to understand the zeta function at unit scales, it is natural to take ${N \approx \log T}$ (or very slightly larger), while for “microscopic” questions one would take ${N}$ close to ${1}$ and only growing very slowly with ${T}$. For the intermediate “mesoscopic” scales one would take ${N}$ somewhere between ${1}$ and ${\log T}$. Unfortunately, the statistical properties of ${P_t}$ are only understood well at a conjectural level at present; even if one assumes the Riemann hypothesis, our understanding of ${P_t}$ is largely restricted to the computation of low moments (e.g., the second or fourth moments) of various linear statistics of ${P_t}$ and related functions (e.g., ${1/P_t}$, ${P'_t/P_t}$, or ${\log P_t}$).

Let’s now heuristically explore the polynomial analogues of this theory in a bit more detail. The Riemann hypothesis basically corresponds to the assertion that all the ${N}$ zeroes of the polynomial ${P_t(Z)}$ lie on the unit circle ${|Z|=1}$ (which, after the change of variables ${Z = e^{2\pi i z/N}}$, corresponds to ${z}$ being real); in a similar vein, the GUE hypothesis corresponds to ${P_t(Z)}$ having the asymptotic law of a random scalar ${a_N(t)}$ times the characteristic polynomial of a random unitary ${N \times N}$ matrix. Next, we consider what happens to the functional equation

$\displaystyle \zeta(s) = \chi(s) \zeta(1-s) \ \ \ \ \ (5)$

where

$\displaystyle \chi(s) := 2^s \pi^{s-1} \sin(\frac{\pi s}{2}) \Gamma(1-s).$

A routine calculation involving Stirling’s formula reveals that

$\displaystyle \chi(\frac{1}{2}+it) = (1+o(1)) e^{-2\pi i L(t)} \ \ \ \ \ (6)$

with ${L(t) := \frac{t}{2\pi} \log \frac{t}{2\pi} - \frac{t}{2\pi} + \frac{7}{8}}$; one also has the closely related approximation

$\displaystyle \frac{\chi'}{\chi}(s) = -\log T + O(1) \ \ \ \ \ (7)$

and hence

$\displaystyle \chi(\phi_t(z)) = (1+o(1)) e^{-2\pi i \theta(t)} e^{2\pi i z} \ \ \ \ \ (8)$

when ${z = o(\log T)}$. Since ${\zeta(1-s) = \overline{\zeta(\overline{1-s})}}$, applying (5) with ${s = \phi_t(z)}$ and using the approximation (2) suggests a functional equation for ${P_t}$:

$\displaystyle P_t(e^{2\pi i z/N}) = e^{-2\pi i L(t)} e^{2\pi i z} \overline{P_t(e^{2\pi i \overline{z}/N})}$

or in terms of ${Z := e^{2\pi i z/N}}$,

$\displaystyle P_t(Z) = e^{-2\pi i L(t)} Z^N \overline{P_t}(1/Z) \ \ \ \ \ (9)$

where ${\overline{P_t}(Z) := \overline{P_t(\overline{Z})}}$ is the polynomial ${P_t}$ with all the coefficients replaced by their complex conjugate. Thus if we write

$\displaystyle P_t(Z) = \sum_{j=0}^N a_j Z^j$

then the functional equation can be written as

$\displaystyle a_j(t) = e^{-2\pi i L(t)} \overline{a_{N-j}(t)}.$

We remark that if we use the heuristic (3) (interpreting the cutoffs in the ${n}$ summation in a suitably vague fashion) then this equation can be viewed as an instance of the Poisson summation formula.

Another consequence of the functional equation is that the zeroes of ${P_t}$ are symmetric with respect to inversion ${Z \mapsto 1/\overline{Z}}$ across the unit circle. This is of course consistent with the Riemann hypothesis, but does not obviously imply it. The phase ${L(t)}$ is of little consequence in this functional equation; one could easily conceal it by working with the phase rotation ${e^{\pi i L(t)} P_t}$ of ${P_t}$ instead.

One consequence of the functional equation is that ${e^{\pi i L(t)} e^{-i N \theta/2} P_t(e^{i\theta})}$ is real for any ${\theta \in {\bf R}}$; the same is then true for the derivative ${e^{\pi i L(t)} e^{i N \theta} (i e^{i\theta} P'_t(e^{i\theta}) - i \frac{N}{2} P_t(e^{i\theta})}$. Among other things, this implies that ${P'_t(e^{i\theta})}$ cannot vanish unless ${P_t(e^{i\theta})}$ does also; thus the zeroes of ${P'_t}$ will not lie on the unit circle except where ${P_t}$ has repeated zeroes. The analogous statement is true for ${\zeta}$; the zeroes of ${\zeta'}$ will not lie on the critical line except where ${\zeta}$ has repeated zeroes.

Relating to this fact, it is a classical result of Speiser that the Riemann hypothesis is true if and only if all the zeroes of the derivative ${\zeta'}$ of the zeta function in the critical strip lie on or to the right of the critical line. The analogous result for polynomials is

Proposition 3 We have

$\displaystyle \# \{ |Z| = 1: P_t(Z) = 0 \} = N - 2 \# \{ |Z| > 1: P'_t(Z) = 0 \}$

(where all zeroes are counted with multiplicity.) In particular, the zeroes of ${P_t(Z)}$ all lie on the unit circle if and only if the zeroes of ${P'_t(Z)}$ lie in the closed unit disk.

Proof: From the functional equation we have

$\displaystyle \# \{ |Z| = 1: P_t(Z) = 0 \} = N - 2 \# \{ |Z| > 1: P_t(Z) = 0 \}.$

Thus it will suffice to show that ${P_t}$ and ${P'_t}$ have the same number of zeroes outside the closed unit disk.

Set ${f(z) := z \frac{P'(z)}{P(z)}}$, then ${f}$ is a rational function that does not have a zero or pole at infinity. For ${e^{i\theta}}$ not a zero of ${P_t}$, we have already seen that ${e^{\pi i L(t)} e^{-i N \theta/2} P_t(e^{i\theta})}$ and ${e^{\pi i L(t)} e^{i N \theta} (i e^{i\theta} P'_t(e^{i\theta}) - i \frac{N}{2} P_t(e^{i\theta})}$ are real, so on dividing we see that ${i f(e^{i\theta}) - \frac{iN}{2}}$ is always real, that is to say

$\displaystyle \mathrm{Re} f(e^{i\theta}) = \frac{N}{2}.$

(This can also be seen by writing ${f(e^{i\theta}) = \sum_\lambda \frac{1}{1-e^{-i\theta} \lambda}}$, where ${\lambda}$ runs over the zeroes of ${P_t}$, and using the fact that these zeroes are symmetric with respect to reflection across the unit circle.) When ${e^{i\theta}}$ is a zero of ${P_t}$, ${f(z)}$ has a simple pole at ${e^{i\theta}}$ with residue a positive multiple of ${e^{i\theta}}$, and so ${f(z)}$ stays on the right half-plane if one traverses a semicircular arc around ${e^{i\theta}}$ outside the unit disk. From this and continuity we see that ${f}$ stays on the right-half plane in a circle slightly larger than the unit circle, and hence by the argument principle it has the same number of zeroes and poles outside of this circle, giving the claim. $\Box$

From the functional equation and the chain rule, ${Z}$ is a zero of ${P'_t}$ if and only if ${1/\overline{Z}}$ is a zero of ${N P_t - P'_t}$. We can thus write the above proposition in the equivalent form

$\displaystyle \# \{ |Z| = 1: P_t(Z) = 0 \} = N - 2 \# \{ |Z| < 1: NP_t(Z) - P'_t(Z) = 0 \}.$

One can use this identity to get a lower bound on the number of zeroes of ${P_t}$ by the method of mollifiers. Namely, for any other polynomial ${M_t}$, we clearly have

$\displaystyle \# \{ |Z| = 1: P_t(Z) = 0 \}$

$\displaystyle \geq N - 2 \# \{ |Z| < 1: M_t(Z)(NP_t(Z) - P'_t(Z)) = 0 \}.$

By Jensen’s formula, we have for any ${r>1}$ that

$\displaystyle \log |M_t(0)| |NP_t(0)-P'_t(0)|$

$\displaystyle \leq -(\log r) \# \{ |Z| < 1: M_t(Z)(NP_t(Z) - P'_t(Z)) = 0 \}$

$\displaystyle + \frac{1}{2\pi} \int_0^{2\pi} \log |M_t(re^{i\theta})(NP_t(e^{i\theta}) - P'_t(re^{i\theta}))|\ d\theta.$

We therefore have

$\displaystyle \# \{ |Z| = 1: P_t(Z) = 0 \} \geq N + \frac{2}{\log r} \log |M_t(0)| |NP_t(0)-P'_t(0)|$

$\displaystyle - \frac{1}{\log r} \frac{1}{2\pi} \int_0^{2\pi} \log |M_t(re^{i\theta})(NP_t(e^{i\theta}) - P'_t(re^{i\theta}))|^2\ d\theta.$

As the logarithm function is concave, we can apply Jensen’s inequality to conclude

$\displaystyle {\bf E} \# \{ |Z| = 1: P_t(Z) = 0 \} \geq N$

$\displaystyle + {\bf E} \frac{2}{\log r} \log |M_t(0)| |NP_t(0)-P'_t(0)|$

$\displaystyle - \frac{1}{\log r} \log \left( \frac{1}{2\pi} \int_0^{2\pi} {\bf E} |M_t(re^{i\theta})(NP_t(e^{i\theta}) - P'_t(re^{i\theta}))|^2\ d\theta\right).$

where the expectation is over the ${t}$ parameter. It turns out that by choosing the mollifier ${M_t}$ carefully in order to make ${M_t P_t}$ behave like the function ${1}$ (while keeping the degree ${M_t}$ small enough that one can compute the second moment here), and then optimising in ${r}$, one can use this inequality to get a positive fraction of zeroes of ${P_t}$ on the unit circle on average. This is the polynomial analogue of a classical argument of Levinson, who used this to show that at least one third of the zeroes of the Riemann zeta function are on the critical line; all later improvements on this fraction have been based on some version of Levinson’s method, mainly focusing on more advanced choices for the mollifier ${M_t}$ and of the differential operator ${N - \partial_z}$ that implicitly appears in the above approach. (The most recent lower bound I know of is ${0.4191637}$, due to Pratt and Robles. In principle (as observed by Farmer) this bound can get arbitrarily close to ${1}$ if one is allowed to use arbitrarily long mollifiers, but establishing this seems of comparable difficulty to unsolved problems such as the pair correlation conjecture; see this paper of Radziwill for more discussion.) A variant of these techniques can also establish “zero density estimates” of the following form: for any ${W \geq 1}$, the number of zeroes of ${P_t}$ that lie further than ${\frac{W}{N}}$ from the unit circle is of order ${O( e^{-cW} N )}$ on average for some absolute constant ${c>0}$. Thus, roughly speaking, most zeroes of ${P_t}$ lie within ${O(1/N)}$ of the unit circle. (Analogues of these results for the Riemann zeta function were worked out by Selberg, by Jutila, and by Conrey, with increasingly strong values of ${c}$.)

The zeroes of ${P'_t}$ tend to live somewhat closer to the origin than the zeroes of ${P_t}$. Suppose for instance that we write

$\displaystyle P_t(Z) = \sum_{j=0}^N a_j(t) Z^j = a_N(t) \prod_{j=1}^N (Z - \lambda_j)$

where ${\lambda_1,\dots,\lambda_N}$ are the zeroes of ${P_t(Z)}$, then by evaluating at zero we see that

$\displaystyle \lambda_1 \dots \lambda_N = (-1)^N a_0(t) / a_N(t)$

and the right-hand side is of unit magnitude by the functional equation. However, if we differentiate

$\displaystyle P'_t(Z) = \sum_{j=1}^N a_j(t) j Z^{j-1} = N a_N(t) \prod_{j=1}^{N-1} (Z - \lambda'_j)$

where ${\lambda'_1,\dots,\lambda'_{N-1}}$ are the zeroes of ${P'_t}$, then by evaluating at zero we now see that

$\displaystyle \lambda'_1 \dots \lambda'_{N-1} = (-1)^N a_1(t) / N a_N(t).$

The right-hand side would now be typically expected to be of size ${O(1/N) \approx \exp(- \log N)}$, and so on average we expect the ${\lambda'_j}$ to have magnitude like ${\exp( - \frac{\log N}{N} )}$, that is to say pushed inwards from the unit circle by a distance roughly ${\frac{\log N}{N}}$. The analogous result for the Riemann zeta function is that the zeroes of ${\zeta'(s)}$ at height ${\sim T}$ lie at a distance roughly ${\frac{\log\log T}{\log T}}$ to the right of the critical line on the average; see this paper of Levinson and Montgomery for a precise statement.

Important note: As this is not a course in probability, we will try to avoid developing the general theory of stochastic calculus (which includes such concepts as filtrations, martingales, and Ito calculus). This will unfortunately limit what we can actually prove rigorously, and so at some places the arguments will be somewhat informal in nature. A rigorous treatment of many of the topics here can be found for instance in Lawler’s Conformally Invariant Processes in the Plane, from which much of the material here is drawn.

In these notes, random variables will be denoted in boldface.

Definition 1 A real random variable ${\mathbf{X}}$ is said to be normally distributed with mean ${x_0 \in {\bf R}}$ and variance ${\sigma^2 > 0}$ if one has

$\displaystyle \mathop{\bf E} F(\mathbf{X}) = \frac{1}{\sqrt{2\pi} \sigma} \int_{\bf R} e^{-(x-x_0)^2/2\sigma^2} F(x)\ dx$

for all test functions ${F \in C_c({\bf R})}$. Similarly, a complex random variable ${\mathbf{Z}}$ is said to be normally distributed with mean ${z_0 \in {\bf R}}$ and variance ${\sigma^2>0}$ if one has

$\displaystyle \mathop{\bf E} F(\mathbf{Z}) = \frac{1}{\pi \sigma^2} \int_{\bf C} e^{-|z-x_0|^2/\sigma^2} F(z)\ dx dy$

for all test functions ${F \in C_c({\bf C})}$, where ${dx dy}$ is the area element on ${{\bf C}}$.

A real Brownian motion with base point ${x_0 \in {\bf R}}$ is a random, almost surely continuous function ${\mathbf{B}^{x_0}: [0,+\infty) \rightarrow {\bf R}}$ (using the locally uniform topology on continuous functions) with the property that (almost surely) ${\mathbf{B}^{x_0}(0) = x_0}$, and for any sequence of times ${0 \leq t_0 < t_1 < t_2 < \dots < t_n}$, the increments ${\mathbf{B}^{x_0}(t_i) - \mathbf{B}^{x_0}(t_{i-1})}$ for ${i=1,\dots,n}$ are independent real random variables that are normally distributed with mean zero and variance ${t_i - t_{i-1}}$. Similarly, a complex Brownian motion with base point ${z_0 \in {\bf R}}$ is a random, almost surely continuous function ${\mathbf{B}^{z_0}: [0,+\infty) \rightarrow {\bf R}}$ with the property that ${\mathbf{B}^{z_0}(0) = z_0}$ and for any sequence of times ${0 \leq t_0 < t_1 < t_2 < \dots < t_n}$, the increments ${\mathbf{B}^{z_0}(t_i) - \mathbf{B}^{z_0}(t_{i-1})}$ for ${i=1,\dots,n}$ are independent complex random variables that are normally distributed with mean zero and variance ${t_i - t_{i-1}}$.

Remark 2 Thanks to the central limit theorem, the hypothesis that the increments ${\mathbf{B}^{x_0}(t_i) - \mathbf{B}^{x_0}(t_{i-1})}$ be normally distributed can be dropped from the definition of a Brownian motion, so long as one retains the independence and the normalisation of the mean and variance (technically one also needs some uniform integrability on the increments beyond the second moment, but we will not detail this here). A similar statement is also true for the complex Brownian motion (where now we need to normalise the variances and covariances of the real and imaginary parts of the increments).

Real and complex Brownian motions exist from any base point ${x_0}$ or ${z_0}$; see e.g. this previous blog post for a construction. We have the following simple invariances:

Exercise 3

• (i) (Translation invariance) If ${\mathbf{B}^{x_0}}$ is a real Brownian motion with base point ${x_0 \in {\bf R}}$, and ${h \in {\bf R}}$, show that ${\mathbf{B}^{x_0}+h}$ is a real Brownian motion with base point ${x_0+h}$. Similarly, if ${\mathbf{B}^{z_0}}$ is a complex Brownian motion with base point ${z_0 \in {\bf R}}$, and ${h \in {\bf C}}$, show that ${\mathbf{B}^{z_0}+c}$ is a complex Brownian motion with base point ${z_0+h}$.
• (ii) (Dilation invariance) If ${\mathbf{B}^{0}}$ is a real Brownian motion with base point ${0}$, and ${\lambda \in {\bf R}}$ is non-zero, show that ${t \mapsto \lambda \mathbf{B}^0(t / |\lambda|^{1/2})}$ is also a real Brownian motion with base point ${0}$. Similarly, if ${\mathbf{B}^0}$ is a complex Brownian motion with base point ${0}$, and ${\lambda \in {\bf C}}$ is non-zero, show that ${t \mapsto \lambda \mathbf{B}^0(t / |\lambda|^{1/2})}$ is also a complex Brownian motion with base point ${0}$.
• (iii) (Real and imaginary parts) If ${\mathbf{B}^0}$ is a complex Brownian motion with base point ${0}$, show that ${\sqrt{2} \mathrm{Re} \mathbf{B}^0}$ and ${\sqrt{2} \mathrm{Im} \mathbf{B}^0}$ are independent real Brownian motions with base point ${0}$. Conversely, if ${\mathbf{B}^0_1, \mathbf{B}^0_2}$ are independent real Brownian motions of base point ${0}$, show that ${\frac{1}{\sqrt{2}} (\mathbf{B}^0_1 + i \mathbf{B}^0_2)}$ is a complex Brownian motion with base point ${0}$.

The next lemma is a special case of the optional stopping theorem.

Lemma 4 (Optional stopping identities)

• (i) (Real case) Let ${\mathbf{B}^{x_0}}$ be a real Brownian motion with base point ${x_0 \in {\bf R}}$. Let ${\mathbf{t}}$ be a bounded stopping time – a bounded random variable with the property that for any time ${t \geq 0}$, the event that ${\mathbf{t} \leq t}$ is determined by the values of the trajectory ${\mathbf{B}^{x_0}}$ for times up to ${t}$ (or more precisely, this event is measurable with respect to the ${\sigma}$ algebra generated by this proprtion of the trajectory). Then

$\displaystyle \mathop{\bf E} \mathbf{B}^{x_0}(\mathbf{t}) = x_0$

and

$\displaystyle \mathop{\bf E} (\mathbf{B}^{x_0}(\mathbf{t})-x_0)^2 - \mathbf{t} = 0$

and

$\displaystyle \mathop{\bf E} (\mathbf{B}^{x_0}(\mathbf{t})-x_0)^4 = O( \mathop{\bf E} \mathbf{t}^2 ).$

• (ii) (Complex case) Let ${\mathbf{B}^{z_0}}$ be a real Brownian motion with base point ${z_0 \in {\bf R}}$. Let ${\mathbf{t}}$ be a bounded stopping time – a bounded random variable with the property that for any time ${t \geq 0}$, the event that ${\mathbf{t} \leq t}$ is determined by the values of the trajectory ${\mathbf{B}^{x_0}}$ for times up to ${t}$. Then

$\displaystyle \mathop{\bf E} \mathbf{B}^{z_0}(\mathbf{t}) = z_0$

$\displaystyle \mathop{\bf E} (\mathrm{Re}(\mathbf{B}^{z_0}(\mathbf{t})-z_0))^2 - \frac{1}{2} \mathbf{t} = 0$

$\displaystyle \mathop{\bf E} (\mathrm{Im}(\mathbf{B}^{z_0}(\mathbf{t})-z_0))^2 - \frac{1}{2} \mathbf{t} = 0$

$\displaystyle \mathop{\bf E} \mathrm{Re}(\mathbf{B}^{z_0}(\mathbf{t})-z_0) \mathrm{Im}(\mathbf{B}^{z_0}(\mathbf{t})-z_0) = 0$

$\displaystyle \mathop{\bf E} |\mathbf{B}^{x_0}(\mathbf{t})-z_0|^4 = O( \mathop{\bf E} \mathbf{t}^2 ).$

Proof: (Slightly informal) We just prove (i) and leave (ii) as an exercise. By translation invariance we can take ${x_0=0}$. Let ${T}$ be an upper bound for ${\mathbf{t}}$. Since ${\mathbf{B}^0(T)}$ is a real normally distributed variable with mean zero and variance ${T}$, we have

$\displaystyle \mathop{\bf E} \mathbf{B}^0( T ) = 0$

and

$\displaystyle \mathop{\bf E} \mathbf{B}^0( T )^2 = T$

and

$\displaystyle \mathop{\bf E} \mathbf{B}^0( T )^4 = 3T^2.$

By the law of total expectation, we thus have

$\displaystyle \mathop{\bf E} \mathop{\bf E}(\mathbf{B}^0( T ) | \mathbf{t}, \mathbf{B}^{z_0}(\mathbf{t}) ) = 0$

and

$\displaystyle \mathop{\bf E} \mathop{\bf E}((\mathbf{B}^0( T ))^2 | \mathbf{t}, \mathbf{B}^{z_0}(\mathbf{t}) ) = T$

and

$\displaystyle \mathop{\bf E} \mathop{\bf E}((\mathbf{B}^0( T ))^4 | \mathbf{t}, \mathbf{B}^{z_0}(\mathbf{t}) ) = 3T^2$

where the inner conditional expectations are with respect to the event that ${\mathbf{t}, \mathbf{B}^{0}(\mathbf{t})}$ attains a particular point in ${S}$. However, from the independent increment nature of Brownian motion, once one conditions ${(\mathbf{t}, \mathbf{B}^{0}(\mathbf{t}))}$ to a fixed point ${(t, x)}$, the random variable ${\mathbf{B}^0(T)}$ becomes a real normally distributed variable with mean ${x}$ and variance ${T-t}$. Thus we have

$\displaystyle \mathop{\bf E}(\mathbf{B}^0( T ) | \mathbf{t}, \mathbf{B}^{z_0}(\mathbf{t}) ) = \mathbf{B}^{z_0}(\mathbf{t})$

and

$\displaystyle \mathop{\bf E}( (\mathbf{B}^0( T ))^2 | \mathbf{t}, \mathbf{B}^{z_0}(\mathbf{t}) ) = \mathbf{B}^{z_0}(\mathbf{t})^2 + T - \mathbf{t}$

and

$\displaystyle \mathop{\bf E}( (\mathbf{B}^0( T ))^4 | \mathbf{t}, \mathbf{B}^{z_0}(\mathbf{t}) ) = \mathbf{B}^{z_0}(\mathbf{t})^4 + 6(T - \mathbf{t}) \mathbf{B}^{z_0}(\mathbf{t})^2 + 3(T - \mathbf{t})^2$

which give the first two claims, and (after some algebra) the identity

$\displaystyle \mathop{\bf E} \mathbf{B}^{z_0}(\mathbf{t})^4 - 6 \mathbf{t} \mathbf{B}^{z_0}(\mathbf{t})^2 + 3 \mathbf{t}^2 = 0$

which then also gives the third claim. $\Box$

Exercise 5 Prove the second part of Lemma 4.

We now approach conformal maps from yet another perspective. Given an open subset ${U}$ of the complex numbers ${{\bf C}}$, define a univalent function on ${U}$ to be a holomorphic function ${f: U \rightarrow {\bf C}}$ that is also injective. We will primarily be studying this concept in the case when ${U}$ is the unit disk ${D(0,1) := \{ z \in {\bf C}: |z| < 1 \}}$.

Clearly, a univalent function ${f: D(0,1) \rightarrow {\bf C}}$ on the unit disk is a conformal map from ${D(0,1)}$ to the image ${f(D(0,1))}$; in particular, ${f(D(0,1))}$ is simply connected, and not all of ${{\bf C}}$ (since otherwise the inverse map ${f^{-1}: {\bf C} \rightarrow D(0,1)}$ would violate Liouville’s theorem). In the converse direction, the Riemann mapping theorem tells us that every open simply connected proper subset ${V \subsetneq {\bf C}}$ of the complex numbers is the image of a univalent function on ${D(0,1)}$. Furthermore, if ${V}$ contains the origin, then the univalent function ${f: D(0,1) \rightarrow {\bf C}}$ with this image becomes unique once we normalise ${f(0) = 0}$ and ${f'(0) > 0}$. Thus the Riemann mapping theorem provides a one-to-one correspondence between open simply connected proper subsets of the complex plane containing the origin, and univalent functions ${f: D(0,1) \rightarrow {\bf C}}$ with ${f(0)=0}$ and ${f'(0)>0}$. We will focus particular attention on the univalent functions ${f: D(0,1) \rightarrow {\bf C}}$ with the normalisation ${f(0)=0}$ and ${f'(0)=1}$; such functions will be called schlicht functions.

One basic example of a univalent function on ${D(0,1)}$ is the Cayley transform ${z \mapsto \frac{1+z}{1-z}}$, which is a Möbius transformation from ${D(0,1)}$ to the right half-plane ${\{ \mathrm{Re}(z) > 0 \}}$. (The slight variant ${z \mapsto \frac{1-z}{1+z}}$ is also referred to as the Cayley transform, as is the closely related map ${z \mapsto \frac{z-i}{z+i}}$, which maps ${D(0,1)}$ to the upper half-plane.) One can square this map to obtain a further univalent function ${z \mapsto \left( \frac{1+z}{1-z} \right)^2}$, which now maps ${D(0,1)}$ to the complex numbers with the negative real axis ${(-\infty,0]}$ removed. One can normalise this function to be schlicht to obtain the Koebe function

$\displaystyle f(z) := \frac{1}{4}\left( \left( \frac{1+z}{1-z} \right)^2 - 1\right) = \frac{z}{(1-z)^2}, \ \ \ \ \ (1)$

which now maps ${D(0,1)}$ to the complex numbers with the half-line ${(-\infty,-1/4]}$ removed. A little more generally, for any ${\theta \in {\bf R}}$ we have the rotated Koebe function

$\displaystyle f(z) := \frac{z}{(1 - e^{i\theta} z)^2} \ \ \ \ \ (2)$

that is a schlicht function that maps ${D(0,1)}$ to the complex numbers with the half-line ${\{ -re^{-i\theta}: r \geq 1/4\}}$ removed.

Every schlicht function ${f: D(0,1) \rightarrow {\bf C}}$ has a convergent Taylor expansion

$\displaystyle f(z) = a_1 z + a_2 z^2 + a_3 z^3 + \dots$

for some complex coefficients ${a_1,a_2,\dots}$ with ${a_1=1}$. For instance, the Koebe function has the expansion

$\displaystyle f(z) = z + 2 z^2 + 3 z^3 + \dots = \sum_{n=1}^\infty n z^n$

and similarly the rotated Koebe function has the expansion

$\displaystyle f(z) = z + 2 e^{i\theta} z^2 + 3 e^{2i\theta} z^3 + \dots = \sum_{n=1}^\infty n e^{(n-1)\theta} z^n.$

Intuitively, the Koebe function and its rotations should be the “largest” schlicht functions available. This is formalised by the famous Bieberbach conjecture, which asserts that for any schlicht function, the coefficients ${a_n}$ should obey the bound ${|a_n| \leq n}$ for all ${n}$. After a large number of partial results, this conjecture was eventually solved by de Branges; see for instance this survey of Korevaar or this survey of Koepf for a history.

It turns out that to resolve these sorts of questions, it is convenient to restrict attention to schlicht functions ${g: D(0,1) \rightarrow {\bf C}}$ that are odd, thus ${g(-z)=-g(z)}$ for all ${z}$, and the Taylor expansion now reads

$\displaystyle g(z) = b_1 z + b_3 z^3 + b_5 z^5 + \dots$

for some complex coefficients ${b_1,b_3,\dots}$ with ${b_1=1}$. One can transform a general schlicht function ${f: D(0,1) \rightarrow {\bf C}}$ to an odd schlicht function ${g: D(0,1) \rightarrow {\bf C}}$ by observing that the function ${f(z^2)/z^2: D(0,1) \rightarrow {\bf C}}$, after removing the singularity at zero, is a non-zero function that equals ${1}$ at the origin, and thus (as ${D(0,1)}$ is simply connected) has a unique holomorphic square root ${(f(z^2)/z^2)^{1/2}}$ that also equals ${1}$ at the origin. If one then sets

$\displaystyle g(z) := z (f(z^2)/z^2)^{1/2} \ \ \ \ \ (3)$

it is not difficult to verify that ${g}$ is an odd schlicht function which additionally obeys the equation

$\displaystyle f(z^2) = g(z)^2. \ \ \ \ \ (4)$

Conversely, given an odd schlicht function ${g}$, the formula (4) uniquely determines a schlicht function ${f}$.

For instance, if ${f}$ is the Koebe function (1), ${g}$ becomes

$\displaystyle g(z) = \frac{z}{1-z^2} = z + z^3 + z^5 + \dots, \ \ \ \ \ (5)$

which maps ${D(0,1)}$ to the complex numbers with two slits ${\{ \pm iy: y > 1/2 \}}$ removed, and if ${f}$ is the rotated Koebe function (2), ${g}$ becomes

$\displaystyle g(z) = \frac{z}{1- e^{i\theta} z^2} = z + e^{i\theta} z^3 + e^{2i\theta} z^5 + \dots. \ \ \ \ \ (6)$

De Branges established the Bieberbach conjecture by first proving an analogous conjecture for odd schlicht functions known as Robertson’s conjecture. More precisely, we have

Theorem 1 (de Branges’ theorem) Let ${n \geq 1}$ be a natural number.

• (i) (Robertson conjecture) If ${g(z) = b_1 z + b_3 z^3 + b_5 z^5 + \dots}$ is an odd schlicht function, then

$\displaystyle \sum_{k=1}^n |b_{2k-1}|^2 \leq n.$

• (ii) (Bieberbach conjecture) If ${f(z) = a_1 z + a_2 z^2 + a_3 z^3 + \dots}$ is a schlicht function, then

$\displaystyle |a_n| \leq n.$

It is easy to see that the Robertson conjecture for a given value of ${n}$ implies the Bieberbach conjecture for the same value of ${n}$. Indeed, if ${f(z) = a_1 z + a_2 z^2 + a_3 z^3 + \dots}$ is schlicht, and ${g(z) = b_1 z + b_3 z^3 + b_5 z^5 + \dots}$ is the odd schlicht function given by (3), then from extracting the ${z^{2n}}$ coefficient of (4) we obtain a formula

$\displaystyle a_n = \sum_{j=1}^n b_{2j-1} b_{2(n+1-j)-1}$

for the coefficients of ${f}$ in terms of the coefficients of ${g}$. Applying the Cauchy-Schwarz inequality, we derive the Bieberbach conjecture for this value of ${n}$ from the Robertson conjecture for the same value of ${n}$. We remark that Littlewood and Paley had conjectured a stronger form ${|b_{2k-1}| \leq 1}$ of Robertson’s conjecture, but this was disproved for ${k=3}$ by Fekete and Szegö.

To prove the Robertson and Bieberbach conjectures, one first takes a logarithm and deduces both conjectures from a similar conjecture about the Taylor coefficients of ${\log \frac{f(z)}{z}}$, known as the Milin conjecture. Next, one continuously enlarges the image ${f(D(0,1))}$ of the schlicht function to cover all of ${{\bf C}}$; done properly, this places the schlicht function ${f}$ as the initial function ${f = f_0}$ in a sequence ${(f_t)_{t \geq 0}}$ of univalent maps ${f_t: D(0,1) \rightarrow {\bf C}}$ known as a Loewner chain. The functions ${f_t}$ obey a useful differential equation known as the Loewner equation, that involves an unspecified forcing term ${\mu_t}$ (or ${\theta(t)}$, in the case that the image is a slit domain) coming from the boundary; this in turn gives useful differential equations for the Taylor coefficients of ${f(z)}$, ${g(z)}$, or ${\log \frac{f(z)}{z}}$. After some elementary calculus manipulations to “integrate” this equations, the Bieberbach, Robertson, and Milin conjectures are then reduced to establishing the non-negativity of a certain explicit hypergeometric function, which is non-trivial to prove (and will not be done here, except for small values of ${n}$) but for which several proofs exist in the literature.

The theory of Loewner chains subsequently became fundamental to a more recent topic in complex analysis, that of the Schramm-Loewner equation (SLE), which is the focus of the next and final set of notes.

We now leave the topic of Riemann surfaces, and turn now to the (loosely related) topic of conformal mapping (and quasiconformal mapping). Recall that a conformal map ${f: U \rightarrow V}$ from an open subset ${U}$ of the complex plane to another open set ${V}$ is a map that is holomorphic and bijective, which (by Rouché’s theorem) also forces the derivative of ${f}$ to be nowhere vanishing. We then say that the two open sets ${U,V}$ are conformally equivalent. From the Cauchy-Riemann equations we see that conformal maps are orientation-preserving and angle-preserving; from the Newton approximation ${f( z_0 + \Delta z) \approx f(z_0) + f'(z_0) \Delta z + O( |\Delta z|^2)}$ we see that they almost preserve small circles, indeed for ${\varepsilon}$ small the circle ${\{ z: |z-z_0| = \varepsilon\}}$ will approximately map to ${\{ w: |w - f(z_0)| = |f'(z_0)| \varepsilon \}}$.

In previous quarters, we proved a fundamental theorem about this concept, the Riemann mapping theorem:

Theorem 1 (Riemann mapping theorem) Let ${U}$ be a simply connected open subset of ${{\bf C}}$ that is not all of ${{\bf C}}$. Then ${U}$ is conformally equivalent to the unit disk ${D(0,1)}$.

This theorem was proven in these 246A lecture notes, using an argument of Koebe. At a very high level, one can sketch Koebe’s proof of the Riemann mapping theorem as follows: among all the injective holomorphic maps ${f: U \rightarrow D(0,1)}$ from ${U}$ to ${D(0,1)}$ that map some fixed point ${z_0 \in U}$ to ${0}$, pick one that maximises the magnitude ${|f'(z_0)|}$ of the derivative (ignoring for this discussion the issue of proving that a maximiser exists). If ${f(U)}$ avoids some point in ${D(0,1)}$, one can compose ${f}$ with various holomorphic maps and use Schwarz’s lemma and the chain rule to increase ${|f'(z_0)|}$ without destroying injectivity; see the previous lecture notes for details. The conformal map ${\phi: U \rightarrow D(0,1)}$ is unique up to Möbius automorphisms of the disk; one can fix the map by picking two distinct points ${z_0,z_1}$ in ${U}$, and requiring ${\phi(z_0)}$ to be zero and ${\phi(z_1)}$ to be positive real.

It is a beautiful observation of Thurston that the concept of a conformal mapping has a discrete counterpart, namely the mapping of one circle packing to another. Furthermore, one can run a version of Koebe’s argument (using now a discrete version of Perron’s method) to prove the Riemann mapping theorem through circle packings. In principle, this leads to a mostly elementary approach to conformal geometry, based on extremely classical mathematics that goes all the way back to Apollonius. However, in order to prove the basic existence and uniqueness theorems of circle packing, as well as the convergence to conformal maps in the continuous limit, it seems to be necessary (or at least highly convenient) to use much more modern machinery, including the theory of quasiconformal mapping, and also the Riemann mapping theorem itself (so in particular we are not structuring these notes to provide a completely independent proof of that theorem, though this may well be possible).

To make the above discussion more precise we need some notation.

Definition 2 (Circle packing) A (finite) circle packing is a finite collection ${(C_j)_{j \in J}}$ of circles ${C_j = \{ z \in {\bf C}: |z-z_j| = r_j\}}$ in the complex numbers indexed by some finite set ${J}$, whose interiors are all disjoint (but which are allowed to be tangent to each other), and whose union is connected. The nerve of a circle packing is the finite graph whose vertices ${\{z_j: j \in J \}}$ are the centres of the circle packing, with two such centres connected by an edge if the circles are tangent. (In these notes all graphs are undirected, finite and simple, unless otherwise specified.)

It is clear that the nerve of a circle packing is connected and planar, since one can draw the nerve by placing each vertex (tautologically) in its location in the complex plane, and drawing each edge by the line segment between the centres of the circles it connects (this line segment will pass through the point of tangency of the two circles). Later in these notes we will also have to consider some infinite circle packings, most notably the infinite regular hexagonal circle packing.

The first basic theorem in the subject is the following converse statement:

Theorem 3 (Circle packing theorem) Every connected planar graph is the nerve of a circle packing.

Of course, there can be multiple circle packings associated to a given connected planar graph; indeed, since reflections across a line and Möbius transformations map circles to circles (or lines), they will map circle packings to circle packings (unless one or more of the circles is sent to a line). It turns out that once one adds enough edges to the planar graph, the circle packing is otherwise rigid:

Theorem 4 (Koebe-Andreev-Thurston theorem) If a connected planar graph is maximal (i.e., no further edge can be added to it without destroying planarity), then the circle packing given by the above theorem is unique up to reflections and Möbius transformations.

Exercise 5 Let ${G}$ be a connected planar graph with ${n \geq 3}$ vertices. Show that the following are equivalent:

• (i) ${G}$ is a maximal planar graph.
• (ii) ${G}$ has ${3n-6}$ edges.
• (iii) Every drawing ${D}$ of ${G}$ divides the plane into faces that have three edges each. (This includes one unbounded face.)
• (iv) At least one drawing ${D}$ of ${G}$ divides the plane into faces that have three edges each.

(Hint: use Euler’s formula ${V-E+F=2}$, where ${F}$ is the number of faces including the unbounded face.)

Thurston conjectured that circle packings can be used to approximate the conformal map arising in the Riemann mapping theorem. Here is an informal statement:

Conjecture 6 (Informal Thurston conjecture) Let ${U}$ be a simply connected domain, with two distinct points ${z_0,z_1}$. Let ${\phi: U \rightarrow D(0,1)}$ be the conformal map from ${U}$ to ${D(0,1)}$ that maps ${z_0}$ to the origin and ${z_1}$ to a positive real. For any small ${\varepsilon>0}$, let ${{\mathcal C}_\varepsilon}$ be the portion of the regular hexagonal circle packing by circles of radius ${\varepsilon}$ that are contained in ${U}$, and let ${{\mathcal C}'_\varepsilon}$ be an circle packing of ${D(0,1)}$ with all “boundary circles” tangent to ${D(0,1)}$, giving rise to an “approximate map” ${\phi_\varepsilon: U_\varepsilon \rightarrow D(0,1)}$ defined on the subset ${U_\varepsilon}$ of ${U}$ consisting of the circles of ${{\mathcal C}_\varepsilon}$, their interiors, and the interstitial regions between triples of mutually tangent circles. Normalise this map so that ${\phi_\varepsilon(z_0)}$ is zero and ${\phi_\varepsilon(z_1)}$ is a positive real. Then ${\phi_\varepsilon}$ converges to ${\phi}$ as ${\varepsilon \rightarrow 0}$.

A rigorous version of this conjecture was proven by Rodin and Sullivan. Besides some elementary geometric lemmas (regarding the relative sizes of various configurations of tangent circles), the main ingredients are a rigidity result for the regular hexagonal circle packing, and the theory of quasiconformal maps. Quasiconformal maps are what seem on the surface to be a very broad generalisation of the notion of a conformal map. Informally, conformal maps take infinitesimal circles to infinitesimal circles, whereas quasiconformal maps take infinitesimal circles to infinitesimal ellipses of bounded eccentricity. In terms of Wirtinger derivatives, conformal maps obey the Cauchy-Riemann equation ${\frac{\partial \phi}{\partial \overline{z}} = 0}$, while (sufficiently smooth) quasiconformal maps only obey an inequality ${|\frac{\partial \phi}{\partial \overline{z}}| \leq \frac{K-1}{K+1} |\frac{\partial \phi}{\partial z}|}$. As such, quasiconformal maps are considerably more plentiful than conformal maps, and in particular it is possible to create piecewise smooth quasiconformal maps by gluing together various simple maps such as affine maps or Möbius transformations; such piecewise maps will naturally arise when trying to rigorously build the map ${\phi_\varepsilon}$ alluded to in the above conjecture. On the other hand, it turns out that quasiconformal maps still have many vestiges of the rigidity properties enjoyed by conformal maps; for instance, there are quasiconformal analogues of fundamental theorems in conformal mapping such as the Schwarz reflection principle, Liouville’s theorem, or Hurwitz’s theorem. Among other things, these quasiconformal rigidity theorems allow one to create conformal maps from the limit of quasiconformal maps in many circumstances, and this will be how the Thurston conjecture will be proven. A key technical tool in establishing these sorts of rigidity theorems will be the theory of an important quasiconformal (quasi-)invariant, the conformal modulus (or, equivalently, the extremal length, which is the reciprocal of the modulus).