You are currently browsing the category archive for the ‘Mathematics’ category.

In this previous blog post I noted the following easy application of Cauchy-Schwarz:

Lemma 1 (Van der Corput inequality) Let {v,u_1,\dots,u_n} be unit vectors in a Hilbert space {H}. Then

\displaystyle  (\sum_{i=1}^n |\langle v, u_i \rangle_H|)^2 \leq \sum_{1 \leq i,j \leq n} |\langle u_i, u_j \rangle_H|.

Proof: The left-hand side may be written as {\langle v, \sum_{i=1}^n \epsilon_i u_i \rangle_H} for some unit complex numbers {\epsilon_i}. By Cauchy-Schwarz we have

\displaystyle  |\langle v, \sum_{i=1}^n \epsilon_i u_i \rangle_H|^2 \leq \langle \sum_{i=1}^n \epsilon_i u_i, \sum_{j=1}^n \epsilon_j u_j \rangle_H

and the claim now follows from the triangle inequality. \Box

As a corollary, correlation becomes transitive in a statistical sense (even though it is not transitive in an absolute sense):

Corollary 2 (Statistical transitivity of correlation) Let {v,u_1,\dots,u_n} be unit vectors in a Hilbert space {H} such that {|\langle v,u_i \rangle_H| \geq \delta} for all {i=1,\dots,n} and some {0 < \delta \leq 1}. Then we have {|\langle u_i, u_j \rangle_H| \geq \delta^2/2} for at least {\delta^2 n^2/2} of the pairs {(i,j) \in \{1,\dots,n\}^2}.

Proof: From the lemma, we have

\displaystyle  \sum_{1 \leq i,j \leq n} |\langle u_i, u_j \rangle_H| \geq \delta^2 n^2.

The contribution of those {i,j} with {|\langle u_i, u_j \rangle_H| < \delta^2/2} is at most {\delta^2 n^2/2}, and all the remaining summands are at most {1}, giving the claim. \Box

One drawback with this corollary is that it does not tell us which pairs {u_i,u_j} correlate. In particular, if the vector {v} also correlates with a separate collection {w_1,\dots,w_n} of unit vectors, the pairs {(i,j)} for which {u_i,u_j} correlate may have no intersection whatsoever with the pairs in which {w_i,w_j} correlate (except of course on the diagonal {i=j} where they must correlate).

While working on an ongoing research project, I recently found that there is a very simple way to get around the latter problem by exploiting the tensor power trick:

Corollary 3 (Simultaneous statistical transitivity of correlation) Let {v, u^k_i} be unit vectors in a Hilbert space for {i=1,\dots,n} and {k=1,\dots,K} such that {|\langle v, u^k_i \rangle_H| \geq \delta_k} for all {i=1,\dots,n}, {k=1,\dots,K} and some {0 < \delta_k \leq 1}. Then there are at least {(\delta_1 \dots \delta_K)^2 n^2/2} pairs {(i,j) \in \{1,\dots,n\}^2} such that {\prod_{k=1}^K |\langle u^k_i, u^k_j \rangle_H| \geq (\delta_1 \dots \delta_K)^2/2}. In particular (by Cauchy-Schwarz) we have {|\langle u^k_i, u^k_j \rangle_H| \geq (\delta_1 \dots \delta_K)^2/2} for all {k}.

Proof: Apply Corollary 2 to the unit vectors {v^{\otimes K}} and {u^1_i \otimes \dots \otimes u^K_i}, {i=1,\dots,n} in the tensor power Hilbert space {H^{\otimes K}}. \Box

It is surprisingly difficult to obtain even a qualitative version of the above conclusion (namely, if {v} correlates with all of the {u^k_i}, then there are many pairs {(i,j)} for which {u^k_i} correlates with {u^k_j} for all {k} simultaneously) without some version of the tensor power trick. For instance, even the powerful Szemerédi regularity lemma, when applied to the set of pairs {i,j} for which one has correlation of {u^k_i}, {u^k_j} for a single {i,j}, does not seem to be sufficient. However, there is a reformulation of the argument using the Schur product theorem as a substitute for (or really, a disguised version of) the tensor power trick. For simplicity of notation let us just work with real Hilbert spaces to illustrate the argument. We start with the identity

\displaystyle  \langle u^k_i, u^k_j \rangle_H = \langle v, u^k_i \rangle_H \langle v, u^k_j \rangle_H + \langle \pi(u^k_i), \pi(u^k_j) \rangle_H

where {\pi} is the orthogonal projection to the complement of {v}. This implies a Gram matrix inequality

\displaystyle  (\langle u^k_i, u^k_j \rangle_H)_{1 \leq i,j \leq n} \succ (\langle v, u^k_i \rangle_H \langle v, u^k_j \rangle_H)_{1 \leq i,j \leq n} \succ 0

for each {k} where {A \succ B} denotes the claim that {A-B} is positive semi-definite. By the Schur product theorem, we conclude that

\displaystyle  (\prod_{k=1}^K \langle u^k_i, u^k_j \rangle_H)_{1 \leq i,j \leq n} \succ (\prod_{k=1}^K \langle v, u^k_i \rangle_H \langle v, u^k_j \rangle_H)_{1 \leq i,j \leq n}

and hence for a suitable choice of signs {\epsilon_1,\dots,\epsilon_n},

\displaystyle  \sum_{1 \leq i, j \leq n} \epsilon_i \epsilon_j \prod_{k=1}^K \langle u^k_i, u^k_j \rangle_H \geq \delta_1^2 \dots \delta_K^2 n^2.

One now argues as in the proof of Corollary 2.

A separate application of tensor powers to amplify correlations was also noted in this previous blog post giving a cheap version of the Kabatjanskii-Levenstein bound, but this seems to not be directly related to this current application.

The (classical) Möbius function {\mu: {\bf N} \rightarrow {\bf Z}} is the unique function that obeys the classical Möbius inversion formula:

Proposition 1 (Classical Möbius inversion) Let {f,g: {\bf N} \rightarrow A} be functions from the natural numbers to an additive group {A}. Then the following two claims are equivalent:
  • (i) {f(n) = \sum_{d|n} g(d)} for all {n \in {\bf N}}.
  • (ii) {g(n) = \sum_{d|n} \mu(n/d) f(d)} for all {n \in {\bf N}}.

There is a generalisation of this formula to (finite) posets, due to Hall, in which one sums over chains {n_0 > \dots > n_k} in the poset:

Proposition 2 (Poset Möbius inversion) Let {{\mathcal N}} be a finite poset, and let {f,g: {\mathcal N} \rightarrow A} be functions from that poset to an additive group {A}. Then the following two claims are equivalent:
  • (i) {f(n) = \sum_{d \leq n} g(d)} for all {n \in {\mathcal N}}, where {d} is understood to range in {{\mathcal N}}.
  • (ii) {g(n) = \sum_{k=0}^\infty (-1)^k \sum_{n = n_0 > n_1 > \dots > n_k} f(n_k)} for all {n \in {\mathcal N}}, where in the inner sum {n_0,\dots,n_k} are understood to range in {{\mathcal N}} with the indicated ordering.
(Note from the finite nature of {{\mathcal N}} that the inner sum in (ii) is vacuous for all but finitely many {k}.)

Comparing Proposition 2 with Proposition 1, it is natural to refer to the function {\mu(d,n) := \sum_{k=0}^\infty (-1)^k \sum_{n = n_0 > n_1 > \dots > n_k = d} 1} as the Möbius function of the poset; the condition (ii) can then be written as

\displaystyle  g(n) = \sum_{d \leq n} \mu(d,n) f(d).

Proof: If (i) holds, then we have

\displaystyle  g(n) = f(n) - \sum_{d<n} g(d) \ \ \ \ \ (1)

for any {n \in {\mathcal N}}. Iterating this we obtain (ii). Conversely, from (ii) and separating out the {k=0} term, and grouping all the other terms based on the value of {d:=n_1}, we obtain (1), and hence (i). \Box

In fact it is not completely necessary that the poset {{\mathcal N}} be finite; an inspection of the proof shows that it suffices that every element {n} of the poset has only finitely many predecessors {\{ d \in {\mathcal N}: d < n \}}.

It is not difficult to see that Proposition 2 includes Proposition 1 as a special case, after verifying the combinatorial fact that the quantity

\displaystyle  \sum_{k=0}^\infty (-1)^k \sum_{d=n_k | n_{k-1} | \dots | n_1 | n_0 = n} 1

is equal to {\mu(n/d)} when {d} divides {n}, and vanishes otherwise.

I recently discovered that Proposition 2 can also lead to a useful variant of the inclusion-exclusion principle. The classical version of this principle can be phrased in terms of indicator functions: if {A_1,\dots,A_\ell} are subsets of some set {X}, then

\displaystyle  \prod_{j=1}^\ell (1-1_{A_j}) = \sum_{k=0}^\ell (-1)^k \sum_{1 \leq j_1 < \dots < j_k \leq \ell} 1_{A_{j_1} \cap \dots \cap A_{j_k}}.

In particular, if there is a finite measure {\nu} on {X} for which {A_1,\dots,A_\ell} are all measurable, we have

\displaystyle  \nu(X \backslash \bigcup_{j=1}^\ell A_j) = \sum_{k=0}^\ell (-1)^k \sum_{1 \leq j_1 < \dots < j_k \leq \ell} \nu( A_{j_1} \cap \dots \cap A_{j_k} ).

One drawback of this formula is that there are exponentially many terms on the right-hand side: {2^\ell} of them, in fact. However, in many cases of interest there are “collisions” between the intersections {A_{j_1} \cap \dots \cap A_{j_k}} (for instance, perhaps many of the pairwise intersections {A_i \cap A_j} agree), in which case there is an opportunity to collect terms and hopefully achieve some cancellation. It turns out that it is possible to use Proposition 2 to do this, in which one only needs to sum over chains in the resulting poset of intersections:

Proposition 3 (Hall-type inclusion-exclusion principle) Let {A_1,\dots,A_\ell} be subsets of some set {X}, and let {{\mathcal N}} be the finite poset formed by intersections of some of the {A_i} (with the convention that {X} is the empty intersection), ordered by set inclusion. Then for any {E \in {\mathcal N}}, one has

\displaystyle  1_E \prod_{F \subsetneq E} (1 - 1_F) = \sum_{k=0}^\ell (-1)^k \sum_{E = E_0 \supsetneq E_1 \supsetneq \dots \supsetneq E_k} 1_{E_k} \ \ \ \ \ (2)

where {F, E_0,\dots,E_k} are understood to range in {{\mathcal N}}. In particular (setting {E} to be the empty intersection) if the {A_j} are all proper subsets of {X} then we have

\displaystyle  \prod_{j=1}^\ell (1-1_{A_j}) = \sum_{k=0}^\ell (-1)^k \sum_{X = E_0 \supsetneq E_1 \supsetneq \dots \supsetneq E_k} 1_{E_k}. \ \ \ \ \ (3)

In particular, if there is a finite measure {\nu} on {X} for which {A_1,\dots,A_\ell} are all measurable, we have

\displaystyle  \mu(X \backslash \bigcup_{j=1}^\ell A_j) = \sum_{k=0}^\ell (-1)^k \sum_{X = E_0 \supsetneq E_1 \supsetneq \dots \supsetneq E_k} \mu(E_k).

Using the Möbius function {\mu} on the poset {{\mathcal N}}, one can write these formulae as

\displaystyle  1_E \prod_{F \subsetneq E} (1 - 1_F) = \sum_{F \subseteq E} \mu(F,E) 1_F,

\displaystyle  \prod_{j=1}^\ell (1-1_{A_j}) = \sum_F \mu(F,X) 1_F

and

\displaystyle  \nu(X \backslash \bigcup_{j=1}^\ell A_j) = \sum_F \mu(F,X) \nu(F).

Proof: It suffices to establish (2) (to derive (3) from (2) observe that all the {F \subsetneq X} are contained in one of the {A_j}, so the effect of {1-1_F} may be absorbed into {1 - 1_{A_j}}). Applying Proposition 2, this is equivalent to the assertion that

\displaystyle  1_E = \sum_{F \subseteq E} 1_F \prod_{G \subsetneq F} (1 - 1_G)

for all {E \in {\mathcal N}}. But this amounts to the assertion that for each {x \in E}, there is precisely one {F \subseteq E} in {{\mathcal n}} with the property that {x \in F} and {x \not \in G} for any {G \subsetneq F} in {{\mathcal N}}, namely one can take {F} to be the intersection of all {G \subseteq E} in {{\mathcal N}} such that {G} contains {x}. \Box

Example 4 If {A_1,A_2,A_3 \subsetneq X} with {A_1 \cap A_2 = A_1 \cap A_3 = A_2 \cap A_3 = A_*}, and {A_1,A_2,A_3,A_*} are all distinct, then we have for any finite measure {\nu} on {X} that makes {A_1,A_2,A_3} measurable that

\displaystyle  \nu(X \backslash (A_1 \cup A_2 \cup A_3)) = \nu(X) - \nu(A_1) - \nu(A_2) \ \ \ \ \ (4)

\displaystyle  - \nu(A_3) - \nu(A_*) + 3 \nu(A_*)

due to the four chains {X \supsetneq A_1}, {X \supsetneq A_2}, {X \supsetneq A_3}, {X \supsetneq A_*} of length one, and the three chains {X \supsetneq A_1 \supsetneq A_*}, {X \supsetneq A_2 \supsetneq A_*}, {X \supsetneq A_3 \supsetneq A_*} of length two. Note that this expansion just has six terms in it, as opposed to the {2^3=8} given by the usual inclusion-exclusion formula, though of course one can reduce the number of terms by combining the {\nu(A_*)} factors. This may not seem particularly impressive, especially if one views the term {3 \mu(A_*)} as really being three terms instead of one, but if we add a fourth set {A_4 \subsetneq X} with {A_i \cap A_j = A_*} for all {1 \leq i < j \leq 4}, the formula now becomes

\displaystyle  \nu(X \backslash (A_1 \cup A_2 \cup A_3 \cap A_4)) = \nu(X) - \nu(A_1) - \nu(A_2) \ \ \ \ \ (5)

\displaystyle  - \nu(A_3) - \nu(A_4) - \nu(A_*) + 4 \nu(A_*)

and we begin to see more cancellation as we now have just seven terms (or ten if we count {4 \nu(A_*)} as four terms) instead of {2^4 = 16} terms.

Example 5 (Variant of Legendre sieve) If {q_1,\dots,q_\ell > 1} are natural numbers, and {a_1,a_2,\dots} is some sequence of complex numbers with only finitely many terms non-zero, then by applying the above proposition to the sets {A_j := q_j {\bf N}} and with {\nu} equal to counting measure weighted by the {a_n} we obtain a variant of the Legendre sieve

\displaystyle  \sum_{n: (n,q_1 \dots q_\ell) = 1} a_n = \sum_{k=0}^\ell (-1)^k \sum_{1 |' d_1 |' \dots |' d_k} \sum_{n: d_k |n} a_n

where {d_1,\dots,d_k} range over the set {{\mathcal N}} formed by taking least common multiples of the {q_j} (with the understanding that the empty least common multiple is {1}), and {d |' n} denotes the assertion that {d} divides {n} but is strictly less than {n}. I am curious to know of this version of the Legendre sieve already appears in the literature (and similarly for the other applications of Proposition 2 given here).

If the poset {{\mathcal N}} has bounded depth then the number of terms in Proposition 3 can end up being just polynomially large in {\ell} rather than exponentially large. Indeed, if all chains {X \supsetneq E_1 \supsetneq \dots \supsetneq E_k} in {{\mathcal N}} have length {k} at most {k_0} then the number of terms here is at most {1 + \ell + \dots + \ell^{k_0}}. (The examples (4), (5) are ones in which the depth is equal to two.) I hope to report in a later post on how this version of inclusion-exclusion with polynomially many terms can be useful in an application.

Actually in our application we need an abstraction of the above formula, in which the indicator functions are replaced by more abstract idempotents:

Proposition 6 (Hall-type inclusion-exclusion principle for idempotents) Let {A_1,\dots,A_\ell} be pairwise commuting elements of some ring {R} with identity, which are all idempotent (thus {A_j A_j = A_j} for {j=1,\dots,\ell}). Let {{\mathcal N}} be the finite poset formed by products of the {A_i} (with the convention that {1} is the empty product), ordered by declaring {E \leq F} when {EF = E} (note that all the elements of {{\mathcal N}} are idempotent so this is a partial ordering). Then for any {E \in {\mathcal N}}, one has

\displaystyle  E \prod_{F < E} (1-F) = \sum_{k=0}^\ell (-1)^k \sum_{E = E_0 > E_1 > \dots > E_k} E_k. \ \ \ \ \ (6)

where {F, E_0,\dots,E_k} are understood to range in {{\mathcal N}}. In particular (setting {E=1}) if all the {A_j} are not equal to {1} then we have

\displaystyle  \prod_{j=1}^\ell (1-A_j) = \sum_{k=0}^\ell (-1)^k \sum_{1 = E_0 > E_1 > \dots > E_k} E_k.

Morally speaking this proposition is equivalent to the previous one after applying a “spectral theorem” to simultaneously diagonalise all of the {A_j}, but it is quicker to just adapt the previous proof to establish this proposition directly. Using the Möbius function {\mu} for {{\mathcal N}}, we can rewrite these formulae as

\displaystyle  E \prod_{F < E} (1-F) = \sum_{F \leq E} \mu(F,E) 1_F

and

\displaystyle  \prod_{j=1}^\ell (1-A_j) = \sum_F \mu(F,1) 1_F.

Proof: Again it suffices to verify (6). Using Proposition 2 as before, it suffices to show that

\displaystyle  E = \sum_{F \leq E} F \prod_{G < F} (1 - G) \ \ \ \ \ (7)

for all {E \in {\mathcal N}} (all sums and products are understood to range in {{\mathcal N}}). We can expand

\displaystyle  E = E \prod_{G < E} (G + (1-G)) = \sum_{{\mathcal A}} (\prod_{G \in {\mathcal A}} G) (\prod_{G < E: G \not \in {\mathcal A}} (1-G)) \ \ \ \ \ (8)

where {{\mathcal A}} ranges over all subsets of {\{ G \in {\mathcal N}: G \leq E \}} that contain {E}. For such an {{\mathcal A}}, if we write {F := \prod_{G \in {\mathcal A}} G}, then {F} is the greatest lower bound of {{\mathcal A}}, and we observe that {F (\prod_{G < E: G \not \in {\mathcal A}} (1-G))} vanishes whenever {{\mathcal A}} fails to contain some {G \in {\mathcal N}} with {F \leq G \leq E}. Thus the only {{\mathcal A}} that give non-zero contributions to (8) are the intervals of the form {\{ G \in {\mathcal N}: F \leq G \leq E\}} for some {F \leq E} (which then forms the greatest lower bound for that interval), and the claim (7) follows (after noting that {F (1-G) = F (1-FG)} for any {F,G \in {\mathcal N}}). \Box

Previous set of notes: Notes 3. Next set of notes: 246C Notes 1.

One of the great classical triumphs of complex analysis was in providing the first complete proof (by Hadamard and de la Vallée Poussin in 1896) of arguably the most important theorem in analytic number theory, the prime number theorem:

Theorem 1 (Prime number theorem) Let {\pi(x)} denote the number of primes less than a given real number {x}. Then

\displaystyle  \lim_{x \rightarrow \infty} \frac{\pi(x)}{x/\ln x} = 1

(or in asymptotic notation, {\pi(x) = (1+o(1)) \frac{x}{\ln x}} as {x \rightarrow \infty}).

(Actually, it turns out to be slightly more natural to replace the approximation {\frac{x}{\ln x}} in the prime number theorem by the logarithmic integral {\int_2^x \frac{dt}{\ln t}}, which turns out to be a more precise approximation, but we will not stress this point here.)

The complex-analytic proof of this theorem hinges on the study of a key meromorphic function related to the prime numbers, the Riemann zeta function {\zeta}. Initially, it is only defined on the half-plane {\{ s \in {\bf C}: \mathrm{Re} s > 1 \}}:

Definition 2 (Riemann zeta function, preliminary definition) Let {s \in {\bf C}} be such that {\mathrm{Re} s > 1}. Then we define

\displaystyle  \zeta(s) := \sum_{n=1}^\infty \frac{1}{n^s}. \ \ \ \ \ (1)

Note that the series is locally uniformly convergent in the half-plane {\{ s \in {\bf C}: \mathrm{Re} s > 1 \}}, so in particular {\zeta} is holomorphic on this region. In previous notes we have already evaluated some special values of this function:

\displaystyle  \zeta(2) = \frac{\pi^2}{6}; \quad \zeta(4) = \frac{\pi^4}{90}; \quad \zeta(6) = \frac{\pi^6}{945}. \ \ \ \ \ (2)

However, it turns out that the zeroes (and pole) of this function are of far greater importance to analytic number theory, particularly with regards to the study of the prime numbers.

The Riemann zeta function has several remarkable properties, some of which we summarise here:

Theorem 3 (Basic properties of the Riemann zeta function)
  • (i) (Euler product formula) For any {s \in {\bf C}} with {\mathrm{Re} s > 1}, we have

    \displaystyle  \zeta(s) = \prod_p (1 - \frac{1}{p^s})^{-1} \ \ \ \ \ (3)

    where the product is absolutely convergent (and locally uniform in {s}) and is over the prime numbers {p = 2, 3, 5, \dots}.
  • (ii) (Trivial zero-free region) {\zeta(s)} has no zeroes in the region {\{s: \mathrm{Re}(s) > 1 \}}.
  • (iii) (Meromorphic continuation) {\zeta} has a unique meromorphic continuation to the complex plane (which by abuse of notation we also call {\zeta}), with a simple pole at {s=1} and no other poles. Furthermore, the Riemann xi function

    \displaystyle  \xi(s) := \frac{1}{2} s(s-1) \pi^{-s/2} \Gamma(s/2) \zeta(s) \ \ \ \ \ (4)

    is an entire function of order {1} (after removing all singularities). The function {(s-1) \zeta(s)} is an entire function of order one after removing the singularity at {s=1}.
  • (iv) (Functional equation) After applying the meromorphic continuation from (iii), we have

    \displaystyle  \zeta(s) = 2^s \pi^{s-1} \sin(\frac{\pi s}{2}) \Gamma(1-s) \zeta(1-s) \ \ \ \ \ (5)

    for all {s \in {\bf C}} (excluding poles). Equivalently, we have

    \displaystyle  \xi(s) = \xi(1-s) \ \ \ \ \ (6)

    for all {s \in {\bf C}}. (The equivalence between the (5) and (6) is a routine consequence of the Euler reflection formula and the Legendre duplication formula, see Exercises 26 and 31 of Notes 1.)

Proof: We just prove (i) and (ii) for now, leaving (iii) and (iv) for later sections.

The claim (i) is an encoding of the fundamental theorem of arithmetic, which asserts that every natural number {n} is uniquely representable as a product {n = \prod_p p^{a_p}} over primes, where the {a_p} are natural numbers, all but finitely many of which are zero. Writing this representation as {\frac{1}{n^s} = \prod_p \frac{1}{p^{a_p s}}}, we see that

\displaystyle  \sum_{n \in S_{x,m}} \frac{1}{n^s} = \prod_{p \leq x} \sum_{a=0}^m \frac{1}{p^{as}}

whenever {x \geq 1}, {m \geq 0}, and {S_{x,m}} consists of all the natural numbers of the form {n = \prod_{p \leq x} p^{a_p}} for some {a_p \leq m}. Sending {m} and {x} to infinity, we conclude from monotone convergence and the geometric series formula that

\displaystyle  \sum_{n=1}^\infty \frac{1}{n^s} = \prod_{p} \sum_{a=0}^\infty \frac{1}{p^{s}} =\prod_p (1 - \frac{1}{p^s})^{-1}

whenever {s>1} is real, and then from dominated convergence we see that the same formula holds for complex {s} with {\mathrm{Re} s > 1} as well. Local uniform convergence then follows from the product form of the Weierstrass {M}-test (Exercise 19 of Notes 1).

The claim (ii) is immediate from (i) since the Euler product {\prod_p (1-\frac{1}{p^s})^{-1}} is absolutely convergent and all terms are non-zero. \Box

We remark that by sending {s} to {1} in Theorem 3(i) we conclude that

\displaystyle  \sum_{n=1}^\infty \frac{1}{n} = \prod_p (1-\frac{1}{p})^{-1}

and from the divergence of the harmonic series we then conclude Euler’s theorem {\sum_p \frac{1}{p} = \infty}. This can be viewed as a weak version of the prime number theorem, and already illustrates the potential applicability of the Riemann zeta function to control the distribution of the prime numbers.

The meromorphic continuation (iii) of the zeta function is initially surprising, but can be interpreted either as a manifestation of the extremely regular spacing of the natural numbers {n} occurring in the sum (1), or as a consequence of various integral representations of {\zeta} (or slight modifications thereof). We will focus in this set of notes on a particular representation of {\zeta} as essentially the Mellin transform of the theta function {\theta} that briefly appeared in previous notes, and the functional equation (iv) can then be viewed as a consequence of the modularity of that theta function. This in turn was established using the Poisson summation formula, so one can view the functional equation as ultimately being a manifestation of Poisson summation. (For a direct proof of the functional equation via Poisson summation, see these notes.)

Henceforth we work with the meromorphic continuation of {\zeta}. The functional equation (iv), when combined with special values of {\zeta} such as (2), gives some additional values of {\zeta} outside of its initial domain {\{s: \mathrm{Re} s > 1\}}, most famously

\displaystyle  \zeta(-1) = -\frac{1}{12}.

If one formally compares this formula with (1), one arrives at the infamous identity

\displaystyle  1 + 2 + 3 + \dots = -\frac{1}{12}

although this identity has to be interpreted in a suitable non-classical sense in order for it to be rigorous (see this previous blog post for further discussion).

From Theorem 3 and the non-vanishing nature of {\Gamma}, we see that {\zeta} has simple zeroes (known as trivial zeroes) at the negative even integers {-2, -4, \dots}, and all other zeroes (the non-trivial zeroes) inside the critical strip {\{ s \in {\bf C}: 0 \leq \mathrm{Re} s \leq 1 \}}. (The non-trivial zeroes are conjectured to all be simple, but this is hopelessly far from being proven at present.) As we shall see shortly, these latter zeroes turn out to be closely related to the distribution of the primes. The functional equation tells us that if {\rho} is a non-trivial zero then so is {1-\rho}; also, we have the identity

\displaystyle  \zeta(s) = \overline{\zeta(\overline{s})} \ \ \ \ \ (7)

for all {s>1} by (1), hence for all {s} (except the pole at {s=1}) by meromorphic continuation. Thus if {\rho} is a non-trivial zero then so is {\overline{\rho}}. We conclude that the set of non-trivial zeroes is symmetric by reflection by both the real axis and the critical line {\{ s \in {\bf C}: \mathrm{Re} s = \frac{1}{2} \}}. We have the following infamous conjecture:

Conjecture 4 (Riemann hypothesis) All the non-trivial zeroes of {\zeta} lie on the critical line {\{ s \in {\bf C}: \mathrm{Re} s = \frac{1}{2} \}}.

This conjecture would have many implications in analytic number theory, particularly with regard to the distribution of the primes. Of course, it is far from proven at present, but the partial results we have towards this conjecture are still sufficient to establish results such as the prime number theorem.

Return now to the original region where {\mathrm{Re} s > 1}. To take more advantage of the Euler product formula (3), we take complex logarithms to conclude that

\displaystyle  -\log \zeta(s) = \sum_p \log(1 - \frac{1}{p^s})

for suitable branches of the complex logarithm, and then on taking derivatives (using for instance the generalised Cauchy integral formula and Fubini’s theorem to justify the interchange of summation and derivative) we see that

\displaystyle  -\frac{\zeta'(s)}{\zeta(s)} = \sum_p \frac{\ln p/p^s}{1 - \frac{1}{p^s}}.

From the geometric series formula we have

\displaystyle  \frac{\ln p/p^s}{1 - \frac{1}{p^s}} = \sum_{j=1}^\infty \frac{\ln p}{p^{js}}

and so (by another application of Fubini’s theorem) we have the identity

\displaystyle  -\frac{\zeta'(s)}{\zeta(s)} = \sum_{n=1}^\infty \frac{\Lambda(n)}{n^s}, \ \ \ \ \ (8)

for {\mathrm{Re} s > 1}, where the von Mangoldt function {\Lambda(n)} is defined to equal {\Lambda(n) = \ln p} whenever {n = p^j} is a power {p^j} of a prime {p} for some {j=1,2,\dots}, and {\Lambda(n)=0} otherwise. The contribution of the higher prime powers {p^2, p^3, \dots} is negligible in practice, and as a first approximation one can think of the von Mangoldt function as the indicator function of the primes, weighted by the logarithm function.

The series {\sum_{n=1}^\infty \frac{1}{n^s}} and {\sum_{n=1}^\infty \frac{\Lambda(n)}{n^s}} that show up in the above formulae are examples of Dirichlet series, which are a convenient device to transform various sequences of arithmetic interest into holomorphic or meromorphic functions. Here are some more examples:

Exercise 5 (Standard Dirichlet series) Let {s} be a complex number with {\mathrm{Re} s > 1}.
  • (i) Show that {-\zeta'(s) = \sum_{n=1}^\infty \frac{\ln n}{n^s}}.
  • (ii) Show that {\zeta^2(s) = \sum_{n=1}^\infty \frac{\tau(n)}{n^s}}, where {\tau(n) := \sum_{d|n} 1} is the divisor function of {n} (the number of divisors of {n}).
  • (iii) Show that {\frac{1}{\zeta(s)} = \sum_{n=1}^\infty \frac{\mu(n)}{n^s}}, where {\mu(n)} is the Möbius function, defined to equal {(-1)^k} when {n} is the product of {k} distinct primes for some {k \geq 0}, and {0} otherwise.
  • (iv) Show that {\frac{\zeta(2s)}{\zeta(s)} = \sum_{n=1}^\infty \frac{\lambda(n)}{n^s}}, where {\lambda(n)} is the Liouville function, defined to equal {(-1)^k} when {n} is the product of {k} (not necessarily distinct) primes for some {k \geq 0}.
  • (v) Show that {\log \zeta(s) = \sum_{n=1}^\infty \frac{\Lambda(n)/\ln n}{n^s}}, where {\log \zeta} is the holomorphic branch of the logarithm that is real for {s>1}, and with the convention that {\Lambda(n)/\ln n} vanishes for {n=1}.
  • (vi) Use the fundamental theorem of arithmetic to show that the von Mangoldt function is the unique function {\Lambda: {\bf N} \rightarrow {\bf R}} such that

    \displaystyle  \ln n = \sum_{d|n} \Lambda(d)

    for every positive integer {n}. Use this and (i) to provide an alternate proof of the identity (8). Thus we see that (8) is really just another encoding of the fundamental theorem of arithmetic.

Given the appearance of the von Mangoldt function {\Lambda}, it is natural to reformulate the prime number theorem in terms of this function:

Theorem 6 (Prime number theorem, von Mangoldt form) One has

\displaystyle  \lim_{x \rightarrow \infty} \frac{1}{x} \sum_{n \leq x} \Lambda(n) = 1

(or in asymptotic notation, {\sum_{n\leq x} \Lambda(n) = x + o(x)} as {x \rightarrow \infty}).

Let us see how Theorem 6 implies Theorem 1. Firstly, for any {x \geq 2}, we can write

\displaystyle  \sum_{n \leq x} \Lambda(n) = \sum_{p \leq x} \ln p + \sum_{j=2}^\infty \sum_{p \leq x^{1/j}} \ln p.

The sum {\sum_{p \leq x^{1/j}} \ln p} is non-zero for only {O(\ln x)} values of {j}, and is of size {O( x^{1/2} \ln x )}, thus

\displaystyle  \sum_{n \leq x} \Lambda(n) = \sum_{p \leq x} \ln p + O( x^{1/2} \ln^2 x ).

Since {x^{1/2} \ln^2 x = o(x)}, we conclude from Theorem 6 that

\displaystyle  \sum_{p \leq x} \ln p = x + o(x)

as {x \rightarrow \infty}. Next, observe from the fundamental theorem of calculus that

\displaystyle  \frac{1}{\ln p} - \frac{1}{\ln x} = \int_p^x \frac{1}{\ln^2 y} \frac{dy}{y}.

Multiplying by {\log p} and summing over all primes {p \leq x}, we conclude that

\displaystyle  \pi(x) - \frac{\sum_{p \leq x} \ln p}{\ln x} = \int_2^x \sum_{p \leq y} \ln p \frac{1}{\ln^2 y} \frac{dy}{y}.

From Theorem 6 we certainly have {\sum_{p \leq y} \ln p = O(y)}, thus

\displaystyle  \pi(x) - \frac{x + o(x)}{\ln x} = O( \int_2^x \frac{dy}{\ln^2 y} ).

By splitting the integral into the ranges {2 \leq y \leq \sqrt{x}} and {\sqrt{x} < y \leq x} we see that the right-hand side is {o(x/\ln x)}, and Theorem 1 follows.

Exercise 7 Show that Theorem 1 conversely implies Theorem 6.

The alternate form (8) of the Euler product identity connects the primes (represented here via proxy by the von Mangoldt function) with the logarithmic derivative of the zeta function, and can be used as a starting point for describing further relationships between {\zeta} and the primes. Most famously, we shall see later in these notes that it leads to the remarkably precise Riemann-von Mangoldt explicit formula:

Theorem 8 (Riemann-von Mangoldt explicit formula) For any non-integer {x > 1}, we have

\displaystyle  \sum_{n \leq x} \Lambda(n) = x - \lim_{T \rightarrow \infty} \sum_{\rho: |\hbox{Im}(\rho)| \leq T} \frac{x^\rho}{\rho} - \ln(2\pi) - \frac{1}{2} \ln( 1 - x^{-2} )

where {\rho} ranges over the non-trivial zeroes of {\zeta} with imaginary part in {[-T,T]}. Furthermore, the convergence of the limit is locally uniform in {x}.

Actually, it turns out that this formula is in some sense too precise; in applications it is often more convenient to work with smoothed variants of this formula in which the sum on the left-hand side is smoothed out, but the contribution of zeroes with large imaginary part is damped; see Exercise 22. Nevertheless, this formula clearly illustrates how the non-trivial zeroes {\rho} of the zeta function influence the primes. Indeed, if one formally differentiates the above formula in {x}, one is led to the (quite nonrigorous) approximation

\displaystyle  \Lambda(n) \approx 1 - \sum_\rho n^{\rho-1} \ \ \ \ \ (9)

or (writing {\rho = \sigma+i\gamma})

\displaystyle  \Lambda(n) \approx 1 - \sum_{\sigma+i\gamma} \frac{n^{i\gamma}}{n^{1-\sigma}}.

Thus we see that each zero {\rho = \sigma + i\gamma} induces an oscillation in the von Mangoldt function, with {\gamma} controlling the frequency of the oscillation and {\sigma} the rate to which the oscillation dies out as {n \rightarrow \infty}. This relationship is sometimes known informally as “the music of the primes”.

Comparing Theorem 8 with Theorem 6, it is natural to suspect that the key step in the proof of the latter is to establish the following slight but important extension of Theorem 3(ii), which can be viewed as a very small step towards the Riemann hypothesis:

Theorem 9 (Slight enlargement of zero-free region) There are no zeroes of {\zeta} on the line {\{ 1+it: t \in {\bf R} \}}.

It is not quite immediate to see how Theorem 6 follows from Theorem 8 and Theorem 9, but we will demonstrate it below the fold.

Although Theorem 9 only seems like a slight improvement of Theorem 3(ii), proving it is surprisingly non-trivial. The basic idea is the following: if there was a zero at {1+it}, then there would also be a different zero at {1-it} (note {t} cannot vanish due to the pole at {s=1}), and then the approximation (9) becomes

\displaystyle  \Lambda(n) \approx 1 - n^{it} - n^{-it} + \dots = 1 - 2 \cos(t \log n) + \dots.

But the expression {1 - 2 \cos(t \log n)} can be negative for large regions of the variable {n}, whereas {\Lambda(n)} is always non-negative. This conflict eventually leads to a contradiction, but it is not immediately obvious how to make this argument rigorous. We will present here the classical approach to doing so using a trigonometric identity of Mertens.

In fact, Theorem 9 is basically equivalent to the prime number theorem:

Exercise 10 For the purposes of this exercise, assume Theorem 6, but do not assume Theorem 9. For any non-zero real {t}, show that

\displaystyle  -\frac{\zeta'(\sigma+it)}{\zeta(\sigma+it)} = o( \frac{1}{\sigma-1})

as {\sigma \rightarrow 1^+}, where {o( \frac{1}{\sigma-1})} denotes a quantity that goes to zero as {\sigma \rightarrow 1^+} after being multiplied by {\sigma-1}. Use this to derive Theorem 9.

This equivalence can help explain why the prime number theorem is remarkably non-trivial to prove, and why the Riemann zeta function has to be either explicitly or implicitly involved in the proof.

This post is only intended as the briefest of introduction to complex-analytic methods in analytic number theory; also, we have not chosen the shortest route to the prime number theorem, electing instead to travel in directions that particularly showcase the complex-analytic results introduced in this course. For some further discussion see this previous set of lecture notes, particularly Notes 2 and Supplement 3 (with much of the material in this post drawn from the latter).

Read the rest of this entry »

Previous set of notes: Notes 2. Next set of notes: Notes 4.

On the real line, the quintessential examples of a periodic function are the (normalised) sine and cosine functions {\sin(2\pi x)}, {\cos(2\pi x)}, which are {1}-periodic in the sense that

\displaystyle  \sin(2\pi(x+1)) = \sin(2\pi x); \quad \cos(2\pi (x+1)) = \cos(2\pi x).

By taking various polynomial combinations of {\sin(2\pi x)} and {\cos(2\pi x)} we obtain more general trigonometric polynomials that are {1}-periodic; and the theory of Fourier series tells us that all other {1}-periodic functions (with reasonable integrability conditions) can be approximated in various senses by such polynomial combinations. Using Euler’s identity, one can use {e^{2\pi ix}} and {e^{-2\pi ix}} in place of {\sin(2\pi x)} and {\cos(2\pi x)} as the basic generating functions here, provided of course one is willing to use complex coefficients instead of real ones. Of course, by rescaling one can also make similar statements for other periods than {1}. {1}-periodic functions {f: {\bf R} \rightarrow {\bf C}} can also be identified (by abuse of notation) with functions {f: {\bf R}/{\bf Z} \rightarrow {\bf C}} on the quotient space {{\bf R}/{\bf Z}} (known as the additive {1}-torus or additive unit circle), or with functions {f: [0,1] \rightarrow {\bf C}} on the fundamental domain (up to boundary) {[0,1]} of that quotient space with the periodic boundary condition {f(0)=f(1)}. The map {x \mapsto (\cos(2\pi x), \sin(2\pi x))} also identifies the additive unit circle {{\bf R}/{\bf Z}} with the geometric unit circle {S^1 = \{ (x,y) \in {\bf R}^2: x^2+y^2=1\} \subset {\bf R}^2}, thanks in large part to the fundamental trigonometric identity {\cos^2 x + \sin^2 x = 1}; this can also be identified with the multiplicative unit circle {S^1 = \{ z \in {\bf C}: |z|=1 \}}. (Usually by abuse of notation we refer to all of these three sets simultaneously as the “unit circle”.) Trigonometric polynomials on the additive unit circle then correspond to ordinary polynomials of the real coefficients {x,y} of the geometric unit circle, or Laurent polynomials of the complex variable {z}.

What about periodic functions on the complex plane? We can start with singly periodic functions {f: {\bf C} \rightarrow {\bf C}} which obey a periodicity relationship {f(z+\omega)=f(z)} for all {z} in the domain and some period {\omega \in {\bf C} \backslash \{0\}}; such functions can also be viewed as functions on the “additive cylinder” {\omega {\bf Z} \backslash {\bf C}} (or equivalently {{\bf C} / \omega {\bf Z}}). We can rescale {\omega=1} as before. For holomorphic functions, we have the following characterisations:

Proposition 1 (Description of singly periodic holomorphic functions)
  • (i) Every {1}-periodic entire function {f: {\bf C} \rightarrow {\bf C}} has an absolutely convergent expansion

    \displaystyle  f(z) = \sum_{n=-\infty}^\infty a_n e^{2\pi i nz} = \sum_{n=-\infty}^\infty a_n q^n \ \ \ \ \ (1)

    where {q} is the nome {q := e^{2\pi i z}}, and the {a_n} are complex coefficients such that

    \displaystyle  \limsup_{n \rightarrow +\infty} |a_n|^{1/n} = \limsup_{n \rightarrow +\infty} |a_{-n}|^{1/n} = 0. \ \ \ \ \ (2)

    Conversely, every doubly infinite sequence {(a_n)_{n \in {\bf Z}}} of coefficients obeying (2) gives rise to a {1}-periodic entire function {f: {\bf C} \rightarrow {\bf C}} via the formula (1).
  • (ii) Every bounded {1}-periodic holomorphic function {f: {\bf H} \rightarrow {\bf C}} on the upper half-plane {\{ z: \mathrm{Im}(z) > 0\}} has an expansion

    \displaystyle  f(z) = \sum_{n=0}^\infty a_n e^{2\pi i nz} = \sum_{n=0}^\infty a_n q^n \ \ \ \ \ (3)

    where the {a_n} are complex coefficients such that

    \displaystyle  \limsup_{n \rightarrow +\infty} |a_n|^{1/n} \leq 1. \ \ \ \ \ (4)

    Conversely, every infinite sequence {(a_n)_{n \in {\bf Z}}} obeying (4) gives rise to a {1}-periodic holomorphic function {f: {\bf H} \rightarrow {\bf C}} which is bounded away from the real axis (i.e., bounded on {\{ z: \mathrm{Im}(z) \geq \varepsilon\}} for every {\varepsilon > 0}).
In both cases, the coefficients {a_n} can be recovered from {f} by the Fourier inversion formula

\displaystyle  a_n = \int_{\gamma_{z_0 \rightarrow z_0+1}} f(z) e^{-2\pi i nz}\ dz \ \ \ \ \ (5)

for any {z_0} in {{\bf C}} (in case (i)) or {{\bf H}} (in case (ii)).

Proof: If {f: {\bf C} \rightarrow {\bf C}} is {1}-periodic, then it can be expressed as {f(z) = F(q) = F(e^{2\pi i z})} for some function {F: {\bf C} \backslash \{0\} \rightarrow {\bf C}} on the “multiplicative cylinder” {{\bf C} \backslash \{0\}}, since the fibres of the map {z \mapsto e^{2\pi i z}} are cosets of the integers {{\bf Z}}, on which {f} is constant by hypothesis. As the map {z \mapsto e^{2\pi i z}} is a covering map from {{\bf C}} to {{\bf C} \backslash \{0\}}, we see that {F} will be holomorphic if and only if {f} is. Thus {F} must have a Laurent series expansion {F(q) = \sum_{n=-\infty}^\infty a_n q^n} with coefficients {a_n} obeying (2), which gives (1), and the inversion formula (5) follows from the usual contour integration formula for Laurent series coefficients. The converse direction to (i) also follows by reversing the above arguments.

For part (ii), we observe that the map {z \mapsto e^{2\pi i z}} is also a covering map from {{\bf H}} to the punctured disk {D(0,1) \backslash \{0\}}, so we can argue as before except that now {F} is a bounded holomorphic function on the punctured disk. By the Riemann singularity removal theorem (Exercise 35 of 246A Notes 3) {F} extends to be holomorphic on all of {D(0,1)}, and thus has a Taylor expansion {F(q) = \sum_{n=0}^\infty a_n q^n} for some coefficients {a_n} obeying (4). The argument now proceeds as with part (i). \Box

The additive cylinder {{\bf Z} \backslash {\bf C}} and the multiplicative cylinder {{\bf C} \backslash \{0\}} can both be identified (on the level of smooth manifolds, at least) with the geometric cylinder {\{ (x,y,z) \in {\bf R}^3: x^2+y^2=1\}}, but we will not use this identification here.

Now let us turn attention to doubly periodic functions of a complex variable {z}, that is to say functions {f} that obey two periodicity relations

\displaystyle  f(z+\omega_1) = f(z); \quad f(z+\omega_2) = f(z)

for all {z \in {\bf C}} and some periods {\omega_1,\omega_2 \in {\bf C}}, which to avoid degeneracies we will assume to be linearly independent over the reals (thus {\omega_1,\omega_2} are non-zero and the ratio {\omega_2/\omega_1} is not real). One can rescale {\omega_1,\omega_2} by a common scaling factor {\lambda \in {\bf C} \backslash \{0\}} to normalise either {\omega_1=1} or {\omega_2=1}, but one of course cannot simultaneously normalise both parameters in this fashion. As in the singly periodic case, such functions can also be identified with functions on the additive {2}-torus {\Lambda \backslash {\bf C}}, where {\Lambda} is the lattice {\Lambda := \omega_1 {\bf Z} + \omega_2 {\bf Z}}, or with functions {f} on the solid parallelogram bounded by the contour {\gamma_{0 \rightarrow \omega_1 \rightarrow \omega_1+\omega_2 \rightarrow \omega_2 \rightarrow 0}} (a fundamental domain up to boundary for that torus), obeying the boundary periodicity conditions

\displaystyle  f(z+\omega_1) = f(z)

for {z} in the edge {\gamma_{\omega_2 \rightarrow 0}}, and

\displaystyle  f(z+\omega_2) = f(z)

for {z} in the edge {\gamma_{\omega_0 \rightarrow 1}}.

Within the world of holomorphic functions, the collection of doubly periodic functions is boring:

Proposition 2 Let {f: {\bf C} \rightarrow {\bf C}} be an entire doubly periodic function (with periods {\omega_1,\omega_2} linearly independent over {{\bf R}}). Then {f} is constant.

In the language of Riemann surfaces, this proposition asserts that the torus {\Lambda \backslash {\bf C}} is a non-hyperbolic Riemann surface; it cannot be holomorphically mapped non-trivially into a bounded subset of the complex plane.

Proof: The fundamental domain (up to boundary) enclosed by {\gamma_{0 \rightarrow \omega_1 \rightarrow \omega_1+\omega_2 \rightarrow \omega_2 \rightarrow 0}} is compact, hence {f} is bounded on this domain, hence bounded on all of {{\bf C}} by double periodicity. The claim now follows from Liouville’s theorem. (One could alternatively have argued here using the compactness of the torus {(\omega_1 {\bf Z} + \omega_2 {\bf Z}) \backslash {\bf C}}. \Box

To obtain more interesting examples of doubly periodic functions, one must therefore turn to the world of meromorphic functions – or equivalently, holomorphic functions into the Riemann sphere {{\bf C} \cup \{\infty\}}. As it turns out, a particularly fundamental example of such a function is the Weierstrass elliptic function

\displaystyle  \wp(z) := \frac{1}{z^2} + \sum_{z_0 \in \Lambda \backslash 0} \frac{1}{(z-z_0)^2} - \frac{1}{z_0^2} \ \ \ \ \ (6)

which plays a role in doubly periodic functions analogous to the role of {x \mapsto \cos(2\pi x)} for {1}-periodic real functions. This function will have a double pole at the origin {0}, and more generally at all other points on the lattice {\Lambda}, but no other poles. The derivative

\displaystyle  \wp'(z) = -2 \sum_{z_0 \in \Lambda} \frac{1}{(z-z_0)^3} \ \ \ \ \ (7)

of the Weierstrass function is another doubly periodic meromorphic function, now with a triple pole at every point of {\Lambda}, and plays a role analogous to {x \mapsto \sin(2\pi x)}. Remarkably, all the other doubly periodic meromorphic functions with these periods will turn out to be rational combinations of {\wp} and {\wp'}; furthermore, in analogy with the identity {\cos^2 x+ \sin^2 x = 1}, one has an identity of the form

\displaystyle  \wp'(z)^2 = 4 \wp(z)^3 - g_2 \wp(z) - g_3 \ \ \ \ \ (8)

for all {z \in {\bf C}} (avoiding poles) and some complex numbers {g_2,g_3} that depend on the lattice {\Lambda}. Indeed, much as the map {x \mapsto (\cos 2\pi x, \sin 2\pi x)} creates a diffeomorphism between the additive unit circle {{\bf R}/{\bf Z}} to the geometric unit circle {\{ (x,y) \in{\bf R}^2: x^2+y^2=1\}}, the map {z \mapsto (\wp(z), \wp'(z))} turns out to be a complex diffeomorphism between the torus {(\omega_1 {\bf Z} + \omega_2 {\bf Z}) \backslash {\bf C}} and the elliptic curve

\displaystyle  \{ (z, w) \in {\bf C}^2: z^2 = 4w^3 - g_2 w - g_3 \} \cup \{\infty\}

with the convention that {(\wp,\wp')} maps the origin {\omega_1 {\bf Z} + \omega_2 {\bf Z}} of the torus to the point {\infty} at infinity. (Indeed, one can view elliptic curves as “multiplicative tori”, and both the additive and multiplicative tori can be identified as smooth manifolds with the more familiar geometric torus, but we will not use such an identification here.) This fundamental identification with elliptic curves and tori motivates many of the further remarkable properties of elliptic curves; for instance, the fact that tori are obviously an abelian group gives rise to an abelian group law on elliptic curves (and this law can be interpreted as an analogue of the trigonometric sum identities for {\wp, \wp'}). The description of the various meromorphic functions on the torus also helps motivate the more general Riemann-Roch theorem that is a fundamental law governing meromorphic functions on other compact Riemann surfaces (and is discussed further in these 246C notes). So far we have focused on studying a single torus {\Lambda \backslash {\bf C}}. However, another important mathematical object of study is the space of all such tori, modulo isomorphism; this is a basic example of a moduli space, known as the (classical, level one) modular curve {X_0(1)}. This curve can be described in a number of ways. On the one hand, it can be viewed as the upper half-plane {{\bf H} = \{ z: \mathrm{Im}(z) > 0 \}} quotiented out by the discrete group {SL_2({\bf Z})}; on the other hand, by using the {j}-invariant, it can be identified with the complex plane {{\bf C}}; alternatively, one can compactify the modular curve and identify this compactification with the Riemann sphere {{\bf C} \cup \{\infty\}}. (This identification, by the way, produces a very short proof of the little and great Picard theorems, which we proved in 246A Notes 4.) Functions on the modular curve (such as the {j}-invariant) can be viewed as {SL_2({\bf Z})}-invariant functions on {{\bf H}}, and include the important class of modular functions; they naturally generalise to the larger class of (weakly) modular forms, which are functions on {{\bf H}} which transform in a very specific way under {SL_2({\bf Z})}-action, and which are ubiquitous throughout mathematics, and particularly in number theory. Basic examples of modular forms include the Eisenstein series, which are also the Laurent coefficients of the Weierstrass elliptic functions {\wp}. More number theoretic examples of modular forms include (suitable powers of) theta functions {\theta}, and the modular discriminant {\Delta}. Modular forms are {1}-periodic functions on the half-plane, and hence by Proposition 1 come with Fourier coefficients {a_n}; these coefficients often turn out to encode a surprising amount of number-theoretic information; a dramatic example of this is the famous modularity theorem, (a special case of which was) used amongst other things to establish Fermat’s last theorem. Modular forms can be generalised to other discrete groups than {SL_2({\bf Z})} (such as congruence groups) and to other domains than the half-plane {{\bf H}}, leading to the important larger class of automorphic forms, which are of major importance in number theory and representation theory, but which are well outside the scope of this course to discuss.

Read the rest of this entry »

Previous set of notes: Notes 1. Next set of notes: Notes 3.

In Exercise 5 (and Lemma 1) of 246A Notes 4 we already observed some links between complex analysis on the disk (or annulus) and Fourier series on the unit circle:

  • (i) Functions {f} that are holomorphic on a disk {\{ |z| < R \}} are expressed by a convergent Fourier series (and also Taylor series) {f(re^{i\theta}) = \sum_{n=0}^\infty r^n a_n e^{in\theta}} for {0 \leq r < R} (so in particular {a_n = \frac{1}{n!} f^{(n)}(0)}), where

    \displaystyle  \limsup_{n \rightarrow +\infty} |a_n|^{1/n} \leq \frac{1}{R}; \ \ \ \ \ (1)

    conversely, every infinite sequence {(a_n)_{n=0}^\infty} of coefficients obeying (1) arises from such a function {f}.
  • (ii) Functions {f} that are holomorphic on an annulus {\{ r_- < |z| < r_+ \}} are expressed by a convergent Fourier series (and also Laurent series) {f(re^{i\theta}) = \sum_{n=-\infty}^\infty r^n a_n e^{in\theta}}, where

    \displaystyle  \limsup_{n \rightarrow +\infty} |a_n|^{1/n} \leq \frac{1}{r_+}; \limsup_{n \rightarrow -\infty} |a_n|^{1/|n|} \leq \frac{1}{r_-}; \ \ \ \ \ (2)

    conversely, every doubly infinite sequence {(a_n)_{n=-\infty}^\infty} of coefficients obeying (2) arises from such a function {f}.
  • (iii) In the situation of (ii), there is a unique decomposition {f = f_1 + f_2} where {f_1} extends holomorphically to {\{ z: |z| < r_+\}}, and {f_2} extends holomorphically to {\{ z: |z| > r_-\}} and goes to zero at infinity, and are given by the formulae

    \displaystyle  f_1(z) = \sum_{n=0}^\infty a_n z^n = \frac{1}{2\pi i} \int_\gamma \frac{f(w)}{w-z}\ dw

    where {\gamma} is any anticlockwise contour in {\{ z: |z| < r_+\}} enclosing {z}, and and

    \displaystyle  f_2(z) = \sum_{n=-\infty}^{-1} a_n z^n = - \frac{1}{2\pi i} \int_\gamma \frac{f(w)}{w-z}\ dw

    where {\gamma} is any anticlockwise contour in {\{ z: |z| > r_-\}} enclosing {0} but not {z}.

This connection lets us interpret various facts about Fourier series through the lens of complex analysis, at least for some special classes of Fourier series. For instance, the Fourier inversion formula {a_n = \frac{1}{2\pi} \int_0^{2\pi} f(e^{i\theta}) e^{-in\theta}\ d\theta} becomes the Cauchy-type formula for the Laurent or Taylor coefficients of {f}, in the event that the coefficients are doubly infinite and obey (2) for some {r_- < 1 < r_+}, or singly infinite and obey (1) for some {R > 1}.

It turns out that there are similar links between complex analysis on a half-plane (or strip) and Fourier integrals on the real line, which we will explore in these notes.

We first fix a normalisation for the Fourier transform. If {f \in L^1({\bf R})} is an absolutely integrable function on the real line, we define its Fourier transform {\hat f: {\bf R} \rightarrow {\bf C}} by the formula

\displaystyle  \hat f(\xi) := \int_{\bf R} f(x) e^{-2\pi i x \xi}\ dx. \ \ \ \ \ (3)

From the dominated convergence theorem {\hat f} will be a bounded continuous function; from the Riemann-Lebesgue lemma it also decays to zero as {\xi \rightarrow \pm \infty}. My choice to place the {2\pi} in the exponent is a personal preference (it is slightly more convenient for some harmonic analysis formulae such as the identities (4), (5), (6) below), though in the complex analysis and PDE literature there are also some slight advantages in omitting this factor. In any event it is not difficult to adapt the discussion in this notes for other choices of normalisation. It is of interest to extend the Fourier transform beyond the {L^1({\bf R})} class into other function spaces, such as {L^2({\bf R})} or the space of tempered distributions, but we will not pursue this direction here; see for instance these lecture notes of mine for a treatment.

Exercise 1 (Fourier transform of Gaussian) If {a} is a coplex number with {\mathrm{Re} a>0} and {f} is the Gaussian function {f(x) := e^{-\pi a x^2}}, show that the Fourier transform {\hat f} is given by the Gaussian {\hat f(\xi) = a^{-1/2} e^{-\pi \xi^2/a}}, where we use the standard branch for {a^{-1/2}}.

The Fourier transform has many remarkable properties. On the one hand, as long as the function {f} is sufficiently “reasonable”, the Fourier transform enjoys a number of very useful identities, such as the Fourier inversion formula

\displaystyle  f(x) = \int_{\bf R} \hat f(\xi) e^{2\pi i x \xi} d\xi, \ \ \ \ \ (4)

the Plancherel identity

\displaystyle  \int_{\bf R} |f(x)|^2\ dx = \int_{\bf R} |\hat f(\xi)|^2\ d\xi, \ \ \ \ \ (5)

and the Poisson summation formula

\displaystyle  \sum_{n \in {\bf Z}} f(n) = \sum_{k \in {\bf Z}} \hat f(k). \ \ \ \ \ (6)

On the other hand, the Fourier transform also intertwines various qualitative properties of a function {f} with “dual” qualitative properties of its Fourier transform {\hat f}; in particular, “decay” properties of {f} tend to be associated with “regularity” properties of {\hat f}, and vice versa. For instance, the Fourier transform of rapidly decreasing functions tend to be smooth. There are complex analysis counterparts of this Fourier dictionary, in which “decay” properties are described in terms of exponentially decaying pointwise bounds, and “regularity” properties are expressed using holomorphicity on various strips, half-planes, or the entire complex plane. The following exercise gives some examples of this:

Exercise 2 (Decay of {f} implies regularity of {\hat f}) Let {f \in L^1({\bf R})} be an absolutely integrable function.
  • (i) If {f} has super-exponential decay in the sense that {f(x) \lesssim_{f,M} e^{-M|x|}} for all {x \in {\bf R}} and {M>0} (that is to say one has {|f(x)| \leq C_{f,M} e^{-M|x|}} for some finite quantity {C_{f,M}} depending only on {f,M}), then {\hat f} extends uniquely to an entire function {\hat f : {\bf C} \rightarrow {\bf C}}. Furthermore, this function continues to be defined by (3).
  • (ii) If {f} is supported on a compact interval {[a,b]} then the entire function {\hat f} from (i) obeys the bounds {\hat f(\xi) \lesssim_f \max( e^{2\pi a \mathrm{Im} \xi}, e^{2\pi b \mathrm{Im} \xi} )} for {\xi \in {\bf C}}. In particular, if {f} is supported in {[-M,M]} then {\hat f(\xi) \lesssim_f e^{2\pi M |\mathrm{Im}(\xi)|}}.
  • (iii) If {f} obeys the bound {f(x) \lesssim_{f,a} e^{-2\pi a|x|}} for all {x \in {\bf R}} and some {a>0}, then {\hat f} extends uniquely to a holomorphic function {\hat f} on the horizontal strip {\{ \xi: |\mathrm{Im} \xi| < a \}}, and obeys the bound {\hat f(\xi) \lesssim_{f,a} \frac{1}{a - |\mathrm{Im}(\xi)|}} in this strip. Furthermore, this function continues to be defined by (3).
  • (iv) If {f} is supported on {[0,+\infty)} (resp. {(-\infty,0]}), then there is a unique continuous extension of {\hat f} to the lower half-plane {\{ \xi: \mathrm{Im} \xi \leq 0\}} (resp. the upper half-plane {\{ \xi: \mathrm{Im} \xi \geq 0 \}} which is holomorphic in the interior of this half-plane, and such that {\hat f(\xi) \rightarrow 0} uniformly as {\mathrm{Im} \xi \rightarrow -\infty} (resp. {\mathrm{Im} \xi \rightarrow +\infty}). Furthermore, this function continues to be defined by (3).
Hint: to establish holomorphicity in each of these cases, use Morera’s theorem and the Fubini-Tonelli theorem. For uniqueness, use analytic continuation, or (for part (iv)) the Cauchy integral formula.

Later in these notes we will give a partial converse to part (ii) of this exercise, known as the Paley-Wiener theorem; there are also partial converses to the other parts of this exercise.

From (3) we observe the following intertwining property between multiplication by an exponential and complex translation: if {\xi_0} is a complex number and {f: {\bf R} \rightarrow {\bf C}} is an absolutely integrable function such that the modulated function {f_{\xi_0}(x) := e^{2\pi i \xi_0 x} f(x)} is also absolutely integrable, then we have the identity

\displaystyle  \widehat{f_{\xi_0}}(\xi) = \hat f(\xi - \xi_0) \ \ \ \ \ (7)

whenever {\xi} is a complex number such that at least one of the two sides of the equation in (7) is well defined. Thus, multiplication of a function by an exponential weight corresponds (formally, at least) to translation of its Fourier transform. By using contour shifting, we will also obtain a dual relationship: under suitable holomorphicity and decay conditions on {f}, translation by a complex shift will correspond to multiplication of the Fourier transform by an exponential weight. It turns out to be possible to exploit this property to derive many Fourier-analytic identities, such as the inversion formula (4) and the Poisson summation formula (6), which we do later in these notes. (The Plancherel theorem can also be established by complex analytic methods, but this requires a little more effort; see Exercise 8.)

The material in these notes is loosely adapted from Chapter 4 of Stein-Shakarchi’s “Complex Analysis”.

Read the rest of this entry »

Marcel Filoche, Svitlana Mayboroda, and I have just uploaded to the arXiv our preprint “The effective potential of an {M}-matrix“. This paper explores the analogue of the effective potential of Schrödinger operators {-\Delta + V} provided by the “landscape function” {u}, when one works with a certain type of self-adjoint matrix known as an {M}-matrix instead of a Schrödinger operator.

Suppose one has an eigenfunction

\displaystyle  (-\Delta + V) \phi = E \phi

of a Schrödinger operator {-\Delta+V}, where {\Delta} is the Laplacian on {{\bf R}^d}, {V: {\bf R}^d \rightarrow {\bf R}} is a potential, and {E} is an energy. Where would one expect the eigenfunction {\phi} to be concentrated? If the potential {V} is smooth and slowly varying, the correspondence principle suggests that the eigenfunction {\phi} should be mostly concentrated in the potential energy wells {\{ x: V(x) \leq E \}}, with an exponentially decaying amount of tunnelling between the wells. One way to rigorously establish such an exponential decay is through an argument of Agmon, which we will sketch later in this post, which gives an exponentially decaying upper bound (in an {L^2} sense) of eigenfunctions {\phi} in terms of the distance to the wells {\{ V \leq E \}} in terms of a certain “Agmon metric” on {{\bf R}^d} determined by the potential {V} and energy level {E} (or any upper bound {\overline{E}} on this energy). Similar exponential decay results can also be obtained for discrete Schrödinger matrix models, in which the domain {{\bf R}^d} is replaced with a discrete set such as the lattice {{\bf Z}^d}, and the Laplacian {\Delta} is replaced by a discrete analogue such as a graph Laplacian.

When the potential {V} is very “rough”, as occurs for instance in the random potentials arising in the theory of Anderson localisation, the Agmon bounds, while still true, become very weak because the wells {\{ V \leq E \}} are dispersed in a fairly dense fashion throughout the domain {{\bf R}^d}, and the eigenfunction can tunnel relatively easily between different wells. However, as was first discovered in 2012 by my two coauthors, in these situations one can replace the rough potential {V} by a smoother effective potential {1/u}, with the eigenfunctions typically localised to a single connected component of the effective wells {\{ 1/u \leq E \}}. In fact, a good choice of effective potential comes from locating the landscape function {u}, which is the solution to the equation {(-\Delta + V) u = 1} with reasonable behavior at infinity, and which is non-negative from the maximum principle, and then the reciprocal {1/u} of this landscape function serves as an effective potential.

There are now several explanations for why this particular choice {1/u} is a good effective potential. Perhaps the simplest (as found for instance in this recent paper of Arnold, David, Jerison, and my two coauthors) is the following observation: if {\phi} is an eigenvector for {-\Delta+V} with energy {E}, then {\phi/u} is an eigenvector for {-\frac{1}{u^2} \mathrm{div}(u^2 \nabla \cdot) + \frac{1}{u}} with the same energy {E}, thus the original Schrödinger operator {-\Delta+V} is conjugate to a (variable coefficient, but still in divergence form) Schrödinger operator with potential {1/u} instead of {V}. Closely related to this, we have the integration by parts identity

\displaystyle  \int_{{\bf R}^d} |\nabla f|^2 + V |f|^2\ dx = \int_{{\bf R}^d} u^2 |\nabla(f/u)|^2 + \frac{1}{u} |f|^2\ dx \ \ \ \ \ (1)

for any reasonable function {f}, thus again highlighting the emergence of the effective potential {1/u}.

These particular explanations seem rather specific to the Schrödinger equation (continuous or discrete); we have for instance not been able to find similar identities to explain an effective potential for the bi-Schrödinger operator {\Delta^2 + V}.

In this paper, we demonstrate the (perhaps surprising) fact that effective potentials continue to exist for operators that bear very little resemblance to Schrödinger operators. Our chosen model is that of an {M}-matrix: self-adjoint positive definite matrices {A} whose off-diagonal entries are negative. This model includes discrete Schrödinger operators (with non-negative potentials) but can allow for significantly more non-local interactions. The analogue of the landscape function would then be the vector {u := A^{-1} 1}, where {1} denotes the vector with all entries {1}. Our main result, roughly speaking, asserts that an eigenvector {A \phi = E \phi} of {A} will then be exponentially localised to the “potential wells” {K := \{ j: \frac{1}{u_j} \leq E \}}, where {u_j} denotes the coordinates of the landscape function {u}. In particular, we establish the inequality

\displaystyle  \sum_k \phi_k^2 e^{2 \rho(k,K) / \sqrt{W}} ( \frac{1}{u_k} - E )_+ \leq W \max_{i,j} |a_{ij}|

if {\phi} is normalised in {\ell^2}, where the connectivity {W} is the maximum number of non-zero entries of {A} in any row or column, {a_{ij}} are the coefficients of {A}, and {\rho} is a certain moderately complicated but explicit metric function on the spatial domain. Informally, this inequality asserts that the eigenfunction {\phi_k} should decay like {e^{-\rho(k,K) / \sqrt{W}}} or faster. Indeed, our numerics show a very strong log-linear relationship between {\phi_k} and {\rho(k,K)}, although it appears that our exponent {1/\sqrt{W}} is not quite optimal. We also provide an associated localisation result which is technical to state but very roughly asserts that a given eigenvector will in fact be localised to a single connected component of {K} unless there is a resonance between two wells (by which we mean that an eigenvalue for a localisation of {A} associated to one well is extremely close to an eigenvalue for a localisation of {A} associated to another well); such localisation is also strongly supported by numerics. (Analogous results for Schrödinger operators had been previously obtained by the previously mentioned paper of Arnold, David, Jerison, and my two coauthors, and to quantum graphs in a very recent paper of Harrell and Maltsev.)

Our approach is based on Agmon’s methods, which we interpret as a double commutator method, and in particular relying on exploiting the negative definiteness of certain double commutator operators. In the case of Schrödinger operators {-\Delta+V}, this negative definiteness is provided by the identity

\displaystyle  \langle [[-\Delta+V,g],g] u, u \rangle = -2\int_{{\bf R}^d} |\nabla g|^2 |u|^2\ dx \leq 0 \ \ \ \ \ (2)

for any sufficiently reasonable functions {u, g: {\bf R}^d \rightarrow {\bf R}}, where we view {g} (like {V}) as a multiplier operator. To exploit this, we use the commutator identity

\displaystyle  \langle g [\psi, -\Delta+V] u, g \psi u \rangle = \frac{1}{2} \langle [[-\Delta+V, g \psi],g\psi] u, u \rangle

\displaystyle -\frac{1}{2} \langle [[-\Delta+V, g],g] \psi u, \psi u \rangle

valid for any {g,\psi,u: {\bf R}^d \rightarrow {\bf R}} after a brief calculation. The double commutator identity then tells us that

\displaystyle  \langle g [\psi, -\Delta+V] u, g \psi u \rangle \leq \int_{{\bf R}^d} |\nabla g|^2 |\psi u|^2\ dx.

If we choose {u} to be a non-negative weight and let {\psi := \phi/u} for an eigenfunction {\phi}, then we can write

\displaystyle  [\psi, -\Delta+V] u = [\psi, -\Delta+V - E] u = \psi (-\Delta+V - E) u

and we conclude that

\displaystyle  \int_{{\bf R}^d} \frac{(-\Delta+V-E)u}{u} |g|^2 |\phi|^2\ dx \leq \int_{{\bf R}^d} |\nabla g|^2 |\phi|^2\ dx. \ \ \ \ \ (3)

We have considerable freedom in this inequality to select the functions {u,g}. If we select {u=1}, we obtain the clean inequality

\displaystyle  \int_{{\bf R}^d} (V-E) |g|^2 |\phi|^2\ dx \leq \int_{{\bf R}^d} |\nabla g|^2 |\phi|^2\ dx.

If we take {g} to be a function which equals {1} on the wells {\{ V \leq E \}} but increases exponentially away from these wells, in such a way that

\displaystyle  |\nabla g|^2 \leq \frac{1}{2} (V-E) |g|^2

outside of the wells, we can obtain the estimate

\displaystyle  \int_{V > E} (V-E) |g|^2 |\phi|^2\ dx \leq 2 \int_{V < E} (E-V) |\phi|^2\ dx,

which then gives an exponential type decay of {\phi} away from the wells. This is basically the classic exponential decay estimate of Agmon; one can basically take {g} to be the distance to the wells {\{ V \leq E \}} with respect to the Euclidean metric conformally weighted by a suitably normalised version of {V-E}. If we instead select {u} to be the landscape function {u = (-\Delta+V)^{-1} 1}, (3) then gives

\displaystyle  \int_{{\bf R}^d} (\frac{1}{u} - E) |g|^2 |\phi|^2\ dx \leq \int_{{\bf R}^d} |\nabla g|^2 |\phi|^2\ dx,

and by selecting {g} appropriately this gives an exponential decay estimate away from the effective wells {\{ \frac{1}{u} \leq E \}}, using a metric weighted by {\frac{1}{u}-E}.

It turns out that this argument extends without much difficulty to the {M}-matrix setting. The analogue of the crucial double commutator identity (2) is

\displaystyle  \langle [[A,D],D] u, u \rangle	= \sum_{i \neq j} a_{ij} u_i u_j (d_{ii} - d_{jj})^2 \leq 0

for any diagonal matrix {D = \mathrm{diag}(d_{11},\dots,d_{NN})}. The remainder of the Agmon type arguments go through after making the natural modifications.

Numerically we have also found some aspects of the landscape theory to persist beyond the {M}-matrix setting, even though the double commutators cease being negative definite, so this may not yet be the end of the story, but it does at least demonstrate that utility the landscape does not purely rely on identities such as (1).

Previous set of notes: 246A Notes 5. Next set of notes: Notes 2.

— 1. Jensen’s formula —

Suppose {f} is a non-zero rational function {f =P/Q}, then by the fundamental theorem of algebra one can write

\displaystyle  f(z) = c \frac{\prod_\rho (z-\rho)}{\prod_\zeta (z-\zeta)}

for some non-zero constant {c}, where {\rho} ranges over the zeroes of {P} (counting multiplicity) and {\zeta} ranges over the zeroes of {Q} (counting multiplicity), and assuming {z} avoids the zeroes of {Q}. Taking absolute values and then logarithms, we arrive at the formula

\displaystyle  \log |f(z)| = \log |c| + \sum_\rho \log|z-\rho| - \sum_\zeta \log |z-\zeta|, \ \ \ \ \ (1)

as long as {z} avoids the zeroes of both {P} and {Q}. (In this set of notes we use {\log} for the natural logarithm when applied to a positive real number, and {\mathrm{Log}} for the standard branch of the complex logarithm (which extends {\log}); the multi-valued complex logarithm {\log} will only be used in passing.) Alternatively, taking logarithmic derivatives, we arrive at the closely related formula

\displaystyle  \frac{f'(z)}{f(z)} = \sum_\rho \frac{1}{z-\rho} - \sum_\zeta \frac{1}{z-\zeta}, \ \ \ \ \ (2)

again for {z} avoiding the zeroes of both {P} and {Q}. Thus we see that the zeroes and poles of a rational function {f} describe the behaviour of that rational function, as well as close relatives of that function such as the log-magnitude {\log|f|} and log-derivative {\frac{f'}{f}}. We have already seen these sorts of formulae arise in our treatment of the argument principle in 246A Notes 4.

Exercise 1 Let {P(z)} be a complex polynomial of degree {n \geq 1}.
  • (i) (Gauss-Lucas theorem) Show that the complex roots of {P'(z)} are contained in the closed convex hull of the complex roots of {P(z)}.
  • (ii) (Laguerre separation theorem) If all the complex roots of {P(z)} are contained in a disk {D(z_0,r)}, and {\zeta \not \in D(z_0,r)}, then all the complex roots of {nP(z) + (\zeta - z) P'(z)} are also contained in {D(z_0,r)}. (Hint: apply a suitable Möbius transformation to move {\zeta} to infinity, and then apply part (i) to a polynomial that emerges after applying this transformation.)

There are a number of useful ways to extend these formulae to more general meromorphic functions than rational functions. Firstly there is a very handy “local” variant of (1) known as Jensen’s formula:

Theorem 2 (Jensen’s formula) Let {f} be a meromorphic function on an open neighbourhood of a disk {\overline{D(z_0,r)} = \{ z: |z-z_0| \leq r \}}, with all removable singularities removed. Then, if {z_0} is neither a zero nor a pole of {f}, we have

\displaystyle  \log |f(z_0)| = \int_0^1 \log |f(z_0+re^{2\pi i t})|\ dt + \sum_{\rho: |\rho-z_0| \leq r} \log \frac{|\rho-z_0|}{r} \ \ \ \ \ (3)

\displaystyle  - \sum_{\zeta: |\zeta-z_0| \leq r} \log \frac{|\zeta-z_0|}{r}

where {\rho} and {\zeta} range over the zeroes and poles of {f} respectively (counting multiplicity) in the disk {\overline{D(z_0,r)}}.

One can view (3) as a truncated (or localised) variant of (1). Note also that the summands {\log \frac{|\rho-z_0|}{r}, \log \frac{|\zeta-z_0|}{r}} are always non-positive.

Proof: By perturbing {r} slightly if necessary, we may assume that none of the zeroes or poles of {f} (which form a discrete set) lie on the boundary circle {\{ z: |z-z_0| = r \}}. By translating and rescaling, we may then normalise {z_0=0} and {r=1}, thus our task is now to show that

\displaystyle  \log |f(0)| = \int_0^1 \log |f(e^{2\pi i t})|\ dt + \sum_{\rho: |\rho| < 1} \log |\rho| - \sum_{\zeta: |\zeta| < 1} \log |\zeta|. \ \ \ \ \ (4)

We may remove the poles and zeroes inside the disk {D(0,1)} by the useful device of Blaschke products. Suppose for instance that {f} has a zero {\rho} inside the disk {D(0,1)}. Observe that the function

\displaystyle  B_\rho(z) := \frac{\rho - z}{1 - \overline{\rho} z} \ \ \ \ \ (5)

has magnitude {1} on the unit circle {\{ z: |z| = 1\}}, equals {\rho} at the origin, has a simple zero at {\rho}, but has no other zeroes or poles inside the disk. Thus Jensen’s formula (4) already holds if {f} is replaced by {B_\rho}. To prove (4) for {f}, it thus suffices to prove it for {f/B_\rho}, which effectively deletes a zero {\rho} inside the disk {D(0,1)} from {f} (and replaces it instead with its inversion {1/\overline{\rho}}). Similarly we may remove all the poles inside the disk. As a meromorphic function only has finitely many poles and zeroes inside a compact set, we may thus reduce to the case when {f} has no poles or zeroes on or inside the disk {D(0,1)}, at which point our goal is simply to show that

\displaystyle  \log |f(0)| = \int_0^1 \log |f(e^{2\pi i t})|\ dt.

Since {f} has no zeroes or poles inside the disk, it has a holomorphic logarithm {F} (Exercise 46 of 246A Notes 4). In particular, {\log |f|} is the real part of {F}. The claim now follows by applying the mean value property (Exercise 17 of 246A Notes 3) to {\log |f|}. \Box

An important special case of Jensen’s formula arises when {f} is holomorphic in a neighborhood of {\overline{D(z_0,r)}}, in which case there are no contributions from poles and one simply has

\displaystyle  \int_0^1 \log |f(z_0+re^{2\pi i t})|\ dt = \log |f(z_0)| + \sum_{\rho: |\rho-z_0| \leq r} \log \frac{r}{|\rho-z_0|}. \ \ \ \ \ (6)

This is quite a useful formula, mainly because the summands {\log \frac{r}{|\rho-z_0|}} are non-negative; it can be viewed as a more precise assertion of the subharmonicity of {\log |f|} (see Exercises 60(ix) and 61 of 246A Notes 5). Here are some quick applications of this formula:

Exercise 3 Use (6) to give another proof of Liouville’s theorem: a bounded holomorphic function {f} on the entire complex plane is necessarily constant.

Exercise 4 Use Jensen’s formula to prove the fundamental theorem of algebra: a complex polynomial {P(z)} of degree {n} has exactly {n} complex zeroes (counting multiplicity), and can thus be factored as {P(z) = c (z-z_1) \dots (z-z_n)} for some complex numbers {c,z_1,\dots,z_n} with {c \neq 0}. (Note that the fundamental theorem was invoked previously in this section, but only for motivational purposes, so the proof here is non-circular.)

Exercise 5 (Shifted Jensen’s formula) Let {f} be a meromorphic function on an open neighbourhood of a disk {\{ z: |z-z_0| \leq r \}}, with all removable singularities removed. Show that

\displaystyle  \log |f(z)| = \int_0^1 \log |f(z_0+re^{2\pi i t})| \mathrm{Re} \frac{r e^{2\pi i t} + (z-z_0)}{r e^{2\pi i t} - (z-z_0)}\ dt \ \ \ \ \ (7)

\displaystyle  + \sum_{\rho: |\rho-z_0| \leq r} \log \frac{|\rho-z|}{|r - \rho^* (z-z_0)|}

\displaystyle - \sum_{\zeta: |\zeta-z_0| \leq r} \log \frac{|\zeta-z|}{|r - \zeta^* (z-z_0)|}

for all {z} in the open disk {\{ z: |z-z_0| < r\}} that are not zeroes or poles of {f}, where {\rho^* = \frac{\overline{\rho-z_0}}{r}} and {\zeta^* = \frac{\overline{\zeta-z_0}}{r}}. (The function {\Re \frac{r e^{2\pi i t} + (z-z_0)}{r e^{2\pi i t} - (z-z_0)}} appearing in the integrand is sometimes known as the Poisson kernel, particularly if one normalises so that {z_0=0} and {r=1}.)

Exercise 6 (Bounded type)
  • (i) If {f} is a holomorphic function on {D(0,1)} that is not identically zero, show that {\liminf_{r \rightarrow 1^-} \int_0^{2\pi} \log |f(re^{i\theta})|\ d\theta > -\infty}.
  • (ii) If {f} is a meromorphic function on {D(0,1)} that is the ratio of two bounded holomorphic functions that are not identically zero, show that {\limsup_{r \rightarrow 1^-} \int_0^{2\pi} |\log |f(re^{i\theta})||\ d\theta < \infty}. (Functions {f} of this form are said to be of bounded type and lie in the Nevanlinna class for the unit disk {D(0,1)}.)

Exercise 7 (Smoothed out Jensen formula) Let {f} be a meromorphic function on an open set {U}, and let {\phi: U \rightarrow {\bf C}} be a smooth compactly supported function. Show that

\displaystyle \sum_\rho \phi(\rho) - \sum_\zeta \phi(\zeta)

\displaystyle  = \frac{-1}{2\pi} \int\int_U ((\frac{\partial}{\partial x} + i \frac{\partial}{\partial y}) \phi(x+iy)) \frac{f'}{f}(x+iy)\ dx dy

\displaystyle  = \frac{1}{2\pi} \int\int_U ((\frac{\partial^2}{\partial x^2} + \frac{\partial^2}{\partial y}^2) \phi(x+iy)) \log |f(x+iy)|\ dx dy

where {\rho, \zeta} range over the zeroes and poles of {f} (respectively) in the support of {\phi}. Informally argue why this identity is consistent with Jensen’s formula.

When applied to entire functions {f}, Jensen’s formula relates the order of growth of {f} near infinity with the density of zeroes of {f}. Here is a typical result:

Proposition 8 Let {f: {\bf C} \rightarrow {\bf C}} be an entire function, not identically zero, that obeys a growth bound {|f(z)| \leq C \exp( C|z|^\alpha)} for some {C, \alpha > 0} and all {z}. Then there exists a constant {C'>0} such that {D(0,R)} has at most {C' R^\alpha} zeroes (counting multiplicity) for any {R \geq 1}.

Entire functions that obey a growth bound of the form {|f(z)| \leq C_\varepsilon \exp( C_\varepsilon |z|^{\rho+\varepsilon})} for every {\varepsilon>0} and {z} (where {C_\varepsilon} depends on {\varepsilon}) are said to be of order at most {\rho}. The above theorem shows that for such functions that are not identically zero, the number of zeroes in a disk of radius {R} does not grow much faster than {R^\rho}. This is often a useful preliminary upper bound on the zeroes of entire functions, as the order of an entire function tends to be relatively easy to compute in practice.

Proof: First suppose that {f(0)} is non-zero. From (6) applied with {r=2R} and {z_0=0} one has

\displaystyle  \int_0^1 \log(C \exp( C (2R)^\alpha ) )\ dt \geq \log |f(0)| + \sum_{\rho: |\rho| \leq 2R} \log \frac{2R}{|\rho|}.

Every zero in {D(0,R)} contribute at least {\log 2} to a summand on the right-hand side, while all other zeroes contribute a non-negative quantity, thus

\displaystyle  \log C + C (2R)^\alpha \geq \log |f(0)| + N_R \log 2

where {N_R} denotes the number of zeroes in {D(0,R)}. This gives the claim for {f(0) \neq 0}. When {f(0)=0}, one can shift {f} by a small amount to make {f} non-zero at the origin (using the fact that zeroes of holomorphic functions not identically zero are isolated), modifying {C} in the process, and then repeating the previous arguments. \Box

Just as (3) and (7) give truncated variants of (1), we can create truncated versions of (2). The following crude truncation is adequate for many applications:

Theorem 9 (Truncated formula for log-derivative) Let {f} be a holomorphic function on an open neighbourhood of a disk {\{ z: |z-z_0| \leq r \}} that is not identically zero on this disk. Suppose that one has a bound of the form {|f(z)| \leq M^{O_{c_1,c_2}(1)} |f(z_0)|} for some {M \geq 1} and all {z} on the circle {\{ z: |z-z_0| = r\}}. Let {0 < c_2 < c_1 < 1} be constants. Then one has the approximate formula

\displaystyle  \frac{f'(z)}{f(z)} = \sum_{\rho: |\rho - z_0| \leq c_1 r} \frac{1}{z-\rho} + O_{c_1,c_2}( \frac{\log M}{r} )

for all {z} in the disk {\{ z: |z-z_0| < c_2 r \}} other than zeroes of {f}. Furthermore, the number of zeroes {\rho} in the above sum is {O_{c_1,c_2}(\log M)}.

Proof: To abbreviate notation, we allow all implied constants in this proof to depend on {c_1,c_2}.

We mimic the proof of Jensen’s formula. Firstly, we may translate and rescale so that {z_0=0} and {r=1}, so we have {|f(z)| \leq M^{O(1)} |f(0)|} when {|z|=1}, and our main task is to show that

\displaystyle  \frac{f'(z)}{f(z)} - \sum_{\rho: |\rho| \leq c_1} \frac{1}{z-\rho} = O( \log M ) \ \ \ \ \ (8)

for {|z| \leq c_2}. Note that if {f(0)=0} then {f} vanishes on the unit circle and hence (by the maximum principle) vanishes identically on the disk, a contradiction, so we may assume {f(0) \neq 0}. From hypothesis we then have

\displaystyle  \log |f(z)| \leq \log |f(0)| + O(\log M)

on the unit circle, and so from Jensen’s formula (3) we see that

\displaystyle  \sum_{\rho: |\rho| \leq 1} \log \frac{1}{|\rho|} = O(\log M). \ \ \ \ \ (9)

In particular we see that the number of zeroes with {|\rho| \leq c_1} is {O(\log M)}, as claimed.

Suppose {f} has a zero {\rho} with {c_1 < |\rho| \leq 1}. If we factor {f = B_\rho g}, where {B_\rho} is the Blaschke product (5), then

\displaystyle  \frac{f'}{f} = \frac{B'_\rho}{B_\rho} + \frac{g'}{g}

\displaystyle  = \frac{g'}{g} + \frac{1}{z-\rho} - \frac{1}{z-1/\overline{\rho}}.

Observe from Taylor expansion that the distance between {\rho} and {1/\overline{\rho}} is {O( \log \frac{1}{|\rho|} )}, and hence {\frac{1}{z-\rho} - \frac{1}{z-1/\overline{\rho}} = O( \log \frac{1}{|\rho|} )} for {|z| \leq c_2}. Thus we see from (9) that we may use Blaschke products to remove all the zeroes in the annulus {c_1 < |\rho| \leq 1} while only affecting the left-hand side of (8) by {O( \log M)}; also, removing the Blaschke products does not affect {|f(z)|} on the unit circle, and only affects {\log |f(0)|} by {O(\log M)} thanks to (9). Thus we may assume without loss of generality that there are no zeroes in this annulus.

Similarly, given a zero {\rho} with {|\rho| \leq c_1}, we have {\frac{1}{z-1/\overline{\rho}} = O(1)}, so using Blaschke products to remove all of these zeroes also only affects the left-hand side of (8) by {O(\log M)} (since the number of zeroes here is {O(\log M)}), with {\log |f(0)|} also modified by at most {O(\log M)}. Thus we may assume in fact that {f} has no zeroes whatsoever within the unit disk. We may then also normalise {f(0) = 1}, then {\log |f(e^{2\pi i t})| \leq O(\log M)} for all {t \in [0,1]}. By Jensen’s formula again, we have

\displaystyle  \int_0^1 \log |f(e^{2\pi i t})|\ dt = 0

and thus (by using the identity {|x| = 2 \max(x,0) - x} for any real {x})

\displaystyle  \int_0^1 \log |f(e^{2\pi i t})|\ dt \ll \log M. \ \ \ \ \ (10)

On the other hand, from (7) we have

\displaystyle  \log |f(z)| = \int_0^1 \log |f(e^{2\pi i t})| \mathrm{Re} \frac{e^{2\pi i t} + z}{e^{2\pi i t} - z}\ dt

which implies from (10) that {\log |f(z)|} and its first derivatives are {O( \log M )} on the disk {\{ z: |z| \leq c_2 \}}. But recall from the proof of Jensen’s formula that {\frac{f'}{f}} is the derivative of a logarithm {\log f} of {f}, whose real part is {\log |f|}. By the Cauchy-Riemann equations for {\log f}, we conclude that {\frac{f'}{f} = O(\log M)} on the disk {\{ z: |z| \leq c_2 \}}, as required. \Box

Exercise 10
  • (i) (Borel-Carathéodory theorem) If {f: U \rightarrow {\bf C}} is analytic on an open neighborhood of a disk {\overline{D(z_0,R)}}, show that

    \displaystyle  \sup_{z \in D(z_0,r)} |f(z)| \leq \frac{2r}{R-r} \sup_{z \in \overline{D(z_0,R)}} \mathrm{Re} f(z) + \frac{R+r}{R-r} |f(z_0)|.

    (Hint: one can normalise {z_0=0}, {R=1}, {f(0)=0}, and {\sup_{|z-z_0| \leq R} \mathrm{Re} f(z)=1}. Now {f} maps the unit disk to the half-plane {\{ \mathrm{Re} z \leq 1 \}}. Use a Möbius transformation to map the half-plane to the unit disk and then use the Schwarz lemma.)
  • (ii) Use (i) to give an alternate way to conclude the proof of Theorem 9.

A variant of the above argument allows one to make precise the heuristic that holomorphic functions locally look like polynomials:

Exercise 11 (Local Weierstrass factorisation) Let the notation and hypotheses be as in Theorem 9. Then show that

\displaystyle  f(z) = P(z) \exp( g(z) )

for all {z} in the disk {\{ z: |z-z_0| < c_2 r \}}, where {P} is a polynomial whose zeroes are precisely the zeroes of {f} in {\{ z: |z-z_0| \leq c_1r \}} (counting multiplicity) and {g} is a holomorphic function on {\{ z: |z-z_0| < c_2 r \}} of magnitude {O_{c_1,c_2}( \log M )} and first derivative {O_{c_1,c_2}( \log M / r )} on this disk. Furthermore, show that the degree of {P} is {O_{c_1,c_2}(\log M)}.

Exercise 12 (Preliminary Beurling factorisation) Let {H^\infty(D(0,1))} denote the space of bounded analytic functions {f: D(0,1) \rightarrow {\bf C}} on the unit disk; this is a normed vector space with norm

\displaystyle  \|f\|_{H^\infty(D(0,1))} := \sup_{z \in D(0,1)} |f(z)|.

  • (i) If {f \in H^\infty(D(0,1))} is not identically zero, and {z_n} denote the zeroes of {f} in {D(0,1)} counting multiplicity, show that

    \displaystyle  \sum_n (1-|z_n|) < \infty

    and

    \displaystyle  \sup_{1/2 < r < 1} \int_0^{2\pi} | \log |f(re^{i\theta})| |\ d\theta < \infty.

  • (ii) Let the notation be as in (i). If we define the Blaschke product

    \displaystyle  B(z) := z^m \prod_{|z_n| \neq 0} \frac{|z_n|}{z_n} \frac{z_n-z}{1-\overline{z_n} z}

    where {m} is the order of vanishing of {f} at zero, show that this product converges absolutely to a holomorphic function on {D(0,1)}, and that {|f(z)| \leq \|f\|_{H^\infty(D(0,1)} |B(z)|} for all {z \in D(0,1)}. (It may be easier to work with finite Blaschke products first to obtain this bound.)
  • (iii) Continuing the notation from (i), establish a factorisation {f(z) = B(z) \exp(g(z))} for some holomorphic function {g: D(0,1) \rightarrow {\bf C}} with {\mathrm{Re}(g(z)) \leq \log \|f\|_{H^\infty(D(0,1)}} for all {z\in D(0,1)}.
  • (iv) (Theorem of F. and M. Riesz, special case) If {f \in H^\infty(D(0,1))} extends continuously to the boundary {\{e^{i\theta}: 0 \leq \theta < 2\pi\}}, show that the set {\{ 0 \leq \theta < 2\pi: f(e^{i\theta})=0 \}} has zero measure.

Remark 13 The factorisation (iii) can be refined further, with {g} being the Poisson integral of some finite measure on the unit circle. Using the Lebesgue decomposition of this finite measure into absolutely continuous parts one ends up factorising {H^\infty(D(0,1))} functions into “outer functions” and “inner functions”, giving the Beurling factorisation of {H^\infty}. There are also extensions to larger spaces {H^p(D(0,1))} than {H^\infty(D(0,1))} (which are to {H^\infty} as {L^p} is to {L^\infty}), known as Hardy spaces. We will not discuss this topic further here, but see for instance this text of Garnett for a treatment.

Exercise 14 (Littlewood’s lemma) Let {f} be holomorphic on an open neighbourhood of a rectangle {R = \{ \sigma+it: \sigma_0 \leq \sigma \leq \sigma_1; 0 \leq t \leq T \}} for some {\sigma_0 < \sigma_1} and {T>0}, with {f} non-vanishing on the boundary of the rectangle. Show that

\displaystyle  2\pi \sum_\rho (\mathrm{Re}(\rho)-\sigma_0) = \int_0^T \log |f(\sigma_0+it)|\ dt - \int_0^T \log |f(\sigma_1+it)|\ dt

\displaystyle  + \int_{\sigma_0}^{\sigma_1} \mathrm{arg} f(\sigma+iT)\ d\sigma - \int_{\sigma_0}^{\sigma_1} \mathrm{arg} f(\sigma)\ d\sigma

where {\rho} ranges over the zeroes of {f} inside {R} (counting multiplicity) and one uses a branch of {\mathrm{arg} f} which is continuous on the upper, lower, and right edges of {C}. (This lemma is a popular tool to explore the zeroes of Dirichlet series such as the Riemann zeta function.)

Read the rest of this entry »

Just a short announcement that next quarter I will be continuing the recently concluded 246A complex analysis class as 246B. Topics I plan to cover:

Notes for the later material will appear on this blog in due course.

I’ve just uploaded to the arXiv my paper “Sendov’s conjecture for sufficiently high degree polynomials“. This paper is a contribution to an old conjecture of Sendov on the zeroes of polynomials:

Conjecture 1 (Sendov’s conjecture) Let {f: {\bf C} \rightarrow {\bf C}} be a polynomial of degree {n \geq 2} that has all zeroes in the closed unit disk {\{ z: |z| \leq 1 \}}. If {\lambda_0} is one of these zeroes, then {f'} has at least one zero in {\{z: |z-\lambda_0| \leq 1\}}.

It is common in the literature on this problem to normalise {f} to be monic, and to rotate the zero {\lambda_0} to be an element {a} of the unit interval {[0,1]}. As it turns out, the location of {a} on this unit interval {[0,1]} ends up playing an important role in the arguments.

Many cases of this conjecture are already known, for instance

In particular, in high degrees the only cases left uncovered by prior results are when {a} is close (but not too close) to {0}, or when {a} is close (but not too close) to {1}; see Figure 1 of my paper.

Our main result covers the high degree case uniformly for all values of {a \in [0,1]}:

Theorem 2 There exists an absolute constant {n_0} such that Sendov’s conjecture holds for all {n \geq n_0}.

In principle, this reduces the verification of Sendov’s conjecture to a finite time computation, although our arguments use compactness methods and thus do not easily provide an explicit value of {n_0}. I believe that the compactness arguments can be replaced with quantitative substitutes that provide an explicit {n_0}, but the value of {n_0} produced is likely to be extremely large (certainly much larger than {9}).

Because of the previous results (particularly those of Chalebgwa and Chijiwa), we will only need to establish the following two subcases of the above theorem:

Theorem 3 (Sendov’s conjecture near the origin) Under the additional hypothesis {a = o(1/\log n)}, Sendov’s conjecture holds for sufficiently large {n}.

Theorem 4 (Sendov’s conjecture near the unit circle) Under the additional hypothesis {1-o(1) \leq a \leq 1 - \varepsilon_0^n} for a fixed {\varepsilon_0>0}, Sendov’s conjecture holds for sufficiently large {n}.

We approach these theorems using the “compactness and contradiction” strategy, assuming that there is a sequence of counterexamples whose degrees {n} going to infinity, using various compactness theorems to extract various asymptotic objects in the limit {n \rightarrow \infty}, and somehow using these objects to derive a contradiction. There are many ways to effect such a strategy; we will use a formalism that I call “cheap nonstandard analysis” and which is common in the PDE literature, in which one repeatedly passes to subsequences as necessary whenever one invokes a compactness theorem to create a limit object. However, the particular choice of asymptotic formalism one selects is not of essential importance for the arguments.

I also found it useful to use the language of probability theory. Given a putative counterexample {f} to Sendov’s conjecture, let {\lambda} be a zero of {f} (chosen uniformly at random among the {n} zeroes of {f}, counting multiplicity), and let {\zeta} similarly be a uniformly random zero of {f'}. We introduce the logarithmic potentials

\displaystyle  U_\lambda(z) := {\bf E} \log \frac{1}{|z-\lambda|}; \quad U_\zeta(z) := {\bf E} \log \frac{1}{|z-\zeta|}

and the Stieltjes transforms

\displaystyle  s_\lambda(z) := {\bf E} \frac{1}{z-\lambda}; \quad s_\zeta(z) := {\bf E} \log \frac{1}{z-\zeta}.

Standard calculations using the fundamental theorem of algebra yield the basic identities

\displaystyle  U_\lambda(z) = \frac{1}{n} \log \frac{1}{|f(z)|}; \quad U_\zeta(z) = \frac{1}{n-1} \log \frac{n}{|f'(z)|}

and

\displaystyle  s_\lambda(z) = \frac{1}{n} \frac{f'(z)}{f(z)}; \quad s_\zeta(z) = \frac{1}{n-1} \frac{f''(z)}{f'(z)} \ \ \ \ \ (1)

and in particular the random variables {\lambda, \zeta} are linked to each other by the identity

\displaystyle  U_\lambda(z) - \frac{n-1}{n} U_\zeta(z) = \frac{1}{n} \log |s_\lambda(z)|. \ \ \ \ \ (2)

On the other hand, the hypotheses of Sendov’s conjecture (and the Gauss-Lucas theorem) place {\lambda,\zeta} inside the unit disk {\{ z:|z| \leq 1\}}. Applying Prokhorov’s theorem, and passing to a subsequence, one can then assume that the random variables {\lambda,\zeta} converge in distribution to some limiting random variables {\lambda^{(\infty)}, \zeta^{(\infty)}} (possibly defined on a different probability space than the original variables {\lambda,\zeta}), also living almost surely inside the unit disk. Standard potential theory then gives the convergence

\displaystyle  U_\lambda(z) \rightarrow U_{\lambda^{(\infty)}}(z); \quad U_\zeta(z) \rightarrow U_{\zeta^{(\infty)}}(z) \ \ \ \ \ (3)

and

\displaystyle  s_\lambda(z) \rightarrow s_{\lambda^{(\infty)}}(z); \quad s_\zeta(z) \rightarrow s_{\zeta^{(\infty)}}(z) \ \ \ \ \ (4)

at least in the local {L^1} sense. Among other things, we then conclude from the identity (2) and some elementary inequalities that

\displaystyle  U_{\lambda^{(\infty)}}(z) = U_{\zeta^{(\infty)}}(z)

for all {|z|>1}. This turns out to have an appealing interpretation in terms of Brownian motion: if one takes two Brownian motions in the complex plane, one originating from {\lambda^{(\infty)}} and one originating from {\zeta^{(\infty)}}, then the location where these Brownian motions first exit the unit disk {\{ z: |z| \leq 1 \}} will have the same distribution. (In our paper we actually replace Brownian motion with the closely related formalism of balayage.) This turns out to connect the random variables {\lambda^{(\infty)}}, {\zeta^{(\infty)}} quite closely to each other. In particular, with this observation and some additional arguments involving both the unique continuation property for harmonic functions and Grace’s theorem (discussed in this previous post), with the latter drawn from the prior work of Dégot, we can get very good control on these distributions:

Theorem 5
  • (i) If {a = o(1)}, then {\lambda^{(\infty)}, \zeta^{(\infty)}} almost surely lie in the semicircle {\{ e^{i\theta}: \pi/2 \leq \theta \leq 3\pi/2\}} and have the same distribution.
  • (ii) If {a = 1-o(1)}, then {\lambda^{(\infty)}} is uniformly distributed on the circle {\{ z: |z|=1\}}, and {\zeta^{(\infty)}} is almost surely zero.

In case (i) (and strengthening the hypothesis {a=o(1)} to {a=o(1/\log n)} to control some technical contributions of “outlier” zeroes of {f}), we can use this information about {\lambda^{(\infty)}} and (4) to ensure that the normalised logarithmic derivative {\frac{1}{n} \frac{f'}{f} = s_\lambda} has a non-negative winding number in a certain small (but not too small) circle around the origin, which by the argument principle is inconsistent with the hypothesis that {f} has a zero at {a = o(1)} and that {f'} has no zeroes near {a}. This is how we establish Theorem 3.

Case (ii) turns out to be more delicate. This is because there are a number of “near-counterexamples” to Sendov’s conjecture that are compatible with the hypotheses and conclusion of case (ii). The simplest such example is {f(z) = z^n - 1}, where the zeroes {\lambda} of {f} are uniformly distributed amongst the {n^{th}} roots of unity (including at {a=1}), and the zeroes of {f'} are all located at the origin. In my paper I also discuss a variant of this construction, in which {f'} has zeroes mostly near the origin, but also acquires a bounded number of zeroes at various locations {\lambda_1+o(1),\dots,\lambda_m+o(1)} inside the unit disk. Specifically, we take

\displaystyle  f(z) := \left(z + \frac{c_2}{n}\right)^{n-m} P(z) - \left(a + \frac{c_2}{n}\right)^{n-m} P(a)

where {a = 1 - \frac{c_1}{n}} for some constants {0 < c_1 < c_2} and

\displaystyle  P(z) := (z-\lambda_1) \dots (z-\lambda_m).

By a perturbative analysis to locate the zeroes of {f}, one eventually would be able to arrive at a true counterexample to Sendov’s conjecture if these locations {\lambda_1,\dots,\lambda_m} were in the open lune

\displaystyle  \{ \lambda: |\lambda| < 1 < |\lambda-1| \}

and if one had the inequality

\displaystyle  c_2 - c_1 - c_2 \cos \theta + \sum_{j=1}^m \log \left|\frac{1 - \lambda_j}{e^{i\theta} - \lambda_j}\right| < 0 \ \ \ \ \ (5)

for all {0 \leq \theta \leq 2\pi}. However, if one takes the mean of this inequality in {\theta}, one arrives at the inequality

\displaystyle  c_2 - c_1 + \sum_{j=1}^m \log |1 - \lambda_j| < 0

which is incompatible with the hypotheses {c_2 > c_1} and {|\lambda_j-1| > 1}. In order to extend this argument to more general polynomials {f}, we require a stability analysis of the endpoint equation

\displaystyle  c_2 - c_1 + c_2 \cos \theta + \sum_{j=1}^m \log \left|\frac{1 - \lambda_j}{e^{i\theta} - \lambda_j}\right| = 0 \ \ \ \ \ (6)

where we now only assume the closed conditions {c_2 \geq c_1} and {|\lambda_j-1| \geq 1}. The above discussion then places all the zeros {\lambda_j} on the arc

\displaystyle  \{ \lambda: |\lambda| < 1 = |\lambda-1|\} \ \ \ \ \ (7)

and if one also takes the second Fourier coefficient of (6) one also obtains the vanishing second moment

\displaystyle  \sum_{j=1}^m \lambda_j^2 = 0.

These two conditions are incompatible with each other (except in the degenerate case when all the {\lambda_j} vanish), because all the non-zero elements {\lambda} of the arc (7) have argument in {\pm [\pi/3,\pi/2]}, so in particular their square {\lambda^2} will have negative real part. It turns out that one can adapt this argument to the more general potential counterexamples to Sendov’s conjecture (in the form of Theorem 4). The starting point is to use (1), (4), and Theorem 5(ii) to obtain good control on {f''/f'}, which one then integrates and exponentiates to get good control on {f'}, and then on a second integration one gets enough information about {f} to pin down the location of its zeroes to high accuracy. The constraint that these zeroes lie inside the unit disk then gives an inequality resembling (5), and an adaptation of the above stability analysis is then enough to conclude. The arguments here are inspired by the previous arguments of Miller, which treated the case when {a} was extremely close to {1} via a similar perturbative analysis; the main novelty is to control the error terms not in terms of the magnitude of the largest zero {\zeta} of {f'} (which is difficult to manage when {n} gets large), but rather by the variance of those zeroes, which ends up being a more tractable expression to keep track of.

Laura Cladek and I have just uploaded to the arXiv our paper “Additive energy of regular measures in one and higher dimensions, and the fractal uncertainty principle“. This paper concerns a continuous version of the notion of additive energy. Given a finite measure {\mu} on {{\bf R}^d} and a scale {r>0}, define the energy {\mathrm{E}(\mu,r)} at scale {r} to be the quantity

\displaystyle  \mathrm{E}(\mu,r) := \mu^4\left( \{ (x_1,x_2,x_3,x_4) \in ({\bf R}^d)^4: |x_1+x_2-x_3-x_4| \leq r \}\right) \ \ \ \ \ (1)

where {\mu^4} is the product measure on {({\bf R}^d)^4} formed from four copies of the measure {\mu} on {{\bf R}^d}. We will be interested in Cantor-type measures {\mu}, supported on a compact set {X \subset B(0,1)} and obeying the Ahlfors-David regularity condition

\displaystyle  \mu(B(x,r)) \leq C r^\delta

for all balls {B(x,r)} and some constants {C, \delta > 0}, as well as the matching lower bound

\displaystyle  \mu(B(x,r)) \geq C^{-1} r^\delta

when {x \in X} whenever {0 < r < 1}. One should think of {X} as a {\delta}-dimensional fractal set, and {\mu} as some vaguely self-similar measure on this set.

Note that once one fixes {x_1,x_2,x_3}, the variable {x_4} in (1) is constrained to a ball of radius {r}, hence we obtain the trivial upper bound

\displaystyle  \mathrm{E}(\mu,r) \leq C^4 r^\delta. \ \ \ \ \ (2)

If the set {X} contains a lot of “additive structure”, one can expect this bound to be basically sharp; for instance, if {\delta} is an integer, {X} is a {\delta}-dimensional unit disk, and {\mu} is Lebesgue measure on this disk, one can verify that {\mathrm{E}(\mu,r) \sim r^\delta} (where we allow implied constants to depend on {d,\delta}. However we show that if the dimension is non-integer, then one obtains a gain:

Theorem 1 If {0 < \delta < d} is not an integer, and {X, \mu} are as above, then

\displaystyle  \mathrm{E}(\mu,r) \lesssim_{C,\delta,d} r^{\delta+\beta}

for some {\beta>0} depending only on {C,\delta,d}.

Informally, this asserts that Ahlfors-David regular fractal sets of non-integer dimension cannot behave as if they are approximately closed under addition. In fact the gain {\beta} we obtain is quasipolynomial in the regularity constant {C}:

\displaystyle  \beta = \exp\left( - O_{\delta,d}( 1 + \log^{O_{\delta,d}(1)}(C) ) \right).

(We also obtain a localised version in which the regularity condition is only required to hold at scales between {r} and {1}.) Such a result was previously obtained (with more explicit values of the {O_{\delta,d}()} implied constants) in the one-dimensional case {d=1} by Dyatlov and Zahl; but in higher dimensions there does not appear to have been any results for this general class of sets {X} and measures {\mu}. In the paper of Dyatlov and Zahl it is noted that some dependence on {C} is necessary; in particular, {\beta} cannot be much better than {1/\log C}. This reflects the fact that there are fractal sets that do behave reasonably well with respect to addition (basically because they are built out of long arithmetic progressions at many scales); however, such sets are not very Ahlfors-David regular. Among other things, this result readily implies a dimension expansion result

\displaystyle  \mathrm{dim}( f( X, X) ) \geq \delta + \beta

for any non-degenerate smooth map {f: {\bf R}^d \times {\bf R}^d \rightarrow {\bf R}^d}, including the sum map {f(x,y) := x+y} and (in one dimension) the product map {f(x,y) := x \cdot y}, where the non-degeneracy condition required is that the gradients {D_x f(x,y), D_y f(x,y): {\bf R}^d \rightarrow {\bf R}^d} are invertible for every {x,y}. We refer to the paper for the formal statement.

Our higher-dimensional argument shares many features in common with that of Dyatlov and Zahl, notably a reliance on the modern tools of additive combinatorics (and specifically the Bogulybov-Ruzsa lemma of Sanders). However, in one dimension we were also able to find a completely elementary argument, avoiding any particularly advanced additive combinatorics and instead primarily exploiting the order-theoretic properties of the real line, that gave a superior value of {\beta}, namely

\displaystyle  \beta := c \min(\delta,1-\delta) C^{-25}.

One of the main reasons for obtaining such improved energy bounds is that they imply a fractal uncertainty principle in some regimes. We focus attention on the model case of obtaining such an uncertainty principle for the semiclassical Fourier transform

\displaystyle  {\mathcal F}_h f(\xi) := (2\pi h)^{-d/2} \int_{{\bf R}^d} e^{-i x \cdot \xi/h} f(x)\ dx

where {h>0} is a small parameter. If {X, \mu, \delta} are as above, and {X_h} denotes the {h}-neighbourhood of {X}, then from the Hausdorff-Young inequality one obtains the trivial bound

\displaystyle  \| 1_{X_h} {\mathcal F}_h 1_{X_h} \|_{L^2({\bf R}^d) \rightarrow L^2({\bf R}^d)} \lesssim_{C,d} h^{\max\left(\frac{d}{2}-\delta,0\right)}.

(There are also variants involving pairs of sets {X_h, Y_h}, but for simplicity we focus on the uncertainty principle for a single set {X_h}.) The fractal uncertainty principle, when it applies, asserts that one can improve this to

\displaystyle  \| 1_{X_h} {\mathcal F}_h 1_{X_h} \|_{L^2({\bf R}^d) \rightarrow L^2({\bf R}^d)} \lesssim_{C,d} h^{\max\left(\frac{d}{2}-\delta,0\right) + \beta}

for some {\beta>0}; informally, this asserts that a function and its Fourier transform cannot simultaneously be concentrated in the set {X_h} when {\delta \leq \frac{d}{2}}, and that a function cannot be concentrated on {X_h} and have its Fourier transform be of maximum size on {X_h} when {\delta \geq \frac{d}{2}}. A modification of the disk example mentioned previously shows that such a fractal uncertainty principle cannot hold if {\delta} is an integer. However, in one dimension, the fractal uncertainty principle is known to hold for all {0 < \delta < 1}. The above-mentioned results of Dyatlov and Zahl were able to establish this for {\delta} close to {1/2}, and the remaining cases {1/2 < \delta < 1} and {0 < \delta < 1/2} were later established by Bourgain-Dyatlov and Dyatlov-Jin respectively. Such uncertainty principles have applications to hyperbolic dynamics, in particular in establishing spectral gaps for certain Selberg zeta functions.

It remains a largely open problem to establish a fractal uncertainty principle in higher dimensions. Our results allow one to establish such a principle when the dimension {\delta} is close to {d/2}, and {d} is assumed to be odd (to make {d/2} a non-integer). There is also work of Han and Schlag that obtains such a principle when one of the copies of {X_h} is assumed to have a product structure. We hope to obtain further higher-dimensional fractal uncertainty principles in subsequent work.

We now sketch how our main theorem is proved. In both one dimension and higher dimensions, the main point is to get a preliminary improvement

\displaystyle  \mathrm{E}(\mu,r_0) \leq \varepsilon r_0^\delta \ \ \ \ \ (3)

over the trivial bound (2) for any small {\varepsilon>0}, provided {r_0} is sufficiently small depending on {\varepsilon, \delta, d}; one can then iterate this bound by a fairly standard “induction on scales” argument (which roughly speaking can be used to show that energies {\mathrm{E}(\mu,r)} behave somewhat multiplicatively in the scale parameter {r}) to propagate the bound to a power gain at smaller scales. We found that a particularly clean way to run the induction on scales was via use of the Gowers uniformity norm {U^2}, and particularly via a clean Fubini-type inequality

\displaystyle  \| f \|_{U^2(V \times V')} \leq \|f\|_{U^2(V; U^2(V'))}

(ultimately proven using the Gowers-Cauchy-Schwarz inequality) that allows one to “decouple” coarse and fine scale aspects of the Gowers norms (and hence of additive energies).

It remains to obtain the preliminary improvement. In one dimension this is done by identifying some “left edges” of the set {X} that supports {\mu}: intervals {[x, x+K^{-n}]} that intersect {X}, but such that a large interval {[x-K^{-n+1},x]} just to the left of this interval is disjoint from {X}. Here {K} is a large constant and {n} is a scale parameter. It is not difficult to show (using in particular the Archimedean nature of the real line) that if one has the Ahlfors-David regularity condition for some {0 < \delta < 1} then left edges exist in abundance at every scale; for instance most points of {X} would be expected to lie in quite a few of these left edges (much as most elements of, say, the ternary Cantor set {\{ \sum_{n=1}^\infty \varepsilon_n 3^{-n} \varepsilon_n \in \{0,1\} \}} would be expected to contain a lot of {0}s in their base {3} expansion). In particular, most pairs {(x_1,x_2) \in X \times X} would be expected to lie in a pair {[x,x+K^{-n}] \times [y,y+K^{-n}]} of left edges of equal length. The key point is then that if {(x_1,x_2) \in X \times X} lies in such a pair with {K^{-n} \geq r}, then there are relatively few pairs {(x_3,x_4) \in X \times X} at distance {O(K^{-n+1})} from {(x_1,x_2)} for which one has the relation {x_1+x_2 = x_3+x_4 + O(r)}, because {x_3,x_4} will both tend to be to the right of {x_1,x_2} respectively. This causes a decrement in the energy at scale {K^{-n+1}}, and by carefully combining all these energy decrements one can eventually cobble together the energy bound (3).

We were not able to make this argument work in higher dimension (though perhaps the cases {0 < \delta < 1} and {d-1 < \delta < d} might not be completely out of reach from these methods). Instead we return to additive combinatorics methods. If the claim (3) failed, then by applying the Balog-Szemeredi-Gowers theorem we can show that the set {X} has high correlation with an approximate group {H}, and hence (by the aforementioned Bogulybov-Ruzsa type theorem of Sanders, which is the main source of the quasipolynomial bounds in our final exponent) {X} will exhibit an approximate “symmetry” along some non-trivial arithmetic progression of some spacing length {r} and some diameter {R \gg r}. The {r}-neighbourhood {X_r} of {X} will then resemble the union of parallel “cylinders” of dimensions {r \times R}. If we focus on a typical {R}-ball of {X_r}, the set now resembles a Cartesian product of an interval of length {R} with a subset of a {d-1}-dimensional hyperplane, which behaves approximately like an Ahlfors-David regular set of dimension {\delta-1} (this already lets us conclude a contradiction if {\delta<1}). Note that if the original dimension {\delta} was non-integer then this new dimension {\delta-1} will also be non-integer. It is then possible to contradict the failure of (3) by appealing to a suitable induction hypothesis at one lower dimension.

Archives