You are currently browsing the monthly archive for December 2014.

In 1946, Ulam, in response to a theorem of Anning and Erdös, posed the following problem:

Problem 1 (Erdös-Ulam problem) Let {S \subset {\bf R}^2} be a set such that the distance between any two points in {S} is rational. Is it true that {S} cannot be (topologically) dense in {{\bf R}^2}?

The paper of Anning and Erdös addressed the case that all the distances between two points in {S} were integer rather than rational in the affirmative.

The Erdös-Ulam problem remains open; it was discussed recently over at Gödel’s lost letter. It is in fact likely (as we shall see below) that the set {S} in the above problem is not only forbidden to be topologically dense, but also cannot be Zariski dense either. If so, then the structure of {S} is quite restricted; it was shown by Solymosi and de Zeeuw that if {S} fails to be Zariski dense, then all but finitely many of the points of {S} must lie on a single line, or a single circle. (Conversely, it is easy to construct examples of dense subsets of a line or circle in which all distances are rational, though in the latter case the square of the radius of the circle must also be rational.)

The main tool of the Solymosi-de Zeeuw analysis was Faltings’ celebrated theorem that every algebraic curve of genus at least two contains only finitely many rational points. The purpose of this post is to observe that an affirmative answer to the full Erdös-Ulam problem similarly follows from the conjectured analogue of Falting’s theorem for surfaces, namely the following conjecture of Bombieri and Lang:

Conjecture 2 (Bombieri-Lang conjecture) Let {X} be a smooth projective irreducible algebraic surface defined over the rationals {{\bf Q}} which is of general type. Then the set {X({\bf Q})} of rational points of {X} is not Zariski dense in {X}.

In fact, the Bombieri-Lang conjecture has been made for varieties of arbitrary dimension, and for more general number fields than the rationals, but the above special case of the conjecture is the only one needed for this application. We will review what “general type” means (for smooth projective complex varieties, at least) below the fold.

The Bombieri-Lang conjecture is considered to be extremely difficult, in particular being substantially harder than Faltings’ theorem, which is itself a highly non-trivial result. So this implication should not be viewed as a practical route to resolving the Erdös-Ulam problem unconditionally; rather, it is a demonstration of the power of the Bombieri-Lang conjecture. Still, it was an instructive algebraic geometry exercise for me to carry out the details of this implication, which quickly boils down to verifying that a certain quite explicit algebraic surface is of general type (Theorem 4 below). As I am not an expert in the subject, my computations here will be rather tedious and pedestrian; it is likely that they could be made much slicker by exploiting more of the machinery of modern algebraic geometry, and I would welcome any such streamlining by actual experts in this area. (For similar reasons, there may be more typos and errors than usual in this post; corrections are welcome as always.) My calculations here are based on a similar calculation of van Luijk, who used analogous arguments to show (assuming Bombieri-Lang) that the set of perfect cuboids is not Zariski-dense in its projective parameter space.

We also remark that in a recent paper of Makhul and Shaffaf, the Bombieri-Lang conjecture (or more precisely, a weaker consequence of that conjecture) was used to show that if {S} is a subset of {{\bf R}^2} with rational distances which intersects any line in only finitely many points, then there is a uniform bound on the cardinality of the intersection of {S} with any line. I have also recently learned (private communication) that an unpublished work of Shaffaf has obtained a result similar to the one in this post, namely that the Erdös-Ulam conjecture follows from the Bombieri-Lang conjecture, plus an additional conjecture about the rational curves in a specific surface.

Let us now give the elementary reductions to the claim that a certain variety is of general type. For sake of contradiction, let {S} be a dense set such that the distance between any two points is rational. Then {S} certainly contains two points that are a rational distance apart. By applying a translation, rotation, and a (rational) dilation, we may assume that these two points are {(0,0)} and {(1,0)}. As {S} is dense, there is a third point of {S} not on the {x} axis, which after a reflection we can place in the upper half-plane; we will write it as {(a,\sqrt{b})} with {b>0}.

Given any two points {P, Q} in {S}, the quantities {|P|^2, |Q|^2, |P-Q|^2} are rational, and so by the cosine rule the dot product {P \cdot Q} is rational as well. Since {(1,0) \in S}, this implies that the {x}-component of every point {P} in {S} is rational; this in turn implies that the product of the {y}-coordinates of any two points {P,Q} in {S} is rational as well (since this differs from {P \cdot Q} by a rational number). In particular, {a} and {b} are rational, and all of the points in {S} now lie in the lattice {\{ ( x, y\sqrt{b}): x, y \in {\bf Q} \}}. (This fact appears to have first been observed in the 1988 habilitationschrift of Kemnitz.)

Now take four points {(x_j,y_j \sqrt{b})}, {j=1,\dots,4} in {S} in general position (so that the octuplet {(x_1,y_1\sqrt{b},\dots,x_4,y_4\sqrt{b})} avoids any pre-specified hypersurface in {{\bf C}^8}); this can be done if {S} is dense. (If one wished, one could re-use the three previous points {(0,0), (1,0), (a,\sqrt{b})} to be three of these four points, although this ultimately makes little difference to the analysis.) If {(x,y\sqrt{b})} is any point in {S}, then the distances {r_j} from {(x,y\sqrt{b})} to {(x_j,y_j\sqrt{b})} are rationals that obey the equations

\displaystyle (x - x_j)^2 + b (y-y_j)^2 = r_j^2

for {j=1,\dots,4}, and thus determine a rational point in the affine complex variety {V = V_{b,x_1,y_1,x_2,y_2,x_3,y_3,x_4,y_4} \subset {\bf C}^5} defined as

\displaystyle V := \{ (x,y,r_1,r_2,r_3,r_4) \in {\bf C}^6:

\displaystyle (x - x_j)^2 + b (y-y_j)^2 = r_j^2 \hbox{ for } j=1,\dots,4 \}.

By inspecting the projection {(x,y,r_1,r_2,r_3,r_4) \rightarrow (x,y)} from {V} to {{\bf C}^2}, we see that {V} is a branched cover of {{\bf C}^2}, with the generic cover having {2^4=16} points (coming from the different ways to form the square roots {r_1,r_2,r_3,r_4}); in particular, {V} is a complex affine algebraic surface, defined over the rationals. By inspecting the monodromy around the four singular base points {(x,y) = (x_i,y_i)} (which switch the sign of one of the roots {r_i}, while keeping the other three roots unchanged), we see that the variety {V} is connected away from its singular set, and thus irreducible. As {S} is topologically dense in {{\bf R}^2}, it is Zariski-dense in {{\bf C}^2}, and so {S} generates a Zariski-dense set of rational points in {V}. To solve the Erdös-Ulam problem, it thus suffices to show that

Claim 3 For any non-zero rational {b} and for rationals {x_1,y_1,x_2,y_2,x_3,y_3,x_4,y_4} in general position, the rational points of the affine surface {V = V_{b,x_1,y_1,x_2,y_2,x_3,y_3,x_4,y_4}} is not Zariski dense in {V}.

This is already very close to a claim that can be directly resolved by the Bombieri-Lang conjecture, but {V} is affine rather than projective, and also contains some singularities. The first issue is easy to deal with, by working with the projectivisation

\displaystyle \overline{V} := \{ [X,Y,Z,R_1,R_2,R_3,R_4] \in {\bf CP}^6: Q(X,Y,Z,R_1,R_2,R_3,R_4) = 0 \} \ \ \ \ \ (1)


of {V}, where {Q: {\bf C}^7 \rightarrow {\bf C}^4} is the homogeneous quadratic polynomial

\displaystyle (X,Y,Z,R_1,R_2,R_3,R_4) := (Q_j(X,Y,Z,R_1,R_2,R_3,R_4) )_{j=1}^4


\displaystyle Q_j(X,Y,Z,R_1,R_2,R_3,R_4) := (X-x_j Z)^2 + b (Y-y_jZ)^2 - R_j^2

and the projective complex space {{\bf CP}^6} is the space of all equivalence classes {[X,Y,Z,R_1,R_2,R_3,R_4]} of tuples {(X,Y,Z,R_1,R_2,R_3,R_4) \in {\bf C}^7 \backslash \{0\}} up to projective equivalence {(\lambda X, \lambda Y, \lambda Z, \lambda R_1, \lambda R_2, \lambda R_3, \lambda R_4) \sim (X,Y,Z,R_1,R_2,R_3,R_4)}. By identifying the affine point {(x,y,r_1,r_2,r_3,r_4)} with the projective point {(X,Y,1,R_1,R_2,R_3,R_4)}, we see that {\overline{V}} consists of the affine variety {V} together with the set {\{ [X,Y,0,R_1,R_2,R_3,R_4]: X^2+bY^2=R^2; R_j = \pm R_1 \hbox{ for } j=2,3,4\}}, which is the union of eight curves, each of which lies in the closure of {V}. Thus {\overline{V}} is the projective closure of {V}, and is thus a complex irreducible projective surface, defined over the rationals. As {\overline{V}} is cut out by four quadric equations in {{\bf CP}^6} and has degree sixteen (as can be seen for instance by inspecting the intersection of {\overline{V}} with a generic perturbation of a fibre over the generically defined projection {[X,Y,Z,R_1,R_2,R_3,R_4] \mapsto [X,Y,Z]}), it is also a complete intersection. To show (3), it then suffices to show that the rational points in {\overline{V}} are not Zariski dense in {\overline{V}}.

Heuristically, the reason why we expect few rational points in {\overline{V}} is as follows. First observe from the projective nature of (1) that every rational point is equivalent to an integer point. But for a septuple {(X,Y,Z,R_1,R_2,R_3,R_4)} of integers of size {O(N)}, the quantity {Q(X,Y,Z,R_1,R_2,R_3,R_4)} is an integer point of {{\bf Z}^4} of size {O(N^2)}, and so should only vanish about {O(N^{-8})} of the time. Hence the number of integer points {(X,Y,Z,R_1,R_2,R_3,R_4) \in {\bf Z}^7} of height comparable to {N} should be about

\displaystyle O(N)^7 \times O(N^{-8}) = O(N^{-1});

this is a convergent sum if {N} ranges over (say) powers of two, and so from standard probabilistic heuristics (see this previous post) we in fact expect only finitely many solutions, in the absence of any special algebraic structure (e.g. the structure of an abelian variety, or a birational reduction to a simpler variety) that could produce an unusually large number of solutions.

The Bombieri-Lang conjecture, Conjecture 2, can be viewed as a formalisation of the above heuristics (roughly speaking, it is one of the most optimistic natural conjectures one could make that is compatible with these heuristics while also being invariant under birational equivalence).

Unfortunately, {\overline{V}} contains some singular points. Being a complete intersection, this occurs when the Jacobian matrix of the map {Q: {\bf C}^7 \rightarrow {\bf C}^4} has less than full rank, or equivalently that the gradient vectors

\displaystyle \nabla Q_j = (2(X-x_j Z), 2(Y-y_j Z), -2x_j (X-x_j Z) - 2y_j (Y-y_j Z), \ \ \ \ \ (2)


\displaystyle 0, \dots, 0, -2R_j, 0, \dots, 0)

for {j=1,\dots,4} are linearly dependent, where the {-2R_j} is in the coordinate position associated to {R_j}. One way in which this can occur is if one of the gradient vectors {\nabla Q_j} vanish identically. This occurs at precisely {4 \times 2^3 = 32} points, when {[X,Y,Z]} is equal to {[x_j,y_j,1]} for some {j=1,\dots,4}, and one has {R_k = \pm ( (x_j - x_k)^2 + b (y_j - y_k)^2 )^{1/2}} for all {k=1,\dots,4} (so in particular {R_j=0}). Let us refer to these as the obvious singularities; they arise from the geometrically evident fact that the distance function {(x,y\sqrt{b}) \mapsto \sqrt{(x-x_j)^2 + b(y-y_j)^2}} is singular at {(x_j,y_j\sqrt{b})}.

The other way in which could occur is if a non-trivial linear combination of at least two of the gradient vectors vanishes. From (2), this can only occur if {R_j=R_k=0} for some distinct {j,k}, which from (1) implies that

\displaystyle (X - x_j Z) = \pm \sqrt{b} i (Y - y_j Z) \ \ \ \ \ (3)



\displaystyle (X - x_k Z) = \pm \sqrt{b} i (Y - y_k Z) \ \ \ \ \ (4)


for two choices of sign {\pm}. If the signs are equal, then (as {x_j, y_j, x_k, y_k} are in general position) this implies that {Z=0}, and then we have the singular point

\displaystyle [X,Y,Z,R_1,R_2,R_3,R_4] = [\pm \sqrt{b} i, 1, 0, 0, 0, 0, 0]. \ \ \ \ \ (5)


If the non-trivial linear combination involved three or more gradient vectors, then by the pigeonhole principle at least two of the signs involved must be equal, and so the only singular points are (5). So the only remaining possibility is when we have two gradient vectors {\nabla Q_j, \nabla Q_k} that are parallel but non-zero, with the signs in (3), (4) opposing. But then (as {x_j,y_j,x_k,y_k} are in general position) the vectors {(X-x_j Z, Y-y_j Z), (X-x_k Z, Y-y_k Z)} are non-zero and non-parallel to each other, a contradiction. Thus, outside of the {32} obvious singular points mentioned earlier, the only other singular points are the two points (5).

We will shortly show that the {32} obvious singularities are ordinary double points; the surface {\overline{V}} near any of these points is analytically equivalent to an ordinary cone {\{ (x,y,z) \in {\bf C}^3: z^2 = x^2 + y^2 \}} near the origin, which is a cone over a smooth conic curve {\{ (x,y) \in {\bf C}^2: x^2+y^2=1\}}. The two non-obvious singularities (5) are slightly more complicated than ordinary double points, they are elliptic singularities, which approximately resemble a cone over an elliptic curve. (As far as I can tell, this resemblance is exact in the category of real smooth manifolds, but not in the category of algebraic varieties.) If one blows up each of the point singularities of {\overline{V}} separately, no further singularities are created, and one obtains a smooth projective surface {X} (using the Segre embedding as necessary to embed {X} back into projective space, rather than in a product of projective spaces). Away from the singularities, the rational points of {\overline{V}} lift up to rational points of {X}. Assuming the Bombieri-Lang conjecture, we thus are able to answer the Erdös-Ulam problem in the affirmative once we establish

Theorem 4 The blowup {X} of {\overline{V}} is of general type.

This will be done below the fold, by the pedestrian device of explicitly constructing global differential forms on {X}; I will also be working from a complex analysis viewpoint rather than an algebraic geometry viewpoint as I am more comfortable with the former approach. (As mentioned above, though, there may well be a quicker way to establish this result by using more sophisticated machinery.)

I thank Mark Green and David Gieseker for helpful conversations (and a crash course in varieties of general type!).

Remark 5 The above argument shows in fact (assuming Bombieri-Lang) that sets {S \subset {\bf R}^2} with all distances rational cannot be Zariski-dense, and thus (by Solymosi-de Zeeuw) must lie on a single line or circle with only finitely many exceptions. Assuming a stronger version of Bombieri-Lang involving a general number field {K}, we obtain a similar conclusion with “rational” replaced by “lying in {K}” (one has to extend the Solymosi-de Zeeuw analysis to more general number fields, but this should be routine, using the analogue of Faltings’ theorem for such number fields).

Read the rest of this entry »

Kevin Ford, Ben Green, Sergei Konyagin, James Maynard, and I have just uploaded to the arXiv our paper “Long gaps between primes“. This is a followup work to our two previous papers (discussed in this previous post), in which we had simultaneously shown that the maximal gap

\displaystyle  G(X) := \sup_{p_n, p_{n+1} \leq X} p_{n+1}-p_n

between primes up to {X} exhibited a lower bound of the shape

\displaystyle  G(X) \geq f(X) \log X \frac{\log \log X \log\log\log\log X}{(\log\log\log X)^2} \ \ \ \ \ (1)

for some function {f(X)} that went to infinity as {X \rightarrow \infty}; this improved upon previous work of Rankin and other authors, who established the same bound but with {f(X)} replaced by a constant. (Again, see the previous post for a more detailed discussion.)

In our previous papers, we did not specify a particular growth rate for {f(X)}. In my paper with Kevin, Ben, and Sergei, there was a good reason for this: our argument relied (amongst other things) on the inverse conjecture on the Gowers norms, as well as the Siegel-Walfisz theorem, and the known proofs of both results both have ineffective constants, rendering our growth function {f(X)} similarly ineffective. Maynard’s approach ostensibly also relies on the Siegel-Walfisz theorem, but (as shown in another recent paper of his) can be made quite effective, even when tracking {k}-tuples of fairly large size (about {\log^c x} for some small {c}). If one carefully makes all the bounds in Maynard’s argument quantitative, one eventually ends up with a growth rate {f(X)} of shape

\displaystyle  f(X) \asymp \frac{\log \log \log X}{\log\log\log\log X}, \ \ \ \ \ (2)

thus leading to a bound

\displaystyle  G(X) \gg \log X \frac{\log \log X}{\log\log\log X}

on the gaps between primes for large {X}; this is an unpublished calculation of James’.

In this paper we make a further refinement of this calculation to obtain a growth rate

\displaystyle  f(X) \asymp \log \log \log X \ \ \ \ \ (3)

leading to a bound of the form

\displaystyle  G(X) \geq c \log X \frac{\log \log X \log\log\log\log X}{\log\log\log X} \ \ \ \ \ (4)

for large {X} and some small constant {c}. Furthermore, this appears to be the limit of current technology (in particular, falling short of Cramer’s conjecture that {G(X)} is comparable to {\log^2 X}); in the spirit of Erdös’ original prize on this problem, I would like to offer 10,000 USD for anyone who can show (in a refereed publication, of course) that the constant {c} here can be replaced by an arbitrarily large constant {C}.

The reason for the growth rate (3) is as follows. After following the sieving process discussed in the previous post, the problem comes down to something like the following: can one sieve out all (or almost all) of the primes in {[x,y]} by removing one residue class modulo {p} for all primes {p} in (say) {[x/4,x/2]}? Very roughly speaking, if one can solve this problem with {y = g(x) x}, then one can obtain a growth rate on {f(X)} of the shape {f(X) \sim g(\log X)}. (This is an oversimplification, as one actually has to sieve out a random subset of the primes, rather than all the primes in {[x,y]}, but never mind this detail for now.)

Using the quantitative “dense clusters of primes” machinery of Maynard, one can find lots of {k}-tuples in {[x,y]} which contain at least {\gg \log k} primes, for {k} as large as {\log^c x} or so (so that {\log k} is about {\log\log x}). By considering {k}-tuples in arithmetic progression, this means that one can find lots of residue classes modulo a given prime {p} in {[x/4,x/2]} that capture about {\log\log x} primes. In principle, this means that union of all these residue classes can cover about {\frac{x}{\log x} \log\log x} primes, allowing one to take {g(x)} as large as {\log\log x}, which corresponds to (3). However, there is a catch: the residue classes for different primes {p} may collide with each other, reducing the efficiency of the covering. In our previous papers on the subject, we selected the residue classes randomly, which meant that we had to insert an additional logarithmic safety margin in expected number of times each prime would be shifted out by one of the residue classes, in order to guarantee that we would (with high probability) sift out most of the primes. This additional safety margin is ultimately responsible for the {\log\log\log\log X} loss in (2).

The main innovation of this paper, beyond detailing James’ unpublished calculations, is to use ideas from the literature on efficient hypergraph covering, to avoid the need for a logarithmic safety margin. The hypergraph covering problem, roughly speaking, is to try to cover a set of {n} vertices using as few “edges” from a given hypergraph {H} as possible. If each edge has {m} vertices, then one certainly needs at least {n/m} edges to cover all the vertices, and the question is to see if one can come close to attaining this bound given some reasonable uniform distribution hypotheses on the hypergraph {H}. As before, random methods tend to require something like {\frac{n}{m} \log r} edges before one expects to cover, say {1-1/r} of the vertices.

However, it turns out (under reasonable hypotheses on {H}) to eliminate this logarithmic loss, by using what is now known as the “semi-random method” or the “Rödl nibble”. The idea is to randomly select a small number of edges (a first “nibble”) – small enough that the edges are unlikely to overlap much with each other, thus obtaining maximal efficiency. Then, one pauses to remove all the edges from {H} that intersect edges from this first nibble, so that all remaining edges will not overlap with the existing edges. One then randomly selects another small number of edges (a second “nibble”), and repeats this process until enough nibbles are taken to cover most of the vertices. Remarkably, it turns out that under some reasonable assumptions on the hypergraph {H}, one can maintain control on the uniform distribution of the edges throughout the nibbling process, and obtain an efficient hypergraph covering. This strategy was carried out in detail in an influential paper of Pippenger and Spencer.

In our setup, the vertices are the primes in {[x,y]}, and the edges are the intersection of the primes with various residue classes. (Technically, we have to work with a family of hypergraphs indexed by a prime {p}, rather than a single hypergraph, but let me ignore this minor technical detail.) The semi-random method would in principle eliminate the logarithmic loss and recover the bound (3). However, there is a catch: the analysis of Pippenger and Spencer relies heavily on the assumption that the hypergraph is uniform, that is to say all edges have the same size. In our context, this requirement would mean that each residue class captures exactly the same number of primes, which is not the case; we only control the number of primes in an average sense, but we were unable to obtain any concentration of measure to come close to verifying this hypothesis. And indeed, the semi-random method, when applied naively, does not work well with edges of variable size – the problem is that edges of large size are much more likely to be eliminated after each nibble than edges of small size, since they have many more vertices that could overlap with the previous nibbles. Since the large edges are clearly the more useful ones for the covering problem than small ones, this bias towards eliminating large edges significantly reduces the efficiency of the semi-random method (and also greatly complicates the analysis of that method).

Our solution to this is to iteratively reweight the probability distribution on edges after each nibble to compensate for this bias effect, giving larger edges a greater weight than smaller edges. It turns out that there is a natural way to do this reweighting that allows one to repeat the Pippenger-Spencer analysis in the presence of edges of variable size, and this ultimately allows us to recover the full growth rate (3).

To go beyond (3), one either has to find a lot of residue classes that can capture significantly more than {\log\log x} primes of size {x} (which is the limit of the multidimensional Selberg sieve of Maynard and myself), or else one has to find a very different method to produce large gaps between primes than the Erdös-Rankin method, which is the method used in all previous work on the subject.

It turns out that the arguments in this paper can be combined with the Maier matrix method to also produce chains of consecutive large prime gaps whose size is of the order of (4); three of us (Kevin, James, and myself) will detail this in a future paper. (A similar combination was also recently observed in connection with our earlier result (1) by Pintz, but there are some additional technical wrinkles required to recover the full gain of (3) for the chains of large gaps problem.)

In Notes 2, the Riemann zeta function {\zeta} (and more generally, the Dirichlet {L}-functions {L(\cdot,\chi)}) were extended meromorphically into the region {\{ s: \hbox{Re}(s) > 0 \}} in and to the right of the critical strip. This is a sufficient amount of meromorphic continuation for many applications in analytic number theory, such as establishing the prime number theorem and its variants. The zeroes of the zeta function in the critical strip {\{ s: 0 < \hbox{Re}(s) < 1 \}} are known as the non-trivial zeroes of {\zeta}, and thanks to the truncated explicit formulae developed in Notes 2, they control the asymptotic distribution of the primes (up to small errors).

The {\zeta} function obeys the trivial functional equation

\displaystyle \zeta(\overline{s}) = \overline{\zeta(s)} \ \ \ \ \ (1)


for all {s} in its domain of definition. Indeed, as {\zeta(s)} is real-valued when {s} is real, the function {\zeta(s) - \overline{\zeta(\overline{s})}} vanishes on the real line and is also meromorphic, and hence vanishes everywhere. Similarly one has the functional equation

\displaystyle \overline{L(s, \chi)} = L(\overline{s}, \overline{\chi}). \ \ \ \ \ (2)


From these equations we see that the zeroes of the zeta function are symmetric across the real axis, and the zeroes of {L(\cdot,\chi)} are the reflection of the zeroes of {L(\cdot,\overline{\chi})} across this axis.

It is a remarkable fact that these functions obey an additional, and more non-trivial, functional equation, this time establishing a symmetry across the critical line {\{ s: \hbox{Re}(s) = \frac{1}{2} \}} rather than the real axis. One consequence of this symmetry is that the zeta function and {L}-functions may be extended meromorphically to the entire complex plane. For the zeta function, the functional equation was discovered by Riemann, and reads as follows:

Theorem 1 (Functional equation for the Riemann zeta function) The Riemann zeta function {\zeta} extends meromorphically to the entire complex plane, with a simple pole at {s=1} and no other poles. Furthermore, one has the functional equation

\displaystyle \zeta(s) = \alpha(s) \zeta(1-s) \ \ \ \ \ (3)


or equivalently

\displaystyle \zeta(1-s) = \alpha(1-s) \zeta(s) \ \ \ \ \ (4)


for all complex {s} other than {s=0,1}, where {\alpha} is the function

\displaystyle \alpha(s) := 2^s \pi^{s-1} \sin( \frac{\pi s}{2}) \Gamma(1-s). \ \ \ \ \ (5)


Here {\cos(z) := \frac{e^z + e^{-z}}{2}}, {\sin(z) := \frac{e^{-z}-e^{-z}}{2i}} are the complex-analytic extensions of the classical trigionometric functions {\cos(x), \sin(x)}, and {\Gamma} is the Gamma function, whose definition and properties we review below the fold.

The functional equation can be placed in a more symmetric form as follows:

Corollary 2 (Functional equation for the Riemann xi function) The Riemann xi function

\displaystyle \xi(s) := \frac{1}{2} s(s-1) \pi^{-s/2} \Gamma(\frac{s}{2}) \zeta(s) \ \ \ \ \ (6)


is analytic on the entire complex plane {{\bf C}} (after removing all removable singularities), and obeys the functional equations

\displaystyle \xi(\overline{s}) = \overline{\xi(s)}


\displaystyle \xi(s) = \xi(1-s). \ \ \ \ \ (7)


In particular, the zeroes of {\xi} consist precisely of the non-trivial zeroes of {\zeta}, and are symmetric about both the real axis and the critical line. Also, {\xi} is real-valued on the critical line and on the real axis.

Corollary 2 is an easy consequence of Theorem 1 together with the duplication theorem for the Gamma function, and the fact that {\zeta} has no zeroes to the right of the critical strip, and is left as an exercise to the reader (Exercise 19). The functional equation in Theorem 1 has many proofs, but most of them are related in on way or another to the Poisson summation formula

\displaystyle \sum_n f(n) = \sum_m \hat f(2\pi m) \ \ \ \ \ (8)


(Theorem 34 from Supplement 2, at least in the case when {f} is twice continuously differentiable and compactly supported), which can be viewed as a Fourier-analytic link between the coarse-scale distribution of the integers and the fine-scale distribution of the integers. Indeed, there is a quick heuristic proof of the functional equation that comes from formally applying the Poisson summation formula to the function {1_{x>0} \frac{1}{x^s}}, and noting that the functions {x \mapsto \frac{1}{x^s}} and {\xi \mapsto \frac{1}{\xi^{1-s}}} are formally Fourier transforms of each other, up to some Gamma function factors, as well as some trigonometric factors arising from the distinction between the real line and the half-line. Such a heuristic proof can indeed be made rigorous, and we do so below the fold, while also providing Riemann’s two classical proofs of the functional equation.

From the functional equation (and the poles of the Gamma function), one can see that {\zeta} has trivial zeroes at the negative even integers {-2,-4,-6,\dots}, in addition to the non-trivial zeroes in the critical strip. More generally, the following table summarises the zeroes and poles of the various special functions appearing in the functional equation, after they have been meromorphically extended to the entire complex plane, and with zeroes classified as “non-trivial” or “trivial” depending on whether they lie in the critical strip or not. (Exponential functions such as {2^{s-1}} or {\pi^{-s}} have no zeroes or poles, and will be ignored in this table; the zeroes and poles of rational functions such as {s(s-1)} are self-evident and will also not be displayed here.)

Function Non-trivial zeroes Trivial zeroes Poles
{\zeta(s)} Yes {-2,-4,-6,\dots} {1}
{\zeta(1-s)} Yes {3,5,\dots} {0}
{\sin(\pi s/2)} No Even integers No
{\cos(\pi s/2)} No Odd integers No
{\sin(\pi s)} No Integers No
{\Gamma(s)} No No {0,-1,-2,\dots}
{\Gamma(s/2)} No No {0,-2,-4,\dots}
{\Gamma(1-s)} No No {1,2,3,\dots}
{\Gamma((1-s)/2)} No No {2,4,6,\dots}
{\xi(s)} Yes No No

Among other things, this table indicates that the Gamma and trigonometric factors in the functional equation are tied to the trivial zeroes and poles of zeta, but have no direct bearing on the distribution of the non-trivial zeroes, which is the most important feature of the zeta function for the purposes of analytic number theory, beyond the fact that they are symmetric about the real axis and critical line. In particular, the Riemann hypothesis is not going to be resolved just from further analysis of the Gamma function!

The zeta function computes the “global” sum {\sum_n \frac{1}{n^s}}, with {n} ranging all the way from {1} to infinity. However, by some Fourier-analytic (or complex-analytic) manipulation, it is possible to use the zeta function to also control more “localised” sums, such as {\sum_n \frac{1}{n^s} \psi(\log n - \log N)} for some {N \gg 1} and some smooth compactly supported function {\psi: {\bf R} \rightarrow {\bf C}}. It turns out that the functional equation (3) for the zeta function localises to this context, giving an approximate functional equation which roughly speaking takes the form

\displaystyle \sum_n \frac{1}{n^s} \psi( \log n - \log N ) \approx \alpha(s) \sum_m \frac{1}{m^{1-s}} \psi( \log M - \log m )

whenever {s=\sigma+it} and {NM = \frac{|t|}{2\pi}}; see Theorem 38 below for a precise formulation of this equation. Unsurprisingly, this form of the functional equation is also very closely related to the Poisson summation formula (8), indeed it is essentially a special case of that formula (or more precisely, of the van der Corput {B}-process). This useful identity relates long smoothed sums of {\frac{1}{n^s}} to short smoothed sums of {\frac{1}{m^{1-s}}} (or vice versa), and can thus be used to shorten exponential sums involving terms such as {\frac{1}{n^s}}, which is useful when obtaining some of the more advanced estimates on the Riemann zeta function.

We will give two other basic uses of the functional equation. The first is to get a good count (as opposed to merely an upper bound) on the density of zeroes in the critical strip, establishing the Riemann-von Mangoldt formula that the number {N(T)} of zeroes of imaginary part between {0} and {T} is {\frac{T}{2\pi} \log \frac{T}{2\pi} - \frac{T}{2\pi} + O(\log T)} for large {T}. The other is to obtain untruncated versions of the explicit formula from Notes 2, giving a remarkable exact formula for sums involving the von Mangoldt function in terms of zeroes of the Riemann zeta function. These results are not strictly necessary for most of the material in the rest of the course, but certainly help to clarify the nature of the Riemann zeta function and its relation to the primes.

In view of the material in previous notes, it should not be surprising that there are analogues of all of the above theory for Dirichlet {L}-functions {L(\cdot,\chi)}. We will restrict attention to primitive characters {\chi}, since the {L}-function for imprimitive characters merely differs from the {L}-function of the associated primitive factor by a finite Euler product; indeed, if {\chi = \chi' \chi_0} for some principal {\chi_0} whose modulus {q_0} is coprime to that of {\chi'}, then

\displaystyle L(s,\chi) = L(s,\chi') \prod_{p|q_0} (1 - \frac{1}{p^s}) \ \ \ \ \ (9)


(cf. equation (45) of Notes 2).

The main new feature is that the Poisson summation formula needs to be “twisted” by a Dirichlet character {\chi}, and this boils down to the problem of understanding the finite (additive) Fourier transform of a Dirichlet character. This is achieved by the classical theory of Gauss sums, which we review below the fold. There is one new wrinkle; the value of {\chi(-1) \in \{-1,+1\}} plays a role in the functional equation. More precisely, we have

Theorem 3 (Functional equation for {L}-functions) Let {\chi} be a primitive character of modulus {q} with {q>1}. Then {L(s,\chi)} extends to an entire function on the complex plane, with

\displaystyle L(s,\chi) = \varepsilon(\chi) 2^s \pi^{s-1} q^{1/2-s} \sin(\frac{\pi}{2}(s+\kappa)) \Gamma(1-s) L(1-s,\overline{\chi})

or equivalently

\displaystyle L(1-s,\overline{\chi}) = \varepsilon(\overline{\chi}) 2^{1-s} \pi^{-s} q^{s-1/2} \sin(\frac{\pi}{2}(1-s+\kappa)) \Gamma(s) L(s,\chi)

for all {s}, where {\kappa} is equal to {0} in the even case {\chi(-1)=+1} and {1} in the odd case {\chi(-1)=-1}, and

\displaystyle \varepsilon(\chi) := \frac{\tau(\chi)}{i^\kappa \sqrt{q}} \ \ \ \ \ (10)


where {\tau(\chi)} is the Gauss sum

\displaystyle \tau(\chi) := \sum_{n \in {\bf Z}/q{\bf Z}} \chi(n) e(n/q). \ \ \ \ \ (11)


and {e(x) := e^{2\pi ix}}, with the convention that the {q}-periodic function {n \mapsto e(n/q)} is also (by abuse of notation) applied to {n} in the cyclic group {{\bf Z}/q{\bf Z}}.

From this functional equation and (2) we see that, as with the Riemann zeta function, the non-trivial zeroes of {L(s,\chi)} (defined as the zeroes within the critical strip {\{ s: 0 < \hbox{Re}(s) < 1 \}} are symmetric around the critical line (and, if {\chi} is real, are also symmetric around the real axis). In addition, {L(s,\chi)} acquires trivial zeroes at the negative even integers and at zero if {\chi(-1)=1}, and at the negative odd integers if {\chi(-1)=-1}. For imprimitive {\chi}, we see from (9) that {L(s,\chi)} also acquires some additional trivial zeroes on the left edge of the critical strip.

There is also a symmetric version of this equation, analogous to Corollary 2:

Corollary 4 Let {\chi,q,\varepsilon(\chi)} be as above, and set

\displaystyle \xi(s,\chi) := (q/\pi)^{(s+\kappa)/2} \Gamma((s+\kappa)/2) L(s,\chi),

then {\xi(\cdot,\chi)} is entire with {\xi(1-s,\chi) = \varepsilon(\chi) \xi(s,\chi)}.

For further detail on the functional equation and its implications, I recommend the classic text of Titchmarsh or the text of Davenport.

Read the rest of this entry »

In Notes 1, we approached multiplicative number theory (the study of multiplicative functions {f: {\bf N} \rightarrow {\bf C}} and their relatives) via elementary methods, in which attention was primarily focused on obtaining asymptotic control on summatory functions {\sum_{n \leq x} f(n)} and logarithmic sums {\sum_{n \leq x} \frac{f(n)}{n}}. Now we turn to the complex approach to multiplicative number theory, in which the focus is instead on obtaining various types of control on the Dirichlet series {{\mathcal D} f}, defined (at least for {s} of sufficiently large real part) by the formula

\displaystyle {\mathcal D} f(s) := \sum_n \frac{f(n)}{n^s}.

These series also made an appearance in the elementary approach to the subject, but only for real {s} that were larger than {1}. But now we will exploit the freedom to extend the variable {s} to the complex domain; this gives enough freedom (in principle, at least) to recover control of elementary sums such as {\sum_{n\leq x} f(n)} or {\sum_{n\leq x} \frac{f(n)}{n}} from control on the Dirichlet series. Crucially, for many key functions {f} of number-theoretic interest, the Dirichlet series {{\mathcal D} f} can be analytically (or at least meromorphically) continued to the left of the line {\{ s: \hbox{Re}(s) = 1 \}}. The zeroes and poles of the resulting meromorphic continuations of {{\mathcal D} f} (and of related functions) then turn out to control the asymptotic behaviour of the elementary sums of {f}; the more one knows about the former, the more one knows about the latter. In particular, knowledge of where the zeroes of the Riemann zeta function {\zeta} are located can give very precise information about the distribution of the primes, by means of a fundamental relationship known as the explicit formula. There are many ways of phrasing this explicit formula (both in exact and in approximate forms), but they are all trying to formalise an approximation to the von Mangoldt function {\Lambda} (and hence to the primes) of the form

\displaystyle \Lambda(n) \approx 1 - \sum_\rho n^{\rho-1} \ \ \ \ \ (1)


where the sum is over zeroes {\rho} (counting multiplicity) of the Riemann zeta function {\zeta = {\mathcal D} 1} (with the sum often restricted so that {\rho} has large real part and bounded imaginary part), and the approximation is in a suitable weak sense, so that

\displaystyle \sum_n \Lambda(n) g(n) \approx \int_0^\infty g(y)\ dy - \sum_\rho \int_0^\infty g(y) y^{\rho-1}\ dy \ \ \ \ \ (2)


for suitable “test functions” {g} (which in practice are restricted to be fairly smooth and slowly varying, with the precise amount of restriction dependent on the amount of truncation in the sum over zeroes one wishes to take). Among other things, such approximations can be used to rigorously establish the prime number theorem

\displaystyle \sum_{n \leq x} \Lambda(n) = x + o(x) \ \ \ \ \ (3)


as {x \rightarrow \infty}, with the size of the error term {o(x)} closely tied to the location of the zeroes {\rho} of the Riemann zeta function.

The explicit formula (1) (or any of its more rigorous forms) is closely tied to the counterpart approximation

\displaystyle -\frac{\zeta'}{\zeta}(s) \approx \frac{1}{s-1} - \sum_\rho \frac{1}{s-\rho} \ \ \ \ \ (4)


for the Dirichlet series {{\mathcal D} \Lambda = -\frac{\zeta'}{\zeta}} of the von Mangoldt function; note that (4) is formally the special case of (2) when {g(n) = n^{-s}}. Such approximations come from the general theory of local factorisations of meromorphic functions, as discussed in Supplement 2; the passage from (4) to (2) is accomplished by such tools as the residue theorem and the Fourier inversion formula, which were also covered in Supplement 2. The relative ease of uncovering the Fourier-like duality between primes and zeroes (sometimes referred to poetically as the “music of the primes”) is one of the major advantages of the complex-analytic approach to multiplicative number theory; this important duality tends to be rather obscured in the other approaches to the subject, although it can still in principle be discernible with sufficient effort.

More generally, one has an explicit formula

\displaystyle \Lambda(n) \chi(n) \approx - \sum_\rho n^{\rho-1} \ \ \ \ \ (5)


for any (non-principal) Dirichlet character {\chi}, where {\rho} now ranges over the zeroes of the associated Dirichlet {L}-function {L(s,\chi) := {\mathcal D} \chi(s)}; we view this formula as a “twist” of (1) by the Dirichlet character {\chi}. The explicit formula (5), proven similarly (in any of its rigorous forms) to (1), is important in establishing the prime number theorem in arithmetic progressions, which asserts that

\displaystyle \sum_{n \leq x: n = a\ (q)} \Lambda(n) = \frac{x}{\phi(q)} + o(x) \ \ \ \ \ (6)


as {x \rightarrow \infty}, whenever {a\ (q)} is a fixed primitive residue class. Again, the size of the error term {o(x)} here is closely tied to the location of the zeroes of the Dirichlet {L}-function, with particular importance given to whether there is a zero very close to {s=1} (such a zero is known as an exceptional zero or Siegel zero).

While any information on the behaviour of zeta functions or {L}-functions is in principle welcome for the purposes of analytic number theory, some regions of the complex plane are more important than others in this regard, due to the differing weights assigned to each zero in the explicit formula. Roughly speaking, in descending order of importance, the most crucial regions on which knowledge of these functions is useful are

  1. The region on or near the point {s=1}.
  2. The region on or near the right edge {\{ 1+it: t \in {\bf R} \}} of the critical strip {\{ s: 0 \leq \hbox{Re}(s) \leq 1 \}}.
  3. The right half {\{ s: \frac{1}{2} < \hbox{Re}(s) < 1 \}} of the critical strip.
  4. The region on or near the critical line {\{ \frac{1}{2} + it: t \in {\bf R} \}} that bisects the critical strip.
  5. Everywhere else.

For instance:

  1. We will shortly show that the Riemann zeta function {\zeta} has a simple pole at {s=1} with residue {1}, which is already sufficient to recover much of the classical theorems of Mertens discussed in the previous set of notes, as well as results on mean values of multiplicative functions such as the divisor function {\tau}. For Dirichlet {L}-functions, the behaviour is instead controlled by the quantity {L(1,\chi)} discussed in Notes 1, which is in turn closely tied to the existence and location of a Siegel zero.
  2. The zeta function is also known to have no zeroes on the right edge {\{1+it: t \in {\bf R}\}} of the critical strip, which is sufficient to prove (and is in fact equivalent to) the prime number theorem. Any enlargement of the zero-free region for {\zeta} into the critical strip leads to improved error terms in that theorem, with larger zero-free regions leading to stronger error estimates. Similarly for {L}-functions and the prime number theorem in arithmetic progressions.
  3. The (as yet unproven) Riemann hypothesis prohibits {\zeta} from having any zeroes within the right half {\{ s: \frac{1}{2} < \hbox{Re}(s) < 1 \}} of the critical strip, and gives very good control on the number of primes in intervals, even when the intervals are relatively short compared to the size of the entries. Even without assuming the Riemann hypothesis, zero density estimates in this region are available that give some partial control of this form. Similarly for {L}-functions, primes in short arithmetic progressions, and the generalised Riemann hypothesis.
  4. Assuming the Riemann hypothesis, further distributional information about the zeroes on the critical line (such as Montgomery’s pair correlation conjecture, or the more general GUE hypothesis) can give finer information about the error terms in the prime number theorem in short intervals, as well as other arithmetic information. Again, one has analogues for {L}-functions and primes in short arithmetic progressions.
  5. The functional equation of the zeta function describes the behaviour of {\zeta} to the left of the critical line, in terms of the behaviour to the right of the critical line. This is useful for building a “global” picture of the structure of the zeta function, and for improving a number of estimates about that function, but (in the absence of unproven conjectures such as the Riemann hypothesis or the pair correlation conjecture) it turns out that many of the basic analytic number theory results using the zeta function can be established without relying on this equation. Similarly for {L}-functions.

Remark 1 If one takes an “adelic” viewpoint, one can unite the Riemann zeta function {\zeta(\sigma+it) = \sum_n n^{-\sigma-it}} and all of the {L}-functions {L(\sigma+it,\chi) = \sum_n \chi(n) n^{-\sigma-it}} for various Dirichlet characters {\chi} into a single object, viewing {n \mapsto \chi(n) n^{-it}} as a general multiplicative character on the adeles; thus the imaginary coordinate {t} and the Dirichlet character {\chi} are really the Archimedean and non-Archimedean components respectively of a single adelic frequency parameter. This viewpoint was famously developed in Tate’s thesis, which among other things helps to clarify the nature of the functional equation, as discussed in this previous post. We will not pursue the adelic viewpoint further in these notes, but it does supply a “high-level” explanation for why so much of the theory of the Riemann zeta function extends to the Dirichlet {L}-functions. (The non-Archimedean character {\chi(n)} and the Archimedean character {n^{it}} behave similarly from an algebraic point of view, but not so much from an analytic point of view; as such, the adelic viewpoint is well suited for algebraic tasks (such as establishing the functional equation), but not for analytic tasks (such as establishing a zero-free region).)

Roughly speaking, the elementary multiplicative number theory from Notes 1 corresponds to the information one can extract from the complex-analytic method in region 1 of the above hierarchy, while the more advanced elementary number theory used to prove the prime number theorem (and which we will not cover in full detail in these notes) corresponds to what one can extract from regions 1 and 2.

As a consequence of this hierarchy of importance, information about the {\zeta} function away from the critical strip, such as Euler’s identity

\displaystyle \zeta(2) = \frac{\pi^2}{6}

or equivalently

\displaystyle 1 + \frac{1}{2^2} + \frac{1}{3^2} + \dots = \frac{\pi^2}{6}

or the infamous identity

\displaystyle \zeta(-1) = -\frac{1}{12},

which is often presented (slightly misleadingly, if one’s conventions for divergent summation are not made explicit) as

\displaystyle 1 + 2 + 3 + \dots = -\frac{1}{12},

are of relatively little direct importance in analytic prime number theory, although they are still of interest for some other, non-number-theoretic, applications. (The quantity {\zeta(2)} does play a minor role as a normalising factor in some asymptotics, see e.g. Exercise 28 from Notes 1, but its precise value is usually not of major importance.) In contrast, the value {L(1,\chi)} of an {L}-function at {s=1} turns out to be extremely important in analytic number theory, with many results in this subject relying ultimately on a non-trivial lower-bound on this quantity coming from Siegel’s theorem, discussed below the fold.

For a more in-depth treatment of the topics in this set of notes, see Davenport’s “Multiplicative number theory“.

Read the rest of this entry »

We will shortly turn to the complex-analytic approach to multiplicative number theory, which relies on the basic properties of complex analytic functions. In this supplement to the main notes, we quickly review the portions of complex analysis that we will be using in this course. We will not attempt a comprehensive review of this subject; for instance, we will completely neglect the conformal geometry or Riemann surface aspect of complex analysis, and we will also avoid using the various boundary convergence theorems for Taylor series or Dirichlet series (the latter type of result is traditionally utilised in multiplicative number theory, but I personally find them a little unintuitive to use, and will instead rely on a slightly different set of complex-analytic tools). We will also focus on the “local” structure of complex analytic functions, in particular adopting the philosophy that such functions behave locally like complex polynomials; the classical “global” theory of entire functions, while traditionally used in the theory of the Riemann zeta function, will be downplayed in these notes. On the other hand, we will play up the relationship between complex analysis and Fourier analysis, as we will incline to using the latter tool over the former in some of the subsequent material. (In the traditional approach to the subject, the Mellin transform is used in place of the Fourier transform, but we will not emphasise the role of the Mellin transform here.)

We begin by recalling the notion of a holomorphic function, which will later be shown to be essentially synonymous with that of a complex analytic function.

Definition 1 (Holomorphic function) Let {\Omega} be an open subset of {{\bf C}}, and let {f: \Omega \rightarrow {\bf C}} be a function. If {z \in {\bf C}}, we say that {f} is complex differentiable at {z} if the limit

\displaystyle  f'(z) := \lim_{h \rightarrow 0; h \in {\bf C} \backslash \{0\}} \frac{f(z+h)-f(z)}{h}

exists, in which case we refer to {f'(z)} as the (complex) derivative of {f} at {z}. If {f} is differentiable at every point {z} of {\Omega}, and the derivative {f': \Omega \rightarrow {\bf C}} is continuous, we say that {f} is holomorphic on {\Omega}.

Exercise 2 Show that a function {f: \Omega \rightarrow {\bf C}} is holomorphic if and only if the two-variable function {(x,y) \mapsto f(x+iy)} is continuously differentiable on {\{ (x,y) \in {\bf R}^2: x+iy \in \Omega\}} and obeys the Cauchy-Riemann equation

\displaystyle  \frac{\partial}{\partial x} f(x+iy) = \frac{1}{i} \frac{\partial}{\partial y} f(x+iy). \ \ \ \ \ (1)

Basic examples of holomorphic functions include complex polynomials

\displaystyle  P(z) = a_n z^n + \dots + a_1 z + a_0

as well as the complex exponential function

\displaystyle  \exp(z) := \sum_{n=0}^\infty \frac{z^n}{n!}

which are holomorphic on the entire complex plane {{\bf C}} (i.e., they are entire functions). The sum or product of two holomorphic functions is again holomorphic; the quotient of two holomorphic functions is holomorphic so long as the denominator is non-zero. Finally, the composition of two holomorphic functions is holomorphic wherever the composition is defined.

Exercise 3

  • (i) Establish Euler’s formula

    \displaystyle  \exp(x+iy) = e^x (\cos y + i \sin y)

    for all {x,y \in {\bf R}}. (Hint: it is a bit tricky to do this starting from the trigonometric definitions of sine and cosine; I recommend either using the Taylor series formulations of these functions instead, or alternatively relying on the ordinary differential equations obeyed by sine and cosine.)

  • (ii) Show that every non-zero complex number {z} has a complex logarithm {\log(z)} such that {\exp(\log(z))=z}, and that this logarithm is unique up to integer multiples of {2\pi i}.
  • (iii) Show that there exists a unique principal branch {\hbox{Log}(z)} of the complex logarithm in the region {{\bf C} \backslash (-\infty,0]}, defined by requiring {\hbox{Log}(z)} to be a logarithm of {z} with imaginary part between {-\pi} and {\pi}. Show that this principal branch is holomorphic with derivative {1/z}.

In real analysis, we have the fundamental theorem of calculus, which asserts that

\displaystyle  \int_a^b F'(t)\ dt = F(b) - F(a)

whenever {[a,b]} is a real interval and {F: [a,b] \rightarrow {\bf R}} is a continuously differentiable function. The complex analogue of this fact is that

\displaystyle  \int_\gamma F'(z)\ dz = F(\gamma(1)) - F(\gamma(0)) \ \ \ \ \ (2)

whenever {F: \Omega \rightarrow {\bf C}} is a holomorphic function, and {\gamma: [0,1] \rightarrow \Omega} is a contour in {\Omega}, by which we mean a piecewise continuously differentiable function, and the contour integral {\int_\gamma f(z)\ dz} for a continuous function {f} is defined via change of variables as

\displaystyle  \int_\gamma f(z)\ dz := \int_0^1 f(\gamma(t)) \gamma'(t)\ dt.

The complex fundamental theorem of calculus (2) follows easily from the real fundamental theorem and the chain rule.

In real analysis, we have the rather trivial fact that the integral of a continuous function on a closed contour is always zero:

\displaystyle  \int_a^b f(t)\ dt + \int_b^a f(t)\ dt = 0.

In complex analysis, the analogous fact is significantly more powerful, and is known as Cauchy’s theorem:

Theorem 4 (Cauchy’s theorem) Let {f: \Omega \rightarrow {\bf C}} be a holomorphic function in a simply connected open set {\Omega}, and let {\gamma: [0,1] \rightarrow \Omega} be a closed contour in {\Omega} (thus {\gamma(1)=\gamma(0)}). Then {\int_\gamma f(z)\ dz = 0}.

Exercise 5 Use Stokes’ theorem to give a proof of Cauchy’s theorem.

A useful reformulation of Cauchy’s theorem is that of contour shifting: if {f: \Omega \rightarrow {\bf C}} is a holomorphic function on a open set {\Omega}, and {\gamma, \tilde \gamma} are two contours in an open set {\Omega} with {\gamma(0)=\tilde \gamma(0)} and {\gamma(1) = \tilde \gamma(1)}, such that {\gamma} can be continuously deformed into {\tilde \gamma}, then {\int_\gamma f(z)\ dz = \int_{\tilde \gamma} f(z)\ dz}. A basic application of contour shifting is the Cauchy integral formula:

Theorem 6 (Cauchy integral formula) Let {f: \Omega \rightarrow {\bf C}} be a holomorphic function in a simply connected open set {\Omega}, and let {\gamma: [0,1] \rightarrow \Omega} be a closed contour which is simple (thus {\gamma} does not traverse any point more than once, with the exception of the endpoint {\gamma(0)=\gamma(1)} that is traversed twice), and which encloses a bounded region {U} in the anticlockwise direction. Then for any {z_0 \in U}, one has

\displaystyle  \int_\gamma \frac{f(z)}{z-z_0}\ dz= 2\pi i f(z_0).

Proof: Let {\varepsilon > 0} be a sufficiently small quantity. By contour shifting, one can replace the contour {\gamma} by the sum (concatenation) of three contours: a contour {\rho} from {\gamma(0)} to {z_0+\varepsilon}, a contour {C_\varepsilon} traversing the circle {\{z: |z-z_0|=\varepsilon\}} once anticlockwise, and the reversal {-\rho} of the contour {\rho} that goes from {z_0+\varepsilon} to {\gamma_0}. The contributions of the contours {\rho, -\rho} cancel each other, thus

\displaystyle \int_\gamma \frac{f(z)}{z-z_0}\ dz = \int_{C_\varepsilon} \frac{f(z)}{z-z_0}\ dz.

By a change of variables, the right-hand side can be expanded as

\displaystyle  2\pi i \int_0^1 f(z_0 + \varepsilon e^{2\pi i t})\ dt.

Sending {\varepsilon \rightarrow 0}, we obtain the claim. \Box

The Cauchy integral formula has many consequences. Specialising to the case when {\gamma} traverses a circle {\{ z: |z-z_0|=r\}} around {z_0}, we conclude the mean value property

\displaystyle  f(z_0) = \int_0^1 f(z_0 + re^{2\pi i t})\ dt \ \ \ \ \ (3)

whenever {f} is holomorphic in a neighbourhood of the disk {\{ z: |z-z_0| \leq r \}}. In a similar spirit, we have the maximum principle for holomorphic functions:

Lemma 7 (Maximum principle) Let {\Omega} be a simply connected open set, and let {\gamma} be a simple closed contour in {\Omega} enclosing a bounded region {U} anti-clockwise. Let {f: \Omega \rightarrow {\bf C}} be a holomorphic function. If we have the bound {|f(z)| \leq M} for all {z} on the contour {\gamma}, then we also have the bound {|f(z_0)| \leq M} for all {z_0 \in U}.

Proof: We use an argument of Landau. Fix {z_0 \in U}. From the Cauchy integral formula and the triangle inequality we have the bound

\displaystyle  |f(z_0)| \leq C_{z_0,\gamma} M

for some constant {C_{z_0,\gamma} > 0} depending on {z_0} and {\gamma}. This ostensibly looks like a weaker bound than what we want, but we can miraculously make the constant {C_{z_0,\gamma}} disappear by the “tensor power trick“. Namely, observe that if {f} is a holomorphic function bounded in magnitude by {M} on {\gamma}, and {n} is a natural number, then {f^n} is a holomorphic function bounded in magnitude by {M^n} on {\gamma}. Applying the preceding argument with {f, M} replaced by {f^n, M^n} we conclude that

\displaystyle  |f(z_0)|^n \leq C_{z_0,\gamma} M^n

and hence

\displaystyle  |f(z_0)| \leq C_{z_0,\gamma}^{1/n} M.

Sending {n \rightarrow \infty}, we obtain the claim. \Box

Another basic application of the integral formula is

Corollary 8 Every holomorphic function {f: \Omega \rightarrow {\bf C}} is complex analytic, thus it has a convergent Taylor series around every point {z_0} in the domain. In particular, holomorphic functions are smooth, and the derivative of a holomorphic function is again holomorphic.

Conversely, it is easy to see that complex analytic functions are holomorphic. Thus, the terms “complex analytic” and “holomorphic” are synonymous, at least when working on open domains. (On a non-open set {\Omega}, saying that {f} is analytic on {\Omega} is equivalent to asserting that {f} extends to a holomorphic function of an open neighbourhood of {\Omega}.) This is in marked contrast to real analysis, in which a function can be continuously differentiable, or even smooth, without being real analytic.

Proof: By translation, we may suppose that {z_0=0}. Let {C_r} be a a contour traversing the circle {\{ z: |z|=r\}} that is contained in the domain {\Omega}, then by the Cauchy integral formula one has

\displaystyle  f(z) = \frac{1}{2\pi i} \int_{C_r} \frac{f(w)}{w-z}\ dw

for all {z} in the disk {\{ z: |z| < r \}}. As {f} is continuously differentiable (and hence continuous) on {C_r}, it is bounded. From the geometric series formula

\displaystyle  \frac{1}{w-z} = \frac{1}{w} + \frac{1}{w^2} z + \frac{1}{w^3} z^2 + \dots

and dominated convergence, we conclude that

\displaystyle  f(z) = \sum_{n=0}^\infty (\frac{1}{2\pi i} \int_{C_r} \frac{f(w)}{w^{n+1}}\ dw) z^n

with the right-hand side an absolutely convergent series for {|z| < r}, and the claim follows. \Box

Exercise 9 Establish the generalised Cauchy integral formulae

\displaystyle  f^{(k)}(z_0) = \frac{k!}{2\pi i} \int_\gamma \frac{f(z)}{(z-z_0)^{k+1}}\ dz

for any non-negative integer {k}, where {f^{(k)}} is the {k}-fold complex derivative of {f}.

This in turn leads to a converse to Cauchy’s theorem, known as Morera’s theorem:

Corollary 10 (Morera’s theorem) Let {f: \Omega \rightarrow {\bf C}} be a continuous function on an open set {\Omega} with the property that {\int_\gamma f(z)\ dz = 0} for all closed contours {\gamma: [0,1] \rightarrow \Omega}. Then {f} is holomorphic.

Proof: We can of course assume {\Omega} to be non-empty and connected (hence path-connected). Fix a point {z_0 \in \Omega}, and define a “primitive” {F: \Omega \rightarrow {\bf C}} of {f} by defining {F(z_1) = \int_\gamma f(z)\ dz}, with {\gamma: [0,1] \rightarrow \Omega} being any contour from {z_0} to {z_1} (this is well defined by hypothesis). By mimicking the proof of the real fundamental theorem of calculus, we see that {F} is holomorphic with {F'=f}, and the claim now follows from Corollary 8. \Box

An important consequence of Morera’s theorem for us is

Corollary 11 (Locally uniform limit of holomorphic functions is holomorphic) Let {f_n: \Omega \rightarrow {\bf C}} be holomorphic functions on an open set {\Omega} which converge locally uniformly to a function {f: \Omega \rightarrow {\bf C}}. Then {f} is also holomorphic on {\Omega}.

Proof: By working locally we may assume that {\Omega} is a ball, and in particular simply connected. By Cauchy’s theorem, {\int_\gamma f_n(z)\ dz = 0} for all closed contours {\gamma} in {\Omega}. By local uniform convergence, this implies that {\int_\gamma f(z)\ dz = 0} for all such contours, and the claim then follows from Morera’s theorem. \Box

Now we study the zeroes of complex analytic functions. If a complex analytic function {f} vanishes at a point {z_0}, but is not identically zero in a neighbourhood of that point, then by Taylor expansion we see that {f} factors in a sufficiently small neighbourhood of {z_0} as

\displaystyle  f(z) = (z-z_0)^n g(z_0) \ \ \ \ \ (4)

for some natural number {n} (which we call the order or multiplicity of the zero at {f}) and some function {g} that is complex analytic and non-zero near {z_0}; this generalises the factor theorem for polynomials. In particular, the zero {z_0} is isolated if {f} does not vanish identically near {z_0}. We conclude that if {\Omega} is connected and {f} vanishes on a neighbourhood of some point {z_0} in {\Omega}, then it must vanish on all of {\Omega} (since the maximal connected neighbourhood of {z_0} in {\Omega} on which {f} vanishes cannot have any boundary point in {\Omega}). This implies unique continuation of analytic functions: if two complex analytic functions on {\Omega} agree on a non-empty open set, then they agree everywhere. In particular, if a complex analytic function does not vanish everywhere, then all of its zeroes are isolated, so in particular it has only finitely many zeroes on any given compact set.

Recall that a rational function is a function {f} which is a quotient {g/h} of two polynomials (at least outside of the set where {h} vanishes). Analogously, let us define a meromorphic function on an open set {\Omega} to be a function {f: \Omega \backslash S \rightarrow {\bf C}} defined outside of a discrete subset {S} of {\Omega} (the singularities of {f}), which is locally the quotient {g/h} of holomorphic functions, in the sense that for every {z_0 \in \Omega}, one has {f=g/h} in a neighbourhood of {z_0} excluding {S}, with {g, h} holomorphic near {z_0} and with {h} non-vanishing outside of {S}. If {z_0 \in S} and {g} has a zero of equal or higher order than {h} at {z_0}, then the singularity is removable and one can extend the meromorphic function holomorphically across {z_0} (by the holomorphic factor theorem (4)); otherwise, the singularity is non-removable and is known as a pole, whose order is equal to the difference between the order of {h} and the order of {g} at {z_0}. (If one wished, one could extend meromorphic functions to the poles by embedding {{\bf C}} in the Riemann sphere {{\bf C} \cup \{\infty\}} and mapping each pole to {\infty}, but we will not do so here. One could also consider non-meromorphic functions with essential singularities at various points, but we will have no need to analyse such singularities in this course.) If the order of a pole or zero is one, we say that it is simple; if it is two, we say it is double; and so forth.

Exercise 12 Show that the space of meromorphic functions on a non-empty open set {\Omega}, quotiented by almost everywhere equivalence, forms a field.

By quotienting two Taylor series, we see that if a meromorphic function {f} has a pole of order {n} at some point {z_0}, then it has a Laurent expansion

\displaystyle  f = \sum_{m=-n}^\infty a_m (z-z_0)^m,

absolutely convergent in a neighbourhood of {z_0} excluding {z_0} itself, and with {a_{-n}} non-zero. The Laurent coefficient {a_{-1}} has a special significance, and is called the residue of the meromorphic function {f} at {z_0}, which we will denote as {\hbox{Res}(f;z_0)}. The importance of this coefficient comes from the following significant generalisation of the Cauchy integral formula, known as the residue theorem:

Exercise 13 (Residue theorem) Let {f} be a meromorphic function on a simply connected domain {\Omega}, and let {\gamma} be a closed contour in {\Omega} enclosing a bounded region {U} anticlockwise, and avoiding all the singularities of {f}. Show that

\displaystyle  \int_\gamma f(z)\ dz = 2\pi i \sum_\rho \hbox{Res}(f;\rho)

where {\rho} is summed over all the poles of {f} that lie in {U}.

The residue theorem is particularly useful when applied to logarithmic derivatives {f'/f} of meromorphic functions {f}, because the residue is of a specific form:

Exercise 14 Let {f} be a meromorphic function on an open set {\Omega} that does not vanish identically. Show that the only poles of {f'/f} are simple poles (poles of order {1}), occurring at the poles and zeroes of {f} (after all removable singularities have been removed). Furthermore, the residue of {f'/f} at a pole {z_0} is an integer, equal to the order of zero of {f} if {f} has a zero at {z_0}, or equal to negative the order of pole at {f} if {f} has a pole at {z_0}.

Remark 15 The fact that residues of logarithmic derivatives of meromorphic functions are automatically integers is a remarkable feature of the complex analytic approach to multiplicative number theory, which is difficult (though not entirely impossible) to duplicate in other approaches to the subject. Here is a sample application of this integrality, which is challenging to reproduce by non-complex-analytic means: if {f} is meromorphic near {z_0}, and one has the bound {|\frac{f'}{f}(z_0+t)| \leq \frac{0.9}{t} + O(1)} as {t \rightarrow 0^+}, then {\frac{f'}{f}} must in fact stay bounded near {z_0}, because the only integer of magnitude less than {0.9} is zero.

Read the rest of this entry »

Van Vu and I have just uploaded to the arXiv our paper “Random matrices have simple spectrum“. Recall that an {n \times n} Hermitian matrix is said to have simple eigenvalues if all of its {n} eigenvalues are distinct. This is a very typical property of matrices to have: for instance, as discussed in this previous post, in the space of all {n \times n} Hermitian matrices, the space of matrices without all eigenvalues simple has codimension three, and for real symmetric cases this space has codimension two. In particular, given any random matrix ensemble of Hermitian or real symmetric matrices with an absolutely continuous distribution, we conclude that random matrices drawn from this ensemble will almost surely have simple eigenvalues.

For discrete random matrix ensembles, though, the above argument breaks down, even though general universality heuristics predict that the statistics of discrete ensembles should behave similarly to those of continuous ensembles. A model case here is the adjacency matrix {M_n} of an Erdös-Rényi graph – a graph on {n} vertices in which any pair of vertices has an independent probability {p} of being in the graph. For the purposes of this paper one should view {p} as fixed, e.g. {p=1/2}, while {n} is an asymptotic parameter going to infinity. In this context, our main result is the following (answering a question of Babai):

Theorem 1 With probability {1-o(1)}, {M_n} has simple eigenvalues.

Our argument works for more general Wigner-type matrix ensembles, but for sake of illustration we will stick with the Erdös-Renyi case. Previous work on local universality for such matrix models (e.g. the work of Erdos, Knowles, Yau, and Yin) was able to show that any individual eigenvalue gap {\lambda_{i+1}(M)-\lambda_i(M)} did not vanish with probability {1-o(1)} (in fact {1-O(n^{-c})} for some absolute constant {c>0}), but because there are {n} different gaps that one has to simultaneously ensure to be non-zero, this did not give Theorem 1 as one is forced to apply the union bound.

Our argument in fact gives simplicity of the spectrum with probability {1-O(n^{-A})} for any fixed {A}; in a subsequent paper we also show that it gives a quantitative lower bound on the eigenvalue gaps (analogous to how many results on the singularity probability of random matrices can be upgraded to a bound on the least singular value).

The basic idea of argument can be sketched as follows. Suppose that {M_n} has a repeated eigenvalue {\lambda}. We split

\displaystyle M_n = \begin{pmatrix} M_{n-1} & X \\ X^T & 0 \end{pmatrix}

for a random {n-1 \times n-1} minor {M_{n-1}} and a random sign vector {X}; crucially, {X} and {M_{n-1}} are independent. If {M_n} has a repeated eigenvalue {\lambda}, then by the Cauchy interlacing law, {M_{n-1}} also has an eigenvalue {\lambda}. We now write down the eigenvector equation for {M_n} at {\lambda}:

\displaystyle \begin{pmatrix} M_{n-1} & X \\ X^T & 0 \end{pmatrix} \begin{pmatrix} v \\ a \end{pmatrix} = \lambda \begin{pmatrix} v \\ a \end{pmatrix}.

Extracting the top {n-1} coefficients, we obtain

\displaystyle (M_{n-1} - \lambda) v + a X = 0.

If we let {w} be the {\lambda}-eigenvector of {M_{n-1}}, then by taking inner products with {w} we conclude that

\displaystyle a (w \cdot X) = 0;

we typically expect {a} to be non-zero, in which case we arrive at

\displaystyle w \cdot X = 0.

In other words, in order for {M_n} to have a repeated eigenvalue, the top right column {X} of {M_n} has to be orthogonal to an eigenvector {w} of the minor {M_{n-1}}. Note that {X} and {w} are going to be independent (once we specify which eigenvector of {M_{n-1}} to take as {w}). On the other hand, thanks to inverse Littlewood-Offord theory (specifically, we use an inverse Littlewood-Offord theorem of Nguyen and Vu), we know that the vector {X} is unlikely to be orthogonal to any given vector {w} independent of {X}, unless the coefficients of {w} are extremely special (specifically, that most of them lie in a generalised arithmetic progression). The main remaining difficulty is then to show that eigenvectors of a random matrix are typically not of this special form, and this relies on a conditioning argument originally used by Komlós to bound the singularity probability of a random sign matrix. (Basically, if an eigenvector has this special form, then one can use a fraction of the rows and columns of the random matrix to determine the eigenvector completely, while still preserving enough randomness in the remaining portion of the matrix so that this vector will in fact not be an eigenvector with high probability.)


RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.