You are currently browsing the monthly archive for December 2014.
In 1946, Ulam, in response to a theorem of Anning and Erdös, posed the following problem:
Problem 1 (Erdös-Ulam problem) Let
be a set such that the distance between any two points in
is rational. Is it true that
cannot be (topologically) dense in
?
The paper of Anning and Erdös addressed the case that all the distances between two points in were integer rather than rational in the affirmative.
The Erdös-Ulam problem remains open; it was discussed recently over at Gödel’s lost letter. It is in fact likely (as we shall see below) that the set in the above problem is not only forbidden to be topologically dense, but also cannot be Zariski dense either. If so, then the structure of
is quite restricted; it was shown by Solymosi and de Zeeuw that if
fails to be Zariski dense, then all but finitely many of the points of
must lie on a single line, or a single circle. (Conversely, it is easy to construct examples of dense subsets of a line or circle in which all distances are rational, though in the latter case the square of the radius of the circle must also be rational.)
The main tool of the Solymosi-de Zeeuw analysis was Faltings’ celebrated theorem that every algebraic curve of genus at least two contains only finitely many rational points. The purpose of this post is to observe that an affirmative answer to the full Erdös-Ulam problem similarly follows from the conjectured analogue of Falting’s theorem for surfaces, namely the following conjecture of Bombieri and Lang:
Conjecture 2 (Bombieri-Lang conjecture) Let
be a smooth projective irreducible algebraic surface defined over the rationals
which is of general type. Then the set
of rational points of
is not Zariski dense in
.
In fact, the Bombieri-Lang conjecture has been made for varieties of arbitrary dimension, and for more general number fields than the rationals, but the above special case of the conjecture is the only one needed for this application. We will review what “general type” means (for smooth projective complex varieties, at least) below the fold.
The Bombieri-Lang conjecture is considered to be extremely difficult, in particular being substantially harder than Faltings’ theorem, which is itself a highly non-trivial result. So this implication should not be viewed as a practical route to resolving the Erdös-Ulam problem unconditionally; rather, it is a demonstration of the power of the Bombieri-Lang conjecture. Still, it was an instructive algebraic geometry exercise for me to carry out the details of this implication, which quickly boils down to verifying that a certain quite explicit algebraic surface is of general type (Theorem 4 below). As I am not an expert in the subject, my computations here will be rather tedious and pedestrian; it is likely that they could be made much slicker by exploiting more of the machinery of modern algebraic geometry, and I would welcome any such streamlining by actual experts in this area. (For similar reasons, there may be more typos and errors than usual in this post; corrections are welcome as always.) My calculations here are based on a similar calculation of van Luijk, who used analogous arguments to show (assuming Bombieri-Lang) that the set of perfect cuboids is not Zariski-dense in its projective parameter space.
We also remark that in a recent paper of Makhul and Shaffaf, the Bombieri-Lang conjecture (or more precisely, a weaker consequence of that conjecture) was used to show that if is a subset of
with rational distances which intersects any line in only finitely many points, then there is a uniform bound on the cardinality of the intersection of
with any line. I have also recently learned (private communication) that an unpublished work of Shaffaf has obtained a result similar to the one in this post, namely that the Erdös-Ulam conjecture follows from the Bombieri-Lang conjecture, plus an additional conjecture about the rational curves in a specific surface.
Let us now give the elementary reductions to the claim that a certain variety is of general type. For sake of contradiction, let be a dense set such that the distance between any two points is rational. Then
certainly contains two points that are a rational distance apart. By applying a translation, rotation, and a (rational) dilation, we may assume that these two points are
and
. As
is dense, there is a third point of
not on the
axis, which after a reflection we can place in the upper half-plane; we will write it as
with
.
Given any two points in
, the quantities
are rational, and so by the cosine rule the dot product
is rational as well. Since
, this implies that the
-component of every point
in
is rational; this in turn implies that the product of the
-coordinates of any two points
in
is rational as well (since this differs from
by a rational number). In particular,
and
are rational, and all of the points in
now lie in the lattice
. (This fact appears to have first been observed in the 1988 habilitationschrift of Kemnitz.)
Now take four points ,
in
in general position (so that the octuplet
avoids any pre-specified hypersurface in
); this can be done if
is dense. (If one wished, one could re-use the three previous points
to be three of these four points, although this ultimately makes little difference to the analysis.) If
is any point in
, then the distances
from
to
are rationals that obey the equations
for , and thus determine a rational point in the affine complex variety
defined as
By inspecting the projection from
to
, we see that
is a branched cover of
, with the generic cover having
points (coming from the different ways to form the square roots
); in particular,
is a complex affine algebraic surface, defined over the rationals. By inspecting the monodromy around the four singular base points
(which switch the sign of one of the roots
, while keeping the other three roots unchanged), we see that the variety
is connected away from its singular set, and thus irreducible. As
is topologically dense in
, it is Zariski-dense in
, and so
generates a Zariski-dense set of rational points in
. To solve the Erdös-Ulam problem, it thus suffices to show that
Claim 3 For any non-zero rational
and for rationals
in general position, the rational points of the affine surface
is not Zariski dense in
.
This is already very close to a claim that can be directly resolved by the Bombieri-Lang conjecture, but is affine rather than projective, and also contains some singularities. The first issue is easy to deal with, by working with the projectivisation
of , where
is the homogeneous quadratic polynomial
with
and the projective complex space is the space of all equivalence classes
of tuples
up to projective equivalence
. By identifying the affine point
with the projective point
, we see that
consists of the affine variety
together with the set
, which is the union of eight curves, each of which lies in the closure of
. Thus
is the projective closure of
, and is thus a complex irreducible projective surface, defined over the rationals. As
is cut out by four quadric equations in
and has degree sixteen (as can be seen for instance by inspecting the intersection of
with a generic perturbation of a fibre over the generically defined projection
), it is also a complete intersection. To show (3), it then suffices to show that the rational points in
are not Zariski dense in
.
Heuristically, the reason why we expect few rational points in is as follows. First observe from the projective nature of (1) that every rational point is equivalent to an integer point. But for a septuple
of integers of size
, the quantity
is an integer point of
of size
, and so should only vanish about
of the time. Hence the number of integer points
of height comparable to
should be about
this is a convergent sum if ranges over (say) powers of two, and so from standard probabilistic heuristics (see this previous post) we in fact expect only finitely many solutions, in the absence of any special algebraic structure (e.g. the structure of an abelian variety, or a birational reduction to a simpler variety) that could produce an unusually large number of solutions.
The Bombieri-Lang conjecture, Conjecture 2, can be viewed as a formalisation of the above heuristics (roughly speaking, it is one of the most optimistic natural conjectures one could make that is compatible with these heuristics while also being invariant under birational equivalence).
Unfortunately, contains some singular points. Being a complete intersection, this occurs when the Jacobian matrix of the map
has less than full rank, or equivalently that the gradient vectors
for are linearly dependent, where the
is in the coordinate position associated to
. One way in which this can occur is if one of the gradient vectors
vanish identically. This occurs at precisely
points, when
is equal to
for some
, and one has
for all
(so in particular
). Let us refer to these as the obvious singularities; they arise from the geometrically evident fact that the distance function
is singular at
.
The other way in which could occur is if a non-trivial linear combination of at least two of the gradient vectors vanishes. From (2), this can only occur if for some distinct
, which from (1) implies that
for two choices of sign . If the signs are equal, then (as
are in general position) this implies that
, and then we have the singular point
If the non-trivial linear combination involved three or more gradient vectors, then by the pigeonhole principle at least two of the signs involved must be equal, and so the only singular points are (5). So the only remaining possibility is when we have two gradient vectors that are parallel but non-zero, with the signs in (3), (4) opposing. But then (as
are in general position) the vectors
are non-zero and non-parallel to each other, a contradiction. Thus, outside of the
obvious singular points mentioned earlier, the only other singular points are the two points (5).
We will shortly show that the obvious singularities are ordinary double points; the surface
near any of these points is analytically equivalent to an ordinary cone
near the origin, which is a cone over a smooth conic curve
. The two non-obvious singularities (5) are slightly more complicated than ordinary double points, they are elliptic singularities, which approximately resemble a cone over an elliptic curve. (As far as I can tell, this resemblance is exact in the category of real smooth manifolds, but not in the category of algebraic varieties.) If one blows up each of the point singularities of
separately, no further singularities are created, and one obtains a smooth projective surface
(using the Segre embedding as necessary to embed
back into projective space, rather than in a product of projective spaces). Away from the singularities, the rational points of
lift up to rational points of
. Assuming the Bombieri-Lang conjecture, we thus are able to answer the Erdös-Ulam problem in the affirmative once we establish
This will be done below the fold, by the pedestrian device of explicitly constructing global differential forms on ; I will also be working from a complex analysis viewpoint rather than an algebraic geometry viewpoint as I am more comfortable with the former approach. (As mentioned above, though, there may well be a quicker way to establish this result by using more sophisticated machinery.)
I thank Mark Green and David Gieseker for helpful conversations (and a crash course in varieties of general type!).
Remark 5 The above argument shows in fact (assuming Bombieri-Lang) that sets
with all distances rational cannot be Zariski-dense, and thus (by Solymosi-de Zeeuw) must lie on a single line or circle with only finitely many exceptions. Assuming a stronger version of Bombieri-Lang involving a general number field
, we obtain a similar conclusion with “rational” replaced by “lying in
” (one has to extend the Solymosi-de Zeeuw analysis to more general number fields, but this should be routine, using the analogue of Faltings’ theorem for such number fields).
Kevin Ford, Ben Green, Sergei Konyagin, James Maynard, and I have just uploaded to the arXiv our paper “Long gaps between primes“. This is a followup work to our two previous papers (discussed in this previous post), in which we had simultaneously shown that the maximal gap
between primes up to exhibited a lower bound of the shape
for some function that went to infinity as
; this improved upon previous work of Rankin and other authors, who established the same bound but with
replaced by a constant. (Again, see the previous post for a more detailed discussion.)
In our previous papers, we did not specify a particular growth rate for . In my paper with Kevin, Ben, and Sergei, there was a good reason for this: our argument relied (amongst other things) on the inverse conjecture on the Gowers norms, as well as the Siegel-Walfisz theorem, and the known proofs of both results both have ineffective constants, rendering our growth function
similarly ineffective. Maynard’s approach ostensibly also relies on the Siegel-Walfisz theorem, but (as shown in another recent paper of his) can be made quite effective, even when tracking
-tuples of fairly large size (about
for some small
). If one carefully makes all the bounds in Maynard’s argument quantitative, one eventually ends up with a growth rate
of shape
thus leading to a bound
on the gaps between primes for large ; this is an unpublished calculation of James’.
In this paper we make a further refinement of this calculation to obtain a growth rate
leading to a bound of the form
for large and some small constant
. Furthermore, this appears to be the limit of current technology (in particular, falling short of Cramer’s conjecture that
is comparable to
); in the spirit of Erdös’ original prize on this problem, I would like to offer 10,000 USD for anyone who can show (in a refereed publication, of course) that the constant
here can be replaced by an arbitrarily large constant
.
The reason for the growth rate (3) is as follows. After following the sieving process discussed in the previous post, the problem comes down to something like the following: can one sieve out all (or almost all) of the primes in by removing one residue class modulo
for all primes
in (say)
? Very roughly speaking, if one can solve this problem with
, then one can obtain a growth rate on
of the shape
. (This is an oversimplification, as one actually has to sieve out a random subset of the primes, rather than all the primes in
, but never mind this detail for now.)
Using the quantitative “dense clusters of primes” machinery of Maynard, one can find lots of -tuples in
which contain at least
primes, for
as large as
or so (so that
is about
). By considering
-tuples in arithmetic progression, this means that one can find lots of residue classes modulo a given prime
in
that capture about
primes. In principle, this means that union of all these residue classes can cover about
primes, allowing one to take
as large as
, which corresponds to (3). However, there is a catch: the residue classes for different primes
may collide with each other, reducing the efficiency of the covering. In our previous papers on the subject, we selected the residue classes randomly, which meant that we had to insert an additional logarithmic safety margin in expected number of times each prime would be shifted out by one of the residue classes, in order to guarantee that we would (with high probability) sift out most of the primes. This additional safety margin is ultimately responsible for the
loss in (2).
The main innovation of this paper, beyond detailing James’ unpublished calculations, is to use ideas from the literature on efficient hypergraph covering, to avoid the need for a logarithmic safety margin. The hypergraph covering problem, roughly speaking, is to try to cover a set of vertices using as few “edges” from a given hypergraph
as possible. If each edge has
vertices, then one certainly needs at least
edges to cover all the vertices, and the question is to see if one can come close to attaining this bound given some reasonable uniform distribution hypotheses on the hypergraph
. As before, random methods tend to require something like
edges before one expects to cover, say
of the vertices.
However, it turns out (under reasonable hypotheses on ) to eliminate this logarithmic loss, by using what is now known as the “semi-random method” or the “Rödl nibble”. The idea is to randomly select a small number of edges (a first “nibble”) – small enough that the edges are unlikely to overlap much with each other, thus obtaining maximal efficiency. Then, one pauses to remove all the edges from
that intersect edges from this first nibble, so that all remaining edges will not overlap with the existing edges. One then randomly selects another small number of edges (a second “nibble”), and repeats this process until enough nibbles are taken to cover most of the vertices. Remarkably, it turns out that under some reasonable assumptions on the hypergraph
, one can maintain control on the uniform distribution of the edges throughout the nibbling process, and obtain an efficient hypergraph covering. This strategy was carried out in detail in an influential paper of Pippenger and Spencer.
In our setup, the vertices are the primes in , and the edges are the intersection of the primes with various residue classes. (Technically, we have to work with a family of hypergraphs indexed by a prime
, rather than a single hypergraph, but let me ignore this minor technical detail.) The semi-random method would in principle eliminate the logarithmic loss and recover the bound (3). However, there is a catch: the analysis of Pippenger and Spencer relies heavily on the assumption that the hypergraph is uniform, that is to say all edges have the same size. In our context, this requirement would mean that each residue class captures exactly the same number of primes, which is not the case; we only control the number of primes in an average sense, but we were unable to obtain any concentration of measure to come close to verifying this hypothesis. And indeed, the semi-random method, when applied naively, does not work well with edges of variable size – the problem is that edges of large size are much more likely to be eliminated after each nibble than edges of small size, since they have many more vertices that could overlap with the previous nibbles. Since the large edges are clearly the more useful ones for the covering problem than small ones, this bias towards eliminating large edges significantly reduces the efficiency of the semi-random method (and also greatly complicates the analysis of that method).
Our solution to this is to iteratively reweight the probability distribution on edges after each nibble to compensate for this bias effect, giving larger edges a greater weight than smaller edges. It turns out that there is a natural way to do this reweighting that allows one to repeat the Pippenger-Spencer analysis in the presence of edges of variable size, and this ultimately allows us to recover the full growth rate (3).
To go beyond (3), one either has to find a lot of residue classes that can capture significantly more than primes of size
(which is the limit of the multidimensional Selberg sieve of Maynard and myself), or else one has to find a very different method to produce large gaps between primes than the Erdös-Rankin method, which is the method used in all previous work on the subject.
It turns out that the arguments in this paper can be combined with the Maier matrix method to also produce chains of consecutive large prime gaps whose size is of the order of (4); three of us (Kevin, James, and myself) will detail this in a future paper. (A similar combination was also recently observed in connection with our earlier result (1) by Pintz, but there are some additional technical wrinkles required to recover the full gain of (3) for the chains of large gaps problem.)
In Notes 2, the Riemann zeta function (and more generally, the Dirichlet
-functions
) were extended meromorphically into the region
in and to the right of the critical strip. This is a sufficient amount of meromorphic continuation for many applications in analytic number theory, such as establishing the prime number theorem and its variants. The zeroes of the zeta function in the critical strip
are known as the non-trivial zeroes of
, and thanks to the truncated explicit formulae developed in Notes 2, they control the asymptotic distribution of the primes (up to small errors).
The function obeys the trivial functional equation
for all in its domain of definition. Indeed, as
is real-valued when
is real, the function
vanishes on the real line and is also meromorphic, and hence vanishes everywhere. Similarly one has the functional equation
From these equations we see that the zeroes of the zeta function are symmetric across the real axis, and the zeroes of are the reflection of the zeroes of
across this axis.
It is a remarkable fact that these functions obey an additional, and more non-trivial, functional equation, this time establishing a symmetry across the critical line rather than the real axis. One consequence of this symmetry is that the zeta function and
-functions may be extended meromorphically to the entire complex plane. For the zeta function, the functional equation was discovered by Riemann, and reads as follows:
Theorem 1 (Functional equation for the Riemann zeta function) The Riemann zeta function
extends meromorphically to the entire complex plane, with a simple pole at
and no other poles. Furthermore, one has the functional equation
for all complex
other than
, where
is the function
Here
,
are the complex-analytic extensions of the classical trigionometric functions
, and
is the Gamma function, whose definition and properties we review below the fold.
The functional equation can be placed in a more symmetric form as follows:
Corollary 2 (Functional equation for the Riemann xi function) The Riemann xi function
is analytic on the entire complex plane
(after removing all removable singularities), and obeys the functional equations
In particular, the zeroes of
consist precisely of the non-trivial zeroes of
, and are symmetric about both the real axis and the critical line. Also,
is real-valued on the critical line and on the real axis.
Corollary 2 is an easy consequence of Theorem 1 together with the duplication theorem for the Gamma function, and the fact that has no zeroes to the right of the critical strip, and is left as an exercise to the reader (Exercise 19). The functional equation in Theorem 1 has many proofs, but most of them are related in on way or another to the Poisson summation formula
(Theorem 34 from Supplement 2, at least in the case when is twice continuously differentiable and compactly supported), which can be viewed as a Fourier-analytic link between the coarse-scale distribution of the integers and the fine-scale distribution of the integers. Indeed, there is a quick heuristic proof of the functional equation that comes from formally applying the Poisson summation formula to the function
, and noting that the functions
and
are formally Fourier transforms of each other, up to some Gamma function factors, as well as some trigonometric factors arising from the distinction between the real line and the half-line. Such a heuristic proof can indeed be made rigorous, and we do so below the fold, while also providing Riemann’s two classical proofs of the functional equation.
From the functional equation (and the poles of the Gamma function), one can see that has trivial zeroes at the negative even integers
, in addition to the non-trivial zeroes in the critical strip. More generally, the following table summarises the zeroes and poles of the various special functions appearing in the functional equation, after they have been meromorphically extended to the entire complex plane, and with zeroes classified as “non-trivial” or “trivial” depending on whether they lie in the critical strip or not. (Exponential functions such as
or
have no zeroes or poles, and will be ignored in this table; the zeroes and poles of rational functions such as
are self-evident and will also not be displayed here.)
Function | Non-trivial zeroes | Trivial zeroes | Poles |
|
Yes | |
|
|
Yes | |
|
|
No | Even integers | No |
|
No | Odd integers | No |
|
No | Integers | No |
|
No | No | |
|
No | No | |
|
No | No | |
|
No | No | |
|
Yes | No | No |
Among other things, this table indicates that the Gamma and trigonometric factors in the functional equation are tied to the trivial zeroes and poles of zeta, but have no direct bearing on the distribution of the non-trivial zeroes, which is the most important feature of the zeta function for the purposes of analytic number theory, beyond the fact that they are symmetric about the real axis and critical line. In particular, the Riemann hypothesis is not going to be resolved just from further analysis of the Gamma function!
The zeta function computes the “global” sum , with
ranging all the way from
to infinity. However, by some Fourier-analytic (or complex-analytic) manipulation, it is possible to use the zeta function to also control more “localised” sums, such as
for some
and some smooth compactly supported function
. It turns out that the functional equation (3) for the zeta function localises to this context, giving an approximate functional equation which roughly speaking takes the form
whenever and
; see Theorem 39 below for a precise formulation of this equation. Unsurprisingly, this form of the functional equation is also very closely related to the Poisson summation formula (8), indeed it is essentially a special case of that formula (or more precisely, of the van der Corput
-process). This useful identity relates long smoothed sums of
to short smoothed sums of
(or vice versa), and can thus be used to shorten exponential sums involving terms such as
, which is useful when obtaining some of the more advanced estimates on the Riemann zeta function.
We will give two other basic uses of the functional equation. The first is to get a good count (as opposed to merely an upper bound) on the density of zeroes in the critical strip, establishing the Riemann-von Mangoldt formula that the number of zeroes of imaginary part between
and
is
for large
. The other is to obtain untruncated versions of the explicit formula from Notes 2, giving a remarkable exact formula for sums involving the von Mangoldt function in terms of zeroes of the Riemann zeta function. These results are not strictly necessary for most of the material in the rest of the course, but certainly help to clarify the nature of the Riemann zeta function and its relation to the primes.
In view of the material in previous notes, it should not be surprising that there are analogues of all of the above theory for Dirichlet -functions
. We will restrict attention to primitive characters
, since the
-function for imprimitive characters merely differs from the
-function of the associated primitive factor by a finite Euler product; indeed, if
for some principal
whose modulus
is coprime to that of
, then
(cf. equation (45) of Notes 2).
The main new feature is that the Poisson summation formula needs to be “twisted” by a Dirichlet character , and this boils down to the problem of understanding the finite (additive) Fourier transform of a Dirichlet character. This is achieved by the classical theory of Gauss sums, which we review below the fold. There is one new wrinkle; the value of
plays a role in the functional equation. More precisely, we have
Theorem 3 (Functional equation for
-functions) Let
be a primitive character of modulus
with
. Then
extends to an entire function on the complex plane, with
or equivalently
for all
, where
is equal to
in the even case
and
in the odd case
, and
where
is the Gauss sum
and
, with the convention that the
-periodic function
is also (by abuse of notation) applied to
in the cyclic group
.
From this functional equation and (2) we see that, as with the Riemann zeta function, the non-trivial zeroes of (defined as the zeroes within the critical strip
are symmetric around the critical line (and, if
is real, are also symmetric around the real axis). In addition,
acquires trivial zeroes at the negative even integers and at zero if
, and at the negative odd integers if
. For imprimitive
, we see from (9) that
also acquires some additional trivial zeroes on the left edge of the critical strip.
There is also a symmetric version of this equation, analogous to Corollary 2:
Corollary 4 Let
be as above, and set
then
is entire with
.
For further detail on the functional equation and its implications, I recommend the classic text of Titchmarsh or the text of Davenport.
In Notes 1, we approached multiplicative number theory (the study of multiplicative functions and their relatives) via elementary methods, in which attention was primarily focused on obtaining asymptotic control on summatory functions
and logarithmic sums
. Now we turn to the complex approach to multiplicative number theory, in which the focus is instead on obtaining various types of control on the Dirichlet series
, defined (at least for
of sufficiently large real part) by the formula
These series also made an appearance in the elementary approach to the subject, but only for real that were larger than
. But now we will exploit the freedom to extend the variable
to the complex domain; this gives enough freedom (in principle, at least) to recover control of elementary sums such as
or
from control on the Dirichlet series. Crucially, for many key functions
of number-theoretic interest, the Dirichlet series
can be analytically (or at least meromorphically) continued to the left of the line
. The zeroes and poles of the resulting meromorphic continuations of
(and of related functions) then turn out to control the asymptotic behaviour of the elementary sums of
; the more one knows about the former, the more one knows about the latter. In particular, knowledge of where the zeroes of the Riemann zeta function
are located can give very precise information about the distribution of the primes, by means of a fundamental relationship known as the explicit formula. There are many ways of phrasing this explicit formula (both in exact and in approximate forms), but they are all trying to formalise an approximation to the von Mangoldt function
(and hence to the primes) of the form
where the sum is over zeroes (counting multiplicity) of the Riemann zeta function
(with the sum often restricted so that
has large real part and bounded imaginary part), and the approximation is in a suitable weak sense, so that
for suitable “test functions” (which in practice are restricted to be fairly smooth and slowly varying, with the precise amount of restriction dependent on the amount of truncation in the sum over zeroes one wishes to take). Among other things, such approximations can be used to rigorously establish the prime number theorem
as , with the size of the error term
closely tied to the location of the zeroes
of the Riemann zeta function.
The explicit formula (1) (or any of its more rigorous forms) is closely tied to the counterpart approximation
for the Dirichlet series of the von Mangoldt function; note that (4) is formally the special case of (2) when
. Such approximations come from the general theory of local factorisations of meromorphic functions, as discussed in Supplement 2; the passage from (4) to (2) is accomplished by such tools as the residue theorem and the Fourier inversion formula, which were also covered in Supplement 2. The relative ease of uncovering the Fourier-like duality between primes and zeroes (sometimes referred to poetically as the “music of the primes”) is one of the major advantages of the complex-analytic approach to multiplicative number theory; this important duality tends to be rather obscured in the other approaches to the subject, although it can still in principle be discernible with sufficient effort.
More generally, one has an explicit formula
for any (non-principal) Dirichlet character , where
now ranges over the zeroes of the associated Dirichlet
-function
; we view this formula as a “twist” of (1) by the Dirichlet character
. The explicit formula (5), proven similarly (in any of its rigorous forms) to (1), is important in establishing the prime number theorem in arithmetic progressions, which asserts that
as , whenever
is a fixed primitive residue class. Again, the size of the error term
here is closely tied to the location of the zeroes of the Dirichlet
-function, with particular importance given to whether there is a zero very close to
(such a zero is known as an exceptional zero or Siegel zero).
While any information on the behaviour of zeta functions or -functions is in principle welcome for the purposes of analytic number theory, some regions of the complex plane are more important than others in this regard, due to the differing weights assigned to each zero in the explicit formula. Roughly speaking, in descending order of importance, the most crucial regions on which knowledge of these functions is useful are
- The region on or near the point
.
- The region on or near the right edge
of the critical strip
.
- The right half
of the critical strip.
- The region on or near the critical line
that bisects the critical strip.
- Everywhere else.
For instance:
- We will shortly show that the Riemann zeta function
has a simple pole at
with residue
, which is already sufficient to recover much of the classical theorems of Mertens discussed in the previous set of notes, as well as results on mean values of multiplicative functions such as the divisor function
. For Dirichlet
-functions, the behaviour is instead controlled by the quantity
discussed in Notes 1, which is in turn closely tied to the existence and location of a Siegel zero.
- The zeta function is also known to have no zeroes on the right edge
of the critical strip, which is sufficient to prove (and is in fact equivalent to) the prime number theorem. Any enlargement of the zero-free region for
into the critical strip leads to improved error terms in that theorem, with larger zero-free regions leading to stronger error estimates. Similarly for
-functions and the prime number theorem in arithmetic progressions.
- The (as yet unproven) Riemann hypothesis prohibits
from having any zeroes within the right half
of the critical strip, and gives very good control on the number of primes in intervals, even when the intervals are relatively short compared to the size of the entries. Even without assuming the Riemann hypothesis, zero density estimates in this region are available that give some partial control of this form. Similarly for
-functions, primes in short arithmetic progressions, and the generalised Riemann hypothesis.
- Assuming the Riemann hypothesis, further distributional information about the zeroes on the critical line (such as Montgomery’s pair correlation conjecture, or the more general GUE hypothesis) can give finer information about the error terms in the prime number theorem in short intervals, as well as other arithmetic information. Again, one has analogues for
-functions and primes in short arithmetic progressions.
- The functional equation of the zeta function describes the behaviour of
to the left of the critical line, in terms of the behaviour to the right of the critical line. This is useful for building a “global” picture of the structure of the zeta function, and for improving a number of estimates about that function, but (in the absence of unproven conjectures such as the Riemann hypothesis or the pair correlation conjecture) it turns out that many of the basic analytic number theory results using the zeta function can be established without relying on this equation. Similarly for
-functions.
Remark 1 If one takes an “adelic” viewpoint, one can unite the Riemann zeta function
and all of the
-functions
for various Dirichlet characters
into a single object, viewing
as a general multiplicative character on the adeles; thus the imaginary coordinate
and the Dirichlet character
are really the Archimedean and non-Archimedean components respectively of a single adelic frequency parameter. This viewpoint was famously developed in Tate’s thesis, which among other things helps to clarify the nature of the functional equation, as discussed in this previous post. We will not pursue the adelic viewpoint further in these notes, but it does supply a “high-level” explanation for why so much of the theory of the Riemann zeta function extends to the Dirichlet
-functions. (The non-Archimedean character
and the Archimedean character
behave similarly from an algebraic point of view, but not so much from an analytic point of view; as such, the adelic viewpoint is well suited for algebraic tasks (such as establishing the functional equation), but not for analytic tasks (such as establishing a zero-free region).)
Roughly speaking, the elementary multiplicative number theory from Notes 1 corresponds to the information one can extract from the complex-analytic method in region 1 of the above hierarchy, while the more advanced elementary number theory used to prove the prime number theorem (and which we will not cover in full detail in these notes) corresponds to what one can extract from regions 1 and 2.
As a consequence of this hierarchy of importance, information about the function away from the critical strip, such as Euler’s identity
or equivalently
or the infamous identity
which is often presented (slightly misleadingly, if one’s conventions for divergent summation are not made explicit) as
are of relatively little direct importance in analytic prime number theory, although they are still of interest for some other, non-number-theoretic, applications. (The quantity does play a minor role as a normalising factor in some asymptotics, see e.g. Exercise 28 from Notes 1, but its precise value is usually not of major importance.) In contrast, the value
of an
-function at
turns out to be extremely important in analytic number theory, with many results in this subject relying ultimately on a non-trivial lower-bound on this quantity coming from Siegel’s theorem, discussed below the fold.
For a more in-depth treatment of the topics in this set of notes, see Davenport’s “Multiplicative number theory“.
We will shortly turn to the complex-analytic approach to multiplicative number theory, which relies on the basic properties of complex analytic functions. In this supplement to the main notes, we quickly review the portions of complex analysis that we will be using in this course. We will not attempt a comprehensive review of this subject; for instance, we will completely neglect the conformal geometry or Riemann surface aspect of complex analysis, and we will also avoid using the various boundary convergence theorems for Taylor series or Dirichlet series (the latter type of result is traditionally utilised in multiplicative number theory, but I personally find them a little unintuitive to use, and will instead rely on a slightly different set of complex-analytic tools). We will also focus on the “local” structure of complex analytic functions, in particular adopting the philosophy that such functions behave locally like complex polynomials; the classical “global” theory of entire functions, while traditionally used in the theory of the Riemann zeta function, will be downplayed in these notes. On the other hand, we will play up the relationship between complex analysis and Fourier analysis, as we will incline to using the latter tool over the former in some of the subsequent material. (In the traditional approach to the subject, the Mellin transform is used in place of the Fourier transform, but we will not emphasise the role of the Mellin transform here.)
We begin by recalling the notion of a holomorphic function, which will later be shown to be essentially synonymous with that of a complex analytic function.
Definition 1 (Holomorphic function) Let
be an open subset of
, and let
be a function. If
, we say that
is complex differentiable at
if the limit
exists, in which case we refer to
as the (complex) derivative of
at
. If
is differentiable at every point
of
, and the derivative
is continuous, we say that
is holomorphic on
.
Exercise 2 Show that a function
is holomorphic if and only if the two-variable function
is continuously differentiable on
and obeys the Cauchy-Riemann equation
Basic examples of holomorphic functions include complex polynomials
as well as the complex exponential function
which are holomorphic on the entire complex plane (i.e., they are entire functions). The sum or product of two holomorphic functions is again holomorphic; the quotient of two holomorphic functions is holomorphic so long as the denominator is non-zero. Finally, the composition of two holomorphic functions is holomorphic wherever the composition is defined.
- (i) Establish Euler’s formula
for all
. (Hint: it is a bit tricky to do this starting from the trigonometric definitions of sine and cosine; I recommend either using the Taylor series formulations of these functions instead, or alternatively relying on the ordinary differential equations obeyed by sine and cosine.)
- (ii) Show that every non-zero complex number
has a complex logarithm
such that
, and that this logarithm is unique up to integer multiples of
.
- (iii) Show that there exists a unique principal branch
of the complex logarithm in the region
, defined by requiring
to be a logarithm of
with imaginary part between
and
. Show that this principal branch is holomorphic with derivative
.
In real analysis, we have the fundamental theorem of calculus, which asserts that
whenever is a real interval and
is a continuously differentiable function. The complex analogue of this fact is that
whenever is a holomorphic function, and
is a contour in
, by which we mean a piecewise continuously differentiable function, and the contour integral
for a continuous function
is defined via change of variables as
The complex fundamental theorem of calculus (2) follows easily from the real fundamental theorem and the chain rule.
In real analysis, we have the rather trivial fact that the integral of a continuous function on a closed contour is always zero:
In complex analysis, the analogous fact is significantly more powerful, and is known as Cauchy’s theorem:
Theorem 4 (Cauchy’s theorem) Let
be a holomorphic function in a simply connected open set
, and let
be a closed contour in
(thus
). Then
.
Exercise 5 Use Stokes’ theorem to give a proof of Cauchy’s theorem.
A useful reformulation of Cauchy’s theorem is that of contour shifting: if is a holomorphic function on a open set
, and
are two contours in an open set
with
and
, such that
can be continuously deformed into
, then
. A basic application of contour shifting is the Cauchy integral formula:
Theorem 6 (Cauchy integral formula) Let
be a holomorphic function in a simply connected open set
, and let
be a closed contour which is simple (thus
does not traverse any point more than once, with the exception of the endpoint
that is traversed twice), and which encloses a bounded region
in the anticlockwise direction. Then for any
, one has
Proof: Let be a sufficiently small quantity. By contour shifting, one can replace the contour
by the sum (concatenation) of three contours: a contour
from
to
, a contour
traversing the circle
once anticlockwise, and the reversal
of the contour
that goes from
to
. The contributions of the contours
cancel each other, thus
By a change of variables, the right-hand side can be expanded as
Sending , we obtain the claim.
The Cauchy integral formula has many consequences. Specialising to the case when traverses a circle
around
, we conclude the mean value property
whenever is holomorphic in a neighbourhood of the disk
. In a similar spirit, we have the maximum principle for holomorphic functions:
Lemma 7 (Maximum principle) Let
be a simply connected open set, and let
be a simple closed contour in
enclosing a bounded region
anti-clockwise. Let
be a holomorphic function. If we have the bound
for all
on the contour
, then we also have the bound
for all
.
Proof: We use an argument of Landau. Fix . From the Cauchy integral formula and the triangle inequality we have the bound
for some constant depending on
and
. This ostensibly looks like a weaker bound than what we want, but we can miraculously make the constant
disappear by the “tensor power trick“. Namely, observe that if
is a holomorphic function bounded in magnitude by
on
, and
is a natural number, then
is a holomorphic function bounded in magnitude by
on
. Applying the preceding argument with
replaced by
we conclude that
and hence
Sending , we obtain the claim.
Another basic application of the integral formula is
Corollary 8 Every holomorphic function
is complex analytic, thus it has a convergent Taylor series around every point
in the domain. In particular, holomorphic functions are smooth, and the derivative of a holomorphic function is again holomorphic.
Conversely, it is easy to see that complex analytic functions are holomorphic. Thus, the terms “complex analytic” and “holomorphic” are synonymous, at least when working on open domains. (On a non-open set , saying that
is analytic on
is equivalent to asserting that
extends to a holomorphic function of an open neighbourhood of
.) This is in marked contrast to real analysis, in which a function can be continuously differentiable, or even smooth, without being real analytic.
Proof: By translation, we may suppose that . Let
be a a contour traversing the circle
that is contained in the domain
, then by the Cauchy integral formula one has
for all in the disk
. As
is continuously differentiable (and hence continuous) on
, it is bounded. From the geometric series formula
and dominated convergence, we conclude that
with the right-hand side an absolutely convergent series for , and the claim follows.
Exercise 9 Establish the generalised Cauchy integral formulae
for any non-negative integer
, where
is the
-fold complex derivative of
.
This in turn leads to a converse to Cauchy’s theorem, known as Morera’s theorem:
Corollary 10 (Morera’s theorem) Let
be a continuous function on an open set
with the property that
for all closed contours
. Then
is holomorphic.
Proof: We can of course assume to be non-empty and connected (hence path-connected). Fix a point
, and define a “primitive”
of
by defining
, with
being any contour from
to
(this is well defined by hypothesis). By mimicking the proof of the real fundamental theorem of calculus, we see that
is holomorphic with
, and the claim now follows from Corollary 8.
An important consequence of Morera’s theorem for us is
Corollary 11 (Locally uniform limit of holomorphic functions is holomorphic) Let
be holomorphic functions on an open set
which converge locally uniformly to a function
. Then
is also holomorphic on
.
Proof: By working locally we may assume that is a ball, and in particular simply connected. By Cauchy’s theorem,
for all closed contours
in
. By local uniform convergence, this implies that
for all such contours, and the claim then follows from Morera’s theorem.
Now we study the zeroes of complex analytic functions. If a complex analytic function vanishes at a point
, but is not identically zero in a neighbourhood of that point, then by Taylor expansion we see that
factors in a sufficiently small neighbourhood of
as
for some natural number (which we call the order or multiplicity of the zero at
) and some function
that is complex analytic and non-zero near
; this generalises the factor theorem for polynomials. In particular, the zero
is isolated if
does not vanish identically near
. We conclude that if
is connected and
vanishes on a neighbourhood of some point
in
, then it must vanish on all of
(since the maximal connected neighbourhood of
in
on which
vanishes cannot have any boundary point in
). This implies unique continuation of analytic functions: if two complex analytic functions on
agree on a non-empty open set, then they agree everywhere. In particular, if a complex analytic function does not vanish everywhere, then all of its zeroes are isolated, so in particular it has only finitely many zeroes on any given compact set.
Recall that a rational function is a function which is a quotient
of two polynomials (at least outside of the set where
vanishes). Analogously, let us define a meromorphic function on an open set
to be a function
defined outside of a discrete subset
of
(the singularities of
), which is locally the quotient
of holomorphic functions, in the sense that for every
, one has
in a neighbourhood of
excluding
, with
holomorphic near
and with
non-vanishing outside of
. If
and
has a zero of equal or higher order than
at
, then the singularity is removable and one can extend the meromorphic function holomorphically across
(by the holomorphic factor theorem (4)); otherwise, the singularity is non-removable and is known as a pole, whose order is equal to the difference between the order of
and the order of
at
. (If one wished, one could extend meromorphic functions to the poles by embedding
in the Riemann sphere
and mapping each pole to
, but we will not do so here. One could also consider non-meromorphic functions with essential singularities at various points, but we will have no need to analyse such singularities in this course.) If the order of a pole or zero is one, we say that it is simple; if it is two, we say it is double; and so forth.
Exercise 12 Show that the space of meromorphic functions on a non-empty open set
, quotiented by almost everywhere equivalence, forms a field.
By quotienting two Taylor series, we see that if a meromorphic function has a pole of order
at some point
, then it has a Laurent expansion
absolutely convergent in a neighbourhood of excluding
itself, and with
non-zero. The Laurent coefficient
has a special significance, and is called the residue of the meromorphic function
at
, which we will denote as
. The importance of this coefficient comes from the following significant generalisation of the Cauchy integral formula, known as the residue theorem:
Exercise 13 (Residue theorem) Let
be a meromorphic function on a simply connected domain
, and let
be a closed contour in
enclosing a bounded region
anticlockwise, and avoiding all the singularities of
. Show that
where
is summed over all the poles of
that lie in
.
The residue theorem is particularly useful when applied to logarithmic derivatives of meromorphic functions
, because the residue is of a specific form:
Exercise 14 Let
be a meromorphic function on an open set
that does not vanish identically. Show that the only poles of
are simple poles (poles of order
), occurring at the poles and zeroes of
(after all removable singularities have been removed). Furthermore, the residue of
at a pole
is an integer, equal to the order of zero of
if
has a zero at
, or equal to negative the order of pole at
if
has a pole at
.
Remark 15 The fact that residues of logarithmic derivatives of meromorphic functions are automatically integers is a remarkable feature of the complex analytic approach to multiplicative number theory, which is difficult (though not entirely impossible) to duplicate in other approaches to the subject. Here is a sample application of this integrality, which is challenging to reproduce by non-complex-analytic means: if
is meromorphic near
, and one has the bound
as
, then
must in fact stay bounded near
, because the only integer of magnitude less than
is zero.
Van Vu and I have just uploaded to the arXiv our paper “Random matrices have simple spectrum“. Recall that an Hermitian matrix is said to have simple eigenvalues if all of its
eigenvalues are distinct. This is a very typical property of matrices to have: for instance, as discussed in this previous post, in the space of all
Hermitian matrices, the space of matrices without all eigenvalues simple has codimension three, and for real symmetric cases this space has codimension two. In particular, given any random matrix ensemble of Hermitian or real symmetric matrices with an absolutely continuous distribution, we conclude that random matrices drawn from this ensemble will almost surely have simple eigenvalues.
For discrete random matrix ensembles, though, the above argument breaks down, even though general universality heuristics predict that the statistics of discrete ensembles should behave similarly to those of continuous ensembles. A model case here is the adjacency matrix of an Erdös-Rényi graph – a graph on
vertices in which any pair of vertices has an independent probability
of being in the graph. For the purposes of this paper one should view
as fixed, e.g.
, while
is an asymptotic parameter going to infinity. In this context, our main result is the following (answering a question of Babai):
Our argument works for more general Wigner-type matrix ensembles, but for sake of illustration we will stick with the Erdös-Renyi case. Previous work on local universality for such matrix models (e.g. the work of Erdos, Knowles, Yau, and Yin) was able to show that any individual eigenvalue gap did not vanish with probability
(in fact
for some absolute constant
), but because there are
different gaps that one has to simultaneously ensure to be non-zero, this did not give Theorem 1 as one is forced to apply the union bound.
Our argument in fact gives simplicity of the spectrum with probability for any fixed
; in a subsequent paper we also show that it gives a quantitative lower bound on the eigenvalue gaps (analogous to how many results on the singularity probability of random matrices can be upgraded to a bound on the least singular value).
The basic idea of argument can be sketched as follows. Suppose that has a repeated eigenvalue
. We split
for a random minor
and a random sign vector
; crucially,
and
are independent. If
has a repeated eigenvalue
, then by the Cauchy interlacing law,
also has an eigenvalue
. We now write down the eigenvector equation for
at
:
Extracting the top coefficients, we obtain
If we let be the
-eigenvector of
, then by taking inner products with
we conclude that
we typically expect to be non-zero, in which case we arrive at
In other words, in order for to have a repeated eigenvalue, the top right column
of
has to be orthogonal to an eigenvector
of the minor
. Note that
and
are going to be independent (once we specify which eigenvector of
to take as
). On the other hand, thanks to inverse Littlewood-Offord theory (specifically, we use an inverse Littlewood-Offord theorem of Nguyen and Vu), we know that the vector
is unlikely to be orthogonal to any given vector
independent of
, unless the coefficients of
are extremely special (specifically, that most of them lie in a generalised arithmetic progression). The main remaining difficulty is then to show that eigenvectors of a random matrix are typically not of this special form, and this relies on a conditioning argument originally used by Komlós to bound the singularity probability of a random sign matrix. (Basically, if an eigenvector has this special form, then one can use a fraction of the rows and columns of the random matrix to determine the eigenvector completely, while still preserving enough randomness in the remaining portion of the matrix so that this vector will in fact not be an eigenvector with high probability.)
Recent Comments