You are currently browsing the monthly archive for November 2014.
Analytic number theory is only one of many different approaches to number theory. Another important branch of the subject is algebraic number theory, which studies algebraic structures (e.g. groups, rings, and fields) of number-theoretic interest. With this perspective, the classical field of rationals , and the classical ring of integers , are placed inside the much larger field of algebraic numbers, and the much larger ring of algebraic integers, respectively. Recall that an algebraic number is a root of a polynomial with integer coefficients, and an algebraic integer is a root of a monic polynomial with integer coefficients; thus for instance is an algebraic integer (a root of ), while is merely an algebraic number (a root of ). For the purposes of this post, we will adopt the concrete (but somewhat artificial) perspective of viewing algebraic numbers and integers as lying inside the complex numbers , thus . (From a modern algebraic perspective, it is better to think of as existing as an abstract field separate from , but which has a number of embeddings into (as well as into other fields, such as the completed p-adics ), no one of which should be considered favoured over any other; cf. this mathOverflow post. But for the rudimentary algebraic number theory in this post, we will not need to work at this level of abstraction.) In particular, we identify the algebraic integer with the complex number for any natural number .
Exercise 1 Show that the field of algebraic numbers is indeed a field, and that the ring of algebraic integers is indeed a ring, and is in fact an integral domain. Also, show that , that is to say the ordinary integers are precisely the algebraic integers that are also rational. Because of this, we will sometimes refer to elements of as rational integers.
In practice, the field is too big to conveniently work with directly, having infinite dimension (as a vector space) over . Thus, algebraic number theory generally restricts attention to intermediate fields between and , which are of finite dimension over ; that is to say, finite degree extensions of . Such fields are known as algebraic number fields, or number fields for short. Apart from itself, the simplest examples of such number fields are the quadratic fields, which have dimension exactly two over .
Exercise 2 Show that if is a rational number that is not a perfect square, then the field generated by and either of the square roots of is a quadratic field. Conversely, show that all quadratic fields arise in this fashion. (Hint: show that every element of a quadratic field is a root of a quadratic polynomial over the rationals.)
The ring of algebraic integers is similarly too large to conveniently work with directly, so in algebraic number theory one usually works with the rings of algebraic integers inside a given number field . One can (and does) study this situation in great generality, but for the purposes of this post we shall restrict attention to a simple but illustrative special case, namely the quadratic fields with a certain type of negative discriminant. (The positive discriminant case will be briefly discussed in Remark 42 below.)
Exercise 3 Let be a square-free natural number with or . Show that the ring of algebraic integers in is given by
If instead is square-free with , show that the ring is instead given by
What happens if is not square-free, or negative?
Remark 4 In the case , it may naively appear more natural to work with the ring , which is an index two subring of . However, because this ring only captures some of the algebraic integers in rather than all of them, the algebraic properties of these rings are somewhat worse than those of (in particular, they generally fail to be Dedekind domains) and so are not convenient to work with in algebraic number theory.
We refer to fields of the form for natural square-free numbers as quadratic fields of negative discriminant, and similarly refer to as a ring of quadratic integers of negative discriminant. Quadratic fields and quadratic integers of positive discriminant are just as important to analytic number theory as their negative discriminant counterparts, but we will restrict attention to the latter here for simplicity of discussion.
Thus, for instance, when , the ring of integers in is the ring of Gaussian integers
and when , the ring of integers in is the ring of Eisenstein integers
where is a cube root of unity.
As these examples illustrate, the additive structure of a ring of quadratic integers is that of a two-dimensional lattice in , which is isomorphic as an additive group to . Thus, from an additive viewpoint, one can view quadratic integers as “two-dimensional” analogues of rational integers. From a multiplicative viewpoint, however, the quadratic integers (and more generally, integers in a number field) behave very similarly to the rational integers (as opposed to being some sort of “higher-dimensional” version of such integers). Indeed, a large part of basic algebraic number theory is devoted to treating the multiplicative theory of integers in number fields in a unified fashion, that naturally generalises the classical multiplicative theory of the rational integers.
For instance, every rational integer has an absolute value , with the multiplicativity property for , and the positivity property for all . Among other things, the absolute value detects units: if and only if is a unit in (that is to say, it is multiplicatively invertible in ). Similarly, in any ring of quadratic integers with negative discriminant, we can assign a norm to any quadratic integer by the formula
where is the complex conjugate of . (When working with other number fields than quadratic fields of negative discriminant, one instead defines to be the product of all the Galois conjugates of .) Thus for instance, when one has
Analogously to the rational integers, we have the multiplicativity property for and the positivity property for , and the units in are precisely the elements of norm one.
For the rational integers, we of course have the fundamental theorem of arithmetic, which asserts that every non-zero rational integer can be uniquely factored (up to permutation and units) as the product of irreducible integers, that is to say non-zero, non-unit integers that cannot be factored into the product of integers of strictly smaller norm. As it turns out, the same claim is true for a few additional rings of quadratic integers, such as the Gaussian integers and Eisenstein integers, but fails in general; for instance, in the ring , we have the famous counterexample
that decomposes non-uniquely into the product of irreducibles in . Nevertheless, it is an important fact that the fundamental theorem of arithmetic can be salvaged if one uses an “idealised” notion of a number in a ring of integers , now known in modern language as an ideal of that ring. For instance, in , the principal ideal turns out to uniquely factor into the product of (non-principal) ideals ; see Exercise 27. We will review the basic theory of ideals in number fields (focusing primarily on quadratic fields of negative discriminant) below the fold.
for some integer coefficients . One can declare two quadratic forms to be equivalent if one can transform one to the other by an invertible linear transformation , so that . For example, the quadratic forms and are equivalent, as can be seen by using the invertible linear transformation . Such equivalences correspond to the different choices of basis available when expressing a ring such as (or an ideal thereof) additively as a copy of .
There is an important and classical invariant of a quadratic form , namely the discriminant , which will of course be familiar to most readers via the quadratic formula, which among other things tells us that a quadratic form will be positive definite precisely when its discriminant is negative. It is not difficult (particularly if one exploits the multiplicativity of the determinant of matrices) to show that two equivalent quadratic forms have the same discriminant. Thus for instance any quadratic form equivalent to (1) has discriminant , while any quadratic form equivalent to (2) has discriminant . Thus we see that each ring of quadratic integers is associated with a certain negative discriminant , defined to equal when and when .
Exercise 6 (Geometric interpretation of discriminant) Let be a quadratic form of negative discriminant , and extend it to a real form in the obvious fashion. Show that for any , the set is an ellipse of area .
It is natural to ask the converse question: if two quadratic forms have the same discriminant, are they necessarily equivalent? For certain choices of discriminant, this is the case:
Exercise 7 Show that any quadratic form of discriminant is equivalent to the form , and any quadratic form of discriminant is equivalent to . (Hint: use elementary transformations to try to make as small as possible, to the point where one only has to check a finite number of cases; this argument is due to Legendre.) More generally, show that for any negative discriminant , there are only finitely many quadratic forms of that discriminant up to equivalence (a result first established by Gauss).
Unfortunately, for most choices of discriminant, the converse question fails; for instance, the quadratic forms and both have discriminant , but are not equivalent (Exercise 38). This particular failure of equivalence turns out to be intimately related to the failure of unique factorisation in the ring .
It turns out that there is a fundamental connection between quadratic fields, equivalence classes of quadratic forms of a given discriminant, and real Dirichlet characters, thus connecting the material discussed above with the last section of the previous set of notes. Here is a typical instance of this connection:
- (i) For any natural number , the number of Gaussian integers with norm is equal to . Equivalently, the number of solutions to the equation with is . (Here, as in the previous post, the symbol denotes Dirichlet convolution.)
- (ii) For any natural number , the number of Gaussian integers that divide (thus for some ) is .
We will prove this proposition later in these notes. We observe that as a special case of part (i) of this proposition, we recover the Fermat two-square theorem: an odd prime is expressible as the sum of two squares if and only if . This proposition should also be compared with the fact, used crucially in the previous post to prove Dirichlet’s theorem, that is non-negative for any , and at least one when is a square, for any quadratic character .
As an illustration of the relevance of such connections to analytic number theory, let us now explicitly compute .
This particular identity is also known as the Leibniz formula.
Proof: For a large number , consider the quantity
of all the Gaussian integers of norm less than . On the one hand, this is the same as the number of lattice points of in the disk of radius . Placing a unit square centred at each such lattice point, we obtain a region which differs from the disk by a region contained in an annulus of area . As the area of the disk is , we conclude the Gauss bound
On the other hand, by Proposition 8(i) (and removing the contribution), we see that
Now we use the Dirichlet hyperbola method to expand the right-hand side sum, first expressing
and then using the bounds , , from the previous set of notes to conclude that
Comparing the two formulae for and sending , we obtain the claim.
Exercise 10 Give an alternate proof of Corollary 9 that relies on obtaining asymptotics for the Dirichlet series as , rather than using the Dirichlet hyperbola method.
Exercise 11 Give a direct proof of Corollary 9 that does not use Proposition 8, instead using Taylor expansion of the complex logarithm . (One can also use Taylor expansions of some other functions related to the complex logarithm here, such as the arctangent function.)
More generally, one can relate for a real Dirichlet character with the number of inequivalent quadratic forms of a certain discriminant, via the famous class number formula; we will give a special case of this formula below the fold.
The material here is only a very rudimentary introduction to algebraic number theory, and is not essential to the rest of the course. A slightly expanded version of the material here, from the perspective of analytic number theory, may be found in Sections 5 and 6 of Davenport’s book. A more in-depth treatment of algebraic number theory may be found in a number of texts, e.g. Fröhlich and Taylor.
In analytic number theory, an arithmetic function is simply a function from the natural numbers to the real or complex numbers. (One occasionally also considers arithmetic functions taking values in more general rings than or , as in this previous blog post, but we will restrict attention here to the classical situation of real ofr complex arithmetic functions.) Experience has shown that a particularly tractable and relevant class of arithmetic functions for analytic number theory are the multiplicative functions, which are arithmetic functions with the additional property that
whenever are coprime. (One also considers arithmetic functions, such as the logarithm function or the von Mangoldt function, that are not genuinely multiplicative, but interact closely with multiplicative functions, and can be viewed as “derived” versions of multiplicative functions; see this previous post.) A typical example of a multiplicative function is the divisor function
that counts the number of divisors of a natural number . (The divisor function is also denoted in the literature.) The study of asymptotic behaviour of multiplicative functions (and their relatives) is known as multiplicative number theory, and is a basic cornerstone of modern analytic number theory.
There are various approaches to multiplicative number theory, each of which focuses on different asymptotic statistics of arithmetic functions . In elementary multiplicative number theory, which is the focus of this set of notes, particular emphasis is given on the following two statistics of a given arithmetic function :
- The summatory functions
of an arithmetic function , as well as the associated natural density
(if it exists).
- The logarithmic sums
of an arithmetic function , as well as the associated logarithmic density
(if it exists).
A classical case of interest is when is an indicator function of some set of natural numbers, in which case we also refer to the natural or logarithmic density of as the natural or logarithmic density of respectively. However, in analytic number theory it is usually more convenient to replace such indicator functions with other related functions that have better multiplicative properties. For instance, the indicator function of the primes is often replaced with the von Mangoldt function .
Typically, the logarithmic sums are relatively easy to control, but the summatory functions require more effort in order to obtain satisfactory estimates; see Exercise 7 below.
If an arithmetic function is multiplicative (or closely related to a multiplicative function), then there is an important further statistic on an arithmetic function beyond the summatory function and the logarithmic sum, namely the Dirichlet series
for various real or complex numbers . Under the hypothesis (3), this series is absolutely convergent for real numbers , or more generally for complex numbers with . As we will see below the fold, when is multiplicative then the Dirichlet series enjoys an important Euler product factorisation which has many consequences for analytic number theory.
In the elementary approach to multiplicative number theory presented in this set of notes, we consider Dirichlet series only for real numbers (and focusing particularly on the asymptotic behaviour as ); in later notes we will focus instead on the important complex-analytic approach to multiplicative number theory, in which the Dirichlet series (4) play a central role, and are defined not only for complex numbers with large real part, but are often extended analytically or meromorphically to the rest of the complex plane as well.
Remark 1 The elementary and complex-analytic approaches to multiplicative number theory are the two classical approaches to the subject. One could also consider a more “Fourier-analytic” approach, in which one studies convolution-type statistics such as
as for various cutoff functions , such as smooth, compactly supported functions. See for instance this previous blog post for an instance of such an approach. Another related approach is the “pretentious” approach to multiplicative number theory currently being developed by Granville-Soundararajan and their collaborators. We will occasionally make reference to these more modern approaches in these notes, but will primarily focus on the classical approaches.
To reverse the process and derive control on summatory functions or logarithmic sums starting from control of Dirichlet series is trickier, and usually requires one to allow to be complex-valued rather than real-valued if one wants to obtain really accurate estimates; we will return to this point in subsequent notes. However, there is a cheap way to get upper bounds on such sums, known as Rankin’s trick, which we will discuss later in these notes.
The basic strategy of elementary multiplicative theory is to first gather useful estimates on the statistics of “smooth” or “non-oscillatory” functions, such as the constant function , the harmonic function , or the logarithm function ; one also considers the statistics of periodic functions such as Dirichlet characters. These functions can be understood without any multiplicative number theory, using basic tools from real analysis such as the (quantitative version of the) integral test or summation by parts. Once one understands the statistics of these basic functions, one can then move on to statistics of more arithmetically interesting functions, such as the divisor function (2) or the von Mangoldt function that we will discuss below. A key tool to relate these functions to each other is that of Dirichlet convolution, which is an operation that interacts well with summatory functions, logarithmic sums, and particularly well with Dirichlet series.
Many problems and results in analytic prime number theory can be formulated in the following general form: given a collection of (affine-)linear forms , none of which is a multiple of any other, find a number such that a certain property of the linear forms are true. For instance:
- For the twin prime conjecture, one can use the linear forms , , and the property in question is the assertion that and are both prime.
- For the even Goldbach conjecture, the claim is similar but one uses the linear forms , for some even integer .
- For Chen’s theorem, we use the same linear forms as in the previous two cases, but now is the assertion that is prime and is an almost prime (in the sense that there are at most two prime factors).
- In the recent results establishing bounded gaps between primes, we use the linear forms for some admissible tuple , and take to be the assertion that at least two of are prime.
For these sorts of results, one can try a sieve-theoretic approach, which can broadly be formulated as follows:
- First, one chooses a carefully selected sieve weight , which could for instance be a non-negative function having a divisor sum form
for some coefficients , where is a natural scale parameter. The precise choice of sieve weight is often quite a delicate matter, but will not be discussed here. (In some cases, one may work with multiple sieve weights .)
- Next, one uses tools from analytic number theory (such as the Bombieri-Vinogradov theorem) to obtain upper and lower bounds for sums such as
or more generally of the form
where is some “arithmetic” function involving the prime factorisation of (we will be a bit vague about what this means precisely, but a typical choice of might be a Dirichlet convolution of two other arithmetic functions ).
- Using some combinatorial arguments, one manipulates these upper and lower bounds, together with the non-negative nature of , to conclude the existence of an in the support of (or of at least one of the sieve weights being considered) for which holds
For instance, in the recent results on bounded gaps between primes, one selects a sieve weight for which one has upper bounds on
and lower bounds on
so that one can show that the expression
is strictly positive, which implies the existence of an in the support of such that at least two of are prime. As another example, to prove Chen’s theorem to find such that is prime and is almost prime, one uses a variety of sieve weights to produce a lower bound for
and an upper bound for
where is some parameter between and , and “rough” means that all prime factors are at least . One can observe that if , then there must be at least one for which is prime and is almost prime, since for any rough number , the quantity
is only positive when is an almost prime (if has three or more factors, then either it has at least two factors less than , or it is of the form for some ). The upper and lower bounds on are ultimately produced via asymptotics for expressions of the form (1), (2), (3) for various divisor sums and various arithmetic functions .
Unfortunately, there is an obstruction to sieve-theoretic techniques working for certain types of properties , which Zeb Brady and I recently formalised at an AIM workshop this week. To state the result, we recall the Liouville function , defined by setting whenever is the product of exactly primes (counting multiplicity). Define a sign pattern to be an element of the discrete cube . Given a property of natural numbers , we say that a sign pattern is forbidden by if there does not exist any natural numbers obeying for which
Example 1 Let be the property that at least two of are prime. Then the sign patterns , , , are forbidden, because prime numbers have a Liouville function of , so that can only occur when at least two of are equal to .
We then have a parity obstruction as soon as has “too many” forbidden sign patterns, in the following (slightly informal) sense:
Claim 1 (Parity obstruction) Suppose is such that that the convex hull of the forbidden sign patterns of contains the origin. Then one cannot use the above sieve-theoretic approach to establish the existence of an such that holds.
Thus for instance, the property in Example 3 is subject to the parity obstruction since is a convex combination of and , whereas the properties in Examples 1, 2 are not. One can also check that the property “at least of the numbers is prime” is subject to the parity obstruction as soon as . Thus, the largest number of elements of a -tuple that one can force to be prime by purely sieve-theoretic methods is , rounded up.
This claim is not precisely a theorem, because it presumes a certain “Liouville pseudorandomness conjecture” (a very close cousin of the more well known “Möbius pseudorandomness conjecture”) which is a bit difficult to formalise precisely. However, this conjecture is widely believed by analytic number theorists, see e.g. this blog post for a discussion. (Note though that there are scenarios, most notably the “Siegel zero” scenario, in which there is a severe breakdown of this pseudorandomness conjecture, and the parity obstruction then disappears. A typical instance of this is Heath-Brown’s proof of the twin prime conjecture (which would ordinarily be subject to the parity obstruction) under the hypothesis of a Siegel zero.) The obstruction also does not prevent the establishment of an such that holds by introducing additional sieve axioms beyond upper and lower bounds on quantities such as (1), (2), (3). The proof of the Friedlander-Iwaniec theorem is a good example of this latter scenario.
Now we give a (slightly nonrigorous) proof of the claim.
Proof: (Nonrigorous) Suppose that the convex hull of the forbidden sign patterns contain the origin. Then we can find non-negative numbers for sign patterns , which sum to , are non-zero only for forbidden sign patterns, and which have mean zero in the sense that
for all . By Fourier expansion (or Lagrange interpolation), one can then write as a polynomial
where is a polynomial in variables that is a linear combination of monomials with and (thus has no constant or linear terms, and no monomials with repeated terms). The point is that the mean zero condition allows one to eliminate the linear terms. If we now consider the weight function
then is non-negative, is supported solely on for which is a forbidden pattern, and is equal to plus a linear combination of monomials with .
The Liouville pseudorandomness principle then predicts that sums of the form
or more generally
should be asymptotically negligible; intuitively, the point here is that the prime factorisation of should not influence the Liouville function of , even on the short arithmetic progressions that the divisor sum is built out of, and so any monomial occurring in should exhibit strong cancellation for any of the above sums. If one accepts this principle, then all the expressions (1), (2), (3) should be essentially unchanged when is replaced by .
Suppose now for sake of contradiction that one could use sieve-theoretic methods to locate an in the support of some sieve weight obeying . Then, by reweighting all sieve weights by the additional multiplicative factor of , the same arguments should also be able to locate in the support of for which holds. But is only supported on those whose Liouville sign pattern is forbidden, a contradiction.
Claim 1 is sharp in the following sense: if the convex hull of the forbidden sign patterns of do not contain the origin, then by the Hahn-Banach theorem (in the hyperplane separation form), there exist real coefficients such that
for all forbidden sign patterns and some . On the other hand, from Liouville pseudorandomness one expects that
and hence is not a forbidden sign pattern. This does not actually imply that holds, but it does not prevent from holding purely from parity considerations. Thus, we do not expect a parity obstruction of the type in Claim 1 to hold when the convex hull of forbidden sign patterns does not contain the origin.
Example 4 Let be a graph on vertices , and let be the property that one can find an edge of with both prime. We claim that this property is subject to the parity problem precisely when is two-colourable. Indeed, if is two-colourable, then we can colour into two colours (say, red and green) such that all edges in connect a red vertex to a green vertex. If we then consider the two sign patterns in which all the red vertices have one sign and the green vertices have the opposite sign, these are two forbidden sign patterns which contain the origin in the convex hull, and so the parity problem applies. Conversely, suppose that is not two-colourable, then it contains an odd cycle. Any forbidden sign pattern then must contain more s on this odd cycle than s (since otherwise two of the s are adjacent on this cycle by the pigeonhole principle, and this is not forbidden), and so by convexity any tuple in the convex hull of this sign pattern has a positive sum on this odd cycle. Hence the origin is not in the convex hull, and the parity obstruction does not apply. (See also this previous post for a similar obstruction ultimately coming from two-colourability).
Example 5 An example of a parity-obstructed property (supplied by Zeb Brady) that does not come from two-colourability: we let be the property that are prime for some collection of pair sets that cover . For instance, this property holds if are both prime, or if are all prime, but not if are the only primes. An example of a forbidden sign pattern is the pattern where are given the sign , and the other three pairs are given . Averaging over permutations of we see that zero lies in the convex hull, and so this example is blocked by parity. However, there is no sign pattern such that it and its negation are both forbidden, which is another formulation of two-colourability.
Of course, the absence of a parity obstruction does not automatically mean that the desired claim is true. For instance, given an admissible -tuple , parity obstructions do not prevent one from establishing the existence of infinitely many such that at least three of are prime, however we are not yet able to actually establish this, even assuming strong sieve-theoretic hypotheses such as the generalised Elliott-Halberstam hypothesis. (However, the argument giving (4) does easily give the far weaker claim that there exist infinitely many such that at least three of have a Liouville function of .)
Remark 1 Another way to get past the parity problem in some cases is to take advantage of linear forms that are constant multiples of each other (which correlates the Liouville functions to each other). For instance, on GEH we can find two numbers (products of exactly three primes) that differ by exactly ; a direct sieve approach using the linear forms fails due to the parity obstruction, but instead one can first find such that two of are prime, and then among the pairs of linear forms , , one can find a pair of numbers that differ by exactly . See this paper of Goldston, Graham, Pintz, and Yildirim for more examples of this type.
I thank John Friedlander and Sid Graham for helpful discussions and encouragement.
In the winter quarter (starting January 5) I will be teaching a graduate topics course entitled “An introduction to analytic prime number theory“. As the name suggests, this is a course covering many of the analytic number theory techniques used to study the distribution of the prime numbers . I will list the topics I intend to cover in this course below the fold. As with my previous courses, I will place lecture notes online on my blog in advance of the physical lectures.
The type of results about primes that one aspires to prove here is well captured by Landau’s classical list of problems:
- Even Goldbach conjecture: every even number greater than two is expressible as the sum of two primes.
- Twin prime conjecture: there are infinitely many pairs which are simultaneously prime.
- Legendre’s conjecture: for every natural number , there is a prime between and .
- There are infinitely many primes of the form .
All four of Landau’s problems remain open, but we have convincing heuristic evidence that they are all true, and in each of the four cases we have some highly non-trivial partial results, some of which will be covered in this course. We also now have some understanding of the barriers we are facing to fully resolving each of these problems, such as the parity problem; this will also be discussed in the course.
One of the main reasons that the prime numbers are so difficult to deal with rigorously is that they have very little usable algebraic or geometric structure that we know how to exploit; for instance, we do not have any useful prime generating functions. One of course can create non-useful functions of this form, such as the ordered parameterisation that maps each natural number to the prime , or one could invoke Matiyasevich’s theorem to produce a polynomial of many variables whose only positive values are prime, but these sorts of functions have no usable structure to exploit (for instance, they give no insight into any of the Landau problems listed above; see also Remark 2 below). The various primality tests in the literature, while useful for practical applications (e.g. cryptography) involving primes, have also proven to be of little utility for these sorts of problems; again, see Remark 2. In fact, in order to make plausible heuristic predictions about the primes, it is best to take almost the opposite point of view to the structured viewpoint, using as a starting point the belief that the primes exhibit strong pseudorandomness properties that are largely incompatible with the presence of rigid algebraic or geometric structure. We will discuss such heuristics later in this course.
It may be in the future that some usable structure to the primes (or related objects) will eventually be located (this is for instance one of the motivations in developing a rigorous theory of the “field with one element“, although this theory is far from being fully realised at present). For now, though, analytic and combinatorial methods have proven to be the most effective way forward, as they can often be used even in the near-complete absence of structure.
In this course, we will not discuss combinatorial approaches (such as the deployment of tools from additive combinatorics) in depth, but instead focus on the analytic methods. The basic principles of this approach can be summarised as follows:
- Rather than try to isolate individual primes in , one works with the set of primes in aggregate, focusing in particular on asymptotic statistics of this set. For instance, rather than try to find a single pair of twin primes, one can focus instead on the count of twin primes up to some threshold . Similarly, one can focus on counts such as , , or , which are the natural counts associated to the other three Landau problems. In all four of Landau’s problems, the basic task is now to obtain a non-trivial lower bounds on these counts.
- If one wishes to proceed analytically rather than combinatorially, one should convert all these counts into sums, using the fundamental identity
(or variants thereof) for the cardinality of subsets of the natural numbers , where is the indicator function of (and ranges over ). Thus we are now interested in estimating (and particularly in lower bounding) sums such as
- Once one expresses number-theoretic problems in this fashion, we are naturally led to the more general question of how to accurately estimate (or, less ambitiously, to lower bound or upper bound) sums such as
or more generally bilinear or multilinear sums such as
for various functions of arithmetic interest. (Importantly, one should also generalise to include integrals as well as sums, particularly contour integrals or integrals over the unit circle or real line, but we postpone discussion of these generalisations to later in the course.) Indeed, a huge portion of modern analytic number theory is devoted to precisely this sort of question. In many cases, we can predict an expected main term for such sums, and then the task is to control the error term between the true sum and its expected main term. It is often convenient to normalise the expected main term to be zero or negligible (e.g. by subtracting a suitable constant from ), so that one is now trying to show that a sum of signed real numbers (or perhaps complex numbers) is small. In other words, the question becomes one of rigorously establishing a significant amount of cancellation in one’s sums (also referred to as a gain or savings over a benchmark “trivial bound”). Or to phrase it negatively, the task is to rigorously prevent a conspiracy of non-cancellation, caused for instance by two factors in the summand exhibiting an unexpectedly large correlation with each other.
- It is often difficult to discern cancellation (or to prevent conspiracy) directly for a given sum (such as ) of interest. However, analytic number theory has developed a large number of techniques to relate one sum to another, and then the strategy is to keep transforming the sum into more and more analytically tractable expressions, until one arrives at a sum for which cancellation can be directly exhibited. (Note though that there is often a short-term tradeoff between analytic tractability and algebraic simplicity; in a typical analytic number theory argument, the sums will get expanded and decomposed into many quite messy-looking sub-sums, until at some point one applies some crude estimation to replace these messy sub-sums by tractable ones again.) There are many transformations available, ranging such basic tools as the triangle inequality, pointwise domination, or the Cauchy-Schwarz inequality to key identities such as multiplicative number theory identities (such as the Vaughan identity and the Heath-Brown identity), Fourier-analytic identities (e.g. Fourier inversion, Poisson summation, or more advanced trace formulae), or complex analytic identities (e.g. the residue theorem, Perron’s formula, or Jensen’s formula). The sheer range of transformations available can be intimidating at first; there is no shortage of transformations and identities in this subject, and if one applies them randomly then one will typically just transform a difficult sum into an even more difficult and intractable expression. However, one can make progress if one is guided by the strategy of isolating and enhancing a desired cancellation (or conspiracy) to the point where it can be easily established (or dispelled), or alternatively to reach the point where no deep cancellation is needed for the application at hand (or equivalently, that no deep conspiracy can disrupt the application).
- One particularly powerful technique (albeit one which, ironically, can be highly “ineffective” in a certain technical sense to be discussed later) is to use one potential conspiracy to defeat another, a technique I refer to as the “dueling conspiracies” method. This technique may be unable to prevent a single strong conspiracy, but it can sometimes be used to prevent two or more such conspiracies from occurring, which is particularly useful if conspiracies come in pairs (e.g. through complex conjugation symmetry, or a functional equation). A related (but more “effective”) strategy is to try to “disperse” a single conspiracy into several distinct conspiracies, which can then be used to defeat each other.
As stated before, the above strategy has not been able to establish any of the four Landau problems as stated. However, they can come close to such problems (and we now have some understanding as to why these problems remain out of reach of current methods). For instance, by using these techniques (and a lot of additional effort) one can obtain the following sample partial results in the Landau problems:
- Chen’s theorem: every sufficiently large even number is expressible as the sum of a prime and an almost prime (the product of at most two primes). The proof proceeds by finding a nontrivial lower bound on , where is the set of almost primes.
- Zhang’s theorem: There exist infinitely many pairs of consecutive primes with . The proof proceeds by giving a non-negative lower bound on the quantity for large and certain distinct integers between and . (The bound has since been lowered to .)
- The Baker-Harman-Pintz theorem: for sufficiently large , there is a prime between and . Proven by finding a nontrivial lower bound on .
- The Friedlander-Iwaniec theorem: There are infinitely many primes of the form . Proven by finding a nontrivial lower bound on .
We will discuss (simpler versions of) several of these results in this course.
Of course, for the above general strategy to have any chance of succeeding, one must at some point use some information about the set of primes. As stated previously, usefully structured parametric descriptions of do not appear to be available. However, we do have two other fundamental and useful ways to describe :
- (Sieve theory description) The primes consist of those numbers greater than one, that are not divisible by any smaller prime.
- (Multiplicative number theory description) The primes are the multiplicative generators of the natural numbers : every natural number is uniquely factorisable (up to permutation) into the product of primes (the fundamental theorem of arithmetic).
The sieve-theoretic description and its variants lead one to a good understanding of the almost primes, which turn out to be excellent tools for controlling the primes themselves, although there are known limitations as to how much information on the primes one can extract from sieve-theoretic methods alone, which we will discuss later in this course. The multiplicative number theory methods lead one (after some complex or Fourier analysis) to the Riemann zeta function (and other L-functions, particularly the Dirichlet L-functions), with the distribution of zeroes (and poles) of these functions playing a particularly decisive role in the multiplicative methods.
Many of our strongest results in analytic prime number theory are ultimately obtained by incorporating some combination of the above two fundamental descriptions of (or variants thereof) into the general strategy described above. In contrast, more advanced descriptions of , such as those coming from the various primality tests available, have (until now, at least) been surprisingly ineffective in practice for attacking problems such as Landau’s problems. One reason for this is that such tests generally involve operations such as exponentiation or the factorial function , which grow too quickly to be amenable to the analytic techniques discussed above.
To give a simple illustration of these two basic approaches to the primes, let us first give two variants of the usual proof of Euclid’s theorem:
Proof: (Multiplicative number theory proof) Suppose for contradiction that there were only finitely many primes . Then, by the fundamental theorem of arithmetic, every natural number is expressible as the product of the primes . But the natural number is larger than one, but not divisible by any of the primes , a contradiction.
(Sieve-theoretic proof) Suppose for contradiction that there were only finitely many primes . Then, by the Chinese remainder theorem, the set of natural numbers that is not divisible by any of the has density , that is to say
In particular, has positive density and thus contains an element larger than . But the least such element is one further prime in addition to , a contradiction.
Remark 1 One can also phrase the proof of Euclid’s theorem in a fashion that largely avoids the use of contradiction; see this previous blog post for more discussion.
Both proofs in fact extend to give a stronger result:
Proof: (Multiplicative number theory proof) By the fundamental theorem of arithmetic, every natural number is expressible uniquely as the product of primes in increasing order. In particular, we have the identity
(both sides make sense in as everything is unsigned). Since the left-hand side is divergent, the right-hand side is as well. But
and , so must be divergent.
(Sieve-theoretic proof) Suppose for contradiction that the sum is convergent. For each natural number , let be the set of natural numbers not divisible by the first primes , and let be the set of numbers not divisible by any prime in . As in the previous proof, each has density . Also, since contains at most multiples of , we have from the union bound that
Since is assumed to be convergent, we conclude that the density of converges to the density of ; thus has density , which is non-zero by the hypothesis that converges. On the other hand, since the primes are the only numbers greater than one not divisible by smaller primes, is just , which has density zero, giving the desired contradiction.
Remark 2 We have seen how easy it is to prove Euler’s theorem by analytic methods. In contrast, there does not seem to be any known proof of this theorem that proceeds by using any sort of prime-generating formula or a primality test, which is further evidence that such tools are not the most effective way to make progress on problems such as Landau’s problems. (But the weaker theorem of Euclid, Theorem 1, can sometimes be proven by such devices.)
The two proofs of Theorem 2 given above are essentially the same proof, as is hinted at by the geometric series identity
One can also see the Riemann zeta function begin to make an appearance in both proofs. Once one goes beyond Euler’s theorem, though, the sieve-theoretic and multiplicative methods begin to diverge significantly. On one hand, sieve theory can still handle to some extent sets such as twin primes, despite the lack of multiplicative structure (one simply has to sieve out two residue classes per prime, rather than one); on the other, multiplicative number theory can attain results such as the prime number theorem for which purely sieve theoretic techniques have not been able to establish. The deepest results in analytic number theory will typically require a combination of both sieve-theoretic methods and multiplicative methods in conjunction with the many transforms discussed earlier (and, in many cases, additional inputs from other fields of mathematics such as arithmetic geometry, ergodic theory, or additive combinatorics).
The wave equation is usually expressed in the form
where is a function of both time and space , with being the Laplacian operator. One can generalise this equation in a number of ways, for instance by replacing the spatial domain with some other manifold and replacing the Laplacian with the Laplace-Beltrami operator or adding lower order terms (such as a potential, or a coupling with a magnetic field). But for sake of discussion let us work with the classical wave equation on . We will work formally in this post, being unconcerned with issues of convergence, justifying interchange of integrals, derivatives, or limits, etc.. One then has a conserved energy
which we can rewrite using integration by parts and the inner product on as
A key feature of the wave equation is finite speed of propagation: if, at time (say), the initial position and initial velocity are both supported in a ball , then at any later time , the position and velocity are supported in the larger ball . This can be seen for instance (formally, at least) by inspecting the exterior energy
and observing (after some integration by parts and differentiation under the integral sign) that it is non-increasing in time, non-negative, and vanishing at time .
The wave equation is second order in time, but one can turn it into a first order system by working with the pair rather than just the single field , where is the velocity field. The system is then
Finite speed of propagation then tells us that if are both supported on , then are supported on for all . One also has time reversal symmetry: if is a solution, then is a solution also, thus for instance one can establish an analogue of finite speed of propagation for negative times using this symmetry.
If one has an eigenfunction
of the Laplacian, then we have the explicit solutions
of the wave equation, which formally can be used to construct all other solutions via the principle of superposition.
When one has vanishing initial velocity , the solution is given via functional calculus by
and the propagator can be expressed as the average of half-wave operators:
One can view as a minor of the full wave propagator
which is unitary with respect to the energy form (1), and is the fundamental solution to the wave equation in the sense that
Viewing the contraction as a minor of a unitary operator is an instance of the “dilation trick“.
It turns out (as I learned from Yuval Peres) that there is a useful discrete analogue of the wave equation (and of all of the above facts), in which the time variable now lives on the integers rather than on , and the spatial domain can be replaced by discrete domains also (such as graphs). Formally, the system is now of the form
where is now an integer, take values in some Hilbert space (e.g. functions on a graph ), and is some operator on that Hilbert space (which in applications will usually be a self-adjoint contraction). To connect this with the classical wave equation, let us first consider a rescaling of this system
where is a small parameter (representing the discretised time step), now takes values in the integer multiples of , and is the wave propagator operator or the heat propagator (the two operators are different, but agree to fourth order in ). One can then formally verify that the wave equation emerges from this rescaled system in the limit . (Thus, is not exactly the direct analogue of the Laplacian , but can be viewed as something like in the case of small , or if we are not rescaling to the small case. The operator is sometimes known as the diffusion operator)
Assuming is self-adjoint, solutions to the system (3) formally conserve the energy
This energy is positive semi-definite if is a contraction. We have the same time reversal symmetry as before: if solves the system (3), then so does . If one has an eigenfunction
to the operator , then one has an explicit solution
to (3), and (in principle at least) this generates all other solutions via the principle of superposition.
Finite speed of propagation is a lot easier in the discrete setting, though one has to offset the support of the “velocity” field by one unit. Suppose we know that has unit speed in the sense that whenever is supported in a ball , then is supported in the ball . Then an easy induction shows that if are supported in respectively, then are supported in .
where and are the Chebyshev polynomials of the first and second kind, thus
As before, is unitary with respect to the energy form (4), so this is another instance of the dilation trick in action. The powers and are discrete analogues of the heat propagators and wave propagators respectively.
One nice application of all this formalism, which I learned from Yuval Peres, is the Varopoulos-Carne inequality:
Theorem 1 (Varopoulos-Carne inequality) Let be a (possibly infinite) regular graph, let , and let be vertices in . Then the probability that the simple random walk at lands at at time is at most , where is the graph distance.
This general inequality is quite sharp, as one can see using the standard Cayley graph on the integers . Very roughly speaking, it asserts that on a regular graph of reasonably controlled growth (e.g. polynomial growth), random walks of length concentrate on the ball of radius or so centred at the origin of the random walk.
Proof: Let be the graph Laplacian, thus
for any , where is the degree of the regular graph and sum is over the vertices that are adjacent to . This is a contraction of unit speed, and the probability that the random walk at lands at at time is
where are the Dirac deltas at . Using (5), we can rewrite this as
where we are now using the energy form (4). We can write
where is the simple random walk of length on the integers, that is to say where are independent uniform Bernoulli signs. Thus we wish to show that
By finite speed of propagation, the inner product here vanishes if . For we can use Cauchy-Schwarz and the unitary nature of to bound the inner product by . Thus the left-hand side may be upper bounded by
and the claim now follows from the Chernoff inequality.
This inequality has many applications, particularly with regards to relating the entropy, mixing time, and concentration of random walks with volume growth of balls; see this text of Lyons and Peres for some examples.
For sake of comparison, here is a continuous counterpart to the Varopoulos-Carne inequality:
Theorem 2 (Continuous Varopoulos-Carne inequality) Let , and let be supported on compact sets respectively. Then
where is the Euclidean distance between and .
Proof: By Fourier inversion one has
for any real , and thus
By finite speed of propagation, the inner product vanishes when ; otherwise, we can use Cauchy-Schwarz and the contractive nature of to bound this inner product by . Thus
Bounding by , we obtain the claim.
Observe that the argument is quite general and can be applied for instance to other Riemannian manifolds than .