You are currently browsing the category archive for the ‘math.NT’ category.
One of the basic general problems in analytic number theory is to understand as much as possible the fluctuations of the Möbius function , defined as
when
is the product of
distinct primes, and zero otherwise. For instance, as
takes values in
, we have the trivial bound
and the seemingly slight improvement
is equivalent to the notorious Riemann hypothesis.
There is a general Möbius pseudorandomness heuristic that suggests that the sign pattern behaves so randomly (or pseudorandomly) that one should expect a substantial amount of cancellation in sums that involve the sign fluctuation of the Möbius function in a nontrivial fashion, with the amount of cancellation present comparable to the amount that an analogous random sum would provide; cf. the probabilistic heuristic discussed in this recent blog post. There are a number of ways to make this heuristic precise. As already mentioned, the Riemann hypothesis can be considered one such manifestation of the heuristic. Another manifestation is the following old conjecture of Chowla:
Conjecture 1 (Chowla’s conjecture) For any fixed integer
and exponents
, with at least one of the
odd (so as not to completely destroy the sign cancellation), we have
Note that as for any
, we can reduce to the case when the
take values in
here. When only one of the
are odd, this is essentially the prime number theorem in arithmetic progressions (after some elementary sieving), but with two or more of the
are odd, the problem becomes completely open. For instance, the estimate
is morally very close to the conjectured asymptotic
for the von Mangoldt function , where
is the twin prime constant; this asymptotic in turn implies the twin prime conjecture. (To formally deduce estimates for von Mangoldt from estimates for Möbius, though, typically requires some better control on the error terms than
, in particular gains of some power of
are usually needed. See this previous blog post for more discussion.)
Remark 1 The Chowla conjecture resembles an assertion that, for
chosen randomly and uniformly from
to
, the random variables
become asymptotically independent of each other (in the probabilistic sense) as
. However, this is not quite accurate, because some moments (namely those with all exponents
even) have the “wrong” asymptotic value, leading to some unwanted correlation between the two variables. For instance, the events
and
have a strong correlation with each other, basically because they are both strongly correlated with the event of
being divisible by
. A more accurate interpretation of the Chowla conjecture is that the random variables
are asymptotically conditionally independent of each other, after conditioning on the zero pattern
; thus, it is the sign of the Möbius function that fluctuates like random noise, rather than the zero pattern. (The situation is a bit cleaner if one works instead with the Liouville function
instead of the Möbius function
, as this function never vanishes, but we will stick to the traditional Möbius function formalism here.)
A more recent formulation of the Möbius randomness heuristic is the following conjecture of Sarnak. Given a bounded sequence , define the topological entropy of the sequence to be the least exponent
with the property that for any fixed
, and for
going to infinity the set
of
can be covered by
balls of radius
. (If
arises from a minimal topological dynamical system
by
, the above notion is equivalent to the usual notion of the topological entropy of a dynamical system.) For instance, if the sequence is a bit sequence (i.e. it takes values in
), then there are only
-bit patterns that can appear as blocks of
consecutive bits in this sequence. As a special case, a Turing machine with bounded memory that had access to a random number generator at the rate of one random bit produced every
units of time, but otherwise evolved deterministically, would have an output sequence that had a topological entropy of at most
. A bounded sequence is said to be deterministic if its topological entropy is zero. A typical example is a polynomial sequence such as
for some fixed
; the
-blocks of such polynomials sequence have covering numbers that only grow polynomially in
, rather than exponentially, thus yielding the zero entropy. Unipotent flows, such as the horocycle flow on a compact hyperbolic surface, are another good source of deterministic sequences.
Conjecture 2 (Sarnak’s conjecture) Let
be a deterministic bounded sequence. Then
This conjecture in general is still quite far from being solved. However, special cases are known:
- For constant sequences, this is essentially the prime number theorem (1).
- For periodic sequences, this is essentially the prime number theorem in arithmetic progressions.
- For quasiperiodic sequences such as
for some continuous
, this follows from the work of Davenport.
- For nilsequences, this is a result of Ben Green and myself.
- For horocycle flows, this is a result of Bourgain, Sarnak, and Ziegler.
- For the Thue-Morse sequence, this is a result of Dartyge-Tenenbaum (with a stronger error term obtained by Maduit-Rivat). A subsequent result of Bourgain handles all bounded rank one sequences (though the Thue-Morse sequence is actually of rank two), and a related result of Green establishes asymptotic orthogonality of the Möbius function to bounded depth circuits, although such functions are not necessarily deterministic in nature.
- For the Rudin-Shapiro sequence, I sketched out an argument at this MathOverflow post.
- The Möbius function is known to itself be non-deterministic, because its square
(i.e. the indicator of the square-free functions) is known to be non-deterministic (indeed, its topological entropy is
). (The corresponding question for the Liouville function
, however, remains open, as the square
has zero entropy.)
- In the converse direction, it is easy to construct sequences of arbitrarily small positive entropy that correlate with the Möbius function (a rather silly example is
for some fixed large (squarefree)
, which has topological entropy at most
but clearly correlates with
).
See this survey of Sarnak for further discussion of this and related topics.
In this post I wanted to give a very nice argument of Sarnak that links the above two conjectures:
Proposition 3 The Chowla conjecture implies the Sarnak conjecture.
The argument does not use any number-theoretic properties of the Möbius function; one could replace in both conjectures by any other function from the natural numbers to
and obtain the same implication. The argument consists of the following ingredients:
- To show that
, it suffices to show that the expectation of the random variable
, where
is drawn uniformly at random from
to
, can be made arbitrary small by making
large (and
even larger).
- By the union bound and the zero topological entropy of
, it suffices to show that for any bounded deterministic coefficients
, the random variable
concentrates with exponentially high probability.
- Finally, this exponentially high concentration can be achieved by the moment method, using a slight variant of the moment method proof of the large deviation estimates such as the Chernoff inequality or Hoeffding inequality (as discussed in this blog post).
As is often the case, though, while the “top-down” order of steps presented above is perhaps the clearest way to think conceptually about the argument, in order to present the argument formally it is more convenient to present the arguments in the reverse (or “bottom-up”) order. This is the approach taken below the fold.
There has been a lot of recent interest in the abc conjecture, since the release a few weeks ago of the last of a series of papers by Shinichi Mochizuki which, as one of its major applications, claims to establish this conjecture. It’s still far too early to judge whether this proof is likely to be correct or not (the entire argument encompasses several hundred pages of argument, mostly in the area of anabelian geometry, which very few mathematicians are expert in, to the extent that we still do not even have a full outline of the proof strategy yet), and I don’t have anything substantial to add to the existing discussion around that conjecture. (But, for those that are interested, the Polymath wiki page on the ABC conjecture has collected most of the links to that discussion, and to various background materials.)
In the meantime, though, I thought I might give the standard probabilistic heuristic argument that explains why we expect the ABC conjecture to be true. The underlying heuristic is a common one, used throughout number theory, and it can be summarised as follows:
Heuristic 1 (Probabilistic heuristic) Even though number theory is a deterministic subject (one does not need to roll any dice to factorise a number, or figure out if a number is prime), one expects to get a good asymptotic prediction for the answers to many number-theoretic questions by pretending that various number-theoretic assertions
(e.g. that a given number
is prime) are probabilistic events (with a probability
that can vary between
and
) rather than deterministic events (that are either always true or always false). Furthermore:
- (Basic heuristic) If two or more of these heuristically probabilistic events have no obvious reason to be strongly correlated to each other, then we should expect them to behave as if they were (jointly) independent.
- (Advanced heuristic) If two or more of these heuristically probabilistic events have some obvious correlation between them, but no further correlations are suspected, then we should expect them to behave as if they were conditionally independent, relative to whatever data is causing the correlation.
This is, of course, an extremely vague and completely non-rigorous heuristic, requiring (among other things) a subjective and ad hoc determination of what an “obvious reason” is, but in practice it tends to give remarkably plausible predictions, some fraction of which can in fact be backed up by rigorous argument (although in many cases, the actual argument has almost nothing in common with the probabilistic heuristic). A famous special case of this heuristic is the Cramér random model for the primes, but this is not the only such instance for that heuristic.
To give the most precise predictions, one should use the advanced heuristic in Heuristic 1, but this can be somewhat complicated to execute, and so we shall focus instead on the predictions given by the basic heuristic (thus ignoring the presence of some number-theoretic correlations), which tends to give predictions that are quantitatively inaccurate but still reasonably good at the qualitative level.
Here is a basic “corollary” of Heuristic 1:
Heuristic 2 (Heuristic Borel-Cantelli) Suppose one has a sequence
of number-theoretic statements, which we heuristically interpet as probabilistic events with probabilities
. Suppose also that we know of no obvious reason for these events to have much of a correlation with each other. Then:
- If
, we expect only finitely many of the statements
to be true. (And if
is much smaller than
, we in fact expect none of the
to be true.)
- If
, we expect infinitely many of the statements
to be true.
This heuristic is motivated both by the Borel-Cantelli lemma, and by the standard probabilistic computation that if one is given jointly independent, and genuinely probabilistic, events with
, then one almost surely has an infinite number of the
occuring.
Before we get to the ABC conjecture, let us give two simpler (and well known) demonstrations of these heuristics in action:
Example 1 (Twin prime conjecture) One can heuristically justify the twin prime conjecture as follows. Using the prime number theorem, one can heuristically assign a probability of
to the event that any given large integer
is prime. In particular, the probability that
is prime will then be
. Making the assumption that there are no strong correlations between these events, we are led to the prediction that the probability that
and
are simultaneously prime is
. Since
, the Borel-Cantelli heuristic then predicts that there should be infinitely many twin primes.
Note that the above argument is a bit too naive, because there are some non-trivial correlations between the primality of
and the primality of
. Most obviously, if
is prime, this greatly increases the probability that
is odd, which implies that
is odd, which then elevates the probability that
is prime. A bit more subtly, if
is prime, then
is likely to avoid the residue class
, which means that
avoids the residue class
, which ends up decreasing the probability that
is prime. However, there is a standard way to correct for these local correlations; see for instance in this previous blog post. As it turns out, these local correlations ultimately alter the prediction for the asymptotic density of twin primes by a constant factor (the twin prime constant), but do not affect the qualitative prediction of there being infinitely many twin primes.
Example 2 (Fermat’s last theorem) Let us now heuristically count the number of solutions to
for various
and natural numbers
(which we can reduce to be coprime if desired). We recast this (in the spirit of the ABC conjecture) as
, where
are
powers. The number of
powers up to any given number
is about
, so heuristically any given natural number
has a probability about
of being an
power. If we make the naive assumption that (in the coprime case at least) there is no strong correlation between the events that
is an
power,
is an
power, and
being an
power, then for typical
, the probability that
are all simultaneously
powers would then be
. For fixed
, the total number of solutions to the Fermat equation would then be predicted to be
(Strictly speaking, we need to restrict to the coprime case, but given that a positive density of pairs of integers are coprime, it should not affect the qualitative conclusion significantly if we now omit this restriction.) It might not be immediately obvious as to whether this sum converges or diverges, but (as is often the case with these sorts of unsigned sums) one can clarify the situation by dyadic decomposition. Suppose for instance that we consider the portion of the sum where
lies between
and
. Then this portion of the sum can be controlled by
which simplifies to
Summing in
, one thus expects infinitely many solutions for
, only finitely many solutions for
(indeed, a refinement of this argument shows that one expects only finitely many solutions even if one considers all
at once), and a borderline prediction of there being a barely infinite number of solutions when
. Here is of course a place where a naive application of the probabilistic heuristic breaks down; there is enough arithmetic structure in the equation
that the naive probabilistic prediction ends up being an inaccurate model. Indeed, while this heuristic suggests that a typical homogeneous cubic should have a logarithmic number of integer solutions of a given height
, it turns out that some homogeneous cubics (namely, those associated to elliptic curves of positive rank) end up with the bulk of these solutions, while other homogeneous cubics (including those associated to elliptic curves of zero rank, including the Fermat curve
) only get finitely many solutions. The reasons for this are subtle, but certainly the high degree of arithmetic structure present in an elliptic curve (starting with the elliptic curve group law which allows one to generate new solutions from old ones, and which also can be used to exclude solutions to
via the method of descent) is a major contributing factor.
Below the fold, we apply similar heuristics to suggest the truth of the ABC conjecture.
Much as group theory is the study of groups, or graph theory is the study of graphs, model theory is the study of models (also known as structures) of some language (which, in this post, will always be a single-sorted, first-order language). A structure is a set
, equipped with one or more operations, constants, and relations. This is of course an extremely general type of mathematical object, but (quite remarkably) one can still say a substantial number of interesting things about very broad classes of structures.
We will observe the common abuse of notation of using the set as a metonym for the entire structure, much as we usually refer to a group
simply as
, a vector space
simply as
, and so forth. Following another common bending of the rules, we also allow some operations on structures (such as the multiplicative inverse operation on a group or field) to only be partially defined, and we allow use of the usual simplifying conventions for mathematical formulas (e.g. writing
instead of
or
, in cases where associativity is known). We will also deviate slightly from the usual practice in logic by emphasising individual structures, rather than the theory of general classes of structures; for instance, we will talk about the theory of a single field such as
or
, rather than the theory of all fields of a certain type (e.g. real closed fields or algebraically closed fields).
Once one has a structure , one can introduce the notion of a definable subset of
, or more generally of a Cartesian power
of
, defined as a set
of the form
in the language
with
free variables and any number of constants from
(that is,
is a well-formed formula built up from a finite number of constants
in
, the relations and operations on
, logical connectives such as
,
,
, and the quantifiers
). Thus, for instance, in the theory of the arithmetic of the natural numbers
, the set of primes
is a definable set, since we have
In the theory of the field of reals , the unit circle
is an example of a definable set,
but so is the the complement of the circle,
and the interval :
Due to the unlimited use of constants, any finite subset of a power of any structure
is, by our conventions, definable in that structure. (One can of course also consider definability without parameters (also known as
-definability), in which arbitrary constants are not permitted, but we will not do so here.)
We can isolate some special subclasses of definable sets:
- An atomic definable set is a set of the form (1) in which
is an atomic formula (i.e. it does not contain any logical connectives or quantifiers).
- A quantifier-free definable set is a set of the form (1) in which
is quantifier-free (i.e. it can contain logical connectives, but does not contain the quantifiers
).
Example 1 In the theory of a field such as
, an atomic definable set is the same thing as an affine algebraic set (also known as an affine algebraic variety, with the understanding that varieties are not necessarily assumed to be irreducible), and a quantifier-free definable set is known as a constructible set; thus we see that algebraic geometry can be viewed in some sense as a special case of model theory. (Conversely, it can in fact be quite profitable to think of model theory as an abstraction of algebraic geometry; for instance, the concepts of Morley rank and Morley degree in model theory (discussed in this previous blog post) directly generalises the concepts of dimension and degree in algebraic geometry.) Over
, the interval
is a definable set, but not a quantifier-free definable set (and certainly not an atomic definable set); and similarly for the primes over
.
A quantifier-free definable set in is nothing more than a finite boolean combination of atomic definable sets; in other words, the class of quantifier-free definable sets over
is the smallest class that contains the atomic definable sets and is closed under boolean operations such as complementation and union (which generate all the other boolean operations). Similarly, the class of definable sets over
is the smallest class that contains the quantifier-free definable sets, and is also closed under the operation of projection
from
to
for every natural number
, where
is the map
.
Some structures have the property of enjoying quantifier elimination, which means that every definable set is in fact a quantifier-free definable set, or equivalently that the projection of a quantifier-free definable set is again quantifier-free. For instance, an algebraically closed field (with the field operations) has quantifier elimination (i.e. the projection of a constructible set is again constructible); this fact can be proven by the classical tool of resultants, and among other things can be used to give a proof of Hilbert’s nullstellensatz. (Note though that projection does not necessary preserve the property of being atomic; for instance, the projection of the atomic set
is the non-atomic, but still quantifier-free definable, set
.) In the converse direction, it is not difficult to use the nullstellensatz to deduce quantifier elimination. For theory of the real field
, which is not algebraically closed, one does not have quantifier elimination, as one can see from the example of the unit circle (which is a quantifier-free definable set) projecting down to the interval
(which is definable, but not quantifer-free definable). However, if one adds the additional operation of order
to the reals, giving it the language of an ordered field rather than just a field, then quantifier elimination is recovered (the class of quantifier-free definable sets now enlarges to match the class of definable sets, which in this case is also the class of semi-algebraic sets); this is the famous Tarski-Seidenberg theorem.
On the other hand, many important structures do not have quantifier elimination; typically, the projection of a quantifier-free definable set is not, in general, quantifier-free definable. This failure of the projection property also shows up in many contexts outside of model theory; for instance, Lebesgue famously made the error of thinking that the projection of a Borel measurable set remained Borel measurable (it is merely an analytic set instead). Turing’s halting theorem can be viewed as an assertion that the projection of a decidable set (also known as a computable or recursive set) is not necessarily decidable (it is merely semi-decidable (or recursively enumerable) instead). The notorious P=NP problem can also be essentially viewed in this spirit; roughly speaking (and glossing over the placement of some quantifiers), it asks whether the projection of a polynomial-time decidable set is again polynomial-time decidable. And so forth. (See this blog post of Dick Lipton for further discussion of the subtleties of projections.)
Now we consider the status of quantifier elimination for the theory of a finite field . If interpreted naively, quantifier elimination is trivial for a finite field
, since every subset of
is finite and thus quantifier-free definable. However, we can recover an interesting question in one of two (essentially equivalent) ways. One is to work in the asymptotic regime in which the field
is large, but the length of the formulae used to construct one’s definable sets stays bounded uniformly in the size of
(where we view any constant in
as contributing a unit amount to the length of a formula, no matter how large
is). A simple counting argument then shows that only a small number of subsets of
become definable in the asymptotic limit
, since the number of definable sets clearly grows at most polynomially in
for any fixed bound on the formula length, while the number of all subsets of
grows exponentially in
.
Another way to proceed is to work not with a single finite field , or even with a sequence
of finite fields, but with the ultraproduct
of a sequence of finite fields, and to study the properties of definable sets over this ultraproduct. (We will be using the notation of ultraproducts and nonstandard analysis from this previous blog post.) This approach is equivalent to the more finitary approach mentioned in the previous paragraph, at least if one does not care to track of the exact bounds on the length of the formulae involved. Indeed, thanks to Los’s theorem, a definable subset
of
is nothing more than the ultraproduct
of definable subsets
of
for all
sufficiently close to
, with the length of the formulae used to define
uniformly bounded in
. In the language of nonstandard analysis, one can view
as a nonstandard finite field.
The ultraproduct of finite fields is an important example of a pseudo-finite field – a field that obeys all the sentences in the languages of fields that finite fields do, but is not necessarily itself a finite field. The model theory of pseudo-finite fields was first studied systematically by Ax (in the same paper where the Ax-Grothendieck theorem, discussed previously on this blog, was established), with important further contributions by Kiefe, by Fried-Sacerdote, by two papers of Chatzidakis-van den Dries-Macintyre, and many other authors.
As mentioned before, quantifier elimination trivially holds for finite fields. But for infinite pseudo-finite fields, such as the ultraproduct of finite fields with
going to infinity, quantifier elimination fails. For instance, in a finite field
, the set
of quadratic residues is a definable set, with a bounded formula length, and so in the ultraproduct
, the set
of nonstandard quadratic residues is also a definable set. However, in one dimension, we see from the factor theorem that the only atomic definable sets are either finite or the whole field
, and so the only constructible sets (i.e. the only quantifier-free definable sets) are either finite or cofinite in
. Since the quadratic residues have asymptotic density
in a large finite field, they cannot form a quantifier-free definable set, despite being definable.
Nevertheless, there is a very nice almost quantifier elimination result for these fields, in characteristic zero at least, which we phrase here as follows:
Theorem 1 (Almost quantifier elimination) Let
be a nonstandard finite field of characteristic zero, and let
be a definable set over
. Then
is the union of finitely many sets of the form
where
is an atomic definable subset of
(i.e. the
-points of an algebraic variety
defined over
in
) and
is a polynomial.
Results of this type were first obtained essentially due to Catarina Kiefe, although the formulation here is closer to that of Cherlin-van den Dries-Macintyre.
Informally, this theorem says that while we cannot quite eliminate all quantifiers from a definable set over a nonstandard finite field, we can eliminate all but one existential quantifier. Note that negation has also been eliminated in this theorem; for instance, the definable set uses a negation, but can also be described using a single existential quantifier as
.) I believe that there are more complicated analogues of this result in positive characteristic, but I have not studied this case in detail (Kiefe’s result does not assume characteristic zero, but her conclusion is slightly different from the one given here). In the one-dimensional case
, the only varieties
are the affine line and finite sets, and we can simplify the above statement, namely that any definable subset of
takes the form
for some polynomial
(i.e. definable sets in
are nothing more than the projections of the
-points of a plane curve).
There is an equivalent formulation of this theorem for standard finite fields, namely that if is a finite field and
is definable using a formula of length at most
, then
can be expressed in the form (2) with the degree of
bounded by some quantity
depending on
and
, assuming that the characteristic of
is sufficiently large depending on
.
The theorem gives quite a satisfactory description of definable sets in either standard or nonstandard finite fields (at least if one does not care about effective bounds in some of the constants, and if one is willing to exclude the small characteristic case); for instance, in conjunction with the Lang-Weil bound discussed in this recent blog post, it shows that any non-empty definable subset of a nonstandard finite field has a nonstandard cardinality of for some positive standard rational
and integer
. Equivalently, any non-empty definable subset of
for some standard finite field
using a formula of length at most
has a standard cardinality of
for some positive rational of height
and some natural number
between
and
. (For instance, in the example of the quadratic residues given above,
is equal to
and
equal to
.) There is a more precise statement to this effect, namely that the Poincaré series of a definable set is rational; see Kiefe’s paper for details.
Below the fold I give a proof of Theorem 1, which relies primarily on the Lang-Weil bound mentioned above.
Let be a finite field, with algebraic closure
, and let
be an (affine) algebraic variety defined over
, by which I mean a set of the form
for some ambient dimension , and some finite number of polynomials
. In order to reduce the number of subscripts later on, let us say that
has complexity at most
if
,
, and the degrees of the
are all less than or equal to
. Note that we do not require at this stage that
be irreducible (i.e. not the union of two strictly smaller varieties), or defined over
, though we will often specialise to these cases later in this post. (Also, everything said here can also be applied with almost no changes to projective varieties, but we will stick with affine varieties for sake of concreteness.)
One can consider two crude measures of how “big” the variety is. The first measure, which is algebraic geometric in nature, is the dimension
of the variety
, which is an integer between
and
(or, depending on convention,
,
, or undefined, if
is empty) that can be defined in a large number of ways (e.g. it is the largest
for which the generic linear projection from
to
is dominant, or the smallest
for which the intersection with a generic codimension
subspace is non-empty). The second measure, which is number-theoretic in nature, is the number
of
-points of
, i.e. points
in
all of whose coefficients lie in the finite field, or equivalently the number of solutions to the system of equations
for
with variables
in
.
These two measures are linked together in a number of ways. For instance, we have the basic Schwarz-Zippel type bound (which, in this qualitative form, goes back at least to Lemma 1 of the work of Lang and Weil in 1954).
Lemma 1 (Schwarz-Zippel type bound) Let
be a variety of complexity at most
. Then we have
.
Proof: (Sketch) For the purposes of exposition, we will not carefully track the dependencies of implied constants on the complexity , instead simply assuming that all of these quantities remain controlled throughout the argument. (If one wished, one could obtain ineffective bounds on these quantities by an ultralimit argument, as discussed in this previous post, or equivalently by moving everything over to a nonstandard analysis framework; one could also obtain such uniformity using the machinery of schemes.)
We argue by induction on the ambient dimension of the variety
. The
case is trivial, so suppose
and that the claim has already been proven for
. By breaking up
into irreducible components we may assume that
is irreducible (this requires some control on the number and complexity of these components, but this is available, as discussed in this previous post). For each
, the fibre
is either one-dimensional (and thus all of
) or zero-dimensional. In the latter case, one has
points in the fibre from the fundamental theorem of algebra (indeed one has a bound of
in this case), and
lives in the projection of
to
, which is a variety of dimension at most
and controlled complexity, so the contribution of this case is acceptable from the induction hypothesis. In the former case, the fibre contributes
-points, but
lies in a variety in
of dimension at most
(since otherwise
would contain a subvariety of dimension at least
, which is absurd) and controlled complexity, and so the contribution of this case is also acceptable from the induction hypothesis.
One can improve the bound on the implied constant to be linear in the degree of (see e.g. Claim 7.2 of this paper of Dvir, Kollar, and Lovett, or Lemma A.3 of this paper of Ellenberg, Oberlin, and myself), but we will not be concerned with these improvements here.
Without further hypotheses on , the above upper bound is sharp (except for improvements in the implied constants). For instance, the variety
where are distict, is the union of
distinct hyperplanes of dimension
, with
and complexity
; similar examples can easily be concocted for other choices of
. In the other direction, there is also no non-trivial lower bound for
without further hypotheses on
. For a trivial example, if
is an element of
that does not lie in
, then the hyperplane
clearly has no -points whatsoever, despite being a
-dimensional variety in
of complexity
. For a slightly less non-trivial example, if
is an element of
that is not a quadratic residue, then the variety
which is the union of two hyperplanes, still has no -points, even though this time the variety is defined over
instead of
(by which we mean that the defining polynomial(s) have all of their coefficients in
). There is however the important Lang-Weil bound that allows for a much better estimate as long as
is both defined over
and irreducible:
Theorem 2 (Lang-Weil bound) Let
be a variety of complexity at most
. Assume that
is defined over
, and that
is irreducible as a variety over
(i.e.
is geometrically irreducible or absolutely irreducible). Then
Again, more explicit bounds on the implied constant here are known, but will not be the focus of this post. As the previous examples show, the hypotheses of definability over and geometric irreducibility are both necessary.
The Lang-Weil bound is already non-trivial in the model case of plane curves:
Theorem 3 (Hasse-Weil bound) Let
be an irreducible polynomial of degree
with coefficients in
. Then
Thus, for instance, if , then the elliptic curve
has
-points, a result first established by Hasse. The Hasse-Weil bound is already quite non-trivial, being the analogue of the Riemann hypothesis for plane curves. For hyper-elliptic curves, an elementary proof (due to Stepanov) is discussed in this previous post. For general plane curves, the first proof was by Weil (leading to his famous Weil conjectures); there is also a nice version of Stepanov’s argument due to Bombieri covering this case which is a little less elementary (relying crucially on the Riemann-Roch theorem for the upper bound, and a lifting trick to then get the lower bound), which I briefly summarise later in this post. The full Lang-Weil bound is deduced from the Hasse-Weil bound by an induction argument using generic hyperplane slicing, as I will also summarise later in this post.
The hypotheses of definability over and geometric irreducibility in the Lang-Weil can be removed after inserting a geometric factor:
Corollary 4 (Lang-Weil bound, alternate form) Let
be a variety of complexity at most
. Then one has
where
is the number of top-dimensional components of
(i.e. geometrically irreducible components of
of dimension
) that are definable over
, or equivalently are invariant with respect to the Frobenius endomorphism
that defines
.
Proof: By breaking up a general variety into components (and using Lemma 1 to dispose of any lower-dimensional components), it suffices to establish this claim when
is itself geometrically irreducible. If
is definable over
, the claim follows from Theorem 2. If
is not definable over
, then it is not fixed by the Frobenius endomorphism
(since otherwise one could produce a set of defining polynomials that were fixed by Frobenius and thus defined over
by using some canonical basis (such as a reduced Grobner basis) for the associated ideal), and so
has strictly smaller dimension than
. But
captures all the
-points of
, so in this case the claim follows from Lemma 1.
Note that if is reducible but is itself defined over
, then the Frobenius endomorphism preserves
itself, but may permute the components of
around. In this case,
is the number of fixed points of this permutation action of Frobenius on the components. In particular,
is always a natural number between
and
; thus we see that regardless of the geometry of
, the normalised count
is asymptotically restricted to a bounded range of natural numbers (in the regime where the complexity stays bounded and
goes to infinity).
Example 1 Consider the variety
for some non-zero parameter
. Geometrically (by which we basically mean “when viewed over the algebraically closed field
“), this is the union of two lines, with slopes corresponding to the two square roots of
. If
is a quadratic residue, then both of these lines are defined over
, and are fixed by Frobenius, and
in this case. If
is not a quadratic residue, then the lines are not defined over
, and the Frobenius automorphism permutes the two lines while preserving
as a whole, giving
in this case.
Corollary 4 effectively computes (at least to leading order) the number-theoretic size of a variety in terms of geometric information about
, namely its dimension
and the number
of top-dimensional components fixed by Frobenius. It turns out that with a little bit more effort, one can extend this connection to cover not just a single variety
, but a family of varieties indexed by points in some base space
. More precisely, suppose we now have two affine varieties
of bounded complexity, together with a regular map
of bounded complexity (the definition of complexity of a regular map is a bit technical, see e.g. this paper, but one can think for instance of a polynomial or rational map of bounded degree as a good example). It will be convenient to assume that the base space
is irreducible. If the map
is a dominant map (i.e. the image
is Zariski dense in
), then standard algebraic geometry results tell us that the fibres
are an unramified family of
-dimensional varieties outside of an exceptional subset
of
of dimension strictly smaller than
(and with
having dimension strictly smaller than
); see e.g. Section I.6.3 of Shafarevich.
Now suppose that ,
, and
are defined over
. Then, by Lang-Weil,
has
-points, and by Schwarz-Zippel, for all but
of these
-points
(the ones that lie in the subvariety
), the fibre
is an algebraic variety defined over
of dimension
. By using ultraproduct arguments (see e.g. Lemma 3.7 of this paper of mine with Emmanuel Breuillard and Ben Green), this variety can be shown to have bounded complexity, and thus by Corollary 4, has
-points. One can then ask how the quantity
is distributed. A simple but illustrative example occurs when
and
is the polynomial
. Then
equals
when
is a non-zero quadratic residue and
when
is a non-zero quadratic non-residue (and
when
is zero, but this is a negligible fraction of all
). In particular, in the asymptotic limit
,
is equal to
half of the time and
half of the time.
Now we describe the asymptotic distribution of the . We need some additional notation. Let
be an
-point in
, and let
be the connected components of the fibre
. As
is defined over
, this set of components is permuted by the Frobenius endomorphism
. But there is also an action by monodromy of the fundamental group
(this requires a certain amount of étale machinery to properly set up, as we are working over a positive characteristic field rather than over the complex numbers, but I am going to ignore this rather important detail here, as I still don’t fully understand it). This fundamental group may be infinite, but (by the étale construction) is always profinite, and in particular has a Haar probability measure, in which every finite index subgroup (and their cosets) are measurable. Thus we may meaningfully talk about elements drawn uniformly at random from this group, so long as we work only with the profinite
-algebra on
that is generated by the cosets of the finite index subgroups of this group (which will be the only relevant sets we need to measure when considering the action of this group on finite sets, such as the components of a generic fibre).
Theorem 5 (Lang-Weil with parameters) Let
be varieties of complexity at most
with
irreducible, and let
be a dominant map of complexity at most
. Let
be an
-point of
. Then, for any natural number
, one has
for
values of
, where
is the random variable that counts the number of components of a generic fibre
that are invariant under
, where
is an element chosen uniformly at random from the étale fundamental group
. In particular, in the asymptotic limit
, and with
chosen uniformly at random from
,
(or, equivalently,
) and
have the same asymptotic distribution.
This theorem generalises Corollary 4 (which is the case when is just a point, so that
is just
and
is trivial). Informally, the effect of a non-trivial parameter space
on the Lang-Weil bound is to push around the Frobenius map by monodromy for the purposes of counting invariant components, and a randomly chosen set of parameters corresponds to a randomly chosen loop on which to perform monodromy.
Example 2 Let
and
for some fixed
; to avoid some technical issues let us suppose that
is coprime to
. Then
can be taken to be
, and for a base point
we can take
. The fibre
– the
roots of unity – can be identified with the cyclic group
by using a primitive root of unity. The étale fundamental group
is (I think) isomorphic to the profinite closure
of the integers
(excluding the part of that closure coming from the characteristic of
). Not coincidentally, the integers
are the fundamental group of the complex analogue
of
. (Brian Conrad points out to me though that for more complicated varieties, such as covers of
by a power of the characteristic, the etale fundamental group is more complicated than just a profinite closure of the ordinary fundamental group, due to the presence of Artin-Schreier covers that are only ramified at infinity.) The action of this fundamental group on the fibres
can given by translation. Meanwhile, the Frobenius map
on
is given by multiplication by
. A random element
then becomes a random affine map
on
, where
chosen uniformly at random from
. The number of fixed points of this map is equal to the greatest common divisor
of
and
when
is divisible by
, and equal to
otherwise. This matches up with the elementary number fact that a randomly chosen non-zero element of
will be an
power with probability
, and when this occurs, the number of
roots in
will be
.
Example 3 (Thanks to Jordan Ellenberg for this example.) Consider a random elliptic curve
, where
are chosen uniformly at random, and let
. Let
be the
-torsion points of
(i.e. those elements
with
using the elliptic curve addition law); as a group, this is isomorphic to
(assuming that
has sufficiently large characteristic, for simplicity), and consider the number of
points of
, which is a random variable taking values in the natural numbers between
and
. In this case, the base variety
is the modular curve
, and the covering variety
is the modular curve
. The generic fibre here can be identified with
, the monodromy action projects down to the action of
, and the action of Frobenius on this fibre can be shown to be given by a
matrix with determinant
(with the exact choice of matrix depending on the choice of fibre and of the identification), so the distribution of the number of
-points of
is asymptotic to the distribution of the number of fixed points
of a random linear map of determinant
on
.
Theorem 5 seems to be well known “folklore” among arithmetic geometers, though I do not know of an explicit reference for it. I enjoyed deriving it for myself (though my derivation is somewhat incomplete due to my lack of understanding of étale cohomology) from the ordinary Lang-Weil theorem and the moment method. I’m recording this derivation later in this post, mostly for my own benefit (as I am still in the process of learning this material), though perhaps some other readers may also be interested in it.
Caveat: not all details are fully fleshed out in this writeup, particularly those involving the finer points of algebraic geometry and étale cohomology, as my understanding of these topics is not as complete as I would like it to be.
Many thanks to Brian Conrad and Jordan Ellenberg for helpful discussions on these topics.
One of the most basic methods in additive number theory is the Hardy-Littlewood circle method. This method is based on expressing a quantity of interest to additive number theory, such as the number of representations of an integer
as the sum of three primes
, as a Fourier-analytic integral over the unit circle
involving exponential sums such as
, and
. For instance, the expression
mentioned earlier can be written as
in order to obtain non-trivial bounds on quantities such as
. For instance, if one can show that
for all odd integers
greater than some given threshold
, this implies that all odd integers greater than
are expressible as the sum of three primes, thus establishing all but finitely many instances of the odd Goldbach conjecture.
Remark 1 In practice, it can be more efficient to work with smoother sums than the partial sum (1), for instance by replacing the cutoff
with a smoother cutoff
for a suitable chocie of cutoff function
, or by replacing the restriction of the summation to primes by a more analytically tractable weight, such as the von Mangoldt function
. However, these improvements to the circle method are primarily technical in nature and do not have much impact on the heuristic discussion in this post, so we will not emphasise them here. One can also certainly use the circle method to study additive combinations of numbers from other sets than the set of primes, but we will restrict attention to additive combinations of primes for sake of discussion, as it is historically one of the most studied sets in additive number theory.
In many cases, it turns out that one can get fairly precise evaluations on sums such as in the major arc case, when
is close to a rational number
with small denominator
, by using tools such as the prime number theorem in arithmetic progressions. For instance, the prime number theorem itself tells us that
and the prime number theorem in residue classes modulo suggests more generally that
when is small and
is close to
, basically thanks to the elementary calculation that the phase
has an average value of
when
is uniformly distributed amongst the residue classes modulo
that are coprime to
. Quantifying the precise error in these approximations can be quite challenging, though, unless one assumes powerful hypotheses such as the Generalised Riemann Hypothesis.
In the minor arc case when is not close to a rational
with small denominator, one no longer expects to have such precise control on the value of
, due to the “pseudorandom” fluctuations of the quantity
. Using the standard probabilistic heuristic (supported by results such as the central limit theorem or Chernoff’s inequality) that the sum of
“pseudorandom” phases should fluctuate randomly and be of typical magnitude
, one expects upper bounds of the shape
. Indeed, a simple application of the Plancherel identity, followed by the prime number theorem, reveals that
are far more typical.
Because one only expects to have upper bounds on , rather than asymptotics, in the minor arc case, one cannot realistically hope to make much use of phases such as
for the minor arc contribution to integrals such as (2) (at least if one is working with a single, deterministic, value of
, so that averaging in
is unavailable). In particular, from upper bound information alone, it is difficult to avoid the “conspiracy” that the magnitude
oscillates in sympathetic resonance with the phase
, thus essentially eliminating almost all of the possible gain in the bounds that could arise from exploiting cancellation from that phase. Thus, one basically has little option except to use the triangle inequality to control the portion of the integral on the minor arc region
:
Despite this handicap, though, it is still possible to get enough bounds on both the major and minor arc contributions of integrals such as (2) to obtain non-trivial lower bounds on quantities such as , at least when
is large. In particular, this sort of method can be developed to give a proof of Vinogradov’s famous theorem that every sufficiently large odd integer
is the sum of three primes; my own result that all odd numbers greater than
can be expressed as the sum of at most five primes is also proven by essentially the same method (modulo a number of minor refinements, and taking advantage of some numerical work on both the Goldbach problems and on the Riemann hypothesis ). It is certainly conceivable that some further variant of the circle method (again combined with a suitable amount of numerical work, such as that of numerically establishing zero-free regions for the Generalised Riemann Hypothesis) can be used to settle the full odd Goldbach conjecture; indeed, under the assumption of the Generalised Riemann Hypothesis, this was already achieved by Deshouillers, Effinger, te Riele, and Zinoviev back in 1997. I am optimistic that an unconditional version of this result will be possible within a few years or so, though I should say that there are still significant technical challenges to doing so, and some clever new ideas will probably be needed to get either the Vinogradov-style argument or numerical verification to work unconditionally for the three-primes problem at medium-sized ranges of
, such as
. (But the intermediate problem of representing all even natural numbers as the sum of at most four primes looks somewhat closer to being feasible, though even this would require some substantially new and non-trivial ideas beyond what is in my five-primes paper.)
However, I (and many other analytic number theorists) are considerably more skeptical that the circle method can be applied to the even Goldbach problem of representing a large even number as the sum
of two primes, or the similar (and marginally simpler) twin prime conjecture of finding infinitely many pairs of twin primes, i.e. finding infinitely many representations
of
as the difference of two primes. At first glance, the situation looks tantalisingly similar to that of the Vinogradov theorem: to settle the even Goldbach problem for large
, one has to find a non-trivial lower bound for the quantity
, as this quantity
is also the number of ways to represent
as the sum
of two primes
. Similarly, to settle the twin prime problem, it would suffice to obtain a lower bound for the quantity
, as this quantity
is also the number of ways to represent
as the difference
of two primes less than or equal to
.
In principle, one can achieve either of these two objectives by a sufficiently fine level of control on the exponential sums . Indeed, there is a trivial (and uninteresting) way to take any (hypothetical) solution of either the asymptotic even Goldbach problem or the twin prime problem and (artificially) convert it to a proof that “uses the circle method”; one simply begins with the quantity
or
, expresses it in terms of
using (5) or (6), and then uses (5) or (6) again to convert these integrals back into a the combinatorial expression of counting solutions to
or
, and then uses the hypothetical solution to the given problem to obtain the required lower bounds on
or
.
Of course, this would not qualify as a genuine application of the circle method by any reasonable measure. One can then ask the more refined question of whether one could hope to get non-trivial lower bounds on or
(or similar quantities) purely from the upper and lower bounds on
or similar quantities (and of various
type norms on such quantities, such as the
bound (4)). Of course, we do not yet know what the strongest possible upper and lower bounds in
are yet (otherwise we would already have made progress on major conjectures such as the Riemann hypothesis); but we can make plausible heuristic conjectures on such bounds. And this is enough to make the following heuristic conclusions:
- (i) For “binary” problems such as computing (5), (6), the contribution of the minor arcs potentially dominates that of the major arcs (if all one is given about the minor arc sums is magnitude information), in contrast to “ternary” problems such as computing (2), in which it is the major arc contribution which is absolutely dominant.
- (ii) Upper and lower bounds on the magnitude of
are not sufficient, by themselves, to obtain non-trivial bounds on (5), (6) unless these bounds are extremely tight (within a relative error of
or better); but
- (iii) obtaining such tight bounds is a problem of comparable difficulty to the original binary problems.
I will provide some justification for these conclusions below the fold; they are reasonably well known “folklore” to many researchers in the field, but it seems that they are rarely made explicit in the literature (in part because these arguments are, by their nature, heuristic instead of rigorous) and I have been asked about them from time to time, so I decided to try to write them down here.
In view of the above conclusions, it seems that the best one can hope to do by using the circle method for the twin prime or even Goldbach problems is to reformulate such problems into a statement of roughly comparable difficulty to the original problem, even if one assumes powerful conjectures such as the Generalised Riemann Hypothesis (which lets one make very precise control on major arc exponential sums, but not on minor arc ones). These are not rigorous conclusions – after all, we have already seen that one can always artifically insert the circle method into any viable approach on these problems – but they do strongly suggest that one needs a method other than the circle method in order to fully solve either of these two problems. I do not know what such a method would be, though I can give some heuristic objections to some of the other popular methods used in additive number theory (such as sieve methods, or more recently the use of inverse theorems); this will be done at the end of this post.
I’ve just uploaded to the arXiv my paper “Every odd number greater than 1 is the sum of at most five primes“, submitted to Mathematics of Computation. The main result of the paper is as stated in the title, and is in the spirit of (though significantly weaker than) the even Goldbach conjecture (every even natural number is the sum of at most two primes) and odd Goldbach conjecture (every odd natural number greater than 1 is the sum of at most three primes). It also improves on a result of Ramaré that every even natural number is the sum of at most six primes. This result had previously also been established by Kaniecki under the additional assumption of the Riemann hypothesis, so one can view the main result here as an unconditional version of Kaniecki’s result.
The method used is the Hardy-Littlewood circle method, which was for instance also used to prove Vinogradov’s theorem that every sufficiently large odd number is the sum of three primes. Let’s quickly recall how this argument works. It is convenient to use a proxy for the primes, such as the von Mangoldt function , which is mostly supported on the primes. To represent a large number
as the sum of three primes, it suffices to obtain a good lower bound for the sum
By Fourier analysis, one can rewrite this sum as an integral
where
and . To control this integral, one then needs good bounds on
for various values of
. To do this, one first approximates
by a rational
with controlled denominator (using a tool such as the Dirichlet approximation theorem)
. The analysis then broadly bifurcates into the major arc case when
is small, and the minor arc case when
is large. In the major arc case, the problem more or less boils down to understanding sums such as
which in turn is almost equivalent to understanding the prime number theorem in arithmetic progressions modulo . In the minor arc case, the prime number theorem is not strong enough to give good bounds (unless one is using some extremely strong hypotheses, such as the generalised Riemann hypothesis), so instead one uses a rather different method, using truncated versions of divisor sum identities such as
to split
into a collection of linear and bilinear sums that are more tractable to bound, typical examples of which (after using a particularly simple truncated divisor sum identity known as Vaughan’s identity) include the “Type I sum”
and the “Type II sum”
After using tools such as the triangle inequality or Cauchy-Schwarz inequality to eliminate arithmetic functions such as or
, one ends up controlling plain exponential sums such as
, which can be efficiently controlled in the minor arc case.
This argument works well when is extremely large, but starts running into problems for moderate sized
, e.g.
. The first issue is that of logarithmic losses in the minor arc estimates. A typical minor arc estimate takes the shape
is close to
for some
. This only improves upon the trivial estimate
from the prime number theorem when
. As a consequence, it becomes necessary to obtain an accurate prime number theorem in arithmetic progressions with modulus as large as
. However, with current technology, the error term in such theorems are quite poor (terms such as
for some small
are typical, and there is also a notorious “Siegel zero” problem), and as a consequence, the method is generally only applicable for very large
. For instance, the best explicit result of Vinogradov type known currently is due to Liu and Wang, who established that all odd numbers larger than
are the sum of three odd primes. (However, on the assumption of the GRH, the full odd Goldbach conjecture is known to be true; this is a result of Deshouillers, Effinger, te Riele, and Zinoviev.)
In this paper, we make a number of refinements to the general scheme, each one of which is individually rather modest and not all that novel, but which when added together turn out to be enough to resolve the five primes problem (though many more ideas would still be needed to tackle the three primes problem, and as is well known the circle method is very unlikely to be the route to make progress on the two primes problem). The first refinement, which is only available in the five primes case, is to take advantage of the numerical verification of the even Goldbach conjecture up to some large (we take
, using a verification of Richstein, although there are now much larger values of
– as high as
– for which the conjecture has been verified). As such, instead of trying to represent an odd number
as the sum of five primes, we can represent it as the sum of three odd primes and a natural number between
and
. This effectively brings us back to the three primes problem, but with the significant additional boost that one can essentially restrict the frequency variable
to be of size
. In practice, this eliminates all of the major arcs except for the principal arc around
. This is a significant simplification, in particular avoiding the need to deal with the prime number theorem in arithmetic progressions (and all the attendant theory of L-functions, Siegel zeroes, etc.).
In a similar spirit, by taking advantage of the numerical verification of the Riemann hypothesis up to some height , and using the explicit formula relating the von Mangoldt function with the zeroes of the zeta function, one can safely deal with the principal major arc
. For our specific application, we use the value
, arising from the verification of the Riemann hypothesis of the first
zeroes by van de Lune (unpublished) and Wedeniswki. (Such verifications have since been extended further, the latest being that the first
zeroes lie on the line.)
To make the contribution of the major arc as efficient as possible, we borrow an idea from a paper of Bourgain, and restrict one of the three primes in the three-primes problem to a somewhat shorter range than the other two (of size instead of
, where we take
to be something like
), as this largely eliminates the “Archimedean” losses coming from trying to use Fourier methods to control convolutions on
. In our paper, we set the scale parameter
to be
(basically, anything that is much larger than
but much less than
will work), but we found that an additional gain (which we ended up not using) could be obtained by averaging
over a range of scales, say between
and
. This sort of averaging could be a useful trick in future work on Goldbach-type problems.
It remains to treat the contribution of the “minor arc” . To do this, one needs good
and
type estimates on the exponential sum
. Plancherel’s theorem gives an
estimate which loses a logarithmic factor, but it turns out that on this particular minor arc one can use tools from the theory of the large sieve (such as Montgomery’s uncertainty principle) to eliminate this logarithmic loss almost completely; it turns out that the most efficient way to do this is use an effective upper bound of Siebert on the number of prime pairs
less than
to obtain an
bound that only loses a factor of
(or of
, once one cuts out the major arc).
For estimates, it turns out that existing effective versions of (1) (in particular, the bound given by Chen and Wang) are insufficient, due to the three logarithmic factors of
in the bound. By using a smoothed out version
of the sum
, for some suitable cutoff function
, one can save one factor of a logarithm, obtaining a bound of the form
with effective constants. One can improve the constants further by restricting all summations to odd integers (which barely affects , since
was mostly supported on odd numbers anyway), which in practice reduces the effective constants by a factor of two or so. One can also make further improvements in the constants by using the very sharp large sieve inequality to control the “Type II” sums that arise from Vaughan’s identity, and by using integration by parts to improve the bounds on the “Type I” sums. A final gain can then be extracted by optimising the cutoff parameters
appearing in Vaughan’s identity to minimise the contribution of the Type II sums (which, in practice, are the dominant term). Combining all these improvements, one ends up with bounds of the shape
when is small (say
) and
when is large (say
). (See the paper for more explicit versions of these estimates.) The point here is that the
factors have been partially replaced by smaller logarithmic factors such as
or
. Putting together all of these improvements, one can finally obtain a satisfactory bound on the minor arc. (There are still some terms with a
factor in them, but we use the effective Vinogradov theorem of Liu and Wang to upper bound
by
, which ends up making the remaining terms involving
manageable.)
Let be an element of the unit circle, let
, and let
. We define the (rank one) Bohr set
to be the set
where is the distance to the origin in the unit circle (or equivalently, the distance to the nearest integer, after lifting up to
). These sets play an important role in additive combinatorics and in additive number theory. For instance, they arise naturally when applying the circle method, because Bohr sets describe the oscillation of exponential phases such as
.
Observe that Bohr sets enjoy the doubling property
thus doubling the Bohr set doubles both the length parameter and the radius parameter
. As such, these Bohr sets resemble two-dimensional balls (or boxes). Indeed, one can view
as the preimage of the two-dimensional box
under the homomorphism
.
Another class of finite set with two-dimensional behaviour is the class of (rank two) generalised arithmetic progressions
with and
Indeed, we have
and so we see, as with the Bohr set, that doubling the generalised arithmetic progressions doubles the two defining parameters of that progression.
More generally, there is an analogy between rank Bohr sets
and the rank generalised arithmetic progressions
One of the aims of additive combinatorics is to formalise analogies such as the one given above. By using some arguments from the geometry of numbers, for instance, one can show that for any rank Bohr set
, there is a rank
generalised arithmetic progression
for which one has the containments
for some explicit depending only on
(in fact one can take
); this is (a slight modification of) Lemma 4.22 of my book with Van Vu.
In the special case when , one can make a significantly more detailed description of the link between rank one Bohr sets and rank two generalised arithmetic progressions, by using the classical theory of continued fractions, which among other things gives a fairly precise formula for the generators
and lengths
of the generalised arithmetic progression associated to a rank one Bohr set
. While this connection is already implicit in the continued fraction literature (for instance, in the classic text of Hardy and Wright), I thought it would be a good exercise to work it out explicitly and write it up, which I will do below the fold.
It is unfortunate that the theory of continued fractions is restricted to the rank one setting (it relies very heavily on the total ordering of one-dimensional sets such as or
). A higher rank version of the theory could potentially help with questions such as the Littlewood conjecture, which remains open despite a substantial amount of effort and partial progress on the problem. At the end of this post I discuss how one can use the rank one theory to rephrase the Littlewood conjecture as a conjecture about a doubly indexed family of rank four progressions, which can be used to heuristically justify why this conjecture should be true, but does not otherwise seem to shed much light on the problem.
One of the most fundamental principles in Fourier analysis is the uncertainty principle. It does not have a single canonical formulation, but one typical informal description of the principle is that if a function is restricted to a narrow region of physical space, then its Fourier transform
must be necessarily “smeared out” over a broad region of frequency space. Some versions of the uncertainty principle are discussed in this previous blog post.
In this post I would like to highlight a useful instance of the uncertainty principle, due to Hugh Montgomery, which is useful in analytic number theory contexts. Specifically, suppose we are given a complex-valued function on the integers. To avoid irrelevant issues at spatial infinity, we will assume that the support
of this function is finite (in practice, we will only work with functions that are supported in an interval
for some natural numbers
). Then we can define the Fourier transform
by the formula
where . (In some literature, the sign in the exponential phase is reversed, but this will make no substantial difference to the arguments below.)
The classical uncertainty principle, in this context, asserts that if is localised in an interval of length
, then
must be “smeared out” at a scale of at least
(and essentially constant at scales less than
). For instance, if
is supported in
, then we have the Plancherel identity
for each frequency , and in particular that
for any arc in the unit circle (with
denoting the length of
). In particular, an interval of length significantly less than
can only capture a fraction of the
energy of the Fourier transform of
, which is consistent with the above informal statement of the uncertainty principle.
Another manifestation of the classical uncertainty principle is the large sieve inequality. A particularly nice formulation of this inequality is due independently to Montgomery and Vaughan and Selberg: if is supported in
, and
are frequencies in
that are
-separated for some
, thus
for all
(where
denotes the distance of
to the origin in
), then
is essentially constant at scales less than
. The factor
can in fact be amplified a little bit to
, which is essentially optimal, by using a neat dilation trick of Paul Cohen, in which one dilates
to
(and replaces each frequency
by their
roots), and then sending
(cf. the tensor product trick); see this survey of Montgomery for details. But we will not need this refinement here.
In the above instances of the uncertainty principle, the concept of narrow support in physical space was formalised in the Archimedean sense, using the standard Archimedean metric on the integers
(in particular, the parameter
is essentially the Archimedean diameter of the support of
). However, in number theory, the Archimedean metric is not the only metric of importance on the integers; the
-adic metrics play an equally important role; indeed, it is common to unify the Archimedean and
-adic perspectives together into a unified adelic perspective. In the
-adic world, the metric balls are no longer intervals, but are instead residue classes modulo some power of
. Intersecting these balls from different
-adic metrics together, we obtain residue classes with respect to various moduli
(which may be either prime or composite). As such, another natural manifestation of the concept of “narrow support in physical space” is “vanishes on many residue classes modulo
“. This notion of narrowness is particularly common in sieve theory, when one deals with functions supported on thin sets such as the primes, which naturally tend to avoid many residue classes (particularly if one throws away the first few primes).
In this context, the uncertainty principle is this: the more residue classes modulo that
avoids, the more the Fourier transform
must spread out along multiples of
. To illustrate a very simple example of this principle, let us take
, and suppose that
is supported only on odd numbers (thus it completely avoids the residue class
). We write out the formulae for
and
:
If is supported on the odd numbers, then
is always equal to
on the support of
, and so we have
. Thus, whenever
has a significant presence at a frequency
, it also must have an equally significant presence at the frequency
; there is a spreading out across multiples of
. Note that one has a similar effect if
was supported instead on the even integers instead of the odd integers.
A little more generally, suppose now that avoids a single residue class modulo a prime
; for sake of argument let us say that it avoids the zero residue class
, although the situation for the other residue classes is similar. For any frequency
and any
, one has
From basic Fourier analysis, we know that the phases sum to zero as
ranges from
to
whenever
is not a multiple of
. We thus have
In particular, if is large, then one of the other
has to be somewhat large as well; using the Cauchy-Schwarz inequality, we can quantify this assertion in an
sense via the inequality
Let us continue this analysis a bit further. Now suppose that avoids
residue classes modulo a prime
, for some
. (We exclude the case
as it is clearly degenerates by forcing
to be identically zero.) Let
be the function that equals
on these residue classes and zero away from these residue classes, then
Using the periodic Fourier transform, we can write
for some coefficients , thus
Some Fourier-analytic computations reveal that
and
and so after some routine algebra and the Cauchy-Schwarz inequality, we obtain a generalisation of (3):
Thus we see that the more residue classes mod we exclude, the more Fourier energy has to disperse along multiples of
. It is also instructive to consider the extreme case
, in which
is supported on just a single residue class
; in this case, one clearly has
, and so
spreads its energy completely evenly along multiples of
.
In 1968, Montgomery observed the following useful generalisation of the above calculation to arbitrary modulus:
Proposition 1 (Montgomery’s uncertainty principle) Let
be a finitely supported function which, for each prime
, avoids
residue classes modulo
for some
. Then for each natural number
, one has
where
is the quantity
where
is the Möbius function.
We give a proof of this proposition below the fold.
Following the “adelic” philosophy, it is natural to combine this uncertainty principle with the large sieve inequality to take simultaneous advantage of localisation both in the Archimedean sense and in the -adic senses. This leads to the following corollary:
Corollary 2 (Arithmetic large sieve inequality) Let
be a function supported on an interval
which, for each prime
, avoids
residue classes modulo
for some
. Let
, and let
be a finite set of natural numbers. Suppose that the frequencies
with
,
, and
are
-separated. Then one has
where
was defined in (4).
Indeed, from the large sieve inequality one has
while from Proposition 1 one has
whence the claim.
There is a great deal of flexibility in the above inequality, due to the ability to select the set , the frequencies
, the omitted classes
, and the separation parameter
. Here is a typical application concerning the original motivation for the large sieve inequality, namely in bounding the size of sets which avoid many residue classes:
Corollary 3 (Large sieve) Let
be a set of integers contained in
which avoids
residue classes modulo
for each prime
, and let
. Then
where
Proof: We apply Corollary 2 with ,
,
,
, and
. The key point is that the Farey sequence of fractions
with
and
is
-separated, since
whenever are distinct fractions in this sequence.
If, for instance, is the set of all primes in
larger than
, then one can set
for all
, which makes
, where
is the Euler totient function. It is a classical estimate that
Using this fact and optimising in , we obtain (a special case of) the Brun-Titchmarsh inequality
where is the prime counting function; a variant of the same argument gives the more general Brun-Titchmarsh inequality
for any primitive residue class , where
is the number of primes less than or equal to
that are congruent to
. By performing a more careful optimisation using a slightly sharper version of the large sieve inequality (2) that exploits the irregular spacing of the Farey sequence, Montgomery and Vaughan were able to delete the error term in the Brun-Titchmarsh inequality, thus establishing the very nice inequality
for any natural numbers with
. This is a particularly useful inequality in non-asymptotic analytic number theory (when one wishes to study number theory at explicit orders of magnitude, rather than the number theory of sufficiently large numbers), due to the absence of asymptotic notation.
I recently realised that Corollary 2 also establishes a stronger version of the “restriction theorem for the Selberg sieve” that Ben Green and I proved some years ago (indeed, one can view Corollary 2 as a “restriction theorem for the large sieve”). I’m placing the details below the fold.
In the previous set of notes we saw how a representation-theoretic property of groups, namely Kazhdan’s property (T), could be used to demonstrate expansion in Cayley graphs. In this set of notes we discuss a different representation-theoretic property of groups, namely quasirandomness, which is also useful for demonstrating expansion in Cayley graphs, though in a somewhat different way to property (T). For instance, whereas property (T), being qualitative in nature, is only interesting for infinite groups such as or
, and only creates Cayley graphs after passing to a finite quotient, quasirandomness is a quantitative property which is directly applicable to finite groups, and is able to deduce expansion in a Cayley graph, provided that random walks in that graph are known to become sufficiently “flat” in a certain sense.
The definition of quasirandomness is easy enough to state:
Definition 1 (Quasirandom groups) Let
be a finite group, and let
. We say that
is
-quasirandom if all non-trivial unitary representations
of
have dimension at least
. (Recall a representation is trivial if
is the identity for all
.)
Exercise 1 Let
be a finite group, and let
. A unitary representation
is said to be irreducible if
has no
-invariant subspaces other than
and
. Show that
is
-quasirandom if and only if every non-trivial irreducible representation of
has dimension at least
.
Remark 1 The terminology “quasirandom group” was introduced explicitly (though with slightly different notational conventions) by Gowers in 2008 in his detailed study of the concept; the name arises because dense Cayley graphs in quasirandom groups are quasirandom graphs in the sense of Chung, Graham, and Wilson, as we shall see below. This property had already been used implicitly to construct expander graphs by Sarnak and Xue in 1991, and more recently by Gamburd in 2002 and by Bourgain and Gamburd in 2008. One can of course define quasirandomness for more general locally compact groups than the finite ones, but we will only need this concept in the finite case. (A paper of Kunze and Stein from 1960, for instance, exploits the quasirandomness properties of the locally compact group
to obtain mixing estimates in that group.)
Quasirandomness behaves fairly well with respect to quotients and short exact sequences:
Exercise 2 Let
be a short exact sequence of finite groups
.
- (i) If
is
-quasirandom, show that
is
-quasirandom also. (Equivalently: any quotient of a
-quasirandom finite group is again a
-quasirandom finite group.)
- (ii) Conversely, if
and
are both
-quasirandom, show that
is
-quasirandom also. (In particular, the direct or semidirect product of two
-quasirandom finite groups is again a
-quasirandom finite group.)
Informally, we will call quasirandom if it is
-quasirandom for some “large”
, though the precise meaning of “large” will depend on context. For applications to expansion in Cayley graphs, “large” will mean “
for some constant
independent of the size of
“, but other regimes of
are certainly of interest.
The way we have set things up, the trivial group is infinitely quasirandom (i.e. it is
-quasirandom for every
). This is however a degenerate case and will not be discussed further here. In the non-trivial case, a finite group can only be quasirandom if it is large and has no large subgroups:
Exercise 3 Let
, and let
be a finite
-quasirandom group.
- (i) Show that if
is non-trivial, then
. (Hint: use the mean zero component
of the regular representation
.) In particular, non-trivial finite groups cannot be infinitely quasirandom.
- (ii) Show that any proper subgroup
of
has index
. (Hint: use the mean zero component of the quasiregular representation.)
The following exercise shows that quasirandom groups have to be quite non-abelian, and in particular perfect:
Exercise 4 (Quasirandomness, abelianness, and perfection) Let
be a finite group.
- (i) If
is abelian and non-trivial, show that
is not
-quasirandom. (Hint: use Fourier analysis or the classification of finite abelian groups.)
- (ii) Show that
is
-quasirandom if and only if it is perfect, i.e. the commutator group
is equal to
. (Equivalently,
is
-quasirandom if and only if it has no non-trivial abelian quotients.)
Later on we shall see that there is a converse to the above two exercises; any non-trivial perfect finite group with no large subgroups will be quasirandom.
Exercise 5 Let
be a finite
-quasirandom group. Show that for any subgroup
of
,
is
-quasirandom, where
is the index of
in
. (Hint: use induced representations.)
Now we give an example of a more quasirandom group.
Lemma 2 (Frobenius lemma) If
is a field of some prime order
, then
is
-quasirandom.
This should be compared with the cardinality of the special linear group, which is easily computed to be
.
Proof: We may of course take to be odd. Suppose for contradiction that we have a non-trivial representation
on a unitary group of some dimension
with
. Set
to be the group element
and suppose first that is non-trivial. Since
, we have
; thus all the eigenvalues of
are
roots of unity. On the other hand, by conjugating
by diagonal matrices in
, we see that
is conjugate to
(and hence
conjugate to
) whenever
is a quadratic residue mod
. As such, the eigenvalues of
must be permuted by the operation
for any quadratic residue mod
. Since
has at least one non-trivial eigenvalue, and there are
distinct quadratic residues, we conclude that
has at least
distinct eigenvalues. But
is a
matrix with
, a contradiction. Thus
lies in the kernel of
. By conjugation, we then see that this kernel contains all unipotent matrices. But these matrices generate
(see exercise below), and so
is trivial, a contradiction.
Exercise 6 Show that for any prime
, the unipotent matrices
for
ranging over
generate
as a group.
Exercise 7 Let
be a finite group, and let
. If
is generated by a collection
of
-quasirandom subgroups, show that
is itself
-quasirandom.
Exercise 8 Show that
is
-quasirandom for any
and any prime
. (This is not sharp; the optimal bound here is
, which follows from the results of Landazuri and Seitz.)
As a corollary of the above results and Exercise 2, we see that the projective special linear group is also
-quasirandom.
Remark 2 One can ask whether the bound
in Lemma 2 is sharp, assuming of course that
is odd. Noting that
acts linearly on the plane
, we see that it also acts projectively on the projective line
, which has
elements. Thus
acts via the quasiregular representation on the
-dimensional space
, and also on the
-dimensional subspace
; this latter representation (known as the Steinberg representation) is irreducible. This shows that the
bound cannot be improved beyond
. More generally, given any character
,
acts on the
-dimensional space
of functions
that obey the twisted dilation invariance
for all
and
; these are known as the principal series representations. When
is the trivial character, this is the quasiregular representation discussed earlier. For most other characters, this is an irreducible representation, but it turns out that when
is the quadratic representation (thus taking values in
while being non-trivial), the principal series representation splits into the direct sum of two
-dimensional representations, which comes very close to matching the bound in Lemma 2. There is a parallel series of representations to the principal series (known as the discrete series) which is more complicated to describe (roughly speaking, one has to embed
in a quadratic extension
and then use a rotated version of the above construction, to change a split torus into a non-split torus), but can generate irreducible representations of dimension
, showing that the bound in Lemma 2 is in fact exactly sharp. These constructions can be generalised to arbitrary finite groups of Lie type using Deligne-Luzstig theory, but this is beyond the scope of this course (and of my own knowledge in the subject).
Exercise 9 Let
be an odd prime. Show that for any
, the alternating group
is
-quasirandom. (Hint: show that all cycles of order
in
are conjugate to each other in
(and not just in
); in particular, a cycle is conjugate to its
power for all
. Also, as
,
is simple, and so the cycles of order
generate the entire group.)
Remark 3 By using more precise information on the representations of the alternating group (using the theory of Specht modules and Young tableaux), one can show the slightly sharper statement that
is
-quasirandom for
(but is only
-quasirandom for
due to icosahedral symmetry, and
-quasirandom for
due to lack of perfectness). Using Exercise 3 with the index
subgroup
, we see that the bound
cannot be improved. Thus,
(for large
) is not as quasirandom as the special linear groups
(for
large and
bounded), because in the latter case the quasirandomness is as strong as a power of the size of the group, whereas in the former case it is only logarithmic in size.
If one replaces the alternating group
with the slightly larger symmetric group
, then quasirandomness is destroyed (since
, having the abelian quotient
, is not perfect); indeed,
is
-quasirandom and no better.
Remark 4 Thanks to the monumental achievement of the classification of finite simple groups, we know that apart from a finite number (26, to be precise) of sporadic exceptions, all finite simple groups (up to isomorphism) are either a cyclic group
, an alternating group
, or is a finite simple group of Lie type such as
. (We will define the concept of a finite simple group of Lie type more precisely in later notes, but suffice to say for now that such groups are constructed from reductive algebraic groups, for instance
is constructed from
in characteristic
.) In the case of finite simple groups
of Lie type with bounded rank
, it is known from the work of Landazuri and Seitz that such groups are
-quasirandom for some
depending only on the rank. On the other hand, by the previous remark, the large alternating groups do not have this property, and one can show that the finite simple groups of Lie type with large rank also do not have this property. Thus, we see using the classification that if a finite simple group
is
-quasirandom for some
and
is sufficiently large depending on
, then
is a finite simple group of Lie type with rank
. It would be of interest to see if there was an alternate way to establish this fact that did not rely on the classification, as it may lead to an alternate approach to proving the classification (or perhaps a weakened version thereof).
A key reason why quasirandomness is desirable for the purposes of demonstrating expansion is that quasirandom groups happen to be rapidly mixing at large scales, as we shall see below the fold. As such, quasirandomness is an important tool for demonstrating expansion in Cayley graphs, though because expansion is a phenomenon that must hold at all scales, one needs to supplement quasirandomness with some additional input that creates mixing at small or medium scales also before one can deduce expansion. As an example of this technique of combining quasirandomness with mixing at small and medium scales, we present a proof (due to Sarnak-Xue, and simplified by Gamburd) of a weak version of the famous “3/16 theorem” of Selberg on the least non-trivial eigenvalue of the Laplacian on a modular curve, which among other things can be used to construct a family of expander Cayley graphs in (compare this with the property (T)-based methods in the previous notes, which could construct expander Cayley graphs in
for any fixed
).

Recent Comments