You are currently browsing the category archive for the ‘math.NT’ category.

One of the basic general problems in analytic number theory is to understand as much as possible the fluctuations of the Möbius function ${\mu(n)}$, defined as ${(-1)^k}$ when ${n}$ is the product of ${k}$ distinct primes, and zero otherwise. For instance, as ${\mu}$ takes values in ${\{-1,0,1\}}$, we have the trivial bound

$\displaystyle |\sum_{n \leq x} \mu(n)| \leq x$

and the seemingly slight improvement

$\displaystyle \sum_{n \leq x} \mu(n) = o(x) \ \ \ \ \ (1)$

is already equivalent to the prime number theorem, as observed by Landau (see e.g. this previous blog post for a proof), while the much stronger (and still open) improvement

$\displaystyle \sum_{n \leq x} \mu(n) = O(x^{1/2+o(1)})$

is equivalent to the notorious Riemann hypothesis.

There is a general Möbius pseudorandomness heuristic that suggests that the sign pattern ${\mu}$ behaves so randomly (or pseudorandomly) that one should expect a substantial amount of cancellation in sums that involve the sign fluctuation of the Möbius function in a nontrivial fashion, with the amount of cancellation present comparable to the amount that an analogous random sum would provide; cf. the probabilistic heuristic discussed in this recent blog post. There are a number of ways to make this heuristic precise. As already mentioned, the Riemann hypothesis can be considered one such manifestation of the heuristic. Another manifestation is the following old conjecture of Chowla:

Conjecture 1 (Chowla’s conjecture) For any fixed integer ${m}$ and exponents ${a_1,a_2,\ldots,a_m \geq 0}$, with at least one of the ${a_i}$ odd (so as not to completely destroy the sign cancellation), we have

$\displaystyle \sum_{n \leq x} \mu(n+1)^{a_1} \ldots \mu(n+m)^{a_m} = o_{x \rightarrow \infty;m}(x).$

Note that as ${\mu^a = \mu^{a+2}}$ for any ${a \geq 1}$, we can reduce to the case when the ${a_i}$ take values in ${0,1,2}$ here. When only one of the ${a_i}$ are odd, this is essentially the prime number theorem in arithmetic progressions (after some elementary sieving), but with two or more of the ${a_i}$ are odd, the problem becomes completely open. For instance, the estimate

$\displaystyle \sum_{n \leq x} \mu(n) \mu(n+2) = o(x)$

is morally very close to the conjectured asymptotic

$\displaystyle \sum_{n \leq x} \Lambda(n) \Lambda(n+2) = 2\Pi_2 x + o(x)$

for the von Mangoldt function ${\Lambda}$, where ${\Pi_2 := \prod_{p > 2} (1 - \frac{1}{(p-1)^2}) = 0.66016\ldots}$ is the twin prime constant; this asymptotic in turn implies the twin prime conjecture. (To formally deduce estimates for von Mangoldt from estimates for Möbius, though, typically requires some better control on the error terms than ${o()}$, in particular gains of some power of ${\log x}$ are usually needed. See this previous blog post for more discussion.)

Remark 1 The Chowla conjecture resembles an assertion that, for ${n}$ chosen randomly and uniformly from ${1}$ to ${x}$, the random variables ${\mu(n+1),\ldots,\mu(n+k)}$ become asymptotically independent of each other (in the probabilistic sense) as ${x \rightarrow \infty}$. However, this is not quite accurate, because some moments (namely those with all exponents ${a_i}$ even) have the “wrong” asymptotic value, leading to some unwanted correlation between the two variables. For instance, the events ${\mu(n)=0}$ and ${\mu(n+4)=0}$ have a strong correlation with each other, basically because they are both strongly correlated with the event of ${n}$ being divisible by ${4}$. A more accurate interpretation of the Chowla conjecture is that the random variables ${\mu(n+1),\ldots,\mu(n+k)}$ are asymptotically conditionally independent of each other, after conditioning on the zero pattern ${\mu(n+1)^2,\ldots,\mu(n+k)^2}$; thus, it is the sign of the Möbius function that fluctuates like random noise, rather than the zero pattern. (The situation is a bit cleaner if one works instead with the Liouville function ${\lambda}$ instead of the Möbius function ${\mu}$, as this function never vanishes, but we will stick to the traditional Möbius function formalism here.)

A more recent formulation of the Möbius randomness heuristic is the following conjecture of Sarnak. Given a bounded sequence ${f: {\bf N} \rightarrow {\bf C}}$, define the topological entropy of the sequence to be the least exponent ${\sigma}$ with the property that for any fixed ${\epsilon > 0}$, and for ${m}$ going to infinity the set ${\{ (f(n+1),\ldots,f(n+m)): n \in {\bf N} \} \subset {\bf C}^m}$ of ${f}$ can be covered by ${O( \exp( \sigma m + o(m) ) )}$ balls of radius ${\epsilon}$. (If ${f}$ arises from a minimal topological dynamical system ${(X,T)}$ by ${f(n) := f(T^n x)}$, the above notion is equivalent to the usual notion of the topological entropy of a dynamical system.) For instance, if the sequence is a bit sequence (i.e. it takes values in ${\{0,1\}}$), then there are only ${\exp(\sigma m + o(m))}$ ${m}$-bit patterns that can appear as blocks of ${m}$ consecutive bits in this sequence. As a special case, a Turing machine with bounded memory that had access to a random number generator at the rate of one random bit produced every ${T}$ units of time, but otherwise evolved deterministically, would have an output sequence that had a topological entropy of at most ${\frac{1}{T} \log 2}$. A bounded sequence is said to be deterministic if its topological entropy is zero. A typical example is a polynomial sequence such as ${f(n) := e^{2\pi i \alpha n^2}}$ for some fixed ${\sigma}$; the ${m}$-blocks of such polynomials sequence have covering numbers that only grow polynomially in ${m}$, rather than exponentially, thus yielding the zero entropy. Unipotent flows, such as the horocycle flow on a compact hyperbolic surface, are another good source of deterministic sequences.

Conjecture 2 (Sarnak’s conjecture) Let ${f: {\bf N} \rightarrow {\bf C}}$ be a deterministic bounded sequence. Then

$\displaystyle \sum_{n \leq x} \mu(n) f(n) = o_{x \rightarrow \infty;f}(x).$

This conjecture in general is still quite far from being solved. However, special cases are known:

• For constant sequences, this is essentially the prime number theorem (1).
• For periodic sequences, this is essentially the prime number theorem in arithmetic progressions.
• For quasiperiodic sequences such as ${f(n) = F(\alpha n \hbox{ mod } 1)}$ for some continuous ${F}$, this follows from the work of Davenport.
• For nilsequences, this is a result of Ben Green and myself.
• For horocycle flows, this is a result of Bourgain, Sarnak, and Ziegler.
• For the Thue-Morse sequence, this is a result of Dartyge-Tenenbaum (with a stronger error term obtained by Maduit-Rivat). A subsequent result of Bourgain handles all bounded rank one sequences (though the Thue-Morse sequence is actually of rank two), and a related result of Green establishes asymptotic orthogonality of the Möbius function to bounded depth circuits, although such functions are not necessarily deterministic in nature.
• For the Rudin-Shapiro sequence, I sketched out an argument at this MathOverflow post.
• The Möbius function is known to itself be non-deterministic, because its square ${\mu^2(n)}$ (i.e. the indicator of the square-free functions) is known to be non-deterministic (indeed, its topological entropy is ${\frac{6}{\pi^2}\log 2}$). (The corresponding question for the Liouville function ${\lambda(n)}$, however, remains open, as the square ${\lambda^2(n)=1}$ has zero entropy.)
• In the converse direction, it is easy to construct sequences of arbitrarily small positive entropy that correlate with the Möbius function (a rather silly example is ${\mu(n) 1_{k|n}}$ for some fixed large (squarefree) ${k}$, which has topological entropy at most ${\log 2/k}$ but clearly correlates with ${\mu}$).

See this survey of Sarnak for further discussion of this and related topics.

In this post I wanted to give a very nice argument of Sarnak that links the above two conjectures:

Proposition 3 The Chowla conjecture implies the Sarnak conjecture.

The argument does not use any number-theoretic properties of the Möbius function; one could replace ${\mu}$ in both conjectures by any other function from the natural numbers to ${\{-1,0,+1\}}$ and obtain the same implication. The argument consists of the following ingredients:

1. To show that ${\sum_{n, it suffices to show that the expectation of the random variable ${\frac{1}{m} (\mu(n+1)f(n+1)+\ldots+\mu(n+m)f(n+m))}$, where ${n}$ is drawn uniformly at random from ${1}$ to ${x}$, can be made arbitrary small by making ${m}$ large (and ${n}$ even larger).
2. By the union bound and the zero topological entropy of ${f}$, it suffices to show that for any bounded deterministic coefficients ${c_1,\ldots,c_m}$, the random variable ${\frac{1}{m}(c_1 \mu(n+1) + \ldots + c_m \mu(n+m))}$ concentrates with exponentially high probability.
3. Finally, this exponentially high concentration can be achieved by the moment method, using a slight variant of the moment method proof of the large deviation estimates such as the Chernoff inequality or Hoeffding inequality (as discussed in this blog post).

As is often the case, though, while the “top-down” order of steps presented above is perhaps the clearest way to think conceptually about the argument, in order to present the argument formally it is more convenient to present the arguments in the reverse (or “bottom-up”) order. This is the approach taken below the fold.

There has been a lot of recent interest in the abc conjecture, since the release a few weeks ago of the last of a series of papers by Shinichi Mochizuki which, as one of its major applications, claims to establish this conjecture. It’s still far too early to judge whether this proof is likely to be correct or not (the entire argument encompasses several hundred pages of argument, mostly in the area of anabelian geometry, which very few mathematicians are expert in, to the extent that we still do not even have a full outline of the proof strategy yet), and I don’t have anything substantial to add to the existing discussion around that conjecture. (But, for those that are interested, the Polymath wiki page on the ABC conjecture has collected most of the links to that discussion, and to various background materials.)

In the meantime, though, I thought I might give the standard probabilistic heuristic argument that explains why we expect the ABC conjecture to be true. The underlying heuristic is a common one, used throughout number theory, and it can be summarised as follows:

Heuristic 1 (Probabilistic heuristic) Even though number theory is a deterministic subject (one does not need to roll any dice to factorise a number, or figure out if a number is prime), one expects to get a good asymptotic prediction for the answers to many number-theoretic questions by pretending that various number-theoretic assertions ${E}$ (e.g. that a given number ${n}$ is prime) are probabilistic events (with a probability ${\mathop{\bf P}(E)}$ that can vary between ${0}$ and ${1}$) rather than deterministic events (that are either always true or always false). Furthermore:

• (Basic heuristic) If two or more of these heuristically probabilistic events have no obvious reason to be strongly correlated to each other, then we should expect them to behave as if they were (jointly) independent.
• (Advanced heuristic) If two or more of these heuristically probabilistic events have some obvious correlation between them, but no further correlations are suspected, then we should expect them to behave as if they were conditionally independent, relative to whatever data is causing the correlation.

This is, of course, an extremely vague and completely non-rigorous heuristic, requiring (among other things) a subjective and ad hoc determination of what an “obvious reason” is, but in practice it tends to give remarkably plausible predictions, some fraction of which can in fact be backed up by rigorous argument (although in many cases, the actual argument has almost nothing in common with the probabilistic heuristic). A famous special case of this heuristic is the Cramér random model for the primes, but this is not the only such instance for that heuristic.

To give the most precise predictions, one should use the advanced heuristic in Heuristic 1, but this can be somewhat complicated to execute, and so we shall focus instead on the predictions given by the basic heuristic (thus ignoring the presence of some number-theoretic correlations), which tends to give predictions that are quantitatively inaccurate but still reasonably good at the qualitative level.

Here is a basic “corollary” of Heuristic 1:

Heuristic 2 (Heuristic Borel-Cantelli) Suppose one has a sequence ${E_1, E_2, \ldots}$ of number-theoretic statements, which we heuristically interpet as probabilistic events with probabilities ${\mathop{\bf P}(E_1), \mathop{\bf P}(E_2), \ldots}$. Suppose also that we know of no obvious reason for these events to have much of a correlation with each other. Then:

• If ${\sum_{i=1}^\infty \mathop{\bf P}(E_i) < \infty}$, we expect only finitely many of the statements ${E_n}$ to be true. (And if ${\sum_{i=1}^\infty \mathop{\bf P}(E_i)}$ is much smaller than ${1}$, we in fact expect none of the ${E_n}$ to be true.)
• If ${\sum_{i=1}^\infty \mathop{\bf P}(E_i) = \infty}$, we expect infinitely many of the statements ${E_n}$ to be true.

This heuristic is motivated both by the Borel-Cantelli lemma, and by the standard probabilistic computation that if one is given jointly independent, and genuinely probabilistic, events ${E_1, E_2, \ldots}$ with ${\sum_{i=1}^\infty \mathop{\bf P}(E_i) = \infty}$, then one almost surely has an infinite number of the ${E_i}$ occuring.

Before we get to the ABC conjecture, let us give two simpler (and well known) demonstrations of these heuristics in action:

Example 1 (Twin prime conjecture) One can heuristically justify the twin prime conjecture as follows. Using the prime number theorem, one can heuristically assign a probability of ${1/\log n}$ to the event that any given large integer ${n}$ is prime. In particular, the probability that ${n+2}$ is prime will then be ${1/\log(n+2)}$. Making the assumption that there are no strong correlations between these events, we are led to the prediction that the probability that ${n}$ and ${n+2}$ are simultaneously prime is ${\frac{1}{(\log n)(\log n+2)}}$. Since ${\sum_{n=1}^\infty \frac{1}{(\log n) (\log n+2)} = \infty}$, the Borel-Cantelli heuristic then predicts that there should be infinitely many twin primes.

Note that the above argument is a bit too naive, because there are some non-trivial correlations between the primality of ${n}$ and the primality of ${n+2}$. Most obviously, if ${n}$ is prime, this greatly increases the probability that ${n}$ is odd, which implies that ${n+2}$ is odd, which then elevates the probability that ${n+2}$ is prime. A bit more subtly, if ${n}$ is prime, then ${n}$ is likely to avoid the residue class ${0 \hbox{ mod } 3}$, which means that ${n+2}$ avoids the residue class ${2 \hbox{ mod } 3}$, which ends up decreasing the probability that ${n+2}$ is prime. However, there is a standard way to correct for these local correlations; see for instance in this previous blog post. As it turns out, these local correlations ultimately alter the prediction for the asymptotic density of twin primes by a constant factor (the twin prime constant), but do not affect the qualitative prediction of there being infinitely many twin primes.

Example 2 (Fermat’s last theorem) Let us now heuristically count the number of solutions to ${x^n+y^n=z^n}$ for various ${n}$ and natural numbers ${x,y,z}$ (which we can reduce to be coprime if desired). We recast this (in the spirit of the ABC conjecture) as ${a+b=c}$, where ${a,b,c}$ are ${n^{th}}$ powers. The number of ${n^{th}}$ powers up to any given number ${N}$ is about ${N^{1/n}}$, so heuristically any given natural number ${a}$ has a probability about ${a^{1/n - 1}}$ of being an ${n^{th}}$ power. If we make the naive assumption that (in the coprime case at least) there is no strong correlation between the events that ${a}$ is an ${n^{th}}$ power, ${b}$ is an ${n^{th}}$ power, and ${a+b}$ being an ${n^{th}}$ power, then for typical ${a,b}$, the probability that ${a,b,a+b}$ are all simultaneously ${n^{th}}$ powers would then be ${a^{1/n-1} b^{1/n-1} (a+b)^{1/n-1}}$. For fixed ${n}$, the total number of solutions to the Fermat equation would then be predicted to be

$\displaystyle \sum_{a=1}^\infty \sum_{b=1}^\infty a^{1/n-1} b^{1/n-1} (a+b)^{1/n-1}.$

(Strictly speaking, we need to restrict to the coprime case, but given that a positive density of pairs of integers are coprime, it should not affect the qualitative conclusion significantly if we now omit this restriction.) It might not be immediately obvious as to whether this sum converges or diverges, but (as is often the case with these sorts of unsigned sums) one can clarify the situation by dyadic decomposition. Suppose for instance that we consider the portion of the sum where ${c=a+b}$ lies between ${2^k}$ and ${2^{k+1}}$. Then this portion of the sum can be controlled by

$\displaystyle \sum_{a \leq 2^{k+1}} \sum_{b \leq 2^{k+1}} a^{1/n-1} b^{1/n-1} O( ( 2^k )^{1/n - 1} )$

which simplifies to

$\displaystyle O( 2^{(3/n - 1)k} ).$

Summing in ${k}$, one thus expects infinitely many solutions for ${n=2}$, only finitely many solutions for ${n>3}$ (indeed, a refinement of this argument shows that one expects only finitely many solutions even if one considers all ${n>3}$ at once), and a borderline prediction of there being a barely infinite number of solutions when ${n=3}$. Here is of course a place where a naive application of the probabilistic heuristic breaks down; there is enough arithmetic structure in the equation ${x^3+y^3=z^3}$ that the naive probabilistic prediction ends up being an inaccurate model. Indeed, while this heuristic suggests that a typical homogeneous cubic should have a logarithmic number of integer solutions of a given height ${N}$, it turns out that some homogeneous cubics (namely, those associated to elliptic curves of positive rank) end up with the bulk of these solutions, while other homogeneous cubics (including those associated to elliptic curves of zero rank, including the Fermat curve ${x^3+y^3=z^3}$) only get finitely many solutions. The reasons for this are subtle, but certainly the high degree of arithmetic structure present in an elliptic curve (starting with the elliptic curve group law which allows one to generate new solutions from old ones, and which also can be used to exclude solutions to ${x^3+y^3=z^3}$ via the method of descent) is a major contributing factor.

Below the fold, we apply similar heuristics to suggest the truth of the ABC conjecture.

Much as group theory is the study of groups, or graph theory is the study of graphs, model theory is the study of models (also known as structures) of some language ${{\mathcal L}}$ (which, in this post, will always be a single-sorted, first-order language). A structure is a set ${X}$, equipped with one or more operations, constants, and relations. This is of course an extremely general type of mathematical object, but (quite remarkably) one can still say a substantial number of interesting things about very broad classes of structures.

We will observe the common abuse of notation of using the set ${X}$ as a metonym for the entire structure, much as we usually refer to a group ${(G,1,\cdot,()^{-1})}$ simply as ${G}$, a vector space ${(V, 0, +, \cdot)}$ simply as ${V}$, and so forth. Following another common bending of the rules, we also allow some operations on structures (such as the multiplicative inverse operation on a group or field) to only be partially defined, and we allow use of the usual simplifying conventions for mathematical formulas (e.g. writing ${a+b+c}$ instead of ${(a+b)+c}$ or ${a+(b+c)}$, in cases where associativity is known). We will also deviate slightly from the usual practice in logic by emphasising individual structures, rather than the theory of general classes of structures; for instance, we will talk about the theory of a single field such as ${{\bf R}}$ or ${{\bf C}}$, rather than the theory of all fields of a certain type (e.g. real closed fields or algebraically closed fields).

Once one has a structure ${X}$, one can introduce the notion of a definable subset of ${X}$, or more generally of a Cartesian power ${X^n}$ of ${X}$, defined as a set ${E \subset X^n}$ of the form

$\displaystyle E = \{ (x_1,\ldots,x_n): P(x_1,\ldots,x_n) \hbox{ true} \} \ \ \ \ \ (1)$

for some formula ${P}$ in the language ${{\mathcal L}}$ with ${n}$ free variables and any number of constants from ${X}$ (that is, ${P(x_1,\ldots,x_n)}$ is a well-formed formula built up from a finite number of constants ${c_1,\ldots,c_m}$ in ${X}$, the relations and operations on ${X}$, logical connectives such as ${\neg}$, ${\wedge}$, ${\implies}$, and the quantifiers ${\forall, \exists}$). Thus, for instance, in the theory of the arithmetic of the natural numbers ${{\bf N} = ({\bf N}, 0, 1, +, \times)}$, the set of primes ${{\mathcal P}}$ is a definable set, since we have

$\displaystyle {\mathcal P} = \{ x \in {\bf N}: (\exists y: x=y+2) \wedge \neg (\exists z,w: x = (z+2)(w+2)) \}.$

In the theory of the field of reals ${{\bf R} = ({\bf R}, 0, 1, +, -, \times, ()^{-1})}$, the unit circle ${S^1}$ is an example of a definable set,

$\displaystyle S^1 = \{ (x,y) \in {\bf R}^2: x^2+y^2 = 1 \},$

but so is the the complement of the circle,

$\displaystyle {\bf R}^2 \backslash S^1 = \{ (x,y) \in {\bf R}^2: \neg(x^2+y^2 = 1) \}$

and the interval ${[-1,1]}$:

$\displaystyle [-1,1] = \{ x \in {\bf R}: \exists y: x^2+y^2 = 1\}.$

Due to the unlimited use of constants, any finite subset of a power ${X^n}$ of any structure ${X}$ is, by our conventions, definable in that structure. (One can of course also consider definability without parameters (also known as ${0}$-definability), in which arbitrary constants are not permitted, but we will not do so here.)

We can isolate some special subclasses of definable sets:

• An atomic definable set is a set of the form (1) in which ${P()}$ is an atomic formula (i.e. it does not contain any logical connectives or quantifiers).
• A quantifier-free definable set is a set of the form (1) in which ${P()}$ is quantifier-free (i.e. it can contain logical connectives, but does not contain the quantifiers ${\forall, \exists}$).

Example 1 In the theory of a field such as ${{\bf R}}$, an atomic definable set is the same thing as an affine algebraic set (also known as an affine algebraic variety, with the understanding that varieties are not necessarily assumed to be irreducible), and a quantifier-free definable set is known as a constructible set; thus we see that algebraic geometry can be viewed in some sense as a special case of model theory. (Conversely, it can in fact be quite profitable to think of model theory as an abstraction of algebraic geometry; for instance, the concepts of Morley rank and Morley degree in model theory (discussed in this previous blog post) directly generalises the concepts of dimension and degree in algebraic geometry.) Over ${{\bf R}}$, the interval ${[-1,1]}$ is a definable set, but not a quantifier-free definable set (and certainly not an atomic definable set); and similarly for the primes over ${{\bf N}}$.

A quantifier-free definable set in ${X^n}$ is nothing more than a finite boolean combination of atomic definable sets; in other words, the class of quantifier-free definable sets over ${X}$ is the smallest class that contains the atomic definable sets and is closed under boolean operations such as complementation and union (which generate all the other boolean operations). Similarly, the class of definable sets over ${X}$ is the smallest class that contains the quantifier-free definable sets, and is also closed under the operation of projection ${\pi_n: E \mapsto \pi_n(E)}$ from ${X^{n+1}}$ to ${X^n}$ for every natural number ${n}$, where ${\pi_n: X^{n+1} \rightarrow X^n}$ is the map ${\pi_n(x_1,\ldots,x_n,x_{n+1}) := (x_1,\ldots,x_n)}$.

Some structures have the property of enjoying quantifier elimination, which means that every definable set is in fact a quantifier-free definable set, or equivalently that the projection of a quantifier-free definable set is again quantifier-free. For instance, an algebraically closed field ${k}$ (with the field operations) has quantifier elimination (i.e. the projection of a constructible set is again constructible); this fact can be proven by the classical tool of resultants, and among other things can be used to give a proof of Hilbert’s nullstellensatz. (Note though that projection does not necessary preserve the property of being atomic; for instance, the projection of the atomic set ${\{ (x,y) \in k^2: xy=1 \}}$ is the non-atomic, but still quantifier-free definable, set ${\{ x \in k: \neg (k=0) \}}$.) In the converse direction, it is not difficult to use the nullstellensatz to deduce quantifier elimination. For theory of the real field ${{\bf R}}$, which is not algebraically closed, one does not have quantifier elimination, as one can see from the example of the unit circle (which is a quantifier-free definable set) projecting down to the interval ${[-1,1]}$ (which is definable, but not quantifer-free definable). However, if one adds the additional operation of order ${<}$ to the reals, giving it the language of an ordered field rather than just a field, then quantifier elimination is recovered (the class of quantifier-free definable sets now enlarges to match the class of definable sets, which in this case is also the class of semi-algebraic sets); this is the famous Tarski-Seidenberg theorem.

On the other hand, many important structures do not have quantifier elimination; typically, the projection of a quantifier-free definable set is not, in general, quantifier-free definable. This failure of the projection property also shows up in many contexts outside of model theory; for instance, Lebesgue famously made the error of thinking that the projection of a Borel measurable set remained Borel measurable (it is merely an analytic set instead). Turing’s halting theorem can be viewed as an assertion that the projection of a decidable set (also known as a computable or recursive set) is not necessarily decidable (it is merely semi-decidable (or recursively enumerable) instead). The notorious P=NP problem can also be essentially viewed in this spirit; roughly speaking (and glossing over the placement of some quantifiers), it asks whether the projection of a polynomial-time decidable set is again polynomial-time decidable. And so forth. (See this blog post of Dick Lipton for further discussion of the subtleties of projections.)

Now we consider the status of quantifier elimination for the theory of a finite field ${F}$. If interpreted naively, quantifier elimination is trivial for a finite field ${F}$, since every subset of ${F^n}$ is finite and thus quantifier-free definable. However, we can recover an interesting question in one of two (essentially equivalent) ways. One is to work in the asymptotic regime in which the field ${F}$ is large, but the length of the formulae used to construct one’s definable sets stays bounded uniformly in the size of ${F}$ (where we view any constant in ${F}$ as contributing a unit amount to the length of a formula, no matter how large ${F}$ is). A simple counting argument then shows that only a small number of subsets of ${F^n}$ become definable in the asymptotic limit ${|F| \rightarrow \infty}$, since the number of definable sets clearly grows at most polynomially in ${|F|}$ for any fixed bound on the formula length, while the number of all subsets of ${|F|^n}$ grows exponentially in ${n}$.

Another way to proceed is to work not with a single finite field ${F}$, or even with a sequence ${F_m}$ of finite fields, but with the ultraproduct ${F = \prod_{m \rightarrow p} F_m}$ of a sequence of finite fields, and to study the properties of definable sets over this ultraproduct. (We will be using the notation of ultraproducts and nonstandard analysis from this previous blog post.) This approach is equivalent to the more finitary approach mentioned in the previous paragraph, at least if one does not care to track of the exact bounds on the length of the formulae involved. Indeed, thanks to Los’s theorem, a definable subset ${E}$ of ${F^n}$ is nothing more than the ultraproduct ${E = \prod_{m \rightarrow p} E_m}$ of definable subsets ${E_m}$ of ${F_m^n}$ for all ${m}$ sufficiently close to ${p}$, with the length of the formulae used to define ${E_m}$ uniformly bounded in ${m}$. In the language of nonstandard analysis, one can view ${F}$ as a nonstandard finite field.

The ultraproduct ${F}$ of finite fields is an important example of a pseudo-finite field – a field that obeys all the sentences in the languages of fields that finite fields do, but is not necessarily itself a finite field. The model theory of pseudo-finite fields was first studied systematically by Ax (in the same paper where the Ax-Grothendieck theorem, discussed previously on this blog, was established), with important further contributions by Kiefe, by Fried-Sacerdote, by two papers of Chatzidakis-van den Dries-Macintyre, and many other authors.

As mentioned before, quantifier elimination trivially holds for finite fields. But for infinite pseudo-finite fields, such as the ultraproduct ${F = \prod_{m \rightarrow p} F_m}$ of finite fields with ${|F_m|}$ going to infinity, quantifier elimination fails. For instance, in a finite field ${F_m}$, the set ${E_m := \{ x \in F_m: \exists y \in F_m: x=y^2 \}}$ of quadratic residues is a definable set, with a bounded formula length, and so in the ultraproduct ${F =\prod_{m \rightarrow p} F_m}$, the set ${E := \prod_{m\rightarrow p} E_m}$ of nonstandard quadratic residues is also a definable set. However, in one dimension, we see from the factor theorem that the only atomic definable sets are either finite or the whole field ${F}$, and so the only constructible sets (i.e. the only quantifier-free definable sets) are either finite or cofinite in ${F}$. Since the quadratic residues have asymptotic density ${1/2}$ in a large finite field, they cannot form a quantifier-free definable set, despite being definable.

Nevertheless, there is a very nice almost quantifier elimination result for these fields, in characteristic zero at least, which we phrase here as follows:

Theorem 1 (Almost quantifier elimination) Let ${F}$ be a nonstandard finite field of characteristic zero, and let ${E \subset F^n}$ be a definable set over ${F}$. Then ${E}$ is the union of finitely many sets of the form

$\displaystyle E = \{ x \in V(F): \exists t \in F: P(x,t) = 0 \} \ \ \ \ \ (2)$

where ${V(F)}$ is an atomic definable subset of ${F^n}$ (i.e. the ${F}$-points of an algebraic variety ${V}$ defined over ${F}$ in ${F^n}$) and ${P: F^{n+1} \rightarrow F}$ is a polynomial.

Results of this type were first obtained essentially due to Catarina Kiefe, although the formulation here is closer to that of Cherlin-van den Dries-Macintyre.

Informally, this theorem says that while we cannot quite eliminate all quantifiers from a definable set over a nonstandard finite field, we can eliminate all but one existential quantifier. Note that negation has also been eliminated in this theorem; for instance, the definable set ${F \backslash \{0\} = \{ x \in F: \neg(x=0) \}}$ uses a negation, but can also be described using a single existential quantifier as ${\{ x \in F: \exists t: xt = 1 \}}$.) I believe that there are more complicated analogues of this result in positive characteristic, but I have not studied this case in detail (Kiefe’s result does not assume characteristic zero, but her conclusion is slightly different from the one given here). In the one-dimensional case ${n=1}$, the only varieties ${V}$ are the affine line and finite sets, and we can simplify the above statement, namely that any definable subset of ${F}$ takes the form ${\{ x\in F: \exists t \in F: P(x,t) = 0 \}}$ for some polynomial ${P}$ (i.e. definable sets in ${F}$ are nothing more than the projections of the ${F}$-points of a plane curve).

There is an equivalent formulation of this theorem for standard finite fields, namely that if ${F}$ is a finite field and ${E \subset F^n}$ is definable using a formula of length at most ${M}$, then ${E}$ can be expressed in the form (2) with the degree of ${P}$ bounded by some quantity ${C_{M,n}}$ depending on ${M}$ and ${n}$, assuming that the characteristic of ${F}$ is sufficiently large depending on ${M, n}$.

The theorem gives quite a satisfactory description of definable sets in either standard or nonstandard finite fields (at least if one does not care about effective bounds in some of the constants, and if one is willing to exclude the small characteristic case); for instance, in conjunction with the Lang-Weil bound discussed in this recent blog post, it shows that any non-empty definable subset of a nonstandard finite field has a nonstandard cardinality of ${(\alpha + O(|F|^{-1/2})) |F|^d}$ for some positive standard rational ${\alpha}$ and integer ${d}$. Equivalently, any non-empty definable subset of ${F^n}$ for some standard finite field ${F}$ using a formula of length at most ${M}$ has a standard cardinality of ${(\alpha + O_{M,n}(|F|^{-1/2})) |F|^d}$ for some positive rational of height ${O_{M,n}(1)}$ and some natural number ${d}$ between ${0}$ and ${n}$. (For instance, in the example of the quadratic residues given above, ${d}$ is equal to ${1}$ and ${\alpha}$ equal to ${1/2}$.) There is a more precise statement to this effect, namely that the Poincaré series of a definable set is rational; see Kiefe’s paper for details.

Below the fold I give a proof of Theorem 1, which relies primarily on the Lang-Weil bound mentioned above.

Let ${F}$ be a finite field, with algebraic closure ${\overline{F}}$, and let ${V}$ be an (affine) algebraic variety defined over ${\overline{F}}$, by which I mean a set of the form

$\displaystyle V = \{ x \in \overline{F}^d: P_1(x) = \ldots = P_m(x) = 0 \}$

for some ambient dimension ${d \geq 0}$, and some finite number of polynomials ${P_1,\ldots,P_m: \overline{F}^d \rightarrow \overline{F}}$. In order to reduce the number of subscripts later on, let us say that ${V}$ has complexity at most ${M}$ if ${d}$, ${m}$, and the degrees of the ${P_1,\ldots,P_m}$ are all less than or equal to ${M}$. Note that we do not require at this stage that ${V}$ be irreducible (i.e. not the union of two strictly smaller varieties), or defined over ${F}$, though we will often specialise to these cases later in this post. (Also, everything said here can also be applied with almost no changes to projective varieties, but we will stick with affine varieties for sake of concreteness.)

One can consider two crude measures of how “big” the variety ${V}$ is. The first measure, which is algebraic geometric in nature, is the dimension ${\hbox{dim}(V)}$ of the variety ${V}$, which is an integer between ${0}$ and ${d}$ (or, depending on convention, ${-\infty}$, ${-1}$, or undefined, if ${V}$ is empty) that can be defined in a large number of ways (e.g. it is the largest ${r}$ for which the generic linear projection from ${V}$ to ${\overline{F}^r}$ is dominant, or the smallest ${r}$ for which the intersection with a generic codimension ${r}$ subspace is non-empty). The second measure, which is number-theoretic in nature, is the number ${|V(F)| = |V \cap F^d|}$ of ${F}$-points of ${V}$, i.e. points ${x = (x_1,\ldots,x_d)}$ in ${V}$ all of whose coefficients lie in the finite field, or equivalently the number of solutions to the system of equations ${P_i(x_1,\ldots,x_d) = 0}$ for ${i=1,\ldots,m}$ with variables ${x_1,\ldots,x_d}$ in ${F}$.

These two measures are linked together in a number of ways. For instance, we have the basic Schwarz-Zippel type bound (which, in this qualitative form, goes back at least to Lemma 1 of the work of Lang and Weil in 1954).

Lemma 1 (Schwarz-Zippel type bound) Let ${V}$ be a variety of complexity at most ${M}$. Then we have ${|V(F)| \ll_M |F|^{\hbox{dim}(V)}}$.

Proof: (Sketch) For the purposes of exposition, we will not carefully track the dependencies of implied constants on the complexity ${M}$, instead simply assuming that all of these quantities remain controlled throughout the argument. (If one wished, one could obtain ineffective bounds on these quantities by an ultralimit argument, as discussed in this previous post, or equivalently by moving everything over to a nonstandard analysis framework; one could also obtain such uniformity using the machinery of schemes.)

We argue by induction on the ambient dimension ${d}$ of the variety ${V}$. The ${d=0}$ case is trivial, so suppose ${d \geq 1}$ and that the claim has already been proven for ${d-1}$. By breaking up ${V}$ into irreducible components we may assume that ${V}$ is irreducible (this requires some control on the number and complexity of these components, but this is available, as discussed in this previous post). For each ${x_1,\ldots,x_{d-1} \in \overline{F}}$, the fibre ${\{ x_d \in \overline{F}: (x_1,\ldots,x_{d-1},x_d) \in V \}}$ is either one-dimensional (and thus all of ${\overline{F}}$) or zero-dimensional. In the latter case, one has ${O_M(1)}$ points in the fibre from the fundamental theorem of algebra (indeed one has a bound of ${D}$ in this case), and ${(x_1,\ldots,x_{d-1})}$ lives in the projection of ${V}$ to ${\overline{F}^{d-1}}$, which is a variety of dimension at most ${\hbox{dim}(V)}$ and controlled complexity, so the contribution of this case is acceptable from the induction hypothesis. In the former case, the fibre contributes ${|F|}$ ${F}$-points, but ${(x_1,\ldots,x_{d-1})}$ lies in a variety in ${\overline{F}^{d-1}}$ of dimension at most ${\hbox{dim}(V)-1}$ (since otherwise ${V}$ would contain a subvariety of dimension at least ${\hbox{dim}(V)+1}$, which is absurd) and controlled complexity, and so the contribution of this case is also acceptable from the induction hypothesis. $\Box$

One can improve the bound on the implied constant to be linear in the degree of ${V}$ (see e.g. Claim 7.2 of this paper of Dvir, Kollar, and Lovett, or Lemma A.3 of this paper of Ellenberg, Oberlin, and myself), but we will not be concerned with these improvements here.

Without further hypotheses on ${V}$, the above upper bound is sharp (except for improvements in the implied constants). For instance, the variety

$\displaystyle V := \{ (x_1,\ldots,x_d) \in \overline{F}^d: \prod_{j=1}^D (x_d - a_j) = 0\},$

where ${a_1,\ldots,a_D \in F}$ are distict, is the union of ${D}$ distinct hyperplanes of dimension ${d-1}$, with ${|V(F)| = D |F|^{d-1}}$ and complexity ${\max(D,d)}$; similar examples can easily be concocted for other choices of ${\hbox{dim}(V)}$. In the other direction, there is also no non-trivial lower bound for ${|V(F)|}$ without further hypotheses on ${V}$. For a trivial example, if ${a}$ is an element of ${\overline{F}}$ that does not lie in ${F}$, then the hyperplane

$\displaystyle V := \{ (x_1,\ldots,x_d) \in \overline{F}^d: x_d - a = 0 \}$

clearly has no ${F}$-points whatsoever, despite being a ${d-1}$-dimensional variety in ${\overline{F}^d}$ of complexity ${d}$. For a slightly less non-trivial example, if ${a}$ is an element of ${F}$ that is not a quadratic residue, then the variety

$\displaystyle V := \{ (x_1,\ldots,x_d) \in \overline{F}^d: x_d^2 - a = 0 \},$

which is the union of two hyperplanes, still has no ${F}$-points, even though this time the variety is defined over ${F}$ instead of ${\overline{F}}$ (by which we mean that the defining polynomial(s) have all of their coefficients in ${F}$). There is however the important Lang-Weil bound that allows for a much better estimate as long as ${V}$ is both defined over ${F}$ and irreducible:

Theorem 2 (Lang-Weil bound) Let ${V}$ be a variety of complexity at most ${M}$. Assume that ${V}$ is defined over ${F}$, and that ${V}$ is irreducible as a variety over ${\overline{F}}$ (i.e. ${V}$ is geometrically irreducible or absolutely irreducible). Then

$\displaystyle |V(F)| = (1 + O_M(|F|^{-1/2})) |F|^{\hbox{dim}(V)}.$

Again, more explicit bounds on the implied constant here are known, but will not be the focus of this post. As the previous examples show, the hypotheses of definability over ${F}$ and geometric irreducibility are both necessary.

The Lang-Weil bound is already non-trivial in the model case ${d=2, \hbox{dim}(V)=1}$ of plane curves:

Theorem 3 (Hasse-Weil bound) Let ${P: \overline{F}^2 \rightarrow \overline{F}}$ be an irreducible polynomial of degree ${D}$ with coefficients in ${F}$. Then

$\displaystyle |\{ (x,y) \in F^2: P(x,y) = 0 \}| = |F| + O_D( |F|^{1/2} ).$

Thus, for instance, if ${a,b \in F}$, then the elliptic curve ${\{ (x,y) \in F^2: y^2 = x^3 + ax + b \}}$ has ${|F| + O(|F|^{1/2})}$ ${F}$-points, a result first established by Hasse. The Hasse-Weil bound is already quite non-trivial, being the analogue of the Riemann hypothesis for plane curves. For hyper-elliptic curves, an elementary proof (due to Stepanov) is discussed in this previous post. For general plane curves, the first proof was by Weil (leading to his famous Weil conjectures); there is also a nice version of Stepanov’s argument due to Bombieri covering this case which is a little less elementary (relying crucially on the Riemann-Roch theorem for the upper bound, and a lifting trick to then get the lower bound), which I briefly summarise later in this post. The full Lang-Weil bound is deduced from the Hasse-Weil bound by an induction argument using generic hyperplane slicing, as I will also summarise later in this post.

The hypotheses of definability over ${F}$ and geometric irreducibility in the Lang-Weil can be removed after inserting a geometric factor:

Corollary 4 (Lang-Weil bound, alternate form) Let ${V}$ be a variety of complexity at most ${M}$. Then one has

$\displaystyle |V(F)| = (c(V) + O_M(|F|^{-1/2})) |F|^{\hbox{dim}(V)}$

where ${c(V)}$ is the number of top-dimensional components of ${V}$ (i.e. geometrically irreducible components of ${V}$ of dimension ${\hbox{dim}(V)}$) that are definable over ${F}$, or equivalently are invariant with respect to the Frobenius endomorphism ${x \mapsto x^{|F|}}$ that defines ${F}$.

Proof: By breaking up a general variety ${V}$ into components (and using Lemma 1 to dispose of any lower-dimensional components), it suffices to establish this claim when ${V}$ is itself geometrically irreducible. If ${V}$ is definable over ${F}$, the claim follows from Theorem 2. If ${V}$ is not definable over ${F}$, then it is not fixed by the Frobenius endomorphism ${Frob}$ (since otherwise one could produce a set of defining polynomials that were fixed by Frobenius and thus defined over ${F}$ by using some canonical basis (such as a reduced Grobner basis) for the associated ideal), and so ${V \cap Frob(V)}$ has strictly smaller dimension than ${V}$. But ${V \cap Frob(V)}$ captures all the ${F}$-points of ${V}$, so in this case the claim follows from Lemma 1. $\Box$

Note that if ${V}$ is reducible but is itself defined over ${F}$, then the Frobenius endomorphism preserves ${V}$ itself, but may permute the components of ${V}$ around. In this case, ${c(V)}$ is the number of fixed points of this permutation action of Frobenius on the components. In particular, ${c(V)}$ is always a natural number between ${0}$ and ${O_M(1)}$; thus we see that regardless of the geometry of ${V}$, the normalised count ${|V(F)|/|F|^{\hbox{dim}(V)}}$ is asymptotically restricted to a bounded range of natural numbers (in the regime where the complexity stays bounded and ${|F|}$ goes to infinity).

Example 1 Consider the variety

$\displaystyle V := \{ (x,y) \in \overline{F}^2: x^2 - ay^2 = 0 \}$

for some non-zero parameter ${a \in F}$. Geometrically (by which we basically mean “when viewed over the algebraically closed field ${\overline{F}}$“), this is the union of two lines, with slopes corresponding to the two square roots of ${a}$. If ${a}$ is a quadratic residue, then both of these lines are defined over ${F}$, and are fixed by Frobenius, and ${c(V) = 2}$ in this case. If ${a}$ is not a quadratic residue, then the lines are not defined over ${F}$, and the Frobenius automorphism permutes the two lines while preserving ${V}$ as a whole, giving ${c(V)=0}$ in this case.

Corollary 4 effectively computes (at least to leading order) the number-theoretic size ${|V(F)|}$ of a variety in terms of geometric information about ${V}$, namely its dimension ${\hbox{dim}(V)}$ and the number ${c(V)}$ of top-dimensional components fixed by Frobenius. It turns out that with a little bit more effort, one can extend this connection to cover not just a single variety ${V}$, but a family of varieties indexed by points in some base space ${W}$. More precisely, suppose we now have two affine varieties ${V,W}$ of bounded complexity, together with a regular map ${\phi: V \rightarrow W}$ of bounded complexity (the definition of complexity of a regular map is a bit technical, see e.g. this paper, but one can think for instance of a polynomial or rational map of bounded degree as a good example). It will be convenient to assume that the base space ${W}$ is irreducible. If the map ${\phi}$ is a dominant map (i.e. the image ${\phi(V)}$ is Zariski dense in ${W}$), then standard algebraic geometry results tell us that the fibres ${\phi^{-1}(\{w\})}$ are an unramified family of ${\hbox{dim}(V)-\hbox{dim}(W)}$-dimensional varieties outside of an exceptional subset ${W'}$ of ${W}$ of dimension strictly smaller than ${\hbox{dim}(W)}$ (and with ${\phi^{-1}(W')}$ having dimension strictly smaller than ${\hbox{dim}(V)}$); see e.g. Section I.6.3 of Shafarevich.

Now suppose that ${V}$, ${W}$, and ${\phi}$ are defined over ${F}$. Then, by Lang-Weil, ${W(F)}$ has ${(1 + O(|F|^{-1/2})) |F|^{\hbox{dim}(W)}}$ ${F}$-points, and by Schwarz-Zippel, for all but ${O( |F|^{\hbox{dim}(W)-1})}$ of these ${F}$-points ${w}$ (the ones that lie in the subvariety ${W'}$), the fibre ${\phi^{-1}(\{w\})}$ is an algebraic variety defined over ${F}$ of dimension ${\hbox{dim}(V)-\hbox{dim}(W)}$. By using ultraproduct arguments (see e.g. Lemma 3.7 of this paper of mine with Emmanuel Breuillard and Ben Green), this variety can be shown to have bounded complexity, and thus by Corollary 4, has ${(c(\phi^{-1}(\{w\})) + O(|F|^{-1/2}) |F|^{\hbox{dim}(V)-\hbox{dim}(W)}}$ ${F}$-points. One can then ask how the quantity ${c(\phi^{-1}(\{w\})}$ is distributed. A simple but illustrative example occurs when ${V=W=F}$ and ${\phi: F \rightarrow F}$ is the polynomial ${\phi(x) := x^2}$. Then ${c(\phi^{-1}(\{w\})}$ equals ${2}$ when ${w}$ is a non-zero quadratic residue and ${0}$ when ${w}$ is a non-zero quadratic non-residue (and ${1}$ when ${w}$ is zero, but this is a negligible fraction of all ${w}$). In particular, in the asymptotic limit ${|F| \rightarrow \infty}$, ${c(\phi^{-1}(\{w\})}$ is equal to ${2}$ half of the time and ${0}$ half of the time.

Now we describe the asymptotic distribution of the ${c(\phi^{-1}(\{w\}))}$. We need some additional notation. Let ${w_0}$ be an ${F}$-point in ${W \backslash W'}$, and let ${\pi_0( \phi^{-1}(\{w_0\}) )}$ be the connected components of the fibre ${\phi^{-1}(\{w_0\})}$. As ${\phi^{-1}(\{w_0\})}$ is defined over ${F}$, this set of components is permuted by the Frobenius endomorphism ${Frob}$. But there is also an action by monodromy of the fundamental group ${\pi_1(W \backslash W')}$ (this requires a certain amount of étale machinery to properly set up, as we are working over a positive characteristic field rather than over the complex numbers, but I am going to ignore this rather important detail here, as I still don’t fully understand it). This fundamental group may be infinite, but (by the étale construction) is always profinite, and in particular has a Haar probability measure, in which every finite index subgroup (and their cosets) are measurable. Thus we may meaningfully talk about elements drawn uniformly at random from this group, so long as we work only with the profinite ${\sigma}$-algebra on ${\pi_1(W \backslash W')}$ that is generated by the cosets of the finite index subgroups of this group (which will be the only relevant sets we need to measure when considering the action of this group on finite sets, such as the components of a generic fibre).

Theorem 5 (Lang-Weil with parameters) Let ${V, W}$ be varieties of complexity at most ${M}$ with ${W}$ irreducible, and let ${\phi: V \rightarrow W}$ be a dominant map of complexity at most ${M}$. Let ${w_0}$ be an ${F}$-point of ${W \backslash W'}$. Then, for any natural number ${a}$, one has ${c(\phi^{-1}(\{w\})) = a}$ for ${(\mathop{\bf P}( X = a ) + O_M(|F|^{-1/2})) |F|^{\hbox{dim}(W)}}$ values of ${w \in W(F)}$, where ${X}$ is the random variable that counts the number of components of a generic fibre ${\phi^{-1}(w_0)}$ that are invariant under ${g \circ Frob}$, where ${g}$ is an element chosen uniformly at random from the étale fundamental group ${\pi_1(W \backslash W')}$. In particular, in the asymptotic limit ${|F| \rightarrow \infty}$, and with ${w}$ chosen uniformly at random from ${W(F)}$, ${c(\phi^{-1}(\{w\}))}$ (or, equivalently, ${|\phi^{-1}(\{w\})(F)| / |F|^{\hbox{dim}(V)-\hbox{dim}(W)}}$) and ${X}$ have the same asymptotic distribution.

This theorem generalises Corollary 4 (which is the case when ${W}$ is just a point, so that ${\phi^{-1}(\{w\})}$ is just ${V}$ and ${g}$ is trivial). Informally, the effect of a non-trivial parameter space ${W}$ on the Lang-Weil bound is to push around the Frobenius map by monodromy for the purposes of counting invariant components, and a randomly chosen set of parameters corresponds to a randomly chosen loop on which to perform monodromy.

Example 2 Let ${V=W=F}$ and ${\phi(x) = x^m}$ for some fixed ${m \geq 1}$; to avoid some technical issues let us suppose that ${m}$ is coprime to ${|F|}$. Then ${W'}$ can be taken to be ${\{0\}}$, and for a base point ${w_0 \in W \backslash W'}$ we can take ${w_0=1}$. The fibre ${\phi^{-1}(\{1\})}$ – the ${m^{th}}$ roots of unity – can be identified with the cyclic group ${{\bf Z}/m{\bf Z}}$ by using a primitive root of unity. The étale fundamental group ${\pi(W \backslash W') = \pi(\overline{F} \backslash 0)}$ is (I think) isomorphic to the profinite closure ${\hat {\bf Z}}$ of the integers ${{\bf Z}}$ (excluding the part of that closure coming from the characteristic of ${F}$). Not coincidentally, the integers ${{\bf Z}}$ are the fundamental group of the complex analogue ${{\bf C} \backslash \{0\}}$ of ${W \backslash W'}$. (Brian Conrad points out to me though that for more complicated varieties, such as covers of ${\overline{F} \backslash \{0\}}$ by a power of the characteristic, the etale fundamental group is more complicated than just a profinite closure of the ordinary fundamental group, due to the presence of Artin-Schreier covers that are only ramified at infinity.) The action of this fundamental group on the fibres ${{\bf Z}/m{\bf Z}}$ can given by translation. Meanwhile, the Frobenius map ${Frob}$ on ${{\bf Z}/m{\bf Z}}$ is given by multiplication by ${m}$. A random element ${g \circ Frob}$ then becomes a random affine map ${x \mapsto |F|x+b}$ on ${{\bf Z}/m{\bf Z}}$, where ${b}$ chosen uniformly at random from ${{\bf Z}/m{\bf Z}}$. The number of fixed points of this map is equal to the greatest common divisor ${(|F|-1,m)}$ of ${|F|-1}$ and ${m}$ when ${b}$ is divisible by ${(|F|-1,m)}$, and equal to ${0}$ otherwise. This matches up with the elementary number fact that a randomly chosen non-zero element of ${F}$ will be an ${m^{th}}$ power with probability ${1/(|F|-1,m)}$, and when this occurs, the number of ${m^{th}}$ roots in ${F}$ will be ${(|F|-1,m)}$.

Example 3 (Thanks to Jordan Ellenberg for this example.) Consider a random elliptic curve ${E = \{ y^2 = x^3 + ax + b \}}$, where ${a,b}$ are chosen uniformly at random, and let ${m \geq 1}$. Let ${E[m]}$ be the ${m}$-torsion points of ${E}$ (i.e. those elements ${g \in E}$ with ${mg = 0}$ using the elliptic curve addition law); as a group, this is isomorphic to ${{\bf Z}/m{\bf Z} \times {\bf Z}/m{\bf Z}}$ (assuming that ${F}$ has sufficiently large characteristic, for simplicity), and consider the number of ${F}$ points of ${E[m]}$, which is a random variable taking values in the natural numbers between ${0}$ and ${m^2}$. In this case, the base variety ${W}$ is the modular curve ${X(1)}$, and the covering variety ${V}$ is the modular curve ${X_1(m)}$. The generic fibre here can be identified with ${{\bf Z}/m{\bf Z} \times {\bf Z}/m{\bf Z}}$, the monodromy action projects down to the action of ${SL_2({\bf Z}/m{\bf Z})}$, and the action of Frobenius on this fibre can be shown to be given by a ${2 \times 2}$ matrix with determinant ${|F|}$ (with the exact choice of matrix depending on the choice of fibre and of the identification), so the distribution of the number of ${F}$-points of ${E[m]}$ is asymptotic to the distribution of the number of fixed points ${X}$ of a random linear map of determinant ${|F|}$ on ${{\bf Z}/m{\bf Z} \times {\bf Z}/m{\bf Z}}$.

Theorem 5 seems to be well known “folklore” among arithmetic geometers, though I do not know of an explicit reference for it. I enjoyed deriving it for myself (though my derivation is somewhat incomplete due to my lack of understanding of étale cohomology) from the ordinary Lang-Weil theorem and the moment method. I’m recording this derivation later in this post, mostly for my own benefit (as I am still in the process of learning this material), though perhaps some other readers may also be interested in it.

Caveat: not all details are fully fleshed out in this writeup, particularly those involving the finer points of algebraic geometry and étale cohomology, as my understanding of these topics is not as complete as I would like it to be.

Many thanks to Brian Conrad and Jordan Ellenberg for helpful discussions on these topics.

One of the most basic methods in additive number theory is the Hardy-Littlewood circle method. This method is based on expressing a quantity of interest to additive number theory, such as the number of representations ${f_3(x)}$ of an integer ${x}$ as the sum of three primes ${x = p_1+p_2+p_3}$, as a Fourier-analytic integral over the unit circle ${{\bf R}/{\bf Z}}$ involving exponential sums such as

$\displaystyle S(x,\alpha) := \sum_{p \leq x} e( \alpha p) \ \ \ \ \ (1)$

where the sum here ranges over all primes up to ${x}$, and ${e(x) := e^{2\pi i x}}$. For instance, the expression ${f(x)}$ mentioned earlier can be written as

$\displaystyle f_3(x) = \int_{{\bf R}/{\bf Z}} S(x,\alpha)^3 e(-x\alpha)\ d\alpha. \ \ \ \ \ (2)$

The strategy is then to obtain sufficiently accurate bounds on exponential sums such as ${S(x,\alpha)}$ in order to obtain non-trivial bounds on quantities such as ${f_3(x)}$. For instance, if one can show that ${f_3(x)>0}$ for all odd integers ${x}$ greater than some given threshold ${x_0}$, this implies that all odd integers greater than ${x_0}$ are expressible as the sum of three primes, thus establishing all but finitely many instances of the odd Goldbach conjecture.

Remark 1 In practice, it can be more efficient to work with smoother sums than the partial sum (1), for instance by replacing the cutoff ${p \leq x}$ with a smoother cutoff ${\chi(p/x)}$ for a suitable chocie of cutoff function ${\chi}$, or by replacing the restriction of the summation to primes by a more analytically tractable weight, such as the von Mangoldt function ${\Lambda(n)}$. However, these improvements to the circle method are primarily technical in nature and do not have much impact on the heuristic discussion in this post, so we will not emphasise them here. One can also certainly use the circle method to study additive combinations of numbers from other sets than the set of primes, but we will restrict attention to additive combinations of primes for sake of discussion, as it is historically one of the most studied sets in additive number theory.

In many cases, it turns out that one can get fairly precise evaluations on sums such as ${S(x,\alpha)}$ in the major arc case, when ${\alpha}$ is close to a rational number ${a/q}$ with small denominator ${q}$, by using tools such as the prime number theorem in arithmetic progressions. For instance, the prime number theorem itself tells us that

$\displaystyle S(x,0) \approx \frac{x}{\log x}$

and the prime number theorem in residue classes modulo ${q}$ suggests more generally that

$\displaystyle S(x,\frac{a}{q}) \approx \frac{\mu(q)}{\phi(q)} \frac{x}{\log x}$

when ${q}$ is small and ${a}$ is close to ${q}$, basically thanks to the elementary calculation that the phase ${e(an/q)}$ has an average value of ${\mu(q)/\phi(q)}$ when ${n}$ is uniformly distributed amongst the residue classes modulo ${q}$ that are coprime to ${q}$. Quantifying the precise error in these approximations can be quite challenging, though, unless one assumes powerful hypotheses such as the Generalised Riemann Hypothesis.

In the minor arc case when ${\alpha}$ is not close to a rational ${a/q}$ with small denominator, one no longer expects to have such precise control on the value of ${S(x,\alpha)}$, due to the “pseudorandom” fluctuations of the quantity ${e(\alpha p)}$. Using the standard probabilistic heuristic (supported by results such as the central limit theorem or Chernoff’s inequality) that the sum of ${k}$ “pseudorandom” phases should fluctuate randomly and be of typical magnitude ${\sim \sqrt{k}}$, one expects upper bounds of the shape

$\displaystyle |S(x,\alpha)| \lessapprox \sqrt{\frac{x}{\log x}} \ \ \ \ \ (3)$

for “typical” minor arc ${\alpha}$. Indeed, a simple application of the Plancherel identity, followed by the prime number theorem, reveals that

$\displaystyle \int_{{\bf R}/{\bf Z}} |S(x,\alpha)|^2\ d\alpha \sim \frac{x}{\log x} \ \ \ \ \ (4)$

which is consistent with (though weaker than) the above heuristic. In practice, though, we are unable to rigorously establish bounds anywhere near as strong as (3); upper bounds such as ${x^{4/5+o(1)}}$ are far more typical.

Because one only expects to have upper bounds on ${|S(x,\alpha)|}$, rather than asymptotics, in the minor arc case, one cannot realistically hope to make much use of phases such as ${e(-x\alpha)}$ for the minor arc contribution to integrals such as (2) (at least if one is working with a single, deterministic, value of ${x}$, so that averaging in ${x}$ is unavailable). In particular, from upper bound information alone, it is difficult to avoid the “conspiracy” that the magnitude ${|S(x,\alpha)|^3}$ oscillates in sympathetic resonance with the phase ${e(-x\alpha)}$, thus essentially eliminating almost all of the possible gain in the bounds that could arise from exploiting cancellation from that phase. Thus, one basically has little option except to use the triangle inequality to control the portion of the integral on the minor arc region ${\Omega_{minor}}$:

$\displaystyle |\int_{\Omega_{minor}} |S(x,\alpha)|^3 e(-x\alpha)\ d\alpha| \leq \int_{\Omega_{minor}} |S(x,\alpha)|^3\ d\alpha.$

Despite this handicap, though, it is still possible to get enough bounds on both the major and minor arc contributions of integrals such as (2) to obtain non-trivial lower bounds on quantities such as ${f(x)}$, at least when ${x}$ is large. In particular, this sort of method can be developed to give a proof of Vinogradov’s famous theorem that every sufficiently large odd integer ${x}$ is the sum of three primes; my own result that all odd numbers greater than ${1}$ can be expressed as the sum of at most five primes is also proven by essentially the same method (modulo a number of minor refinements, and taking advantage of some numerical work on both the Goldbach problems and on the Riemann hypothesis ). It is certainly conceivable that some further variant of the circle method (again combined with a suitable amount of numerical work, such as that of numerically establishing zero-free regions for the Generalised Riemann Hypothesis) can be used to settle the full odd Goldbach conjecture; indeed, under the assumption of the Generalised Riemann Hypothesis, this was already achieved by Deshouillers, Effinger, te Riele, and Zinoviev back in 1997. I am optimistic that an unconditional version of this result will be possible within a few years or so, though I should say that there are still significant technical challenges to doing so, and some clever new ideas will probably be needed to get either the Vinogradov-style argument or numerical verification to work unconditionally for the three-primes problem at medium-sized ranges of ${x}$, such as ${x \sim 10^{50}}$. (But the intermediate problem of representing all even natural numbers as the sum of at most four primes looks somewhat closer to being feasible, though even this would require some substantially new and non-trivial ideas beyond what is in my five-primes paper.)

However, I (and many other analytic number theorists) are considerably more skeptical that the circle method can be applied to the even Goldbach problem of representing a large even number ${x}$ as the sum ${x = p_1 + p_2}$ of two primes, or the similar (and marginally simpler) twin prime conjecture of finding infinitely many pairs of twin primes, i.e. finding infinitely many representations ${2 = p_1 - p_2}$ of ${2}$ as the difference of two primes. At first glance, the situation looks tantalisingly similar to that of the Vinogradov theorem: to settle the even Goldbach problem for large ${x}$, one has to find a non-trivial lower bound for the quantity

$\displaystyle f_2(x) = \int_{{\bf R}/{\bf Z}} S(x,\alpha)^2 e(-x\alpha)\ d\alpha \ \ \ \ \ (5)$

for sufficiently large ${x}$, as this quantity ${f_2(x)}$ is also the number of ways to represent ${x}$ as the sum ${x=p_1+p_2}$ of two primes ${p_1,p_2}$. Similarly, to settle the twin prime problem, it would suffice to obtain a lower bound for the quantity

$\displaystyle \tilde f_2(x) = \int_{{\bf R}/{\bf Z}} |S(x,\alpha)|^2 e(-2\alpha)\ d\alpha \ \ \ \ \ (6)$

that goes to infinity as ${x \rightarrow \infty}$, as this quantity ${\tilde f_2(x)}$ is also the number of ways to represent ${2}$ as the difference ${2 = p_1-p_2}$ of two primes less than or equal to ${x}$.

In principle, one can achieve either of these two objectives by a sufficiently fine level of control on the exponential sums ${S(x,\alpha)}$. Indeed, there is a trivial (and uninteresting) way to take any (hypothetical) solution of either the asymptotic even Goldbach problem or the twin prime problem and (artificially) convert it to a proof that “uses the circle method”; one simply begins with the quantity ${f_2(x)}$ or ${\tilde f_2(x)}$, expresses it in terms of ${S(x,\alpha)}$ using (5) or (6), and then uses (5) or (6) again to convert these integrals back into a the combinatorial expression of counting solutions to ${x=p_1+p_2}$ or ${2=p_1-p_2}$, and then uses the hypothetical solution to the given problem to obtain the required lower bounds on ${f_2(x)}$ or ${\tilde f_2(x)}$.

Of course, this would not qualify as a genuine application of the circle method by any reasonable measure. One can then ask the more refined question of whether one could hope to get non-trivial lower bounds on ${f_2(x)}$ or ${\tilde f_2(x)}$ (or similar quantities) purely from the upper and lower bounds on ${S(x,\alpha)}$ or similar quantities (and of various ${L^p}$ type norms on such quantities, such as the ${L^2}$ bound (4)). Of course, we do not yet know what the strongest possible upper and lower bounds in ${S(x,\alpha)}$ are yet (otherwise we would already have made progress on major conjectures such as the Riemann hypothesis); but we can make plausible heuristic conjectures on such bounds. And this is enough to make the following heuristic conclusions:

• (i) For “binary” problems such as computing (5), (6), the contribution of the minor arcs potentially dominates that of the major arcs (if all one is given about the minor arc sums is magnitude information), in contrast to “ternary” problems such as computing (2), in which it is the major arc contribution which is absolutely dominant.
• (ii) Upper and lower bounds on the magnitude of ${S(x,\alpha)}$ are not sufficient, by themselves, to obtain non-trivial bounds on (5), (6) unless these bounds are extremely tight (within a relative error of ${O(1/\log x)}$ or better); but
• (iii) obtaining such tight bounds is a problem of comparable difficulty to the original binary problems.

I will provide some justification for these conclusions below the fold; they are reasonably well known “folklore” to many researchers in the field, but it seems that they are rarely made explicit in the literature (in part because these arguments are, by their nature, heuristic instead of rigorous) and I have been asked about them from time to time, so I decided to try to write them down here.

In view of the above conclusions, it seems that the best one can hope to do by using the circle method for the twin prime or even Goldbach problems is to reformulate such problems into a statement of roughly comparable difficulty to the original problem, even if one assumes powerful conjectures such as the Generalised Riemann Hypothesis (which lets one make very precise control on major arc exponential sums, but not on minor arc ones). These are not rigorous conclusions – after all, we have already seen that one can always artifically insert the circle method into any viable approach on these problems – but they do strongly suggest that one needs a method other than the circle method in order to fully solve either of these two problems. I do not know what such a method would be, though I can give some heuristic objections to some of the other popular methods used in additive number theory (such as sieve methods, or more recently the use of inverse theorems); this will be done at the end of this post.

In this final set of course notes, we discuss how (a generalisation of) the expansion results obtained in the preceding notes can be used for some nnumber-theoretic applications, and in particular to locate almost primes inside orbits of thin groups, following the work of Bourgain, Gamburd, and Sarnak. We will not attempt here to obtain the sharpest or most general results in this direction, but instead focus on the simplest instances of these results which are still illustrative of the ideas involved.

One of the basic general problems in analytic number theory is to locate tuples of primes of a certain form; for instance, the famous (and still unsolved) twin prime conjecture asserts that there are infinitely many pairs ${(n_1,n_2)}$ in the line ${\{ (n_1,n_2) \in {\bf Z}^2: n_2-n_1=2\}}$ in which both entries are prime. In a similar spirit, one of the Landau conjectures (also still unsolved) asserts that there are infinitely many primes in the set ${\{ n^2+1: n \in {\bf Z} \}}$. The Mersenne prime conjecture (also unsolved) asserts that there are infinitely many primes in the set ${\{ 2^n - 1: n \in {\bf Z} \}}$, and so forth.

More generally, given some explicit subset ${V}$ in ${{\bf R}^d}$ (or ${{\bf C}^d}$, if one wishes), such as an algebraic variety, one can ask the question of whether there are infinitely many integer lattice points ${(n_1,\ldots,n_d)}$ in ${V \cap {\bf Z}^d}$ in which all the coefficients ${n_1,\ldots,n_d}$ are simultaneously prime; let us refer to such points as prime points.

At this level of generality, this problem is impossibly difficult. Indeed, even the much simpler problem of deciding whether the set ${V \cap {\bf Z}^d}$ is non-empty (let alone containing prime points) when ${V}$ is a hypersurface ${\{ x \in {\bf R}^d: P(x) = 0 \}}$ cut out by a polynomial ${P}$ is essentially Hilbert’s tenth problem, which is known to be undecidable in general by Matiyasevich’s theorem. So one needs to restrict attention to a more special class of sets ${V}$, in which the question of finding integer points is not so difficult. One model case is to consider orbits ${V = \Gamma b}$, where ${b \in {\bf Z}^d}$ is a fixed lattice vector and ${\Gamma}$ is some discrete group that acts on ${{\bf Z}^d}$ somehow (e.g. ${\Gamma}$ might be embedded as a subgroup of the special linear group ${SL_d({\bf Z})}$, or on the affine group ${SL_d({\bf Z}) \ltimes {\bf Z}^d}$). In such a situation it is then quite easy to show that ${V = V \cap {\bf Z}^d}$ is large; for instance, ${V}$ will be infinite precisely when the stabiliser of ${b}$ in ${\Gamma}$ has infinite index in ${\Gamma}$.

Even in this simpler setting, the question of determining whether an orbit ${V = \Gamma b}$ contains infinitely prime points is still extremely difficult; indeed the three examples given above of the twin prime conjecture, Landau conjecture, and Mersenne prime conjecture are essentially of this form (possibly after some slight modification of the underlying ring ${{\bf Z}}$, see this paper of Bourgain-Gamburd-Sarnak for details), and are all unsolved (and generally considered well out of reach of current technology). Indeed, the list of non-trivial orbits ${V = \Gamma b}$ which are known to contain infinitely many prime points is quite slim; Euclid’s theorem on the infinitude of primes handles the case ${V = {\bf Z}}$, Dirichlet’s theorem handles infinite arithmetic progressions ${V = a{\bf Z} + r}$, and a somewhat complicated result of Green, Tao, and Ziegler handles “non-degenerate” affine lattices in ${{\bf Z}^d}$ of rank two or more (such as the lattice of length ${d}$ arithmetic progressions), but there are few other positive results known that are not based on the above cases (though we will note the remarkable theorem of Friedlander and Iwaniec that there are infinitely many primes of the form ${a^2+b^4}$, and the related result of Heath-Brown that there are infinitely many primes of the form ${a^3+2b^3}$, as being in a kindred spirit to the above results, though they are not explicitly associated to an orbit of a reasonable action as far as I know).

On the other hand, much more is known if one is willing to replace the primes by the larger set of almost primes – integers with a small number of prime factors (counting multiplicity). Specifically, for any ${r \geq 1}$, let us call an ${r}$-almost prime an integer which is the product of at most ${r}$ primes, and possibly by the unit ${-1}$ as well. Many of the above sorts of questions which are open for primes, are known for ${r}$-almost primes for ${r}$ sufficiently large. For instance, with regards to the twin prime conjecture, it is a result of Chen that there are infinitely many pairs ${p,p+2}$ where ${p}$ is a prime and ${p+2}$ is a ${2}$-almost prime; in a similar vein, it is a result of Iwaniec that there are infinitely many ${2}$-almost primes of the form ${n^2+1}$. On the other hand, it is still open for any fixed ${r}$ whether there are infinitely many Mersenne numbers ${2^n-1}$ which are ${r}$-almost primes. (For the superficially similar situation with the numbers ${2^n+1}$, it is in fact believed (but again unproven) that there are only finitely many ${r}$-almost primes for any fixed ${r}$ (the Fermat prime conjecture).)

The main tool that allows one to count almost primes in orbits is sieve theory. The reason for this lies in the simple observation that in order to ensure that an integer ${n}$ of magnitude at most ${x}$ is an ${r}$-almost prime, it suffices to guarantee that ${n}$ is not divisible by any prime less than ${x^{1/(r+1)}}$. Thus, to create ${r}$-almost primes, one can start with the integers up to some large threshold ${x}$ and remove (or “sieve out”) all the integers that are multiples of any prime ${p}$ less than ${x^{1/(r+1)}}$. The difficulty is then to ensure that a sufficiently non-trivial quantity of integers remain after this process, for the purposes of finding points in the given set ${V}$.

The most basic sieve of this form is the sieve of Eratosthenes, which when combined with the inclusion-exclusion principle gives the Legendre sieve (or exact sieve), which gives an exact formula for quantities such as the number ${\pi(x,z)}$ of natural numbers less than or equal to ${x}$ that are not divisible by any prime less than or equal to a given threshold ${z}$. Unfortunately, when one tries to evaluate this formula, one encounters error terms which grow exponentially in ${z}$, rendering this sieve useful only for very small thresholds ${z}$ (of logarithmic size in ${x}$). To improve the sieve level up to a small power of ${x}$ such as ${x^{1/(r+1)}}$, one has to replace the exact sieve by upper bound sieves and lower bound sieves which only seek to obtain upper or lower bounds on quantities such as ${\pi(x,z)}$, but contain a polynomial number of terms rather than an exponential number. There are a variety of such sieves, with the two most common such sieves being combinatorial sieves (such as the beta sieve), based on various combinatorial truncations of the inclusion-exclusion formula, and the Selberg upper bound sieve, based on upper bounds that are the square of a divisor sum. (There is also the large sieve, which is somewhat different in nature and based on ${L^2}$ almost orthogonality considerations, rather than on any actual sieving, to obtain upper bounds.) We will primarily work with a specific sieve in this notes, namely the beta sieve, and we will not attempt to optimise all the parameters of this sieve (which ultimately means that the almost primality parameter ${r}$ in our results will be somewhat large). For a more detailed study of sieve theory, see the classic text of Halberstam and Richert, or the more recent texts of Iwaniec-Kowalski or of Friedlander-Iwaniec.

Very roughly speaking, the end result of sieve theory is that excepting some degenerate and “exponentially thin” settings (such as those associated with the Mersenne primes), all the orbits which are expected to have a large number of primes, can be proven to at least have a large number of ${r}$-almost primes for some finite ${r}$. (Unfortunately, there is a major obstruction, known as the parity problem, which prevents sieve theory from lowering ${r}$ all the way to ${1}$; see this blog post for more discussion.) One formulation of this principle was established by Bourgain, Gamburd, and Sarnak:

Theorem 1 (Bourgain-Gamburd-Sarnak) Let ${\Gamma}$ be a subgroup of ${SL_2({\bf Z})}$ which is not virtually solvable. Let ${f: {\bf Z}^4 \rightarrow {\bf Z}}$ be a polynomial with integer coefficients obeying the following primitivity condition: for any positive integer ${q}$, there exists ${A \in \Gamma \subset {\bf Z}^4}$ such that ${f(A)}$ is coprime to ${q}$. Then there exists an ${r \geq 1}$ such that there are infinitely many ${A \in \Gamma}$ with ${f(A)}$ non-zero and ${r}$-almost prime.

This is not the strongest version of the Bourgain-Gamburd-Sarnak theorem, but it captures the general flavour of their results. Note that the theorem immediately implies an analogous result for orbits ${\Gamma b \subset {\bf Z}^2}$, in which ${f}$ is now a polynomial from ${{\bf Z}^2}$ to ${{\bf Z}}$, and one uses ${f(Ab)}$ instead of ${f(A)}$. It is in fact conjectured that one can set ${r=1}$ here, but this is well beyond current technology. For the purpose of reaching ${r=1}$, it is very natural to impose the primitivity condition, but as long as one is content with larger values of ${r}$, it is possible to relax the primitivity condition somewhat; see the paper of Bourgain, Gamburd, and Sarnak for more discussion.

By specialising to the polynomial ${f: \begin{pmatrix} a & b \\ c & d \end{pmatrix} \rightarrow abcd}$, we conclude as a corollary that as long as ${\Gamma}$ is primitive in the sense that it contains matrices with all coefficients coprime to ${q}$ for any given ${q}$, then ${\Gamma}$ contains infinitely many matrices whose elements are all ${r}$-almost primes for some ${r}$ depending only on ${\Gamma}$. For further applications of these sorts of results, for instance to Appolonian packings, see the paper of Bourgain, Gamburd, and Sarnak.

It turns out that to prove Theorem 1, the Cayley expansion results in ${SL_2(F_p)}$ from the previous set of notes are not quite enough; one needs a more general Cayley expansion result in ${SL_2({\bf Z}/q{\bf Z})}$ where ${q}$ is square-free but not necessarily prime. The proof of this expansion result uses the same basic methods as in the ${SL_2(F_p)}$ case, but is significantly more complicated technically, and we will only discuss it briefly here. As such, we do not give a complete proof of Theorem 1, but hopefully the portion of the argument presented here is still sufficient to give an impression of the ideas involved.

I’ve just uploaded to the arXiv my paper “Every odd number greater than 1 is the sum of at most five primes“, submitted to Mathematics of Computation. The main result of the paper is as stated in the title, and is in the spirit of (though significantly weaker than) the even Goldbach conjecture (every even natural number is the sum of at most two primes) and odd Goldbach conjecture (every odd natural number greater than 1 is the sum of at most three primes). It also improves on a result of Ramaré that every even natural number is the sum of at most six primes. This result had previously also been established by Kaniecki under the additional assumption of the Riemann hypothesis, so one can view the main result here as an unconditional version of Kaniecki’s result.

The method used is the Hardy-Littlewood circle method, which was for instance also used to prove Vinogradov’s theorem that every sufficiently large odd number is the sum of three primes. Let’s quickly recall how this argument works. It is convenient to use a proxy for the primes, such as the von Mangoldt function ${\Lambda}$, which is mostly supported on the primes. To represent a large number ${x}$ as the sum of three primes, it suffices to obtain a good lower bound for the sum

$\displaystyle \sum_{n_1,n_2,n_3: n_1+n_2+n_3=x} \Lambda(n_1) \Lambda(n_2) \Lambda(n_3).$

By Fourier analysis, one can rewrite this sum as an integral

$\displaystyle \int_{{\bf R}/{\bf Z}} S(x,\alpha)^3 e(-x\alpha)\ d\alpha$

where

$\displaystyle S(x,\alpha) := \sum_{n \leq x} \Lambda(n) e(n\alpha)$

and ${e(\theta) :=e^{2\pi i \theta}}$. To control this integral, one then needs good bounds on ${S(x,\alpha)}$ for various values of ${\alpha}$. To do this, one first approximates ${\alpha}$ by a rational ${a/q}$ with controlled denominator (using a tool such as the Dirichlet approximation theorem) ${q}$. The analysis then broadly bifurcates into the major arc case when ${q}$ is small, and the minor arc case when ${q}$ is large. In the major arc case, the problem more or less boils down to understanding sums such as

$\displaystyle \sum_{n\leq x} \Lambda(n) e(an/q),$

which in turn is almost equivalent to understanding the prime number theorem in arithmetic progressions modulo ${q}$. In the minor arc case, the prime number theorem is not strong enough to give good bounds (unless one is using some extremely strong hypotheses, such as the generalised Riemann hypothesis), so instead one uses a rather different method, using truncated versions of divisor sum identities such as ${\Lambda(n) =\sum_{d|n} \mu(d) \log\frac{n}{d}}$ to split ${S(x,\alpha)}$ into a collection of linear and bilinear sums that are more tractable to bound, typical examples of which (after using a particularly simple truncated divisor sum identity known as Vaughan’s identity) include the “Type I sum”

$\displaystyle \sum_{d \leq U} \mu(d) \sum_{n \leq x/d} \log(n) e(\alpha dn)$

and the “Type II sum”

$\displaystyle \sum_{d > U} \sum_{w > V} \mu(d) (\sum_{b|w: b > V} \Lambda(b)) e(\alpha dw) 1_{dw \leq x}.$

After using tools such as the triangle inequality or Cauchy-Schwarz inequality to eliminate arithmetic functions such as ${\mu(d)}$ or ${\sum_{b|w: b>V}\Lambda(b)}$, one ends up controlling plain exponential sums such as ${\sum_{V < w < x/d} e(\alpha dw)}$, which can be efficiently controlled in the minor arc case.

This argument works well when ${x}$ is extremely large, but starts running into problems for moderate sized ${x}$, e.g. ${x \sim 10^{30}}$. The first issue is that of logarithmic losses in the minor arc estimates. A typical minor arc estimate takes the shape

$\displaystyle |S(x,\alpha)| \ll (\frac{x}{\sqrt{q}}+\frac{x}{\sqrt{x/q}} + x^{4/5}) \log^3 x \ \ \ \ \ (1)$

when ${\alpha}$ is close to ${a/q}$ for some ${1\leq q\leq x}$. This only improves upon the trivial estimate ${|S(x,\alpha)| \ll x}$ from the prime number theorem when ${\log^6 x \ll q \ll x/\log^6 x}$. As a consequence, it becomes necessary to obtain an accurate prime number theorem in arithmetic progressions with modulus as large as ${\log^6 x}$. However, with current technology, the error term in such theorems are quite poor (terms such as ${O(\exp(-c\sqrt{\log x}) x)}$ for some small ${c>0}$ are typical, and there is also a notorious “Siegel zero” problem), and as a consequence, the method is generally only applicable for very large ${x}$. For instance, the best explicit result of Vinogradov type known currently is due to Liu and Wang, who established that all odd numbers larger than ${10^{1340}}$ are the sum of three odd primes. (However, on the assumption of the GRH, the full odd Goldbach conjecture is known to be true; this is a result of Deshouillers, Effinger, te Riele, and Zinoviev.)

In this paper, we make a number of refinements to the general scheme, each one of which is individually rather modest and not all that novel, but which when added together turn out to be enough to resolve the five primes problem (though many more ideas would still be needed to tackle the three primes problem, and as is well known the circle method is very unlikely to be the route to make progress on the two primes problem). The first refinement, which is only available in the five primes case, is to take advantage of the numerical verification of the even Goldbach conjecture up to some large ${N_0}$ (we take ${N_0=4\times 10^{14}}$, using a verification of Richstein, although there are now much larger values of ${N_0}$as high as ${2.6 \times 10^{18}}$ – for which the conjecture has been verified). As such, instead of trying to represent an odd number ${x}$ as the sum of five primes, we can represent it as the sum of three odd primes and a natural number between ${2}$ and ${N_0}$. This effectively brings us back to the three primes problem, but with the significant additional boost that one can essentially restrict the frequency variable ${\alpha}$ to be of size ${O(1/N_0)}$. In practice, this eliminates all of the major arcs except for the principal arc around ${0}$. This is a significant simplification, in particular avoiding the need to deal with the prime number theorem in arithmetic progressions (and all the attendant theory of L-functions, Siegel zeroes, etc.).

In a similar spirit, by taking advantage of the numerical verification of the Riemann hypothesis up to some height ${T_0}$, and using the explicit formula relating the von Mangoldt function with the zeroes of the zeta function, one can safely deal with the principal major arc ${\{ \alpha = O( T_0 / x ) \}}$. For our specific application, we use the value ${T_0= 3.29 \times 10^9}$, arising from the verification of the Riemann hypothesis of the first ${10^{10}}$ zeroes by van de Lune (unpublished) and Wedeniswki. (Such verifications have since been extended further, the latest being that the first ${10^{13}}$ zeroes lie on the line.)

To make the contribution of the major arc as efficient as possible, we borrow an idea from a paper of Bourgain, and restrict one of the three primes in the three-primes problem to a somewhat shorter range than the other two (of size ${O(x/K)}$ instead of ${O(x)}$, where we take ${K}$ to be something like ${10^3}$), as this largely eliminates the “Archimedean” losses coming from trying to use Fourier methods to control convolutions on ${{\bf R}}$. In our paper, we set the scale parameter ${K}$ to be ${10^3}$ (basically, anything that is much larger than ${1}$ but much less than ${T_0}$ will work), but we found that an additional gain (which we ended up not using) could be obtained by averaging ${K}$ over a range of scales, say between ${10^3}$ and ${10^6}$. This sort of averaging could be a useful trick in future work on Goldbach-type problems.

It remains to treat the contribution of the “minor arc” ${T_0/x \ll |\alpha| \ll 1/N_0}$. To do this, one needs good ${L^2}$ and ${L^\infty}$ type estimates on the exponential sum ${S(x,\alpha)}$. Plancherel’s theorem gives an ${L^2}$ estimate which loses a logarithmic factor, but it turns out that on this particular minor arc one can use tools from the theory of the large sieve (such as Montgomery’s uncertainty principle) to eliminate this logarithmic loss almost completely; it turns out that the most efficient way to do this is use an effective upper bound of Siebert on the number of prime pairs ${(p,p+h)}$ less than ${x}$ to obtain an ${L^2}$ bound that only loses a factor of ${8}$ (or of ${7}$, once one cuts out the major arc).

For ${L^\infty}$ estimates, it turns out that existing effective versions of (1) (in particular, the bound given by Chen and Wang) are insufficient, due to the three logarithmic factors of ${\log x}$ in the bound. By using a smoothed out version ${S_\eta(x,\alpha) :=\sum_{n}\Lambda(n) e(n\alpha) \eta(n/x)}$ of the sum ${S(\alpha,x)}$, for some suitable cutoff function ${\eta}$, one can save one factor of a logarithm, obtaining a bound of the form

$\displaystyle |S_\eta(x,\alpha)| \ll (\frac{x}{\sqrt{q}}+\frac{x}{\sqrt{x/q}} + x^{4/5}) \log^2 x$

with effective constants. One can improve the constants further by restricting all summations to odd integers (which barely affects ${S_\eta(x,\alpha)}$, since ${\Lambda}$ was mostly supported on odd numbers anyway), which in practice reduces the effective constants by a factor of two or so. One can also make further improvements in the constants by using the very sharp large sieve inequality to control the “Type II” sums that arise from Vaughan’s identity, and by using integration by parts to improve the bounds on the “Type I” sums. A final gain can then be extracted by optimising the cutoff parameters ${U, V}$ appearing in Vaughan’s identity to minimise the contribution of the Type II sums (which, in practice, are the dominant term). Combining all these improvements, one ends up with bounds of the shape

$\displaystyle |S_\eta(x,\alpha)| \ll \frac{x}{q} \log^2 x + \frac{x}{\sqrt{q}} \log^2 q$

when ${q}$ is small (say ${1 < q < x^{1/3}}$) and

$\displaystyle |S_\eta(x,\alpha)| \ll \frac{x}{(x/q)^2} \log^2 x + \frac{x}{\sqrt{x/q}} \log^2(x/q)$

when ${q}$ is large (say ${x^{2/3} < q < x}$). (See the paper for more explicit versions of these estimates.) The point here is that the ${\log x}$ factors have been partially replaced by smaller logarithmic factors such as ${\log q}$ or ${\log x/q}$. Putting together all of these improvements, one can finally obtain a satisfactory bound on the minor arc. (There are still some terms with a ${\log x}$ factor in them, but we use the effective Vinogradov theorem of Liu and Wang to upper bound ${\log x}$ by ${3100}$, which ends up making the remaining terms involving ${\log x}$ manageable.)

Let ${\alpha \in {\bf R}/{\bf Z}}$ be an element of the unit circle, let ${N \geq 1}$, and let ${\rho > 0}$. We define the (rank one) Bohr set ${B_N(\alpha;\rho)}$ to be the set

$\displaystyle B_N(\alpha;\rho) := \{ n \in {\bf Z}: -N \leq n \leq N; \|n\alpha\|_{{\bf R}/{\bf Z}} \leq \rho \}$

where ${\|x\|_{{\bf R}/{\bf Z}}}$ is the distance to the origin in the unit circle (or equivalently, the distance to the nearest integer, after lifting up to ${{\bf R}}$). These sets play an important role in additive combinatorics and in additive number theory. For instance, they arise naturally when applying the circle method, because Bohr sets describe the oscillation of exponential phases such as ${n \mapsto e^{2\pi i n \alpha}}$.

Observe that Bohr sets enjoy the doubling property

$\displaystyle B_N(\alpha;\rho) + B_N(\alpha;\rho) \subset B_{2N}(\alpha;2\rho),$

thus doubling the Bohr set doubles both the length parameter ${N}$ and the radius parameter ${\rho}$. As such, these Bohr sets resemble two-dimensional balls (or boxes). Indeed, one can view ${B_N(\alpha;\rho)}$ as the preimage of the two-dimensional box ${[-1,1] \times [-\rho,\rho] \subset {\bf R} \times {\bf R}/{\bf Z}}$ under the homomorphism ${n \mapsto (n/N, \alpha n \hbox{ mod } 1)}$.

Another class of finite set with two-dimensional behaviour is the class of (rank two) generalised arithmetic progressions

$\displaystyle P( a_1,a_2; N_1,N_2 ) := \{ n_1 a_1 + n_2 a_2: n_1,n_2 \in {\bf Z}; |n_1| \leq N_1, |n_2| \leq N_2 \}$

with ${a_1,a_2 \in {\bf Z}}$ and ${N_1,N_2 > 0}$ Indeed, we have

$\displaystyle P( a_1,a_2; N_1,N_2 ) + P( a_1,a_2; N_1,N_2 ) \subset P( a_1,a_2; 2N_1, 2N_2 )$

and so we see, as with the Bohr set, that doubling the generalised arithmetic progressions doubles the two defining parameters of that progression.

More generally, there is an analogy between rank ${r}$ Bohr sets

$\displaystyle B_N(\alpha_1,\ldots,\alpha_r; \rho_1,\ldots,\rho_r) := \{ n \in {\bf Z}: -N \leq n \leq N; \|n\alpha_i\|_{{\bf R}/{\bf Z}} \leq \rho_i$

$\displaystyle \hbox{ for all } 1 \leq i \leq r \}$

and the rank ${r+1}$ generalised arithmetic progressions

$\displaystyle P( a_1,\ldots,a_{r+1}; N_1,\ldots,N_{r+1} ) := \{ n_1 a_1 + \ldots + n_{r+1} a_{r+1}:$

$\displaystyle n_1,\ldots,n_{r+1} \in {\bf Z}; |n_i| \leq N_i \hbox{ for all } 1 \leq i \leq r+1 \}.$

One of the aims of additive combinatorics is to formalise analogies such as the one given above. By using some arguments from the geometry of numbers, for instance, one can show that for any rank ${r}$ Bohr set ${B_N(\alpha_1,\ldots,\alpha_r;\rho_1,\ldots,\rho_r)}$, there is a rank ${r+1}$ generalised arithmetic progression ${P(a_1,\ldots,a_{r+1}; N_1,\ldots,N_{r+1})}$ for which one has the containments

$\displaystyle B_N(\alpha_1,\ldots,\alpha_r;\epsilon \rho_1,\ldots,\epsilon \rho_r) \subset P(a_1,\ldots,a_{r+1}; N_1,\ldots,N_{r+1})$

$\displaystyle \subset B_N(\alpha_1,\ldots,\alpha_r;\rho_1,\ldots,\rho_r)$

for some explicit ${\epsilon>0}$ depending only on ${r}$ (in fact one can take ${\epsilon = (r+1)^{-2(r+1)}}$); this is (a slight modification of) Lemma 4.22 of my book with Van Vu.

In the special case when ${r=1}$, one can make a significantly more detailed description of the link between rank one Bohr sets and rank two generalised arithmetic progressions, by using the classical theory of continued fractions, which among other things gives a fairly precise formula for the generators ${a_1,a_2}$ and lengths ${N_1,N_2}$ of the generalised arithmetic progression associated to a rank one Bohr set ${B_N(\alpha;\rho)}$. While this connection is already implicit in the continued fraction literature (for instance, in the classic text of Hardy and Wright), I thought it would be a good exercise to work it out explicitly and write it up, which I will do below the fold.

It is unfortunate that the theory of continued fractions is restricted to the rank one setting (it relies very heavily on the total ordering of one-dimensional sets such as ${{\bf Z}}$ or ${{\bf R}}$). A higher rank version of the theory could potentially help with questions such as the Littlewood conjecture, which remains open despite a substantial amount of effort and partial progress on the problem. At the end of this post I discuss how one can use the rank one theory to rephrase the Littlewood conjecture as a conjecture about a doubly indexed family of rank four progressions, which can be used to heuristically justify why this conjecture should be true, but does not otherwise seem to shed much light on the problem.

One of the most fundamental principles in Fourier analysis is the uncertainty principle. It does not have a single canonical formulation, but one typical informal description of the principle is that if a function ${f}$ is restricted to a narrow region of physical space, then its Fourier transform ${\hat f}$ must be necessarily “smeared out” over a broad region of frequency space. Some versions of the uncertainty principle are discussed in this previous blog post.

In this post I would like to highlight a useful instance of the uncertainty principle, due to Hugh Montgomery, which is useful in analytic number theory contexts. Specifically, suppose we are given a complex-valued function ${f: {\bf Z} \rightarrow {\bf C}}$ on the integers. To avoid irrelevant issues at spatial infinity, we will assume that the support ${\hbox{supp}(f) := \{ n \in {\bf Z}: f(n) \neq 0 \}}$ of this function is finite (in practice, we will only work with functions that are supported in an interval ${[M+1,M+N]}$ for some natural numbers ${M,N}$). Then we can define the Fourier transform ${\hat f: {\bf R}/{\bf Z} \rightarrow {\bf C}}$ by the formula

$\displaystyle \hat f(\xi) := \sum_{n \in {\bf Z}} f(n) e(-n\xi)$

where ${e(x) := e^{2\pi i x}}$. (In some literature, the sign in the exponential phase is reversed, but this will make no substantial difference to the arguments below.)

The classical uncertainty principle, in this context, asserts that if ${f}$ is localised in an interval of length ${N}$, then ${\hat f}$ must be “smeared out” at a scale of at least ${1/N}$ (and essentially constant at scales less than ${1/N}$). For instance, if ${f}$ is supported in ${[M+1,M+N]}$, then we have the Plancherel identity

$\displaystyle \int_{{\bf R}/{\bf Z}} |\hat f(\xi)|^2\ d\xi = \| f \|_{\ell^2({\bf Z})}^2 \ \ \ \ \ (1)$

while from the Cauchy-Schwarz inequality we have

$\displaystyle |\hat f(\xi)| \leq N^{1/2} \| f \|_{\ell^2({\bf Z})}$

for each frequency ${\xi}$, and in particular that

$\displaystyle \int_I |\hat f(\xi)|^2\ d\xi \leq N |I| \| f \|_{\ell^2({\bf Z})}^2$

for any arc ${I}$ in the unit circle (with ${|I|}$ denoting the length of ${I}$). In particular, an interval of length significantly less than ${1/N}$ can only capture a fraction of the ${L^2}$ energy of the Fourier transform of ${\hat f}$, which is consistent with the above informal statement of the uncertainty principle.

Another manifestation of the classical uncertainty principle is the large sieve inequality. A particularly nice formulation of this inequality is due independently to Montgomery and Vaughan and Selberg: if ${f}$ is supported in ${[M+1,M+N]}$, and ${\xi_1,\ldots,\xi_J}$ are frequencies in ${{\bf R}/{\bf Z}}$ that are ${\delta}$-separated for some ${\delta>0}$, thus ${\| \xi_i-\xi_j\|_{{\bf R}/{\bf Z}} \geq \delta}$ for all ${1 \leq i (where ${\|\xi\|_{{\bf R}/{\bf Z}}}$ denotes the distance of ${\xi}$ to the origin in ${{\bf R}/{\bf Z}}$), then

$\displaystyle \sum_{j=1}^J |\hat f(\xi_j)|^2 \leq (N + \frac{1}{\delta}) \| f \|_{\ell^2({\bf Z})}^2. \ \ \ \ \ (2)$

The reader is encouraged to see how this inequality is consistent with the Plancherel identity (1) and the intuition that ${\hat f}$ is essentially constant at scales less than ${1/N}$. The factor ${N + \frac{1}{\delta}}$ can in fact be amplified a little bit to ${N + \frac{1}{\delta} - 1}$, which is essentially optimal, by using a neat dilation trick of Paul Cohen, in which one dilates ${[M+1,M+N]}$ to ${[MK+K, MK+NK]}$ (and replaces each frequency ${\xi_j}$ by their ${K^{th}}$ roots), and then sending ${K \rightarrow \infty}$ (cf. the tensor product trick); see this survey of Montgomery for details. But we will not need this refinement here.

In the above instances of the uncertainty principle, the concept of narrow support in physical space was formalised in the Archimedean sense, using the standard Archimedean metric ${d_\infty(n,m) := |n-m|}$ on the integers ${{\bf Z}}$ (in particular, the parameter ${N}$ is essentially the Archimedean diameter of the support of ${f}$). However, in number theory, the Archimedean metric is not the only metric of importance on the integers; the ${p}$-adic metrics play an equally important role; indeed, it is common to unify the Archimedean and ${p}$-adic perspectives together into a unified adelic perspective. In the ${p}$-adic world, the metric balls are no longer intervals, but are instead residue classes modulo some power of ${p}$. Intersecting these balls from different ${p}$-adic metrics together, we obtain residue classes with respect to various moduli ${q}$ (which may be either prime or composite). As such, another natural manifestation of the concept of “narrow support in physical space” is “vanishes on many residue classes modulo ${q}$“. This notion of narrowness is particularly common in sieve theory, when one deals with functions supported on thin sets such as the primes, which naturally tend to avoid many residue classes (particularly if one throws away the first few primes).

In this context, the uncertainty principle is this: the more residue classes modulo ${q}$ that ${f}$ avoids, the more the Fourier transform ${\hat f}$ must spread out along multiples of ${1/q}$. To illustrate a very simple example of this principle, let us take ${q=2}$, and suppose that ${f}$ is supported only on odd numbers (thus it completely avoids the residue class ${0 \mod 2}$). We write out the formulae for ${\hat f(\xi)}$ and ${\hat f(\xi+1/2)}$:

$\displaystyle \hat f(\xi) = \sum_n f(n) e(-n\xi)$

$\displaystyle \hat f(\xi+1/2) = \sum_n f(n) e(-n\xi) e(-n/2).$

If ${f}$ is supported on the odd numbers, then ${e(-n/2)}$ is always equal to ${-1}$ on the support of ${f}$, and so we have ${\hat f(\xi+1/2)=-\hat f(\xi)}$. Thus, whenever ${\hat f}$ has a significant presence at a frequency ${\xi}$, it also must have an equally significant presence at the frequency ${\xi+1/2}$; there is a spreading out across multiples of ${1/2}$. Note that one has a similar effect if ${f}$ was supported instead on the even integers instead of the odd integers.

A little more generally, suppose now that ${f}$ avoids a single residue class modulo a prime ${p}$; for sake of argument let us say that it avoids the zero residue class ${0 \mod p}$, although the situation for the other residue classes is similar. For any frequency ${\xi}$ and any ${j=0,\ldots,p-1}$, one has

$\displaystyle \hat f(\xi+j/p) = \sum_n f(n) e(-n\xi) e(-jn/p).$

From basic Fourier analysis, we know that the phases ${e(-jn/p)}$ sum to zero as ${j}$ ranges from ${0}$ to ${p-1}$ whenever ${n}$ is not a multiple of ${p}$. We thus have

$\displaystyle \sum_{j=0}^{p-1} \hat f(\xi+j/p) = 0.$

In particular, if ${\hat f(\xi)}$ is large, then one of the other ${\hat f(\xi+j/p)}$ has to be somewhat large as well; using the Cauchy-Schwarz inequality, we can quantify this assertion in an ${\ell^2}$ sense via the inequality

$\displaystyle \sum_{j=1}^{p-1} |\hat f(\xi+j/p)|^2 \geq \frac{1}{p-1} |\hat f(\xi)|^2. \ \ \ \ \ (3)$

Let us continue this analysis a bit further. Now suppose that ${f}$ avoids ${\omega(p)}$ residue classes modulo a prime ${p}$, for some ${0 \leq \omega(p) < p}$. (We exclude the case ${\omega(p)=p}$ as it is clearly degenerates by forcing ${f}$ to be identically zero.) Let ${\nu(n)}$ be the function that equals ${1}$ on these residue classes and zero away from these residue classes, then

$\displaystyle \sum_n f(n) e(-n\xi) \nu(n) = 0.$

Using the periodic Fourier transform, we can write

$\displaystyle \nu(n) = \sum_{j=0}^{p-1} c(j) e(-jn/p)$

for some coefficients ${c(j)}$, thus

$\displaystyle \sum_{j=0}^{p-1} \hat f(\xi+j/p) c(j) = 0.$

Some Fourier-analytic computations reveal that

$\displaystyle c(0) = \frac{\omega(p)}{p}$

and

$\displaystyle \sum_{j=0}^{p-1} |c(j)|^2 = \frac{\omega(p)}{p}$

and so after some routine algebra and the Cauchy-Schwarz inequality, we obtain a generalisation of (3):

$\displaystyle \sum_{j=1}^{p-1} |\hat f(\xi+j/p)|^2 \geq \frac{\omega(p)}{p-\omega(p)} |\hat f(\xi)|^2.$

Thus we see that the more residue classes mod ${p}$ we exclude, the more Fourier energy has to disperse along multiples of ${1/p}$. It is also instructive to consider the extreme case ${\omega(p)=p-1}$, in which ${f}$ is supported on just a single residue class ${a \mod p}$; in this case, one clearly has ${\hat f(\xi+j/p) = e(-aj/p) \hat f(\xi)}$, and so ${\hat f}$ spreads its energy completely evenly along multiples of ${1/p}$.

In 1968, Montgomery observed the following useful generalisation of the above calculation to arbitrary modulus:

Proposition 1 (Montgomery’s uncertainty principle) Let ${f: {\bf Z} \rightarrow{\bf C}}$ be a finitely supported function which, for each prime ${p}$, avoids ${\omega(p)}$ residue classes modulo ${p}$ for some ${0 \leq \omega(p) < p}$. Then for each natural number ${q}$, one has

$\displaystyle \sum_{1 \leq a \leq q: (a,q)=1} |\hat f(\xi+\frac{a}{q})|^2 \geq h(q) |\hat f(\xi)|^2$

where ${h(q)}$ is the quantity

$\displaystyle h(q) := \mu(q)^2 \prod_{p|q} \frac{\omega(p)}{p-\omega(p)} \ \ \ \ \ (4)$

where ${\mu}$ is the Möbius function.

We give a proof of this proposition below the fold.

Following the “adelic” philosophy, it is natural to combine this uncertainty principle with the large sieve inequality to take simultaneous advantage of localisation both in the Archimedean sense and in the ${p}$-adic senses. This leads to the following corollary:

Corollary 2 (Arithmetic large sieve inequality) Let ${f: {\bf Z} \rightarrow {\bf C}}$ be a function supported on an interval ${[M+1,M+N]}$ which, for each prime ${p}$, avoids ${\omega(p)}$ residue classes modulo ${p}$ for some ${0 \leq \omega(p) < p}$. Let ${\xi_1,\ldots,\xi_J \in {\bf R}/{\bf Z}}$, and let ${{\mathcal Q}}$ be a finite set of natural numbers. Suppose that the frequencies ${\xi_j + \frac{a}{q}}$ with ${1 \leq j \leq J}$, ${q \in {\mathcal Q}}$, and ${(a,q)=1}$ are ${\delta}$-separated. Then one has

$\displaystyle \sum_{j=1}^J |\hat f(\xi_j)|^2 \leq \frac{N + \frac{1}{\delta}}{\sum_{q \in {\mathcal Q}} h(q)} \| f \|_{\ell^2({\bf Z})}^2$

where ${h(q)}$ was defined in (4).

Indeed, from the large sieve inequality one has

$\displaystyle \sum_{j=1}^J \sum_{q \in {\mathcal Q}} \sum_{1 \leq a \leq q: (a,q)=1} |\hat f(\xi_j+\frac{a}{q})|^2 \leq (N + \frac{1}{\delta}) \| f \|_{\ell^2({\bf Z})}^2$

while from Proposition 1 one has

$\displaystyle \sum_{q \in {\mathcal Q}} \sum_{1 \leq a \leq q: (a,q)=1} |\hat f(\xi_j+\frac{a}{q})|^2 \geq (\sum_{q \in {\mathcal Q}} h(q)) |\hat f(\xi_j)|^2$

whence the claim.

There is a great deal of flexibility in the above inequality, due to the ability to select the set ${{\mathcal Q}}$, the frequencies ${\xi_1,\ldots,\xi_J}$, the omitted classes ${\omega(p)}$, and the separation parameter ${\delta}$. Here is a typical application concerning the original motivation for the large sieve inequality, namely in bounding the size of sets which avoid many residue classes:

Corollary 3 (Large sieve) Let ${A}$ be a set of integers contained in ${[M+1,M+N]}$ which avoids ${\omega(p)}$ residue classes modulo ${p}$ for each prime ${p}$, and let ${R>0}$. Then

$\displaystyle |A| \leq \frac{N+R^2}{G(R)}$

where

$\displaystyle G(R) := \sum_{q \leq R} h(q).$

Proof: We apply Corollary 2 with ${f = 1_A}$, ${J=1}$, ${\delta=1/R^2}$, ${\xi_1=0}$, and ${{\mathcal Q} := \{ q: q \leq R\}}$. The key point is that the Farey sequence of fractions ${a/q}$ with ${q \leq R}$ and ${(a,q)=1}$ is ${1/R^2}$-separated, since

$\displaystyle \| \frac{a}{q}-\frac{a'}{q'} \|_{{\bf R}/{\bf Z}} \geq \frac{1}{qq'} \geq \frac{1}{R^2}$

whenever ${a/q, a'/q'}$ are distinct fractions in this sequence. $\Box$

If, for instance, ${A}$ is the set of all primes in ${[M+1,M+N]}$ larger than ${R}$, then one can set ${\omega(p)=1}$ for all ${p \leq R}$, which makes ${h(q) = \frac{\mu^2(q)}{\phi(q)}}$, where ${\phi}$ is the Euler totient function. It is a classical estimate that

$\displaystyle G(R) = \log R + O(1).$

Using this fact and optimising in ${R}$, we obtain (a special case of) the Brun-Titchmarsh inequality

$\displaystyle \pi(M+N)-\pi(M) \leq (2+o_{N \rightarrow \infty}(1)) \frac{N}{\log N}$

where ${\pi(x)}$ is the prime counting function; a variant of the same argument gives the more general Brun-Titchmarsh inequality

$\displaystyle \pi(M+N;a,q)-\pi(M;a,q) \leq (2+o_{N \rightarrow \infty;q}(1)) \frac{q}{\phi(q)} \frac{N}{\log N}$

for any primitive residue class ${a \mod q}$, where ${\pi(M;a,q)}$ is the number of primes less than or equal to ${M}$ that are congruent to ${a \mod q}$. By performing a more careful optimisation using a slightly sharper version of the large sieve inequality (2) that exploits the irregular spacing of the Farey sequence, Montgomery and Vaughan were able to delete the error term in the Brun-Titchmarsh inequality, thus establishing the very nice inequality

$\displaystyle \pi(M+N;a,q)-\pi(M;a,q) \leq 2 \frac{q}{\phi(q)} \frac{N}{\log N}$

for any natural numbers ${M,N,a,q}$ with ${N>1}$. This is a particularly useful inequality in non-asymptotic analytic number theory (when one wishes to study number theory at explicit orders of magnitude, rather than the number theory of sufficiently large numbers), due to the absence of asymptotic notation.

I recently realised that Corollary 2 also establishes a stronger version of the “restriction theorem for the Selberg sieve” that Ben Green and I proved some years ago (indeed, one can view Corollary 2 as a “restriction theorem for the large sieve”). I’m placing the details below the fold.

In the previous set of notes we saw how a representation-theoretic property of groups, namely Kazhdan’s property (T), could be used to demonstrate expansion in Cayley graphs. In this set of notes we discuss a different representation-theoretic property of groups, namely quasirandomness, which is also useful for demonstrating expansion in Cayley graphs, though in a somewhat different way to property (T). For instance, whereas property (T), being qualitative in nature, is only interesting for infinite groups such as ${SL_d({\bf Z})}$ or ${SL_d({\bf R})}$, and only creates Cayley graphs after passing to a finite quotient, quasirandomness is a quantitative property which is directly applicable to finite groups, and is able to deduce expansion in a Cayley graph, provided that random walks in that graph are known to become sufficiently “flat” in a certain sense.

The definition of quasirandomness is easy enough to state:

Definition 1 (Quasirandom groups) Let ${G}$ be a finite group, and let ${D \geq 1}$. We say that ${G}$ is ${D}$-quasirandom if all non-trivial unitary representations ${\rho: G \rightarrow U(H)}$ of ${G}$ have dimension at least ${D}$. (Recall a representation is trivial if ${\rho(g)}$ is the identity for all ${g \in G}$.)

Exercise 1 Let ${G}$ be a finite group, and let ${D \geq 1}$. A unitary representation ${\rho: G \rightarrow U(H)}$ is said to be irreducible if ${H}$ has no ${G}$-invariant subspaces other than ${\{0\}}$ and ${H}$. Show that ${G}$ is ${D}$-quasirandom if and only if every non-trivial irreducible representation of ${G}$ has dimension at least ${D}$.

Remark 1 The terminology “quasirandom group” was introduced explicitly (though with slightly different notational conventions) by Gowers in 2008 in his detailed study of the concept; the name arises because dense Cayley graphs in quasirandom groups are quasirandom graphs in the sense of Chung, Graham, and Wilson, as we shall see below. This property had already been used implicitly to construct expander graphs by Sarnak and Xue in 1991, and more recently by Gamburd in 2002 and by Bourgain and Gamburd in 2008. One can of course define quasirandomness for more general locally compact groups than the finite ones, but we will only need this concept in the finite case. (A paper of Kunze and Stein from 1960, for instance, exploits the quasirandomness properties of the locally compact group ${SL_2({\bf R})}$ to obtain mixing estimates in that group.)

Quasirandomness behaves fairly well with respect to quotients and short exact sequences:

Exercise 2 Let ${0 \rightarrow H \rightarrow G \rightarrow K \rightarrow 0}$ be a short exact sequence of finite groups ${H,G,K}$.

• (i) If ${G}$ is ${D}$-quasirandom, show that ${K}$ is ${D}$-quasirandom also. (Equivalently: any quotient of a ${D}$-quasirandom finite group is again a ${D}$-quasirandom finite group.)
• (ii) Conversely, if ${H}$ and ${K}$ are both ${D}$-quasirandom, show that ${G}$ is ${D}$-quasirandom also. (In particular, the direct or semidirect product of two ${D}$-quasirandom finite groups is again a ${D}$-quasirandom finite group.)

Informally, we will call ${G}$ quasirandom if it is ${D}$-quasirandom for some “large” ${D}$, though the precise meaning of “large” will depend on context. For applications to expansion in Cayley graphs, “large” will mean “${D \geq |G|^c}$ for some constant ${c>0}$ independent of the size of ${G}$“, but other regimes of ${D}$ are certainly of interest.

The way we have set things up, the trivial group ${G = \{1\}}$ is infinitely quasirandom (i.e. it is ${D}$-quasirandom for every ${D}$). This is however a degenerate case and will not be discussed further here. In the non-trivial case, a finite group can only be quasirandom if it is large and has no large subgroups:

Exercise 3 Let ${D \geq 1}$, and let ${G}$ be a finite ${D}$-quasirandom group.

• (i) Show that if ${G}$ is non-trivial, then ${|G| \geq D+1}$. (Hint: use the mean zero component ${\tau\downharpoonright_{\ell^2(G)_0}}$ of the regular representation ${\tau: G \rightarrow U(\ell^2(G))}$.) In particular, non-trivial finite groups cannot be infinitely quasirandom.
• (ii) Show that any proper subgroup ${H}$ of ${G}$ has index ${[G:H] \geq D+1}$. (Hint: use the mean zero component of the quasiregular representation.)

The following exercise shows that quasirandom groups have to be quite non-abelian, and in particular perfect:

Exercise 4 (Quasirandomness, abelianness, and perfection) Let ${G}$ be a finite group.

• (i) If ${G}$ is abelian and non-trivial, show that ${G}$ is not ${2}$-quasirandom. (Hint: use Fourier analysis or the classification of finite abelian groups.)
• (ii) Show that ${G}$ is ${2}$-quasirandom if and only if it is perfect, i.e. the commutator group ${[G,G]}$ is equal to ${G}$. (Equivalently, ${G}$ is ${2}$-quasirandom if and only if it has no non-trivial abelian quotients.)

Later on we shall see that there is a converse to the above two exercises; any non-trivial perfect finite group with no large subgroups will be quasirandom.

Exercise 5 Let ${G}$ be a finite ${D}$-quasirandom group. Show that for any subgroup ${G'}$ of ${G}$, ${G'}$ is ${D/[G:G']}$-quasirandom, where ${[G:G'] := |G|/|G'|}$ is the index of ${G'}$ in ${G}$. (Hint: use induced representations.)

Now we give an example of a more quasirandom group.

Lemma 2 (Frobenius lemma) If ${F_p}$ is a field of some prime order ${p}$, then ${SL_2(F_p)}$ is ${\frac{p-1}{2}}$-quasirandom.

This should be compared with the cardinality ${|SL_2(F_p)|}$ of the special linear group, which is easily computed to be ${(p^2-1) \times p = p^3 - p}$.

Proof: We may of course take ${p}$ to be odd. Suppose for contradiction that we have a non-trivial representation ${\rho: SL_2(F_p) \rightarrow U_d({\bf C})}$ on a unitary group of some dimension ${d}$ with ${d < \frac{p-1}{2}}$. Set ${a}$ to be the group element

$\displaystyle a := \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix},$

and suppose first that ${\rho(a)}$ is non-trivial. Since ${a^p=1}$, we have ${\rho(a)^p=1}$; thus all the eigenvalues of ${\rho(a)}$ are ${p^{th}}$ roots of unity. On the other hand, by conjugating ${a}$ by diagonal matrices in ${SL_2(F_p)}$, we see that ${a}$ is conjugate to ${a^m}$ (and hence ${\rho(a)}$ conjugate to ${\rho(a)^m}$) whenever ${m}$ is a quadratic residue mod ${p}$. As such, the eigenvalues of ${\rho(a)}$ must be permuted by the operation ${x \mapsto x^m}$ for any quadratic residue mod ${p}$. Since ${\rho(a)}$ has at least one non-trivial eigenvalue, and there are ${\frac{p-1}{2}}$ distinct quadratic residues, we conclude that ${\rho(a)}$ has at least ${\frac{p-1}{2}}$ distinct eigenvalues. But ${\rho(a)}$ is a ${d \times d}$ matrix with ${d < \frac{p-1}{2}}$, a contradiction. Thus ${a}$ lies in the kernel of ${\rho}$. By conjugation, we then see that this kernel contains all unipotent matrices. But these matrices generate ${SL_2(F_p)}$ (see exercise below), and so ${\rho}$ is trivial, a contradiction. $\Box$

Exercise 6 Show that for any prime ${p}$, the unipotent matrices

$\displaystyle \begin{pmatrix} 1 & t \\ 0 & 1 \end{pmatrix}, \begin{pmatrix} 1 & 0 \\ t & 1 \end{pmatrix}$

for ${t}$ ranging over ${F_p}$ generate ${SL_2(F_p)}$ as a group.

Exercise 7 Let ${G}$ be a finite group, and let ${D \geq 1}$. If ${G}$ is generated by a collection ${G_1,\ldots,G_k}$ of ${D}$-quasirandom subgroups, show that ${G}$ is itself ${D}$-quasirandom.

Exercise 8 Show that ${SL_d(F_p)}$ is ${\frac{p-1}{2}}$-quasirandom for any ${d \geq 2}$ and any prime ${p}$. (This is not sharp; the optimal bound here is ${\gg_d p^{d-1}}$, which follows from the results of Landazuri and Seitz.)

As a corollary of the above results and Exercise 2, we see that the projective special linear group ${PSL_d(F_p)}$ is also ${\frac{p-1}{2}}$-quasirandom.

Remark 2 One can ask whether the bound ${\frac{p-1}{2}}$ in Lemma 2 is sharp, assuming of course that ${p}$ is odd. Noting that ${SL_2(F_p)}$ acts linearly on the plane ${F_p^2}$, we see that it also acts projectively on the projective line ${PF_p^1 := (F_p^2 \backslash \{0\}) / F_p^\times}$, which has ${p+1}$ elements. Thus ${SL_2(F_p)}$ acts via the quasiregular representation on the ${p+1}$-dimensional space ${\ell^2(PF_p^1)}$, and also on the ${p}$-dimensional subspace ${\ell^2(PF_p^1)_0}$; this latter representation (known as the Steinberg representation) is irreducible. This shows that the ${\frac{p-1}{2}}$ bound cannot be improved beyond ${p}$. More generally, given any character ${\chi: F_p^\times \rightarrow S^1}$, ${SL_2(F_p)}$ acts on the ${p+1}$-dimensional space ${V_\chi}$ of functions ${f \in \ell^2( F_p^2 \backslash \{0\} )}$ that obey the twisted dilation invariance ${f(tx) = \chi(t) f(x)}$ for all ${t \in F_p^\times}$ and ${x \in F_p^2 \backslash \{0\}}$; these are known as the principal series representations. When ${\chi}$ is the trivial character, this is the quasiregular representation discussed earlier. For most other characters, this is an irreducible representation, but it turns out that when ${\chi}$ is the quadratic representation (thus taking values in ${\{-1,+1\}}$ while being non-trivial), the principal series representation splits into the direct sum of two ${\frac{p+1}{2}}$-dimensional representations, which comes very close to matching the bound in Lemma 2. There is a parallel series of representations to the principal series (known as the discrete series) which is more complicated to describe (roughly speaking, one has to embed ${F_p}$ in a quadratic extension ${F_{p^2}}$ and then use a rotated version of the above construction, to change a split torus into a non-split torus), but can generate irreducible representations of dimension ${\frac{p-1}{2}}$, showing that the bound in Lemma 2 is in fact exactly sharp. These constructions can be generalised to arbitrary finite groups of Lie type using Deligne-Luzstig theory, but this is beyond the scope of this course (and of my own knowledge in the subject).

Exercise 9 Let ${p}$ be an odd prime. Show that for any ${n \geq p+2}$, the alternating group ${A_n}$ is ${p-1}$-quasirandom. (Hint: show that all cycles of order ${p}$ in ${A_n}$ are conjugate to each other in ${A_n}$ (and not just in ${S_n}$); in particular, a cycle is conjugate to its ${j^{th}}$ power for all ${j=1,\ldots,p-1}$. Also, as ${n \geq 5}$, ${A_n}$ is simple, and so the cycles of order ${p}$ generate the entire group.)

Remark 3 By using more precise information on the representations of the alternating group (using the theory of Specht modules and Young tableaux), one can show the slightly sharper statement that ${A_n}$ is ${n-1}$-quasirandom for ${n \geq 6}$ (but is only ${3}$-quasirandom for ${n=5}$ due to icosahedral symmetry, and ${1}$-quasirandom for ${n \leq 4}$ due to lack of perfectness). Using Exercise 3 with the index ${n}$ subgroup ${A_{n-1}}$, we see that the bound ${n-1}$ cannot be improved. Thus, ${A_n}$ (for large ${n}$) is not as quasirandom as the special linear groups ${SL_d(F_p)}$ (for ${p}$ large and ${d}$ bounded), because in the latter case the quasirandomness is as strong as a power of the size of the group, whereas in the former case it is only logarithmic in size.

If one replaces the alternating group ${A_n}$ with the slightly larger symmetric group ${S_n}$, then quasirandomness is destroyed (since ${S_n}$, having the abelian quotient ${S_n/A_n}$, is not perfect); indeed, ${S_n}$ is ${1}$-quasirandom and no better.

Remark 4 Thanks to the monumental achievement of the classification of finite simple groups, we know that apart from a finite number (26, to be precise) of sporadic exceptions, all finite simple groups (up to isomorphism) are either a cyclic group ${{\bf Z}/p{\bf Z}}$, an alternating group ${A_n}$, or is a finite simple group of Lie type such as ${PSL_d(F_p)}$. (We will define the concept of a finite simple group of Lie type more precisely in later notes, but suffice to say for now that such groups are constructed from reductive algebraic groups, for instance ${PSL_d(F_p)}$ is constructed from ${SL_d}$ in characteristic ${p}$.) In the case of finite simple groups ${G}$ of Lie type with bounded rank ${r=O(1)}$, it is known from the work of Landazuri and Seitz that such groups are ${\gg |G|^c}$-quasirandom for some ${c>0}$ depending only on the rank. On the other hand, by the previous remark, the large alternating groups do not have this property, and one can show that the finite simple groups of Lie type with large rank also do not have this property. Thus, we see using the classification that if a finite simple group ${G}$ is ${|G|^c}$-quasirandom for some ${c>0}$ and ${|G|}$ is sufficiently large depending on ${c}$, then ${G}$ is a finite simple group of Lie type with rank ${O_c(1)}$. It would be of interest to see if there was an alternate way to establish this fact that did not rely on the classification, as it may lead to an alternate approach to proving the classification (or perhaps a weakened version thereof).

A key reason why quasirandomness is desirable for the purposes of demonstrating expansion is that quasirandom groups happen to be rapidly mixing at large scales, as we shall see below the fold. As such, quasirandomness is an important tool for demonstrating expansion in Cayley graphs, though because expansion is a phenomenon that must hold at all scales, one needs to supplement quasirandomness with some additional input that creates mixing at small or medium scales also before one can deduce expansion. As an example of this technique of combining quasirandomness with mixing at small and medium scales, we present a proof (due to Sarnak-Xue, and simplified by Gamburd) of a weak version of the famous “3/16 theorem” of Selberg on the least non-trivial eigenvalue of the Laplacian on a modular curve, which among other things can be used to construct a family of expander Cayley graphs in ${SL_2({\bf Z}/N{\bf Z})}$ (compare this with the property (T)-based methods in the previous notes, which could construct expander Cayley graphs in ${SL_d({\bf Z}/N{\bf Z})}$ for any fixed ${d \geq 3}$).