You are currently browsing the tag archive for the ‘prime numbers’ tag.

The twin prime conjecture is one of the oldest unsolved problems in analytic number theory. There are several reasons why this conjecture remains out of reach of current techniques, but the most important obstacle is the parity problem which prevents purely sieve-theoretic methods (or many other popular methods in analytic number theory, such as the circle method) from detecting pairs of prime twins in a way that can distinguish them from other twins of almost primes. The parity problem is discussed in these previous blog posts; this obstruction is ultimately powered by the Möbius pseudorandomness principle that asserts that the Möbius function ${\mu}$ is asymptotically orthogonal to all “structured” functions (and in particular, to the weight functions constructed from sieve theory methods).

However, there is an intriguing “alternate universe” in which the Möbius function is strongly correlated with some structured functions, and specifically with some Dirichlet characters, leading to the existence of the infamous “Siegel zero“. In this scenario, the parity problem obstruction disappears, and it becomes possible, in principle, to attack problems such as the twin prime conjecture. In particular, we have the following result of Heath-Brown:

Theorem 1 At least one of the following two statements are true:

• (Twin prime conjecture) There are infinitely many primes ${p}$ such that ${p+2}$ is also prime.
• (No Siegel zeroes) There exists a constant ${c>0}$ such that for every real Dirichlet character ${\chi}$ of conductor ${q > 1}$, the associated Dirichlet ${L}$-function ${s \mapsto L(s,\chi)}$ has no zeroes in the interval ${[1-\frac{c}{\log q}, 1]}$.

Informally, this result asserts that if one had an infinite sequence of Siegel zeroes, one could use this to generate infinitely many twin primes. See this survey of Friedlander and Iwaniec for more on this “illusory” or “ghostly” parallel universe in analytic number theory that should not actually exist, but is surprisingly self-consistent and to date proven to be impossible to banish from the realm of possibility.

The strategy of Heath-Brown’s proof is fairly straightforward to describe. The usual starting point is to try to lower bound

$\displaystyle \sum_{x \leq n \leq 2x} \Lambda(n) \Lambda(n+2) \ \ \ \ \ (1)$

for some large value of ${x}$, where ${\Lambda}$ is the von Mangoldt function. Actually, in this post we will work with the slight variant

$\displaystyle \sum_{x \leq n \leq 2x} \Lambda_2(n(n+2)) \nu(n(n+2))$

where

$\displaystyle \Lambda_2(n) = (\mu * L^2)(n) = \sum_{d|n} \mu(d) \log^2 \frac{n}{d}$

is the second von Mangoldt function, and ${*}$ denotes Dirichlet convolution, and ${\nu}$ is an (unsquared) Selberg sieve that damps out small prime factors. This sum also detects twin primes, but will lead to slightly simpler computations. For technical reasons we will also smooth out the interval ${x \leq n \leq 2x}$ and remove very small primes from ${n}$, but we will skip over these steps for the purpose of this informal discussion. (In Heath-Brown’s original paper, the Selberg sieve ${\nu}$ is essentially replaced by the more combinatorial restriction ${1_{(n(n+2),q^{1/C}\#)=1}}$ for some large ${C}$, where ${q^{1/C}\#}$ is the primorial of ${q^{1/C}}$, but I found the computations to be slightly easier if one works with a Selberg sieve, particularly if the sieve is not squared to make it nonnegative.)

If there is a Siegel zero ${L(\beta,\chi)=0}$ with ${\beta}$ close to ${1}$ and ${\chi}$ a Dirichlet character of conductor ${q}$, then multiplicative number theory methods can be used to show that the Möbius function ${\mu}$ “pretends” to be like the character ${\chi}$ in the sense that ${\mu(p) \approx \chi(p)}$ for “most” primes ${p}$ near ${q}$ (e.g. in the range ${q^\varepsilon \leq p \leq q^C}$ for some small ${\varepsilon>0}$ and large ${C>0}$). Traditionally, one uses complex-analytic methods to demonstrate this, but one can also use elementary multiplicative number theory methods to establish these results (qualitatively at least), as will be shown below the fold.

The fact that ${\mu}$ pretends to be like ${\chi}$ can be used to construct a tractable approximation (after inserting the sieve weight ${\nu}$) in the range ${[x,2x]}$ (where ${x = q^C}$ for some large ${C}$) for the second von Mangoldt function ${\Lambda_2}$, namely the function

$\displaystyle \tilde \Lambda_2(n) := (\chi * L)(n) = \sum_{d|n} \chi(d) \log^2 \frac{n}{d}.$

Roughly speaking, we think of the periodic function ${\chi}$ and the slowly varying function ${\log^2}$ as being of about the same “complexity” as the constant function ${1}$, so that ${\tilde \Lambda_2}$ is roughly of the same “complexity” as the divisor function

$\displaystyle \tau(n) := (1*1)(n) = \sum_{d|n} 1,$

which is considerably simpler to obtain asymptotics for than the von Mangoldt function as the Möbius function is no longer present. (For instance, note from the Dirichlet hyperbola method that one can estimate ${\sum_{x \leq n \leq 2x} \tau(n)}$ to accuracy ${O(\sqrt{x})}$ with little difficulty, whereas to obtain a comparable level of accuracy for ${\sum_{x \leq n \leq 2x} \Lambda(n)}$ or ${\sum_{x \leq n \leq 2x} \Lambda_2(n)}$ is essentially the Riemann hypothesis.)

One expects ${\tilde \Lambda_2(n)}$ to be a good approximant to ${\Lambda_2(n)}$ if ${n}$ is of size ${O(x)}$ and has no prime factors less than ${q^{1/C}}$ for some large constant ${C}$. The Selberg sieve ${\nu}$ will be mostly supported on numbers with no prime factor less than ${q^{1/C}}$. As such, one can hope to approximate (1) by the expression

$\displaystyle \sum_{x \leq n \leq 2x} \tilde \Lambda_2(n(n+2)) \nu(n(n+2)); \ \ \ \ \ (2)$

as it turns out, the error between this expression and (1) is easily controlled by sieve-theoretic techniques. Let us ignore the Selberg sieve for now and focus on the slightly simpler sum

$\displaystyle \sum_{x \leq n \leq 2x} \tilde \Lambda_2(n(n+2)).$

As discussed above, this sum should be thought of as a slightly more complicated version of the sum

$\displaystyle \sum_{x \leq n \leq 2x} \tau(n(n+2)). \ \ \ \ \ (3)$

Accordingly, let us look (somewhat informally) at the task of estimating the model sum (3). One can think of this problem as basically that of counting solutions to the equation ${ab+2=cd}$ with ${a,b,c,d}$ in various ranges; this is clearly related to understanding the equidistribution of the hyperbola ${\{ (a,b) \in {\bf Z}/d{\bf Z}: ab + 2 = 0 \hbox{ mod } d \}}$ in ${({\bf Z}/d{\bf Z})^2}$. Taking Fourier transforms, the latter problem is closely related to estimation of the Kloosterman sums

$\displaystyle \sum_{m \in ({\bf Z}/r{\bf Z})^\times} e( \frac{a_1 m + a_2 \overline{m}}{r} )$

where ${\overline{m}}$ denotes the inverse of ${m}$ in ${({\bf Z}/r{\bf Z})^\times}$. One can then use the Weil bound

$\displaystyle \sum_{m \in ({\bf Z}/r{\bf Z})^\times} e( \frac{am+b\overline{m}}{r} ) \ll r^{1/2 + o(1)} (a,b,r)^{1/2} \ \ \ \ \ (4)$

where ${(a,b,r)}$ is the greatest common divisor of ${a,b,r}$ (with the convention that this is equal to ${r}$ if ${a,b}$ vanish), and the ${o(1)}$ decays to zero as ${r \rightarrow \infty}$. The Weil bound yields good enough control on error terms to estimate (3), and as it turns out the same method also works to estimate (2) (provided that ${x=q^C}$ with ${C}$ large enough).

Actually one does not need the full strength of the Weil bound here; any power savings over the trivial bound of ${r}$ will do. In particular, it will suffice to use the weaker, but easier to prove, bounds of Kloosterman:

Lemma 2 (Kloosterman bound) One has

$\displaystyle \sum_{m \in ({\bf Z}/r{\bf Z})^\times} e( \frac{am+b\overline{m}}{r} ) \ll r^{3/4 + o(1)} (a,b,r)^{1/4} \ \ \ \ \ (5)$

whenever ${r \geq 1}$ and ${a,b}$ are coprime to ${r}$, where the ${o(1)}$ is with respect to the limit ${r \rightarrow \infty}$ (and is uniform in ${a,b}$).

Proof: Observe from change of variables that the Kloosterman sum ${\sum_{m \in ({\bf Z}/r{\bf Z})^\times} e( \frac{am+b\overline{m}}{r} )}$ is unchanged if one replaces ${(a,b)}$ with ${(\lambda a, \lambda^{-1} b)}$ for ${\lambda \in ({\bf Z}/d{\bf Z})^\times}$. For fixed ${a,b}$, the number of such pairs ${(\lambda a, \lambda^{-1} b)}$ is at least ${r^{1-o(1)} / (a,b,r)}$, thanks to the divisor bound. Thus it will suffice to establish the fourth moment bound

$\displaystyle \sum_{a,b \in {\bf Z}/r{\bf Z}} |\sum_{m \in ({\bf Z}/r{\bf Z})^\times} e\left( \frac{am+b\overline{m}}{r} \right)|^4 \ll d^{4+o(1)}.$

The left-hand side can be rearranged as

$\displaystyle \sum_{m_1,m_2,m_3,m_4 \in ({\bf Z}/r{\bf Z})^\times} \sum_{a,b \in {\bf Z}/d{\bf Z}}$

$\displaystyle e\left( \frac{a(m_1+m_2-m_3-m_4) + b(\overline{m_1}+\overline{m_2}-\overline{m_3}-\overline{m_4})}{r} \right)$

which by Fourier summation is equal to

$\displaystyle d^2 \# \{ (m_1,m_2,m_3,m_4) \in (({\bf Z}/r{\bf Z})^\times)^4:$

$\displaystyle m_1+m_2-m_3-m_4 = \frac{1}{m_1} + \frac{1}{m_2} - \frac{1}{m_3} - \frac{1}{m_4} = 0 \hbox{ mod } r \}.$

Observe from the quadratic formula and the divisor bound that each pair ${(x,y)\in ({\bf Z}/r{\bf Z})^2}$ has at most ${O(r^{o(1)})}$ solutions ${(m_1,m_2)}$ to the system of equations ${m_1+m_2=x; \frac{1}{m_1} + \frac{1}{m_2} = y}$. Hence the number of quadruples ${(m_1,m_2,m_3,m_4)}$ of the desired form is ${r^{2+o(1)}}$, and the claim follows. $\Box$

We will also need another easy case of the Weil bound to handle some other portions of (2):

Lemma 3 (Easy Weil bound) Let ${\chi}$ be a primitive real Dirichlet character of conductor ${q}$, and let ${a,b,c,d \in{\bf Z}/q{\bf Z}}$. Then

$\displaystyle \sum_{n \in {\bf Z}/q{\bf Z}} \chi(an+b) \chi(cn+d) \ll q^{o(1)} (ad-bc, q).$

Proof: As ${q}$ is the conductor of a primitive real Dirichlet character, ${q}$ is equal to ${2^j}$ times a squarefree odd number for some ${j \leq 3}$. By the Chinese remainder theorem, it thus suffices to establish the claim when ${q}$ is an odd prime. We may assume that ${ad-bc}$ is not divisible by this prime ${q}$, as the claim is trivial otherwise. If ${a}$ vanishes then ${c}$ does not vanish, and the claim follows from the mean zero nature of ${\chi}$; similarly if ${c}$ vanishes. Hence we may assume that ${a,c}$ do not vanish, and then we can normalise them to equal ${1}$. By completing the square it now suffices to show that

$\displaystyle \sum_{n \in {\bf Z}/p{\bf Z}} \chi( n^2 - b ) \ll 1$

whenever ${b \neq 0 \hbox{ mod } p}$. As ${\chi}$ is ${+1}$ on the quadratic residues and ${-1}$ on the non-residues, it now suffices to show that

$\displaystyle \# \{ (m,n) \in ({\bf Z}/p{\bf Z})^2: n^2 - b = m^2 \} = p + O(1).$

But by making the change of variables ${(x,y) = (n+m,n-m)}$, the left-hand side becomes ${\# \{ (x,y) \in ({\bf Z}/p{\bf Z})^2: xy=b\}}$, and the claim follows. $\Box$

While the basic strategy of Heath-Brown’s argument is relatively straightforward, implementing it requires a large amount of computation to control both main terms and error terms. I experimented for a while with rearranging the argument to try to reduce the amount of computation; I did not fully succeed in arriving at a satisfactorily minimal amount of superfluous calculation, but I was able to at least reduce this amount a bit, mostly by replacing a combinatorial sieve with a Selberg-type sieve (which was not needed to be positive, so I dispensed with the squaring aspect of the Selberg sieve to simplify the calculations a little further; also for minor reasons it was convenient to retain a tiny portion of the combinatorial sieve to eliminate extremely small primes). Also some modest reductions in complexity can be obtained by using the second von Mangoldt function ${\Lambda_2(n(n+2))}$ in place of ${\Lambda(n) \Lambda(n+2)}$. These exercises were primarily for my own benefit, but I am placing them here in case they are of interest to some other readers.

We now move away from the world of multiplicative prime number theory covered in Notes 1 and Notes 2, and enter the wider, and complementary, world of non-multiplicative prime number theory, in which one studies statistics related to non-multiplicative patterns, such as twins ${n,n+2}$. This creates a major jump in difficulty; for instance, even the most basic multiplicative result about the primes, namely Euclid’s theorem that there are infinitely many of them, remains unproven for twin primes. Of course, the situation is even worse for stronger results, such as Euler’s theorem, Dirichlet’s theorem, or the prime number theorem. Finally, even many multiplicative questions about the primes remain open. The most famous of these is the Riemann hypothesis, which gives the asymptotic ${\sum_{n \leq x} \Lambda(n) = x + O( \sqrt{x} \log^2 x )}$ (see Proposition 24 from Notes 2). But even if one assumes the Riemann hypothesis, the precise distribution of the error term ${O( \sqrt{x} \log^2 x )}$ in the above asymptotic (or in related asymptotics, such as for the sum ${\sum_{x \leq n < x+y} \Lambda(n)}$ that measures the distribution of primes in short intervals) is not entirely clear.

Despite this, we do have a number of extremely convincing and well supported models for the primes (and related objects) that let us predict what the answer to many prime number theory questions (both multiplicative and non-multiplicative) should be, particularly in asymptotic regimes where one can work with aggregate statistics about the primes, rather than with a small number of individual primes. These models are based on taking some statistical distribution related to the primes (e.g. the primality properties of a randomly selected ${k}$-tuple), and replacing that distribution by a model distribution that is easy to compute with (e.g. a distribution with strong joint independence properties). One can then predict the asymptotic value of various (normalised) statistics about the primes by replacing the relevant statistical distributions of the primes with their simplified models. In this non-rigorous setting, many difficult conjectures on the primes reduce to relatively simple calculations; for instance, all four of the (still unsolved) Landau problems may now be justified in the affirmative by one or more of these models. Indeed, the models are so effective at this task that analytic number theory is in the curious position of being able to confidently predict the answer to a large proportion of the open problems in the subject, whilst not possessing a clear way forward to rigorously confirm these answers!

As it turns out, the models for primes that have turned out to be the most accurate in practice are random models, which involve (either explicitly or implicitly) one or more random variables. This is despite the prime numbers being obviously deterministic in nature; no coins are flipped or dice rolled to create the set of primes. The point is that while the primes have a lot of obvious multiplicative structure (for instance, the product of two primes is never another prime), they do not appear to exhibit much discernible non-multiplicative structure asymptotically, in the sense that they rarely exhibit statistical anomalies in the asymptotic limit that cannot be easily explained in terms of the multiplicative properties of the primes. As such, when considering non-multiplicative statistics of the primes, the primes appear to behave pseudorandomly, and can thus be modeled with reasonable accuracy by a random model. And even for multiplicative problems, which are in principle controlled by the zeroes of the Riemann zeta function, one can obtain good predictions by positing various pseudorandomness properties of these zeroes, so that the distribution of these zeroes can be modeled by a random model.

Of course, one cannot expect perfect accuracy when replicating a deterministic set such as the primes by a probabilistic model of that set, and each of the heuristic models we discuss below have some limitations to the range of statistics about the primes that they can expect to track with reasonable accuracy. For instance, many of the models about the primes do not fully take into account the multiplicative structure of primes, such as the connection with a zeta function with a meromorphic continuation to the entire complex plane; at the opposite extreme, we have the GUE hypothesis which appears to accurately model the zeta function, but does not capture such basic properties of the primes as the fact that the primes are all natural numbers. Nevertheless, each of the models described below, when deployed within their sphere of reasonable application, has (possibly after some fine-tuning) given predictions that are in remarkable agreement with numerical computation and with known rigorous theoretical results, as well as with other models in overlapping spheres of application; they are also broadly compatible with the general heuristic (discussed in this previous post) that in the absence of any exploitable structure, asymptotic statistics should default to the most “uniform”, “pseudorandom”, or “independent” distribution allowable.

As hinted at above, we do not have a single unified model for the prime numbers (other than the primes themselves, of course), but instead have an overlapping family of useful models that each appear to accurately describe some, but not all, aspects of the prime numbers. In this set of notes, we will discuss four such models:

1. The Cramér random model and its refinements, which model the set ${{\mathcal P}}$ of prime numbers by a random set.
2. The Möbius pseudorandomness principle, which predicts that the Möbius function ${\mu}$ does not correlate with any genuinely different arithmetic sequence of reasonable “complexity”.
3. The equidistribution of residues principle, which predicts that the residue classes of a large number ${n}$ modulo a small or medium-sized prime ${p}$ behave as if they are independently and uniformly distributed as ${p}$ varies.
4. The GUE hypothesis, which asserts that the zeroes of the Riemann zeta function are distributed (at microscopic and mesoscopic scales) like the zeroes of a GUE random matrix, and which generalises the pair correlation conjecture regarding pairs of such zeroes.

This is not an exhaustive list of models for the primes and related objects; for instance, there is also the model in which the major arc contribution in the Hardy-Littlewood circle method is predicted to always dominate, and with regards to various finite groups of number-theoretic importance, such as the class groups discussed in Supplement 1, there are also heuristics of Cohen-Lenstra type. Historically, the first heuristic discussion of the primes along these lines was by Sylvester, who worked informally with a model somewhat related to the equidistribution of residues principle. However, we will not discuss any of these models here.

A word of warning: the discussion of the above four models will inevitably be largely informal, and “fuzzy” in nature. While one can certainly make precise formalisations of at least some aspects of these models, one should not be inflexibly wedded to a specific such formalisation as being “the” correct way to pin down the model rigorously. (To quote the statistician George Box: “all models are wrong, but some are useful”.) Indeed, we will see some examples below the fold in which some finer structure in the prime numbers leads to a correction term being added to a “naive” implementation of one of the above models to make it more accurate, and it is perfectly conceivable that some further such fine-tuning will be applied to one or more of these models in the future. These sorts of mathematical models are in some ways closer in nature to the scientific theories used to model the physical world, than they are to the axiomatic theories one is accustomed to in rigorous mathematics, and one should approach the discussion below accordingly. In particular, and in contrast to the other notes in this course, the material here is not directly used for proving further theorems, which is why we have marked it as “optional” material. Nevertheless, the heuristics and models here are still used indirectly for such purposes, for instance by

• giving a clearer indication of what results one expects to be true, thus guiding one to fruitful conjectures;
• providing a quick way to scan for possible errors in a mathematical claim (e.g. by finding that the main term is off from what a model predicts, or an error term is too small);
• gauging the relative strength of various assertions (e.g. classifying some results as “unsurprising”, others as “potential breakthroughs” or “powerful new estimates”, others as “unexpected new phenomena”, and yet others as “way too good to be true”); or
• setting up heuristic barriers (such as the parity barrier) that one has to resolve before resolving certain key problems (e.g. the twin prime conjecture).

See also my previous essay on the distinction between “rigorous” and “post-rigorous” mathematics, or Thurston’s essay discussing, among other things, the “definition-theorem-proof” model of mathematics and its limitations.

Remark 1 The material in this set of notes presumes some prior exposure to probability theory. See for instance this previous post for a quick review of the relevant concepts.

Define a partition of ${1}$ to be a finite or infinite multiset ${\Sigma}$ of real numbers in the interval ${I \in (0,1]}$ (that is, an unordered set of real numbers in ${I}$, possibly with multiplicity) whose total sum is ${1}$: ${\sum_{t \in \Sigma}t = 1}$. For instance, ${\{1/2,1/4,1/8,1/16,\ldots\}}$ is a partition of ${1}$. Such partitions arise naturally when trying to decompose a large object into smaller ones, for instance:

1. (Prime factorisation) Given a natural number ${n}$, one can decompose it into prime factors ${n = p_1 \ldots p_k}$ (counting multiplicity), and then the multiset

$\displaystyle \Sigma_{PF}(n) := \{ \frac{\log p_1}{\log n}, \ldots,\frac{\log p_k}{\log n} \}$

is a partition of ${1}$.

2. (Cycle decomposition) Given a permutation ${\sigma \in S_n}$ on ${n}$ labels ${\{1,\ldots,n\}}$, one can decompose ${\sigma}$ into cycles ${C_1,\ldots,C_k}$, and then the multiset

$\displaystyle \Sigma_{CD}(\sigma) := \{ \frac{|C_1|}{n}, \ldots, \frac{|C_k|}{n} \}$

is a partition of ${1}$.

3. (Normalisation) Given a multiset ${\Gamma}$ of positive real numbers whose sum ${S := \sum_{x\in \Gamma}x}$ is finite and non-zero, the multiset

$\displaystyle \Sigma_N( \Gamma) := \frac{1}{S} \cdot \Gamma = \{ \frac{x}{S}: x \in \Gamma \}$

is a partition of ${1}$.

In the spirit of the universality phenomenon, one can ask what is the natural distribution for what a “typical” partition should look like; thus one seeks a natural probability distribution on the space of all partitions, analogous to (say) the gaussian distributions on the real line, or GUE distributions on point processes on the line, and so forth. It turns out that there is one natural such distribution which is related to all three examples above, known as the Poisson-Dirichlet distribution. To describe this distribution, we first have to deal with the problem that it is not immediately obvious how to cleanly parameterise the space of partitions, given that the cardinality of the partition can be finite or infinite, that multiplicity is allowed, and that we would like to identify two partitions that are permutations of each other

One way to proceed is to random partition ${\Sigma}$ as a type of point process on the interval ${I}$, with the constraint that ${\sum_{x \in \Sigma} x = 1}$, in which case one can study statistics such as the counting functions

$\displaystyle N_{[a,b]} := |\Sigma \cap [a,b]| = \sum_{x \in\Sigma} 1_{[a,b]}(x)$

(where the cardinality here counts multiplicity). This can certainly be done, although in the case of the Poisson-Dirichlet process, the formulae for the joint distribution of such counting functions is moderately complicated. Another way to proceed is to order the elements of ${\Sigma}$ in decreasing order

$\displaystyle t_1 \geq t_2 \geq t_3 \geq \ldots \geq 0,$

with the convention that one pads the sequence ${t_n}$ by an infinite number of zeroes if ${\Sigma}$ is finite; this identifies the space of partitions with an infinite dimensional simplex

$\displaystyle \{ (t_1,t_2,\ldots) \in [0,1]^{\bf N}: t_1 \geq t_2 \geq \ldots; \sum_{n=1}^\infty t_n = 1 \}.$

However, it turns out that the process of ordering the elements is not “smooth” (basically because functions such as ${(x,y) \mapsto \max(x,y)}$ and ${(x,y) \mapsto \min(x,y)}$ are not smooth) and the formulae for the joint distribution in the case of the Poisson-Dirichlet process is again complicated.

It turns out that there is a better (or at least “smoother”) way to enumerate the elements ${u_1,(1-u_1)u_2,(1-u_1)(1-u_2)u_3,\ldots}$ of a partition ${\Sigma}$ than the ordered method, although it is random rather than deterministic. This procedure (which I learned from this paper of Donnelly and Grimmett) works as follows.

1. Given a partition ${\Sigma}$, let ${u_1}$ be an element of ${\Sigma}$ chosen at random, with each element ${t\in \Sigma}$ having a probability ${t}$ of being chosen as ${u_1}$ (so if ${t \in \Sigma}$ occurs with multiplicity ${m}$, the net probability that ${t}$ is chosen as ${u_1}$ is actually ${mt}$). Note that this is well-defined since the elements of ${\Sigma}$ sum to ${1}$.
2. Now suppose ${u_1}$ is chosen. If ${\Sigma \backslash \{u_1\}}$ is empty, we set ${u_2,u_3,\ldots}$ all equal to zero and stop. Otherwise, let ${u_2}$ be an element of ${\frac{1}{1-u_1} \cdot (\Sigma \backslash \{u_1\})}$ chosen at random, with each element ${t \in \frac{1}{1-u_1} \cdot (\Sigma \backslash \{u_1\})}$ having a probability ${t}$ of being chosen as ${u_2}$. (For instance, if ${u_1}$ occurred with some multiplicity ${m>1}$ in ${\Sigma}$, then ${u_2}$ can equal ${\frac{u_1}{1-u_1}}$ with probability ${(m-1)u_1/(1-u_1)}$.)
3. Now suppose ${u_1,u_2}$ are both chosen. If ${\Sigma \backslash \{u_1,u_2\}}$ is empty, we set ${u_3, u_4, \ldots}$ all equal to zero and stop. Otherwise, let ${u_3}$ be an element of ${\frac{1}{1-u_1-u_2} \cdot (\Sigma\backslash \{u_1,u_2\})}$, with ech element ${t \in \frac{1}{1-u_1-u_2} \cdot (\Sigma\backslash \{u_1,u_2\})}$ having a probability ${t}$ of being chosen as ${u_3}$.
4. We continue this process indefinitely to create elements ${u_1,u_2,u_3,\ldots \in [0,1]}$.

We denote the random sequence ${Enum(\Sigma) := (u_1,u_2,\ldots) \in [0,1]^{\bf N}}$ formed from a partition ${\Sigma}$ in the above manner as the random normalised enumeration of ${\Sigma}$; this is a random variable in the infinite unit cube ${[0,1]^{\bf N}}$, and can be defined recursively by the formula

$\displaystyle Enum(\Sigma) = (u_1, Enum(\frac{1}{1-u_1} \cdot (\Sigma\backslash \{u_1\})))$

with ${u_1}$ drawn randomly from ${\Sigma}$, with each element ${t \in \Sigma}$ chosen with probability ${t}$, except when ${\Sigma =\{1\}}$ in which case we instead have

$\displaystyle Enum(\{1\}) = (1, 0,0,\ldots).$

Note that one can recover ${\Sigma}$ from any of its random normalised enumerations ${Enum(\Sigma) := (u_1,u_2,\ldots)}$ by the formula

$\displaystyle \Sigma = \{ u_1, (1-u_1) u_2,(1-u_1)(1-u_2)u_3,\ldots\} \ \ \ \ \ (1)$

with the convention that one discards any zero elements on the right-hand side. Thus ${Enum}$ can be viewed as a (stochastic) parameterisation of the space of partitions by the unit cube ${[0,1]^{\bf N}}$, which is a simpler domain to work with than the infinite-dimensional simplex mentioned earlier.

Note that this random enumeration procedure can also be adapted to the three models described earlier:

1. Given a natural number ${n}$, one can randomly enumerate its prime factors ${n =p'_1 p'_2 \ldots p'_k}$ by letting each prime factor ${p}$ of ${n}$ be equal to ${p'_1}$ with probability ${\frac{\log p}{\log n}}$, then once ${p'_1}$ is chosen, let each remaining prime factor ${p}$ of ${n/p'_1}$ be equal to ${p'_2}$ with probability ${\frac{\log p}{\log n/p'_1}}$, and so forth.
2. Given a permutation ${\sigma\in S_n}$, one can randomly enumerate its cycles ${C'_1,\ldots,C'_k}$ by letting each cycle ${C}$ in ${\sigma}$ be equal to ${C'_1}$ with probability ${\frac{|C|}{n}}$, and once ${C'_1}$ is chosen, let each remaining cycle ${C}$ be equalto ${C'_2}$ with probability ${\frac{|C|}{n-|C'_1|}}$, and so forth. Alternatively, one traverse the elements of ${\{1,\ldots,n\}}$ in random order, then let ${C'_1}$ be the first cycle one encounters when performing this traversal, let ${C'_2}$ be the next cycle (not equal to ${C'_1}$ one encounters when performing this traversal, and so forth.
3. Given a multiset ${\Gamma}$ of positive real numbers whose sum ${S := \sum_{x\in\Gamma} x}$ is finite, we can randomly enumerate ${x'_1,x'_2,\ldots}$ the elements of this sequence by letting each ${x \in \Gamma}$ have a ${\frac{x}{S}}$ probability of being set equal to ${x'_1}$, and then once ${x'_1}$ is chosen, let each remaining ${x \in \Gamma\backslash \{x'_1\}}$ have a ${\frac{x_i}{S-x'_1}}$ probability of being set equal to ${x'_2}$, and so forth.

We then have the following result:

Proposition 1 (Existence of the Poisson-Dirichlet process) There exists a random partition ${\Sigma}$ whose random enumeration ${Enum(\Sigma) = (u_1,u_2,\ldots)}$ has the uniform distribution on ${[0,1]^{\bf N}}$, thus ${u_1,u_2,\ldots}$ are independently and identically distributed copies of the uniform distribution on ${[0,1]}$.

A random partition ${\Sigma}$ with this property will be called the Poisson-Dirichlet process. This process, first introduced by Kingman, can be described explicitly using (1) as

$\displaystyle \Sigma = \{ u_1, (1-u_1) u_2,(1-u_1)(1-u_2)u_3,\ldots\},$

where ${u_1,u_2,\ldots}$ are iid copies of the uniform distribution of ${[0,1]}$, although it is not immediately obvious from this definition that ${Enum(\Sigma)}$ is indeed uniformly distributed on ${[0,1]^{\bf N}}$. We prove this proposition below the fold.

An equivalent definition of a Poisson-Dirichlet process is a random partition ${\Sigma}$ with the property that

$\displaystyle (u_1, \frac{1}{1-u_1} \cdot (\Sigma \backslash \{u_1\})) \equiv (U, \Sigma) \ \ \ \ \ (2)$

where ${u_1}$ is a random element of ${\Sigma}$ with each ${t \in\Sigma}$ having a probability ${t}$ of being equal to ${u_1}$, ${U}$ is a uniform variable on ${[0,1]}$ that is independent of ${\Sigma}$, and ${\equiv}$ denotes equality of distribution. This can be viewed as a sort of stochastic self-similarity property of ${\Sigma}$: if one randomly removes one element from ${\Sigma}$ and rescales, one gets a new copy of ${\Sigma}$.

It turns out that each of the three ways to generate partitions listed above can lead to the Poisson-Dirichlet process, either directly or in a suitable limit. We begin with the third way, namely by normalising a Poisson process to have sum ${1}$:

Proposition 2 (Poisson-Dirichlet processes via Poisson processes) Let ${a>0}$, and let ${\Gamma_a}$ be a Poisson process on ${(0,+\infty)}$ with intensity function ${t \mapsto \frac{1}{t} e^{-at}}$. Then the sum ${S :=\sum_{x \in \Gamma_a} x}$ is almost surely finite, and the normalisation ${\Sigma_N(\Gamma_a) = \frac{1}{S} \cdot \Gamma_a}$ is a Poisson-Dirichlet process.

Again, we prove this proposition below the fold. Now we turn to the second way (a topic, incidentally, that was briefly touched upon in this previous blog post):

Proposition 3 (Large cycles of a typical permutation) For each natural number ${n}$, let ${\sigma}$ be a permutation drawn uniformly at random from ${S_n}$. Then the random partition ${\Sigma_{CD}(\sigma)}$ converges in the limit ${n \rightarrow\infty}$ to a Poisson-Dirichlet process ${\Sigma_{PF}}$ in the following sense: given any fixed sequence of intervals ${[a_1,b_1],\ldots,[a_k,b_k] \subset I}$ (independent of ${n}$), the joint discrete random variable ${(N_{[a_1,b_1]}(\Sigma_{CD}(\sigma)),\ldots,N_{[a_k,b_k]}(\Sigma_{CD}(\sigma)))}$ converges in distribution to ${(N_{[a_1,b_1]}(\Sigma),\ldots,N_{[a_k,b_k]}(\Sigma))}$.

Finally, we turn to the first way:

Proposition 4 (Large prime factors of a typical number) Let ${x > 0}$, and let ${N_x}$ be a random natural number chosen according to one of the following three rules:

1. (Uniform distribution) ${N_x}$ is drawn uniformly at random from the natural numbers in ${[1,x]}$.
2. (Shifted uniform distribution) ${N_x}$ is drawn uniformly at random from the natural numbers in ${[x,2x]}$.
3. (Zeta distribution) Each natural number ${n}$ has a probability ${\frac{1}{\zeta(s)}\frac{1}{n^s}}$ of being equal to ${N_x}$, where ${s := 1 +\frac{1}{\log x}}$and ${\zeta(s):=\sum_{n=1}^\infty \frac{1}{n^s}}$.

Then ${\Sigma_{PF}(N_x)}$ converges as ${x \rightarrow \infty}$ to a Poisson-Dirichlet process ${\Sigma}$ in the same fashion as in Proposition 3.

The process ${\Sigma_{PF}(N_x)}$ was first studied by Billingsley (and also later by Knuth-Trabb Pardo and by Vershik, but the formulae were initially rather complicated; the proposition above is due to of Donnelly and Grimmett, although the third case of the proposition is substantially easier and appears in the earlier work of Lloyd. We prove the proposition below the fold.

The previous two propositions suggests an interesting analogy between large random integers and large random permutations; see this ICM article of Vershik and this non-technical article of Granville (which, incidentally, was once adapted into a play) for further discussion.

As a sample application, consider the problem of estimating the number ${\pi(x,x^{1/u})}$ of integers up to ${x}$ which are not divisible by any prime larger than ${x^{1/u}}$ (i.e. they are ${x^{1/u}}$smooth), where ${u>0}$ is a fixed real number. This is essentially (modulo some inessential technicalities concerning the distinction between the intervals ${[x,2x]}$ and ${[1,x]}$) the probability that ${\Sigma}$ avoids ${[1/u,1]}$, which by the above theorem converges to the probability ${\rho(u)}$ that ${\Sigma_{PF}}$ avoids ${[1/u,1]}$. Below the fold we will show that this function is given by the Dickman function, defined by setting ${\rho(u)=1}$ for ${u < 1}$ and ${u\rho'(u) = \rho(u-1)}$ for ${u \geq 1}$, thus recovering the classical result of Dickman that ${\pi(x,x^{1/u}) = (\rho(u)+o(1))x}$.

I thank Andrew Granville and Anatoly Vershik for showing me the nice link between prime factors and the Poisson-Dirichlet process. The material here is standard, and (like many of the other notes on this blog) was primarily written for my own benefit, but it may be of interest to some readers. In preparing this article I found this exposition by Kingman to be helpful.

Note: this article will emphasise the computations rather than rigour, and in particular will rely on informal use of infinitesimals to avoid dealing with stochastic calculus or other technicalities. We adopt the convention that we will neglect higher order terms in infinitesimal calculations, e.g. if ${dt}$ is infinitesimal then we will abbreviate ${dt + o(dt)}$ simply as ${dt}$.

Tamar Ziegler and I have just uploaded to the arXiv our joint paper “A multi-dimensional Szemerédi theorem for the primes via a correspondence principle“. This paper is related to an earlier result of Ben Green and mine in which we established that the primes contain arbitrarily long arithmetic progressions. Actually, in that paper we proved a more general result:

Theorem 1 (Szemerédi’s theorem in the primes) Let ${A}$ be a subset of the primes ${{\mathcal P}}$ of positive relative density, thus ${\limsup_{N \rightarrow \infty} \frac{|A \cap [N]|}{|{\mathcal P} \cap [N]|} > 0}$. Then ${A}$ contains arbitrarily long arithmetic progressions.

This result was based in part on an earlier paper of Green that handled the case of progressions of length three. With the primes replaced by the integers, this is of course the famous theorem of Szemerédi.

Szemerédi’s theorem has now been generalised in many different directions. One of these is the multidimensional Szemerédi theorem of Furstenberg and Katznelson, who used ergodic-theoretic techniques to show that any dense subset of ${{\bf Z}^d}$ necessarily contained infinitely many constellations of any prescribed shape. Our main result is to relativise that theorem to the primes as well:

Theorem 2 (Multidimensional Szemerédi theorem in the primes) Let ${d \geq 1}$, and let ${A}$ be a subset of the ${d^{th}}$ Cartesian power ${{\mathcal P}^d}$ of the primes of positive relative density, thus

$\displaystyle \limsup_{N \rightarrow \infty} \frac{|A \cap [N]^d|}{|{\mathcal P}^d \cap [N]^d|} > 0.$

Then for any ${v_1,\ldots,v_k \in {\bf Z}^d}$, ${A}$ contains infinitely many “constellations” of the form ${a+r v_1, \ldots, a + rv_k}$ with ${a \in {\bf Z}^k}$ and ${r}$ a positive integer.

In the case when ${A}$ is itself a Cartesian product of one-dimensional sets (in particular, if ${A}$ is all of ${{\mathcal P}^d}$), this result already follows from Theorem 1, but there does not seem to be a similarly easy argument to deduce the general case of Theorem 2 from previous results. Simultaneously with this paper, an independent proof of Theorem 2 using a somewhat different method has been established by Cook, Maygar, and Titichetrakun.

The result is reminiscent of an earlier result of mine on finding constellations in the Gaussian primes (or dense subsets thereof). That paper followed closely the arguments of my original paper with Ben Green, namely it first enclosed (a W-tricked version of) the primes or Gaussian primes (in a sieve theoretic-sense) by a slightly larger set (or more precisely, a weight function ${\nu}$) of almost primes or almost Gaussian primes, which one could then verify (using methods closely related to the sieve-theoretic methods in the ongoing Polymath8 project) to obey certain pseudorandomness conditions, known as the linear forms condition and the correlation condition. Very roughly speaking, these conditions assert statements of the following form: if ${n}$ is a randomly selected integer, then the events of ${n+h_1,\ldots,n+h_k}$ simultaneously being an almost prime (or almost Gaussian prime) are approximately independent for most choices of ${h_1,\ldots,h_k}$. Once these conditions are satisfied, one can then run a transference argument (initially based on ergodic-theory methods, but nowadays there are simpler transference results based on the Hahn-Banach theorem, due to Gowers and Reingold-Trevisan-Tulsiani-Vadhan) to obtain relative Szemerédi-type theorems from their absolute counterparts.

However, when one tries to adapt these arguments to sets such as ${{\mathcal P}^2}$, a new difficulty occurs: the natural analogue of the almost primes would be the Cartesian square ${{\mathcal A}^2}$ of the almost primes – pairs ${(n,m)}$ whose entries are both almost primes. (Actually, for technical reasons, one does not work directly with a set of almost primes, but would instead work with a weight function such as ${\nu(n) \nu(m)}$ that is concentrated on a set such as ${{\mathcal A}^2}$, but let me ignore this distinction for now.) However, this set ${{\mathcal A}^2}$ does not enjoy as many pseudorandomness conditions as one would need for a direct application of the transference strategy to work. More specifically, given any fixed ${h, k}$, and random ${(n,m)}$, the four events

$\displaystyle (n,m) \in {\mathcal A}^2$

$\displaystyle (n+h,m) \in {\mathcal A}^2$

$\displaystyle (n,m+k) \in {\mathcal A}^2$

$\displaystyle (n+h,m+k) \in {\mathcal A}^2$

do not behave independently (as they would if ${{\mathcal A}^2}$ were replaced for instance by the Gaussian almost primes), because any three of these events imply the fourth. This blocks the transference strategy for constellations which contain some right-angles to them (e.g. constellations of the form ${(n,m), (n+r,m), (n,m+r)}$) as such constellations soon turn into rectangles such as the one above after applying Cauchy-Schwarz a few times. (But a few years ago, Cook and Magyar showed that if one restricted attention to constellations which were in general position in the sense that any coordinate hyperplane contained at most one element in the constellation, then this obstruction does not occur and one can establish Theorem 2 in this case through the transference argument.) It’s worth noting that very recently, Conlon, Fox, and Zhao have succeeded in removing of the pseudorandomness conditions (namely the correlation condition) from the transference principle, leaving only the linear forms condition as the remaining pseudorandomness condition to be verified, but unfortunately this does not completely solve the above problem because the linear forms condition also fails for ${{\mathcal A}^2}$ (or for weights concentrated on ${{\mathcal A}^2}$) when applied to rectangular patterns.

There are now two ways known to get around this problem and establish Theorem 2 in full generality. The approach of Cook, Magyar, and Titichetrakun proceeds by starting with one of the known proofs of the multidimensional Szemerédi theorem – namely, the proof that proceeds through hypergraph regularity and hypergraph removal – and attach pseudorandom weights directly within the proof itself, rather than trying to add the weights to the result of that proof through a transference argument. (A key technical issue is that weights have to be added to all the levels of the hypergraph – not just the vertices and top-order edges – in order to circumvent the failure of naive pseudorandomness.) As one has to modify the entire proof of the multidimensional Szemerédi theorem, rather than use that theorem as a black box, the Cook-Magyar-Titichetrakun argument is lengthier than ours; on the other hand, it is more general and does not rely on some difficult theorems about primes that are used in our paper.

In our approach, we continue to use the multidimensional Szemerédi theorem (or more precisely, the equivalent theorem of Furstenberg and Katznelson concerning multiple recurrence for commuting shifts) as a black box. The difference is that instead of using a transference principle to connect the relative multidimensional Szemerédi theorem we need to the multiple recurrence theorem, we instead proceed by a version of the Furstenberg correspondence principle, similar to the one that connects the absolute multidimensional Szemerédi theorem to the multiple recurrence theorem. I had discovered this approach many years ago in an unpublished note, but had abandoned it because it required an infinite number of linear forms conditions (in contrast to the transference technique, which only needed a finite number of linear forms conditions and (until the recent work of Conlon-Fox-Zhao) a correlation condition). The reason for this infinite number of conditions is that the correspondence principle has to build a probability measure on an entire ${\sigma}$-algebra; for this, it is not enough to specify the measure ${\mu(A)}$ of a single set such as ${A}$, but one also has to specify the measure ${\mu( T^{n_1} A \cap \ldots \cap T^{n_m} A)}$ of “cylinder sets” such as ${T^{n_1} A \cap \ldots \cap T^{n_m} A}$ where ${m}$ could be arbitrarily large. The larger ${m}$ gets, the more linear forms conditions one needs to keep the correspondence under control.

With the sieve weights ${\nu}$ we were using at the time, standard sieve theory methods could indeed provide a finite number of linear forms conditions, but not an infinite number, so my idea was abandoned. However, with my later work with Green and Ziegler on linear equations in primes (and related work on the Mobius-nilsequences conjecture and the inverse conjecture on the Gowers norm), Tamar and I realised that the primes themselves obey an infinite number of linear forms conditions, so one can basically use the primes (or a proxy for the primes, such as the von Mangoldt function ${\Lambda}$) as the enveloping sieve weight, rather than a classical sieve. Thus my old idea of using the Furstenberg correspondence principle to transfer Szemerédi-type theorems to the primes could actually be realised. In the one-dimensional case, this simply produces a much more complicated proof of Theorem 1 than the existing one; but it turns out that the argument works as well in higher dimensions and yields Theorem 2 relatively painlessly, except for the fact that it needs the results on linear equations in primes, the known proofs of which are extremely lengthy (and also require some of the transference machinery mentioned earlier). The problem of correlations in rectangles is avoided in the correspondence principle approach because one can compensate for such correlations by performing a suitable weighted limit to compute the measure ${\mu( T^{n_1} A \cap \ldots \cap T^{n_m} A)}$ of cylinder sets, with each ${m}$ requiring a different weighted correction. (This may be related to the Cook-Magyar-Titichetrakun strategy of weighting all of the facets of the hypergraph in order to recover pseudorandomness, although our contexts are rather different.)

One of the most basic methods in additive number theory is the Hardy-Littlewood circle method. This method is based on expressing a quantity of interest to additive number theory, such as the number of representations ${f_3(x)}$ of an integer ${x}$ as the sum of three primes ${x = p_1+p_2+p_3}$, as a Fourier-analytic integral over the unit circle ${{\bf R}/{\bf Z}}$ involving exponential sums such as

$\displaystyle S(x,\alpha) := \sum_{p \leq x} e( \alpha p) \ \ \ \ \ (1)$

where the sum here ranges over all primes up to ${x}$, and ${e(x) := e^{2\pi i x}}$. For instance, the expression ${f(x)}$ mentioned earlier can be written as

$\displaystyle f_3(x) = \int_{{\bf R}/{\bf Z}} S(x,\alpha)^3 e(-x\alpha)\ d\alpha. \ \ \ \ \ (2)$

The strategy is then to obtain sufficiently accurate bounds on exponential sums such as ${S(x,\alpha)}$ in order to obtain non-trivial bounds on quantities such as ${f_3(x)}$. For instance, if one can show that ${f_3(x)>0}$ for all odd integers ${x}$ greater than some given threshold ${x_0}$, this implies that all odd integers greater than ${x_0}$ are expressible as the sum of three primes, thus establishing all but finitely many instances of the odd Goldbach conjecture.

Remark 1 In practice, it can be more efficient to work with smoother sums than the partial sum (1), for instance by replacing the cutoff ${p \leq x}$ with a smoother cutoff ${\chi(p/x)}$ for a suitable choice of cutoff function ${\chi}$, or by replacing the restriction of the summation to primes by a more analytically tractable weight, such as the von Mangoldt function ${\Lambda(n)}$. However, these improvements to the circle method are primarily technical in nature and do not have much impact on the heuristic discussion in this post, so we will not emphasise them here. One can also certainly use the circle method to study additive combinations of numbers from other sets than the set of primes, but we will restrict attention to additive combinations of primes for sake of discussion, as it is historically one of the most studied sets in additive number theory.

In many cases, it turns out that one can get fairly precise evaluations on sums such as ${S(x,\alpha)}$ in the major arc case, when ${\alpha}$ is close to a rational number ${a/q}$ with small denominator ${q}$, by using tools such as the prime number theorem in arithmetic progressions. For instance, the prime number theorem itself tells us that

$\displaystyle S(x,0) \approx \frac{x}{\log x}$

and the prime number theorem in residue classes modulo ${q}$ suggests more generally that

$\displaystyle S(x,\frac{a}{q}) \approx \frac{\mu(q)}{\phi(q)} \frac{x}{\log x}$

when ${q}$ is small and ${a}$ is close to ${q}$, basically thanks to the elementary calculation that the phase ${e(an/q)}$ has an average value of ${\mu(q)/\phi(q)}$ when ${n}$ is uniformly distributed amongst the residue classes modulo ${q}$ that are coprime to ${q}$. Quantifying the precise error in these approximations can be quite challenging, though, unless one assumes powerful hypotheses such as the Generalised Riemann Hypothesis.

In the minor arc case when ${\alpha}$ is not close to a rational ${a/q}$ with small denominator, one no longer expects to have such precise control on the value of ${S(x,\alpha)}$, due to the “pseudorandom” fluctuations of the quantity ${e(\alpha p)}$. Using the standard probabilistic heuristic (supported by results such as the central limit theorem or Chernoff’s inequality) that the sum of ${k}$ “pseudorandom” phases should fluctuate randomly and be of typical magnitude ${\sim \sqrt{k}}$, one expects upper bounds of the shape

$\displaystyle |S(x,\alpha)| \lessapprox \sqrt{\frac{x}{\log x}} \ \ \ \ \ (3)$

for “typical” minor arc ${\alpha}$. Indeed, a simple application of the Plancherel identity, followed by the prime number theorem, reveals that

$\displaystyle \int_{{\bf R}/{\bf Z}} |S(x,\alpha)|^2\ d\alpha \sim \frac{x}{\log x} \ \ \ \ \ (4)$

which is consistent with (though weaker than) the above heuristic. In practice, though, we are unable to rigorously establish bounds anywhere near as strong as (3); upper bounds such as ${x^{4/5+o(1)}}$ are far more typical.

Because one only expects to have upper bounds on ${|S(x,\alpha)|}$, rather than asymptotics, in the minor arc case, one cannot realistically hope to make much use of phases such as ${e(-x\alpha)}$ for the minor arc contribution to integrals such as (2) (at least if one is working with a single, deterministic, value of ${x}$, so that averaging in ${x}$ is unavailable). In particular, from upper bound information alone, it is difficult to avoid the “conspiracy” that the magnitude ${|S(x,\alpha)|^3}$ oscillates in sympathetic resonance with the phase ${e(-x\alpha)}$, thus essentially eliminating almost all of the possible gain in the bounds that could arise from exploiting cancellation from that phase. Thus, one basically has little option except to use the triangle inequality to control the portion of the integral on the minor arc region ${\Omega_{minor}}$:

$\displaystyle |\int_{\Omega_{minor}} S(x,\alpha)^3 e(-x\alpha)\ d\alpha| \leq \int_{\Omega_{minor}} |S(x,\alpha)|^3\ d\alpha.$

Despite this handicap, though, it is still possible to get enough bounds on both the major and minor arc contributions of integrals such as (2) to obtain non-trivial lower bounds on quantities such as ${f(x)}$, at least when ${x}$ is large. In particular, this sort of method can be developed to give a proof of Vinogradov’s famous theorem that every sufficiently large odd integer ${x}$ is the sum of three primes; my own result that all odd numbers greater than ${1}$ can be expressed as the sum of at most five primes is also proven by essentially the same method (modulo a number of minor refinements, and taking advantage of some numerical work on both the Goldbach problems and on the Riemann hypothesis ). It is certainly conceivable that some further variant of the circle method (again combined with a suitable amount of numerical work, such as that of numerically establishing zero-free regions for the Generalised Riemann Hypothesis) can be used to settle the full odd Goldbach conjecture; indeed, under the assumption of the Generalised Riemann Hypothesis, this was already achieved by Deshouillers, Effinger, te Riele, and Zinoviev back in 1997. I am optimistic that an unconditional version of this result will be possible within a few years or so, though I should say that there are still significant technical challenges to doing so, and some clever new ideas will probably be needed to get either the Vinogradov-style argument or numerical verification to work unconditionally for the three-primes problem at medium-sized ranges of ${x}$, such as ${x \sim 10^{50}}$. (But the intermediate problem of representing all even natural numbers as the sum of at most four primes looks somewhat closer to being feasible, though even this would require some substantially new and non-trivial ideas beyond what is in my five-primes paper.)

However, I (and many other analytic number theorists) are considerably more skeptical that the circle method can be applied to the even Goldbach problem of representing a large even number ${x}$ as the sum ${x = p_1 + p_2}$ of two primes, or the similar (and marginally simpler) twin prime conjecture of finding infinitely many pairs of twin primes, i.e. finding infinitely many representations ${2 = p_1 - p_2}$ of ${2}$ as the difference of two primes. At first glance, the situation looks tantalisingly similar to that of the Vinogradov theorem: to settle the even Goldbach problem for large ${x}$, one has to find a non-trivial lower bound for the quantity

$\displaystyle f_2(x) = \int_{{\bf R}/{\bf Z}} S(x,\alpha)^2 e(-x\alpha)\ d\alpha \ \ \ \ \ (5)$

for sufficiently large ${x}$, as this quantity ${f_2(x)}$ is also the number of ways to represent ${x}$ as the sum ${x=p_1+p_2}$ of two primes ${p_1,p_2}$. Similarly, to settle the twin prime problem, it would suffice to obtain a lower bound for the quantity

$\displaystyle \tilde f_2(x) = \int_{{\bf R}/{\bf Z}} |S(x,\alpha)|^2 e(-2\alpha)\ d\alpha \ \ \ \ \ (6)$

that goes to infinity as ${x \rightarrow \infty}$, as this quantity ${\tilde f_2(x)}$ is also the number of ways to represent ${2}$ as the difference ${2 = p_1-p_2}$ of two primes less than or equal to ${x}$.

In principle, one can achieve either of these two objectives by a sufficiently fine level of control on the exponential sums ${S(x,\alpha)}$. Indeed, there is a trivial (and uninteresting) way to take any (hypothetical) solution of either the asymptotic even Goldbach problem or the twin prime problem and (artificially) convert it to a proof that “uses the circle method”; one simply begins with the quantity ${f_2(x)}$ or ${\tilde f_2(x)}$, expresses it in terms of ${S(x,\alpha)}$ using (5) or (6), and then uses (5) or (6) again to convert these integrals back into a the combinatorial expression of counting solutions to ${x=p_1+p_2}$ or ${2=p_1-p_2}$, and then uses the hypothetical solution to the given problem to obtain the required lower bounds on ${f_2(x)}$ or ${\tilde f_2(x)}$.

Of course, this would not qualify as a genuine application of the circle method by any reasonable measure. One can then ask the more refined question of whether one could hope to get non-trivial lower bounds on ${f_2(x)}$ or ${\tilde f_2(x)}$ (or similar quantities) purely from the upper and lower bounds on ${S(x,\alpha)}$ or similar quantities (and of various ${L^p}$ type norms on such quantities, such as the ${L^2}$ bound (4)). Of course, we do not yet know what the strongest possible upper and lower bounds in ${S(x,\alpha)}$ are yet (otherwise we would already have made progress on major conjectures such as the Riemann hypothesis); but we can make plausible heuristic conjectures on such bounds. And this is enough to make the following heuristic conclusions:

• (i) For “binary” problems such as computing (5), (6), the contribution of the minor arcs potentially dominates that of the major arcs (if all one is given about the minor arc sums is magnitude information), in contrast to “ternary” problems such as computing (2), in which it is the major arc contribution which is absolutely dominant.
• (ii) Upper and lower bounds on the magnitude of ${S(x,\alpha)}$ are not sufficient, by themselves, to obtain non-trivial bounds on (5), (6) unless these bounds are extremely tight (within a relative error of ${O(1/\log x)}$ or better); but
• (iii) obtaining such tight bounds is a problem of comparable difficulty to the original binary problems.

I will provide some justification for these conclusions below the fold; they are reasonably well known “folklore” to many researchers in the field, but it seems that they are rarely made explicit in the literature (in part because these arguments are, by their nature, heuristic instead of rigorous) and I have been asked about them from time to time, so I decided to try to write them down here.

In view of the above conclusions, it seems that the best one can hope to do by using the circle method for the twin prime or even Goldbach problems is to reformulate such problems into a statement of roughly comparable difficulty to the original problem, even if one assumes powerful conjectures such as the Generalised Riemann Hypothesis (which lets one make very precise control on major arc exponential sums, but not on minor arc ones). These are not rigorous conclusions – after all, we have already seen that one can always artifically insert the circle method into any viable approach on these problems – but they do strongly suggest that one needs a method other than the circle method in order to fully solve either of these two problems. I do not know what such a method would be, though I can give some heuristic objections to some of the other popular methods used in additive number theory (such as sieve methods, or more recently the use of inverse theorems); this will be done at the end of this post.

I’ve uploaded to the arXiv the polymath research paper “Deterministic methods to find primes“, which is the outcome of the Polymath4 collaborative mathematics project, and has been submitted to Mathematics of Computation.

The objective of this paper was to find fast deterministic algorithms to solve the following problem:

Given a (large) integer x, find a prime p larger than x.

Thanks to the AKS algorithm, a number of size O(x) can be deterministically tested for primality in time $O( \log^{O(1)} x ) = O( x^{o(1)} )$.   By Bertrand’s postulate, there is always at least one prime between x and 2x; by testing each one of these integers in turn for primality, one can thus obtain a deterministic algorithm to find primes in time $O( x^{1+o(1)})$.

But one should be able to do much better.  For comparison, if probabilistic algorithms are allowed, then by randomly selecting integers between x and 2x to test for primality, it is easy to see from the prime number theorem that one will succeed in obtaining a prime with high probability in time $O( \log^{O(1)} x ) = O( x^{o(1)} )$.  However, after some effort we were not able to “derandomise” this algorithm to create any reasonable deterministic counterpart.  Nevertheless, we conjecture that a deterministic algorithm with run time $O( x^{o(1)})$ exists.  Such algorithms can be easily obtained if one assumes some standard conjectures regarding the primes (e.g. Cramer’s conjecture on prime gaps), but we do not know of any deterministic algorithms which can be unconditionally proved to run in time $O( x^{o(1)})$.

Currently, the best known deterministic algorithm is due to Lagarias and Odlyzko, and has a run time of $O( x^{1/2+o(1)})$.  Roughly speaking, it is based on the ability to compute the prime counting function $\pi(x)$ in time $O( x^{1/2+o(1)} )$; once one has this function, one can detect which intervals contain primes or not, and then starting from Bertrand’s postulate and performing a binary search one can then locate a prime.  The Lagarias-Odlyzko argument is based on approximating $\pi(x)$ by a certain integral of the Riemann zeta function, which one then approximates in turn by quadrature.

We conjecture that one should be able to compute $\pi(x)$ in faster time, and in particular in time $O(x^{1/2-c+o(1)})$ for some $c>0$.  Unfortunately, we were not able to achieve this; however, we do have a non-trivial method to compute the parity $\pi(x) \hbox{ mod } 2$ of $\pi(x)$ in such a time; a bit more generally (and oversimplifying a little bit), we can compute various projections $\sum_{p \leq x} t^p \hbox{ mod } 2, g(t)$ of the prime polynomial $\sum_{p \leq x} t^p$ modulo some small polynomials g.  This seems close to being able to achieve the goal of detecting whether primes exist in a given interval, but unfortunately some additional work seems to be needed in that area.  Still, the methods used to get the parity seem to be interesting (involving the Dirichlet hyperbola method, a piecewise approximation of the hyperbola, and the Strassen fast multiplication algorithm), and these techniques might be useful for some other problems.

Roughly speaking, the idea to compute the parity of $\pi(x) \hbox{ mod } 2$ is as follows.  The first observation is that, for square-free n, the number $\tau(n)$ of divisors of n is equal to 2 when n is a prime, and a multiple of 4 otherwise.  So to compute the parity of $\pi(x)$, it suffices to compute $\sum_{n \leq x} \tau(n)$ modulo 4 (or more precisely, the restriction of this sum to squarefree n).

The restriction to squarefree numbers turns out to be relatively easy to handle, so we ignore it.  Since $\tau(n) = \sum_{a,b: ab=n} 1$, we can rewrite

$\sum_{n \leq x} \tau(n) = \sum_{a,b: ab \leq x} 1$.

So it suffices to find an efficient way to count the number of lattice points below the hyperbola $\{ (a,b): ab = x \}$.

The classic way to do this is via the Dirichlet hyperbola method.  This lets one rewrite the above expression as

$2 \sum_{a \leq \sqrt{x}} \lfloor \frac{x}{a} \rfloor - \lfloor \sqrt{x}\rfloor^2$

(assuming $x$ is not a perfect square, where $\lfloor x \rfloor$ is the greatest integer function).  One can then compute this quantity in time $O(x^{1/2+o(1)})$ without difficulty.

To improve this to $O(x^{1/2-c+o(1)})$, we used some ideas of Vinogradov, based on using Diophantine approximation to find moderately long arithmetic progressions on which the map $a \mapsto \lfloor \frac{x}{a} \rfloor$ is linear, and thus easily summable.

The same method extends to the polynomial setting, though now, instead of summing an arithmetic progression, one has to find an efficient way to sum quadratic sums such as $\sum_{n=1}^N t^{an^2+bn+c}$.  This turns out to be possible by expressing this sum as a matrix product and then using fast matrix multiplication algorithms.

There is still some scope to improve the results and methods further; Ernie Croot is pursuing this with two undergraduate students, David Hollis and David Lowry,

In this, the final lecture notes of this course, we discuss one of the motivating applications of the theory developed thus far, namely to count solutions to linear equations in primes ${{\mathcal P} = \{2,3,5,7,\ldots\}}$ (or in dense subsets ${A}$ of primes ${{\mathcal P}}$). Unfortunately, the most famous linear equations in primes: the twin prime equation ${p_2 - p_1 = 2}$ and the even Goldbach equation ${p_1+p_2=N}$ – remain out of reach of this technology (because the relevant affine linear forms involved are commensurate, and thus have infinite complexity with respect to the Gowers norms), but most other systems of equations, in particular that of arithmetic progressions ${p_i = n+ir}$ for ${i=0,\ldots,k-1}$ (or equivalently, ${p_i + p_{i+2} = 2p_{i+1}}$ for ${i=0,\ldots,k-2}$) , as well as the odd Goldbach equation ${p_1+p_2+p_3=N}$, are tractable.

To illustrate the main ideas, we will focus on the following result of Green:

Theorem 1 (Roth’s theorem in the primes) Let ${A \subset {\mathcal P}}$ be a subset of primes whose upper density ${\limsup_{N \rightarrow \infty} |A \cap [N]|/|{\mathcal P} \cap [N]|}$ is positive. Then ${A}$ contains infinitely many arithmetic progressions of length three.

This should be compared with Roth’s theorem in the integers (Notes 2), which is the same statement but with the primes ${{\mathcal P}}$ replaced by the integers ${{\bf Z}}$ (or natural numbers ${{\bf N}}$). Indeed, Roth’s theorem for the primes is proven by transferring Roth’s theorem for the integers to the prime setting; the latter theorem is used as a “black box”. The key difficulty here in performing this transference is that the primes have zero density inside the integers; indeed, from the prime number theorem we have ${|{\mathcal P} \cap [N]| = (1+o(1)) \frac{N}{\log N} = o(N)}$.

There are a number of generalisations of this transference technique. In a paper of Green and myself, we extended the above theorem to progressions of longer length (thus transferring Szemerédi’s theorem to the primes). In a series of papers (culminating in a paper to appear shortly) of Green, myself, and also Ziegler, related methods are also used to obtain an asymptotic for the number of solutions in the primes to any system of linear equations of bounded complexity. This latter result uses the full power of higher order Fourier analysis, in particular relying heavily on the inverse conjecture for the Gowers norms; in contrast, Roth’s theorem and Szemerédi’s theorem in the primes are “softer” results that do not need this conjecture.

To transfer results from the integers to the primes, there are three basic steps:

• A general transference principle, that transfers certain types of additive combinatorial results from dense subsets of the integers to dense subsets of a suitably “pseudorandom set” of integers (or more precisely, to the integers weighted by a suitably “pseudorandom measure”);
• An application of sieve theory to show that the primes (or more precisely, an affine modification of the primes) lie inside a suitably pseudorandom set of integers (or more precisely, have significant mass with respect to a suitably pseudorandom measure).
• If one is seeking asymptotics for patterns in the primes, and not simply lower bounds, one also needs to control correlations between the primes (or proxies for the primes, such as the Möbius function) with various objects that arise from higher order Fourier analysis, such as nilsequences.

The former step can be accomplished in a number of ways. For progressions of length three (and more generally, for controlling linear patterns of complexity at most one), transference can be accomplished by Fourier-analytic methods. For more complicated patterns, one can use techniques inspired by ergodic theory; more recently, simplified and more efficient methods based on duality (the Hahn-Banach theorem) have also been used. No number theory is used in this step. (In the case of transference to genuinely random sets, rather than pseudorandom sets, similar ideas appeared earlier in the graph theory setting, see this paper of Kohayakawa, Luczak, and Rodl.

The second step is accomplished by fairly standard sieve theory methods (e.g. the Selberg sieve, or the slight variants of this sieve used by Goldston and Yildirim). Remarkably, very little of the formidable apparatus of modern analytic number theory is needed for this step; for instance, the only fact about the Riemann zeta function that is truly needed is that it has a simple pole at ${s=1}$, and no knowledge of L-functions is needed.

The third step does draw more significantly on analytic number theory techniques and results (most notably, the method of Vinogradov to compute oscillatory sums over the primes, and also the Siegel-Walfisz theorem that gives a good error term on the prime number theorem in arithemtic progressions). As these techniques are somewhat orthogonal to the main topic of this course, we shall only touch briefly on this aspect of the transference strategy.

This week I am in Bremen, where the 50th International Mathematical Olympiad is being held.  A number of former Olympians (Béla Bollobás, Tim Gowers, Laci Lovasz, Stas Smirnov, Jean-Christophe Yoccoz, and myself) were invited to give a short talk (20 minutes in length) at the celebratory event for this anniversary.  I chose to talk on a topic I have spoken about several times before, on “Structure and randomness in the prime numbers“.  Given the time constraints, there was a limit as to how much substance I could put into the talk; but I try to describe, in very general terms, what we know about the primes, and what we suspect to be true, but cannot yet establish.  As I have mentioned in previous talks, the key problem is that we suspect the distribution of the primes to obey no significant patterns (other than “local” structure, such as having a strong tendency to be odd (which is local information at the 2 place), or obeying the prime number theorem (which is local information at the infinity place)), but we still do not have fully satisfactory tools for establishing the absence of a pattern. (This is in contrast with many types of Olympiad problems, where the key to solving a problem often lies in discovering the right pattern or structure in the problem to exploit.)

The PDF of the talk is here; I decided to try out the Beamer LaTeX package for a change.

This week I am at Penn State University, giving this year’s Marker lectures.  My chosen theme for my four lectures here is “recent developments in additive prime number theory”.  My first lecture, “Long arithmetic progressions in primes”, is similar to my AMS lecture on the same topic and so I am not reposting it here.  The second lecture, the notes for which begin after the fold, is on “Linear equations in primes”.  These two lectures focus primarily on work of myself and Ben Green.  The third and fourth lectures, entitled “Small gaps between primes” and “Sieving for almost primes and expander graphs”, will instead be focused on the work of Goldston-Yildirim-Pintz and Bourgain-Gamburd-Sarnak respectively.
Read the rest of this entry »

This week I am in San Diego for the annual joint mathematics meeting of the American Mathematical Society and the Mathematical Association of America. I am giving two talks here. One is a lecture (for the AMS “Current Events” Bulletin) on recent developments (by Martel-Merle, Merle-Raphael, and others) on stability of solitons; I will post on that lecture at some point in the near future, once the survey paper associated to that lecture is finalised.
The other, which I am presenting here, is an address on “structure and randomness in the prime numbers“. Of course, I’ve talked about this general topic many times before, (e.g. at my Simons lecture at MIT, my Milliman lecture at U. Washington, and my Science Research Colloquium at UCLA), and I have given similar talks to the one here – which focuses on my original 2004 paper with Ben Green on long arithmetic progressions in the primes – about a dozen or so times. As such, this particular talk has probably run its course, and so I am “retiring” it by posting it here.

p.s. At this meeting, Endre Szemerédi was awarded the 2008 Steele prize for a seminal contribution to research, for his landmark paper establishing what is now known as Szemerédi’s theorem, which underlies the result I discuss in this talk. This prize is richly deserved – congratulations Endre! [The AMS and MAA also awarded prizes to several dozen other mathematicians, including many mentioned previously on this blog; rather than list them all here, let me just point you to their prize booklet.]