You are currently browsing the category archive for the ‘math.NT’ category.

I’ve just uploaded to the arXiv the D.H.J. Polymath paper “Variants of the Selberg sieve, and bounded intervals containing many primes“, which is the second paper to be produced from the Polymath8 project (the first one being discussed here). We’ll refer to this latter paper here as the Polymath8b paper, and the former as the Polymath8a paper. As with Polymath8a, the Polymath8b paper is concerned with the smallest asymptotic prime gap

$\displaystyle H_1 := \liminf_{n \rightarrow \infty}(p_{n+1}-p_n),$

where ${p_n}$ denotes the ${n^{th}}$ prime, as well as the more general quantities

$\displaystyle H_m := \liminf_{n \rightarrow \infty}(p_{n+m}-p_n).$

In the breakthrough paper of Goldston, Pintz, and Yildirim, the bound ${H_1 \leq 16}$ was obtained under the strong hypothesis of the Elliott-Halberstam conjecture. An unconditional bound on ${H_1}$, however, remained elusive until the celebrated work of Zhang last year, who showed that

$\displaystyle H_1 \leq 70{,}000{,}000.$

The Polymath8a paper then improved this to ${H_1 \leq 4{,}680}$. After that, Maynard introduced a new multidimensional Selberg sieve argument that gave the substantial improvement

$\displaystyle H_1 \leq 600$

unconditionally, and ${H_1 \leq 12}$ on the Elliott-Halberstam conjecture; furthermore, bounds on ${H_m}$ for higher ${m}$ were obtained for the first time, and specifically that ${H_m \ll m^3 e^{4m}}$ for all ${m \geq 1}$, with the improvements ${H_2 \leq 600}$ and ${H_m \ll m^3 e^{2m}}$ on the Elliott-Halberstam conjecture. (I had independently discovered the multidimensional sieve idea, although I did not obtain Maynard’s specific numerical results, and my asymptotic bounds were a bit weaker.)

In Polymath8b, we obtain some further improvements. Unconditionally, we have ${H_1 \leq 246}$ and ${H_m \ll m e^{(4 - \frac{28}{157}) m}}$, together with some explicit bounds on ${H_2,H_3,H_4,H_5}$; on the Elliott-Halberstam conjecture we have ${H_m \ll m e^{2m}}$ and some numerical improvements to the ${H_2,H_3,H_4,H_5}$ bounds; and assuming the generalised Elliott-Halberstam conjecture we have the bound ${H_1 \leq 6}$, which is best possible from sieve-theoretic methods thanks to the parity problem obstruction.

There were a variety of methods used to establish these results. Maynard’s paper obtained a criterion for bounding ${H_m}$ which reduced to finding a good solution to a certain multidimensional variational problem. When the dimension parameter ${k}$ was relatively small (e.g. ${k \leq 100}$), we were able to obtain good numerical solutions both by continuing the method of Maynard (using a basis of symmetric polynomials), or by using a Krylov iteration scheme. For large ${k}$, we refined the asymptotics and obtained near-optimal solutions of the variational problem. For the ${H_1}$ bounds, we extended the reach of the multidimensional Selberg sieve (particularly under the assumption of the generalised Elliott-Halberstam conjecture) by allowing the function ${F}$ in the multidimensional variational problem to extend to a larger region of space than was previously admissible, albeit with some tricky new constraints on ${F}$ (and penalties in the variational problem). This required some unusual sieve-theoretic manipulations, notably an “epsilon trick”, ultimately relying on the elementary inequality ${(a+b)^2 \geq a^2 + 2ab}$, that allowed one to get non-trivial lower bounds for sums such as ${\sum_n (a(n)+b(n))^2}$ even if the sum ${\sum_n b(n)^2}$ had no non-trivial estimates available; and a way to estimate divisor sums such as ${\sum_{n\leq x} \sum_{d|n} \lambda_d}$ even if ${d}$ was permitted to be comparable to or even exceed ${x}$, by using the fundamental theorem of arithmetic to factorise ${n}$ (after restricting to the case when ${n}$ is almost prime). I hope that these sieve-theoretic tricks will be useful in future work in the subject.

With this paper, the Polymath8 project is almost complete; there is still a little bit of scope to push our methods further and get some modest improvement for instance to the ${H_1 \leq 246}$ bound, but this would require a substantial amount of effort, and it is probably best to instead wait for some new breakthrough in the subject to come along. One final task we are performing is to write up a retrospective article on both the 8a and 8b experiences, an incomplete writeup of which can be found here. If anyone wishes to contribute some commentary on these projects (whether you were an active contributor, an occasional contributor, or a silent “lurker” in the online discussion), please feel free to do so in the comments to this post.

Two of the most famous open problems in additive prime number theory are the twin prime conjecture and the binary Goldbach conjecture. They have quite similar forms:

• Twin prime conjecture The equation ${p_1 - p_2 = 2}$ has infinitely many solutions with ${p_1,p_2}$ prime.
• Binary Goldbach conjecture The equation ${p_1 + p_2 = N}$ has at least one solution with ${p_1,p_2}$ prime for any given even ${N \geq 4}$.

In view of this similarity, it is not surprising that the partial progress on these two conjectures have tracked each other fairly closely; the twin prime conjecture is generally considered slightly easier than the binary Goldbach conjecture, but broadly speaking any progress made on one of the conjectures has also led to a comparable amount of progress on the other. (For instance, Chen’s theorem has a version for the twin prime conjecture, and a version for the binary Goldbach conjecture.) Also, the notorious parity obstruction is present in both problems, preventing a solution to either conjecture by almost all known methods (see this previous blog post for more discussion).

In this post, I would like to note a divergence from this general principle, with regards to bounded error versions of these two conjectures:

• Twin prime with bounded error The inequalities ${0 < p_1 - p_2 < H}$ has infinitely many solutions with ${p_1,p_2}$ prime for some absolute constant ${H}$.
• Binary Goldbach with bounded error The inequalities ${N \leq p_1+p_2 \leq N+H}$ has at least one solution with ${p_1,p_2}$ prime for any sufficiently large ${N}$ and some absolute constant ${H}$.

The first of these statements is now a well-known theorem of Zhang, and the Polymath8b project hosted on this blog has managed to lower ${H}$ to ${H=246}$ unconditionally, and to ${H=6}$ assuming the generalised Elliott-Halberstam conjecture. However, the second statement remains open; the best result that the Polymath8b project could manage in this direction is that (assuming GEH) at least one of the binary Goldbach conjecture with bounded error, or the twin prime conjecture with no error, had to be true.

All the known proofs of Zhang’s theorem proceed through sieve-theoretic means. Basically, they take as input equidistribution results that control the size of discrepancies such as

$\displaystyle \Delta(f; a\ (q)) := \sum_{x \leq n \leq 2x; n=a\ (q)} f(n) - \frac{1}{\phi(q)} \sum_{x \leq n \leq 2x} f(n) \ \ \ \ \ (1)$

for various congruence classes ${a\ (q)}$ and various arithmetic functions ${f}$, e.g. ${f(n) = \Lambda(n+h_i)}$ (or more generaly ${f(n) = \alpha * \beta(n+h_i)}$ for various ${\alpha,\beta}$). After taking some carefully chosen linear combinations of these discrepancies, and using the trivial positivity lower bound

$\displaystyle a_n \geq 0 \hbox{ for all } n \implies \sum_n a_n \geq 0 \ \ \ \ \ (2)$

one eventually obtains (for suitable ${H}$) a non-trivial lower bound of the form

$\displaystyle \sum_{x \leq n \leq 2x} \nu(n) 1_A(n) > 0$

where ${\nu}$ is some weight function, and ${A}$ is the set of ${n}$ such that there are at least two primes in the interval ${[n,n+H]}$. This implies at least one solution to the inequalities ${0 < p_1 - p_2 < H}$ with ${p_1,p_2 \sim x}$, and Zhang’s theorem follows.

In a similar vein, one could hope to use bounds on discrepancies such as (1) (for ${x}$ comparable to ${N}$), together with the trivial lower bound (2), to obtain (for sufficiently large ${N}$, and suitable ${H}$) a non-trivial lower bound of the form

$\displaystyle \sum_{n \leq N} \nu(n) 1_B(n) > 0 \ \ \ \ \ (3)$

for some weight function ${\nu}$, where ${B}$ is the set of ${n}$ such that there is at least one prime in each of the intervals ${[n,n+H]}$ and ${[N-n-H,n]}$. This would imply the binary Goldbach conjecture with bounded error.

However, the parity obstruction blocks such a strategy from working (for much the same reason that it blocks any bound of the form ${H \leq 4}$ in Zhang’s theorem, as discussed in the Polymath8b paper.) The reason is as follows. The sieve-theoretic arguments are linear with respect to the ${n}$ summation, and as such, any such sieve-theoretic argument would automatically also work in a weighted setting in which the ${n}$ summation is weighted by some non-negative weight ${\omega(n) \geq 0}$. More precisely, if one could control the weighted discrepancies

$\displaystyle \Delta(f\omega; a\ (q)) = \sum_{x \leq n \leq 2x; n=a\ (q)} f(n) \omega(n) - \frac{1}{\phi(q)} \sum_{x \leq n \leq 2x} f(n) \omega(n)$

to essentially the same accuracy as the unweighted discrepancies (1), then thanks to the trivial weighted version

$\displaystyle a_n \geq 0 \hbox{ for all } n \implies \sum_n a_n \omega(n) \geq 0$

of (2), any sieve-theoretic argument that was capable of proving (3) would also be capable of proving the weighted estimate

$\displaystyle \sum_{n \leq N} \nu(n) 1_B(n) \omega(n) > 0. \ \ \ \ \ (4)$

However, (4) may be defeated by a suitable choice of weight ${\omega}$, namely

$\displaystyle \omega(n) := \prod_{i=1}^H (1 + \lambda(n) \lambda(n+i)) \times \prod_{j=0}^H (1 - \lambda(n) \lambda(N-n-j))$

where ${n \mapsto \lambda(n)}$ is the Liouville function, which counts the parity of the number of prime factors of a given number ${n}$. Since ${\lambda(n)^2 = 1}$, one can expand out ${\omega(n)}$ as the sum of ${1}$ and a finite number of other terms, each of which consists of the product of two or more translates (or reflections) of ${\lambda}$. But from the Möbius randomness principle (or its analogue for the Liouville function), such products of ${\lambda}$ are widely expected to be essentially orthogonal to any arithmetic function ${f(n)}$ that is arising from a single multiplicative function such as ${\Lambda}$, even on very short arithmetic progressions. As such, replacing ${1}$ by ${\omega(n)}$ in (1) should have a negligible effect on the discrepancy. On the other hand, in order for ${\omega(n)}$ to be non-zero, ${\lambda(n+i)}$ has to have the same sign as ${\lambda(n)}$ and hence the opposite sign to ${\lambda(N-n-j)}$ cannot simultaneously be prime for any ${0 \leq i,j \leq H}$, and so ${1_B(n) \omega(n)}$ vanishes identically, contradicting (4). This indirectly rules out any modification of the Goldston-Pintz-Yildirim/Zhang method for establishing the binary Goldbach conjecture with bounded error.

The above argument is not watertight, and one could envisage some ways around this problem. One of them is that the Möbius randomness principle could simply be false, in which case the parity obstruction vanishes. A good example of this is the result of Heath-Brown that shows that if there are infinitely many Siegel zeroes (which is a strong violation of the Möbius randomness principle), then the twin prime conjecture holds. Another way around the obstruction is to start controlling the discrepancy (1) for functions ${f}$ that are combinations of more than one multiplicative function, e.g. ${f(n) = \Lambda(n) \Lambda(n+2)}$. However, controlling such functions looks to be at least as difficult as the twin prime conjecture (which is morally equivalent to obtaining non-trivial lower-bounds for ${\sum_{x \leq n \leq 2x} \Lambda(n) \Lambda(n+2)}$). A third option is not to use a sieve-theoretic argument, but to try a different method (e.g. the circle method). However, most other known methods also exhibit linearity in the “${n}$” variable and I would suspect they would be vulnerable to a similar obstruction. (In any case, the circle method specifically has some other difficulties in tackling binary problems, as discussed in this previous post.)

Let ${\bar{{\bf Q}}}$ be the algebraic closure of ${{\bf Q}}$, that is to say the field of algebraic numbers. We fix an embedding of ${\bar{{\bf Q}}}$ into ${{\bf C}}$, giving rise to a complex absolute value ${z \mapsto |z|}$ for algebraic numbers ${z \in \bar{{\bf Q}}}$.

Let ${\alpha \in \bar{{\bf Q}}}$ be of degree ${D > 1}$, so that ${\alpha}$ is irrational. A classical theorem of Liouville gives the quantitative bound

$\displaystyle |\alpha - \frac{p}{q}| \geq c \frac{1}{|q|^D} \ \ \ \ \ (1)$

for the irrationality of ${\alpha}$ fails to be approximated by rational numbers ${p/q}$, where ${c>0}$ depends on ${\alpha,D}$ but not on ${p,q}$. Indeed, if one lets ${\alpha = \alpha_1, \alpha_2, \dots, \alpha_D}$ be the Galois conjugates of ${\alpha}$, then the quantity ${\prod_{i=1}^D |q \alpha_i - p|}$ is a non-zero natural number divided by a constant, and so we have the trivial lower bound

$\displaystyle \prod_{i=1}^D |q \alpha_i - p| \geq c$

from which the bound (1) easily follows. A well known corollary of the bound (1) is that Liouville numbers are automatically transcendental.

The famous theorem of Thue, Siegel and Roth improves the bound (1) to

$\displaystyle |\alpha - \frac{p}{q}| \geq c \frac{1}{|q|^{2+\epsilon}} \ \ \ \ \ (2)$

for any ${\epsilon>0}$ and rationals ${\frac{p}{q}}$, where ${c>0}$ depends on ${\alpha,\epsilon}$ but not on ${p,q}$. Apart from the ${\epsilon}$ in the exponent and the implied constant, this bound is optimal, as can be seen from Dirichlet’s theorem. This theorem is a good example of the ineffectivity phenomenon that affects a large portion of modern number theory: the implied constant in the ${\gg}$ notation is known to be finite, but there is no explicit bound for it in terms of the coefficients of the polynomial defining ${\alpha}$ (in contrast to (1), for which an effective bound may be easily established). This is ultimately due to the reliance on the “dueling conspiracy” (or “repulsion phenomenon”) strategy. We do not as yet have a good way to rule out one counterexample to (2), in which ${\frac{p}{q}}$ is far closer to ${\alpha}$ than ${\frac{1}{|q|^{2+\epsilon}}}$; however we can rule out two such counterexamples, by playing them off of each other.

A powerful strengthening of the Thue-Siegel-Roth theorem is given by the subspace theorem, first proven by Schmidt and then generalised further by several authors. To motivate the theorem, first observe that the Thue-Siegel-Roth theorem may be rephrased as a bound of the form

$\displaystyle | \alpha p - \beta q | \times | \alpha' p - \beta' q | \geq c (1 + |p| + |q|)^{-\epsilon} \ \ \ \ \ (3)$

for any algebraic numbers ${\alpha,\beta,\alpha',\beta'}$ with ${(\alpha,\beta)}$ and ${(\alpha',\beta')}$ linearly independent (over the algebraic numbers), and any ${(p,q) \in {\bf Z}^2}$ and ${\epsilon>0}$, with the exception when ${\alpha,\beta}$ or ${\alpha',\beta'}$ are rationally dependent (i.e. one is a rational multiple of the other), in which case one has to remove some lines (i.e. subspaces in ${{\bf Q}^2}$) of rational slope from the space ${{\bf Z}^2}$ of pairs ${(p,q)}$ to which the bound (3) does not apply (namely, those lines for which the left-hand side vanishes). Here ${c>0}$ can depend on ${\alpha,\beta,\alpha',\beta',\epsilon}$ but not on ${p,q}$. More generally, we have

Theorem 1 (Schmidt subspace theorem) Let ${d}$ be a natural number. Let ${L_1,\dots,L_d: \bar{{\bf Q}}^d \rightarrow \bar{{\bf Q}}}$ be linearly independent linear forms. Then for any ${\epsilon>0}$, one has the bound

$\displaystyle \prod_{i=1}^d |L_i(x)| \geq c (1 + \|x\| )^{-\epsilon}$

for all ${x \in {\bf Z}^d}$, outside of a finite number of proper subspaces of ${{\bf Q}^d}$, where

$\displaystyle \| (x_1,\dots,x_d) \| := \max( |x_1|, \dots, |x_d| )$

and ${c>0}$ depends on ${\epsilon, d}$ and the ${\alpha_{i,j}}$, but is independent of ${x}$.

Being a generalisation of the Thue-Siegel-Roth theorem, it is unsurprising that the known proofs of the subspace theorem are also ineffective with regards to the constant ${c}$. (However, the number of exceptional subspaces may be bounded effectively; cf. the situation with the Skolem-Mahler-Lech theorem, discussed in this previous blog post.) Once again, the lower bound here is basically sharp except for the ${\epsilon}$ factor and the implied constant: given any ${\delta_1,\dots,\delta_d > 0}$ with ${\delta_1 \dots \delta_d = 1}$, a simple volume packing argument (the same one used to prove the Dirichlet approximation theorem) shows that for any sufficiently large ${N \geq 1}$, one can find integers ${x_1,\dots,x_d \in [-N,N]}$, not all zero, such that

$\displaystyle |L_i(x)| \ll \delta_i$

for all ${i=1,\dots,d}$. Thus one can get ${\prod_{i=1}^d |L_i(x)|}$ comparable to ${1}$ in many different ways.

There are important generalisations of the subspace theorem to other number fields than the rationals (and to other valuations than the Archimedean valuation ${z \mapsto |z|}$); we will develop one such generalisation below.

The subspace theorem is one of many finiteness theorems in Diophantine geometry; in this case, it is the number of exceptional subspaces which is finite. It turns out that finiteness theorems are very compatible with the language of nonstandard analysis. (See this previous blog post for a review of the basics of nonstandard analysis, and in particular for the nonstandard interpretation of asymptotic notation such as ${\ll}$ and ${o()}$.) The reason for this is that a standard set ${X}$ is finite if and only if it contains no strictly nonstandard elements (that is to say, elements of ${{}^* X \backslash X}$). This makes for a clean formulation of finiteness theorems in the nonstandard setting. For instance, the standard form of Bezout’s theorem asserts that if ${P(x,y), Q(x,y)}$ are coprime polynomials over some field, then the curves ${\{ (x,y): P(x,y) = 0\}}$ and ${\{ (x,y): Q(x,y)=0\}}$ intersect in only finitely many points. The nonstandard version of this is then

Theorem 2 (Bezout’s theorem, nonstandard form) Let ${P(x,y), Q(x,y)}$ be standard coprime polynomials. Then there are no strictly nonstandard solutions to ${P(x,y)=Q(x,y)=0}$.

Now we reformulate Theorem 1 in nonstandard language. We need a definition:

Definition 3 (General position) Let ${K \subset L}$ be nested fields. A point ${x = (x_1,\dots,x_d)}$ in ${L^d}$ is said to be in ${K}$-general position if it is not contained in any hyperplane of ${L^d}$ definable over ${K}$, or equivalently if one has

$\displaystyle a_1 x_1 + \dots + a_d x_d = 0 \iff a_1=\dots = a_d = 0$

for any ${a_1,\dots,a_d \in K}$.

Theorem 4 (Schmidt subspace theorem, nonstandard version) Let ${d}$ be a standard natural number. Let ${L_1,\dots,L_d: \bar{{\bf Q}}^d \rightarrow \bar{{\bf Q}}}$ be linearly independent standard linear forms. Let ${x \in {}^* {\bf Z}^d}$ be a tuple of nonstandard integers which is in ${{\bf Q}}$-general position (in particular, this forces ${x}$ to be strictly nonstandard). Then one has

$\displaystyle \prod_{i=1}^d |L_i(x)| \gg \|x\|^{-o(1)},$

where we extend ${L_i}$ from ${\bar{{\bf Q}}}$ to ${{}^* \bar{{\bf Q}}}$ (and also similarly extend ${\| \|}$ from ${{\bf Z}^d}$ to ${{}^* {\bf Z}^d}$) in the usual fashion.

Observe that (as is usual when translating to nonstandard analysis) some of the epsilons and quantifiers that are present in the standard version become hidden in the nonstandard framework, being moved inside concepts such as “strictly nonstandard” or “general position”. We remark that as ${x}$ is in ${{\bf Q}}$-general position, it is also in ${\bar{{\bf Q}}}$-general position (as an easy Galois-theoretic argument shows), and the requirement that the ${L_1,\dots,L_d}$ are linearly independent is thus equivalent to ${L_1(x),\dots,L_d(x)}$ being ${\bar{{\bf Q}}}$-linearly independent.

Exercise 1 Verify that Theorem 1 and Theorem 4 are equivalent. (Hint: there are only countably many proper subspaces of ${{\bf Q}^d}$.)

We will not prove the subspace theorem here, but instead focus on a particular application of the subspace theorem, namely to counting integer points on curves. In this paper of Corvaja and Zannier, the subspace theorem was used to give a new proof of the following basic result of Siegel:

Theorem 5 (Siegel’s theorem on integer points) Let ${P \in {\bf Q}[x,y]}$ be an irreducible polynomial of two variables, such that the affine plane curve ${C := \{ (x,y): P(x,y)=0\}}$ either has genus at least one, or has at least three points on the line at infinity, or both. Then ${C}$ has only finitely many integer points ${(x,y) \in {\bf Z}^2}$.

This is a finiteness theorem, and as such may be easily converted to a nonstandard form:

Theorem 6 (Siegel’s theorem, nonstandard form) Let ${P \in {\bf Q}[x,y]}$ be a standard irreducible polynomial of two variables, such that the affine plane curve ${C := \{ (x,y): P(x,y)=0\}}$ either has genus at least one, or has at least three points on the line at infinity, or both. Then ${C}$ does not contain any strictly nonstandard integer points ${(x_*,y_*) \in {}^* {\bf Z}^2 \backslash {\bf Z}^2}$.

Note that Siegel’s theorem can fail for genus zero curves that only meet the line at infinity at just one or two points; the key examples here are the graphs ${\{ (x,y): y - f(x) = 0\}}$ for a polynomial ${f \in {\bf Z}[x]}$, and the Pell equation curves ${\{ (x,y): x^2 - dy^2 = 1 \}}$. Siegel’s theorem can be compared with the more difficult theorem of Faltings, which establishes finiteness of rational points (not just integer points), but now needs the stricter requirement that the curve ${C}$ has genus at least two (to avoid the additional counterexample of elliptic curves of positive rank, which have infinitely many rational points).

The standard proofs of Siegel’s theorem rely on a combination of the Thue-Siegel-Roth theorem and a number of results on abelian varieties (notably the Mordell-Weil theorem). The Corvaja-Zannier argument rebalances the difficulty of the argument by replacing the Thue-Siegel-Roth theorem by the more powerful subspace theorem (in fact, they need one of the stronger versions of this theorem alluded to earlier), while greatly reducing the reliance on results on abelian varieties. Indeed, for curves with three or more points at infinity, no theory from abelian varieties is needed at all, while for the remaining cases, one mainly needs the existence of the Abel-Jacobi embedding, together with a relatively elementary theorem of Chevalley-Weil which is used in the proof of the Mordell-Weil theorem, but is significantly easier to prove.

The Corvaja-Zannier argument (together with several further applications of the subspace theorem) is presented nicely in this Bourbaki expose of Bilu. To establish the theorem in full generality requires a certain amount of algebraic number theory machinery, such as the theory of valuations on number fields, or of relative discriminants between such number fields. However, the basic ideas can be presented without much of this machinery by focusing on simple special cases of Siegel’s theorem. For instance, we can handle irreducible cubics that meet the line at infinity at exactly three points ${[1,\alpha_1,0], [1,\alpha_2,0], [1,\alpha_3,0]}$:

Theorem 7 (Siegel’s theorem with three points at infinity) Siegel’s theorem holds when the irreducible polynomial ${P(x,y)}$ takes the form

$\displaystyle P(x,y) = (y - \alpha_1 x) (y - \alpha_2 x) (y - \alpha_3 x) + Q(x,y)$

for some quadratic polynomial ${Q \in {\bf Q}[x,y]}$ and some distinct algebraic numbers ${\alpha_1,\alpha_2,\alpha_3}$.

Proof: We use the nonstandard formalism. Suppose for sake of contradiction that we can find a strictly nonstandard integer point ${(x_*,y_*) \in {}^* {\bf Z}^2 \backslash {\bf Z}^2}$ on a curve ${C := \{ (x,y): P(x,y)=0\}}$ of the indicated form. As this point is infinitesimally close to the line at infinity, ${y_*/x_*}$ must be infinitesimally close to one of ${\alpha_1,\alpha_2,\alpha_3}$; without loss of generality we may assume that ${y_*/x_*}$ is infinitesimally close to ${\alpha_1}$.

We now use a version of the polynomial method, to find some polynomials of controlled degree that vanish to high order on the “arm” of the cubic curve ${C}$ that asymptotes to ${[1,\alpha_1,0]}$. More precisely, let ${D \geq 3}$ be a large integer (actually ${D=3}$ will already suffice here), and consider the ${\bar{{\bf Q}}}$-vector space ${V}$ of polynomials ${R(x,y) \in \bar{{\bf Q}}[x,y]}$ of degree at most ${D}$, and of degree at most ${2}$ in the ${y}$ variable; this space has dimension ${3D}$. Also, as one traverses the arm ${y/x \rightarrow \alpha_1}$ of ${C}$, any polynomial ${R}$ in ${V}$ grows at a rate of at most ${D}$, that is to say ${R}$ has a pole of order at most ${D}$ at the point at infinity ${[1,\alpha_1,0]}$. By performing Laurent expansions around this point (which is a non-singular point of ${C}$, as the ${\alpha_i}$ are assumed to be distinct), we may thus find a basis ${R_1, \dots, R_{3D}}$ of ${V}$, with the property that ${R_j}$ has a pole of order at most ${D+1-j}$ at ${[1,\alpha_1,0]}$ for each ${j=1,\dots,3D}$.

From the control of the pole at ${[1,\alpha_1,0]}$, we have

$\displaystyle |R_j(x_*,y_*)| \ll (|x_*|+|y_*|)^{D+1-j}$

for all ${j=1,\dots,3D}$. The exponents here become negative for ${j > D+1}$, and on multiplying them all together we see that

$\displaystyle \prod_{j=1}^{3D} |R_j(x_*,y_*)| \ll (|x_*|+|y_*|)^{3D(D+1) - \frac{3D(3D+1)}{2}}.$

This exponent is negative for ${D}$ large enough (or just take ${D=3}$). If we expand

$\displaystyle R_j(x_*,y_*) = \sum_{a+b \leq D; b \leq 2} \alpha_{j,a,b} x_*^a y_*^b$

for some algebraic numbers ${\alpha_{j,a,b}}$, then we thus have

$\displaystyle \prod_{j=1}^{3D} |\sum_{a+b \leq D; b \leq 2} \alpha_{j,a,b} x_*^a y_*^b| \ll (|x_*|+|y_*|)^{-\epsilon}$

for some standard ${\epsilon>0}$. Note that the ${3D}$-dimensional vectors ${(\alpha_{j,a,b})_{a+b \leq D; b \leq 2}}$ are linearly independent in ${{\bf C}^{3D}}$, because the ${R_j}$ are linearly independent in ${V}$. Applying the Schmidt subspace theorem in the contrapositive, we conclude that the ${3D}$-tuple ${( x_*^a y_*^b )_{a+b \leq D; b \leq 2} \in {}^* {\bf Z}^{3D}}$ is not in ${{\bf Q}}$-general position. That is to say, one has a non-trivial constraint of the form

$\displaystyle \sum_{a+b \leq D; b \leq 2} c_{a,b} x_*^a y_*^b = 0 \ \ \ \ \ (4)$

for some standard rational coefficients ${c_{a,b}}$, not all zero. But, as ${P}$ is irreducible and cubic in ${y}$, it has no common factor with the standard polynomial ${\sum_{a+b \leq D; b \leq 2} c_{a,b} x^a y^b}$, so by Bezout’s theorem (Theorem 2) the constraint (4) only has standard solutions, contradicting the strictly nonstandard nature of ${(x_*,y_*)}$. $\Box$

Exercise 2 Rewrite the above argument so that it makes no reference to nonstandard analysis. (In this case, the rewriting is quite straightforward; however, there will be a subsequent argument in which the standard version is significantly messier than the nonstandard counterpart, which is the reason why I am working with the nonstandard formalism in this blog post.)

A similar argument works for higher degree curves that meet the line at infinity in three or more points, though if the curve has singularities at infinity then it becomes convenient to rely on the Riemann-Roch theorem to control the dimension of the analogue of the space ${V}$. Note that when there are only two or fewer points at infinity, though, one cannot get the negative exponent of ${-\epsilon}$ needed to usefully apply the subspace theorem. To deal with this case we require some additional tricks. For simplicity we focus on the case of Mordell curves, although it will be convenient to work with more general number fields ${{\bf Q} \subset K \subset \bar{{\bf Q}}}$ than the rationals:

Theorem 8 (Siegel’s theorem for Mordell curves) Let ${k}$ be a non-zero integer. Then there are only finitely many integer solutions ${(x,y) \in {\bf Z}^2}$ to ${y^2 - x^3 = k}$. More generally, for any number field ${K}$, and any nonzero ${k \in K}$, there are only finitely many algebraic integer solutions ${(x,y) \in {\mathcal O}_K^2}$ to ${y^2-x^3=k}$, where ${{\mathcal O}_K}$ is the ring of algebraic integers in ${K}$.

Again, we will establish the nonstandard version. We need some additional notation:

Definition 9

• We define an almost rational integer to be a nonstandard ${x \in {}^* {\bf Q}}$ such that ${Mx \in {}^* {\bf Z}}$ for some standard positive integer ${M}$, and write ${{\bf Q} {}^* {\bf Z}}$ for the ${{\bf Q}}$-algebra of almost rational integers.
• If ${K}$ is a standard number field, we define an almost ${K}$-integer to be a nonstandard ${x \in {}^* K}$ such that ${Mx \in {}^* {\mathcal O}_K}$ for some standard positive integer ${M}$, and write ${K {}^* {\bf Z} = K {\mathcal O}_K}$ for the ${K}$-algebra of almost ${K}$-integers.
• We define an almost algebraic integer to be a nonstandard ${x \in {}^* {\bar Q}}$ such that ${Mx}$ is a nonstandard algebraic integer for some standard positive integer ${M}$, and write ${\bar{{\bf Q}} {}^* {\bf Z}}$ for the ${\bar{{\bf Q}}}$-algebra of almost algebraic integers.
• Theorem 10 (Siegel for Mordell, nonstandard version) Let ${k}$ be a non-zero standard algebraic number. Then the curve ${\{ (x,y): y^2 - x^3 = k \}}$ does not contain any strictly nonstandard almost algebraic integer point.

Another way of phrasing this theorem is that if ${x,y}$ are strictly nonstandard almost algebraic integers, then ${y^2-x^3}$ is either strictly nonstandard or zero.

Exercise 3 Verify that Theorem 8 and Theorem 10 are equivalent.

Due to all the ineffectivity, our proof does not supply any bound on the solutions ${x,y}$ in terms of ${k}$, even if one removes all references to nonstandard analysis. It is a conjecture of Hall (a special case of the notorious ABC conjecture) that one has the bound ${|x| \ll_\epsilon |k|^{2+\epsilon}}$ for all ${\epsilon>0}$ (or equivalently ${|y| \ll_\epsilon |k|^{3+\epsilon}}$), but even the weaker conjecture that ${x,y}$ are of polynomial size in ${k}$ is open. (The best known bounds are of exponential nature, and are proven using a version of Baker’s method: see for instance this text of Sprindzuk.)

A direct repetition of the arguments used to prove Theorem 7 will not work here, because the Mordell curve ${\{ (x,y): y^2 - x^3 = k \}}$ only hits the line at infinity at one point, ${[0,1,0]}$. To get around this we will exploit the fact that the Mordell curve is an elliptic curve and thus has a group law on it. We will then divide all the integer points on this curve by two; as elliptic curves have four 2-torsion points, this will end up placing us in a situation like Theorem 7, with four points at infinity. However, there is an obstruction: it is not obvious that dividing an integer point on the Mordell curve by two will produce another integer point. However, this is essentially true (after enlarging the ring of integers slightly) thanks to a general principle of Chevalley and Weil, which can be worked out explicitly in the case of division by two on Mordell curves by relatively elementary means (relying mostly on unique factorisation of ideals of algebraic integers). We give the details below the fold.

Let ${V}$ be a quasiprojective variety defined over a finite field ${{\bf F}_q}$, thus for instance ${V}$ could be an affine variety

$\displaystyle V = \{ x \in {\bf A}^d: P_1(x) = \dots = P_m(x) = 0\} \ \ \ \ \ (1)$

where ${{\bf A}^d}$ is ${d}$-dimensional affine space and ${P_1,\dots,P_m: {\bf A}^d \rightarrow {\bf A}}$ are a finite collection of polynomials with coefficients in ${{\bf F}_q}$. Then one can define the set ${V[{\bf F}_q]}$ of ${{\bf F}_q}$-rational points, and more generally the set ${V[{\bf F}_{q^n}]}$ of ${{\bf F}_{q^n}}$-rational points for any ${n \geq 1}$, since ${{\bf F}_{q^n}}$ can be viewed as a field extension of ${{\bf F}_q}$. Thus for instance in the affine case (1) we have

$\displaystyle V[{\bf F}_{q^n}] := \{ x \in {\bf F}_{q^n}^d: P_1(x) = \dots = P_m(x) = 0\}.$

The Weil conjectures are concerned with understanding the number

$\displaystyle S_n := |V[{\bf F}_{q^n}]| \ \ \ \ \ (2)$

of ${{\bf F}_{q^n}}$-rational points over a variety ${V}$. The first of these conjectures was proven by Dwork, and can be phrased as follows.

Theorem 1 (Rationality of the zeta function) Let ${V}$ be a quasiprojective variety defined over a finite field ${{\bf F}_q}$, and let ${S_n}$ be given by (2). Then there exist a finite number of algebraic integers ${\alpha_1,\dots,\alpha_k, \beta_1,\dots,\beta_{k'} \in O_{\overline{{\bf Q}}}}$ (known as characteristic values of ${V}$), such that

$\displaystyle S_n = \alpha_1^n + \dots + \alpha_k^n - \beta_1^n - \dots - \beta_{k'}^n$

for all ${n \geq 1}$.

After cancelling, we may of course assume that ${\alpha_i \neq \beta_j}$ for any ${i=1,\dots,k}$ and ${j=1,\dots,k'}$, and then it is easy to see (as we will see below) that the ${\alpha_1,\dots,\alpha_k,\beta_1,\dots,\beta_{k'}}$ become uniquely determined up to permutations of the ${\alpha_1,\dots,\alpha_k}$ and ${\beta_1,\dots,\beta_{k'}}$. These values are known as the characteristic values of ${V}$. Since ${S_n}$ is a rational integer (i.e. an element of ${{\bf Z}}$) rather than merely an algebraic integer (i.e. an element of the ring of integers ${O_{\overline{{\bf Q}}}}$ of the algebraic closure ${\overline{{\bf Q}}}$ of ${{\bf Q}}$), we conclude from the above-mentioned uniqueness that the set of characteristic values are invariant with respect to the Galois group ${Gal(\overline{{\bf Q}} / {\bf Q} )}$. To emphasise this Galois invariance, we will not fix a specific embedding ${\iota_\infty: \overline{{\bf Q}} \rightarrow {\bf C}}$ of the algebraic numbers into the complex field ${{\bf C} = {\bf C}_\infty}$, but work with all such embeddings simultaneously. (Thus, for instance, ${\overline{{\bf Q}}}$ contains three cube roots of ${2}$, but which of these is assigned to the complex numbers ${2^{1/3}}$, ${e^{2\pi i/3} 2^{1/3}}$, ${e^{4\pi i/3} 2^{1/3}}$ will depend on the choice of embedding ${\iota_\infty}$.)

An equivalent way of phrasing Dwork’s theorem is that the (${T}$-form of the) zeta function

$\displaystyle \zeta_V(T) := \exp( \sum_{n=1}^\infty \frac{S_n}{n} T^n )$

associated to ${V}$ (which is well defined as a formal power series in ${T}$, at least) is equal to a rational function of ${T}$ (with the ${\alpha_1,\dots,\alpha_k}$ and ${\beta_1,\dots,\beta_{k'}}$ being the poles and zeroes of ${\zeta_V}$ respectively). Here, we use the formal exponential

$\displaystyle \exp(X) := 1 + X + \frac{X^2}{2!} + \frac{X^3}{3!} + \dots.$

Equivalently, the (${s}$-form of the) zeta-function ${s \mapsto \zeta_V(q^{-s})}$ is a meromorphic function on the complex numbers ${{\bf C}}$ which is also periodic with period ${2\pi i/\log q}$, and which has only finitely many poles and zeroes up to this periodicity.

Dwork’s argument relies primarily on ${p}$-adic analysis – an analogue of complex analysis, but over an algebraically complete (and metrically complete) extension ${{\bf C}_p}$ of the ${p}$-adic field ${{\bf Q}_p}$, rather than over the Archimedean complex numbers ${{\bf C}}$. The argument is quite effective, and in particular gives explicit upper bounds for the number ${k+k'}$ of characteristic values in terms of the complexity of the variety ${V}$; for instance, in the affine case (1) with ${V}$ of degree ${D}$, Bombieri used Dwork’s methods (in combination with Deligne’s theorem below) to obtain the bound ${k+k' \leq (4D+9)^{2d+1}}$, and a subsequent paper of Hooley established the slightly weaker bound ${k+k' \leq (11D+11)^{d+m+2}}$ purely from Dwork’s methods (a similar bound had also been pointed out in unpublished work of Dwork). In particular, one has bounds that are uniform in the field ${{\bf F}_q}$, which is an important fact for many analytic number theory applications.

These ${p}$-adic arguments stand in contrast with Deligne’s resolution of the last (and deepest) of the Weil conjectures:

Theorem 2 (Riemann hypothesis) Let ${V}$ be a quasiprojective variety defined over a finite field ${{\bf F}_q}$, and let ${\lambda \in \overline{{\bf Q}}}$ be a characteristic value of ${V}$. Then there exists a natural number ${w}$ such that ${|\iota_\infty(\lambda)|_\infty = q^{w/2}}$ for every embedding ${\iota_\infty: \overline{{\bf Q}} \rightarrow {\bf C}}$, where ${| |_\infty}$ denotes the usual absolute value on the complex numbers ${{\bf C} = {\bf C}_\infty}$. (Informally: ${\lambda}$ and all of its Galois conjugates have complex magnitude ${q^{w/2}}$.)

To put it another way that closely resembles the classical Riemann hypothesis, all the zeroes and poles of the ${s}$-form ${s \mapsto \zeta_V(q^{-s})}$ lie on the critical lines ${\{ s \in {\bf C}: \hbox{Re}(s) = \frac{w}{2} \}}$ for ${w=0,1,2,\dots}$. (See this previous blog post for further comparison of various instantiations of the Riemann hypothesis.) Whereas Dwork uses ${p}$-adic analysis, Deligne uses the essentially orthogonal technique of ell-adic cohomology to establish his theorem. However, ell-adic methods can be used (via the Grothendieck-Lefschetz trace formula) to establish rationality, and conversely, in this paper of Kedlaya p-adic methods are used to establish the Riemann hypothesis. As pointed out by Kedlaya, the ell-adic methods are tied to the intrinsic geometry of ${V}$ (such as the structure of sheaves and covers over ${V}$), while the ${p}$-adic methods are more tied to the extrinsic geometry of ${V}$ (how ${V}$ sits inside its ambient affine or projective space).

In this post, I would like to record my notes on Dwork’s proof of Theorem 1, drawing heavily on the expositions of Serre, Hooley, Koblitz, and others.

The basic strategy is to control the rational integers ${S_n}$ both in an “Archimedean” sense (embedding the rational integers inside the complex numbers ${{\bf C}_\infty}$ with the usual norm ${||_\infty}$) as well as in the “${p}$-adic” sense, with ${p}$ the characteristic of ${{\bf F}_q}$ (embedding the integers now in the “complexification” ${{\bf C}_p}$ of the ${p}$-adic numbers ${{\bf Q}_p}$, which is equipped with a norm ${||_p}$ that we will recall later). (This is in contrast to the methods of ell-adic cohomology, in which one primarily works over an ${\ell}$-adic field ${{\bf Q}_\ell}$ with ${\ell \neq p,\infty}$.) The Archimedean control is trivial:

Proposition 3 (Archimedean control of ${S_n}$) With ${S_n}$ as above, and any embedding ${\iota_\infty: \overline{{\bf Q}} \rightarrow {\bf C}}$, we have

$\displaystyle |\iota_\infty(S_n)|_\infty \leq C q^{A n}$

for all ${n}$ and some ${C, A >0}$ independent of ${n}$.

Proof: Since ${S_n}$ is a rational integer, ${|\iota_\infty(S_n)|_\infty}$ is just ${|S_n|_\infty}$. By decomposing ${V}$ into affine pieces, we may assume that ${V}$ is of the affine form (1), then we trivially have ${|S_n|_\infty \leq q^{nd}}$, and the claim follows. $\Box$

Another way of thinking about this Archimedean control is that it guarantees that the zeta function ${T \mapsto \zeta_V(T)}$ can be defined holomorphically on the open disk in ${{\bf C}_\infty}$ of radius ${q^{-A}}$ centred at the origin.

The ${p}$-adic control is significantly more difficult, and is the main component of Dwork’s argument:

Proposition 4 (${p}$-adic control of ${S_n}$) With ${S_n}$ as above, and using an embedding ${\iota_p: \overline{{\bf Q}} \rightarrow {\bf C}_p}$ (defined later) with ${p}$ the characteristic of ${{\bf F}_q}$, we can find for any real ${A > 0}$ a finite number of elements ${\alpha_1,\dots,\alpha_k,\beta_1,\dots,\beta_{k'} \in {\bf C}_p}$ such that

$\displaystyle |\iota_p(S_n) - (\alpha_1^n + \dots + \alpha_k^n - \beta_1^n - \dots - \beta_{k'}^n)|_p \leq q^{-An}$

for all ${n}$.

Another way of thinking about this ${p}$-adic control is that it guarantees that the zeta function ${T \mapsto \zeta_V(T)}$ can be defined meromorphically on the entire ${p}$-adic complex field ${{\bf C}_p}$.

Proposition 4 is ostensibly much weaker than Theorem 1 because of (a) the error term of ${p}$-adic magnitude at most ${Cq^{-An}}$; (b) the fact that the number ${k+k'}$ of potential characteristic values here may go to infinity as ${A \rightarrow \infty}$; and (c) the potential characteristic values ${\alpha_1,\dots,\alpha_k,\beta_1,\dots,\beta_{k'}}$ only exist inside the complexified ${p}$-adics ${{\bf C}_p}$, rather than in the algebraic integers ${O_{\overline{{\bf Q}}}}$. However, it turns out that by combining ${p}$-adic control on ${S_n}$ in Proposition 4 with the trivial control on ${S_n}$ in Proposition 3, one can obtain Theorem 1 by an elementary argument that does not use any further properties of ${S_n}$ (other than the obvious fact that the ${S_n}$ are rational integers), with the ${A}$ in Proposition 4 chosen to exceed the ${A}$ in Proposition 3. We give this argument (essentially due to Borel) below the fold.

The proof of Proposition 4 can be split into two pieces. The first piece, which can be viewed as the number-theoretic component of the proof, uses external descriptions of ${V}$ such as (1) to obtain the following decomposition of ${S_n}$:

Proposition 5 (Decomposition of ${S_n}$) With ${\iota_p}$ and ${S_n}$ as above, we can decompose ${\iota_p(S_n)}$ as a finite linear combination (over the integers) of sequences ${S'_n \in {\bf C}_p}$, such that for each such sequence ${n \mapsto S'_n}$, the zeta functions

$\displaystyle \zeta'(T) := \exp( \sum_{n=1}^\infty \frac{S'_n}{n} T^n ) = \sum_{n=0}^\infty c_n T^n$

are entire in ${{\bf C}_p}$, by which we mean that

$\displaystyle |c_n|_p^{1/n} \rightarrow 0$

as ${n \rightarrow \infty}$.

This proposition will ultimately be a consequence of the properties of the Teichmuller lifting ${\tau: \overline{{\bf F}_p}^\times \rightarrow {\bf C}_p^\times}$.

The second piece, which can be viewed as the “${p}$-adic complex analytic” component of the proof, relates the ${p}$-adic entire nature of a zeta function with control on the associated sequence ${S'_n}$, and can be interpreted (after some manipulation) as a ${p}$-adic version of the Weierstrass preparation theorem:

Proposition 6 (${p}$-adic Weierstrass preparation theorem) Let ${S'_n}$ be a sequence in ${{\bf C}_p}$, such that the zeta function

$\displaystyle \zeta'(T) := \exp( \sum_{n=1}^\infty \frac{S'_n}{n} T^n )$

is entire in ${{\bf C}_p}$. Then for any real ${A > 0}$, there exist a finite number of elements ${\beta_1,\dots,\beta_{k'} \in {\bf C}_p}$ such that

$\displaystyle |\iota_p(S'_n) + \beta_1^n + \dots + \beta_{k'}^n|_p \leq q^{-An}$

for all ${n}$ and some ${C>0}$.

Clearly, the combination of Proposition 5 and Proposition 6 (and the non-Archimedean nature of the ${||_p}$ norm) imply Proposition 4.

Let ${{\bf F}_q}$ be a finite field of order ${q = p^n}$, and let ${C}$ be an absolutely irreducible smooth projective curve defined over ${{\bf F}_q}$ (and hence over the algebraic closure ${k := \overline{{\bf F}_q}}$ of that field). For instance, ${C}$ could be the projective elliptic curve

$\displaystyle C = \{ [x,y,z]: y^2 z = x^3 + ax z^2 + b z^3 \}$

in the projective plane ${{\bf P}^2 = \{ [x,y,z]: (x,y,z) \neq (0,0,0) \}}$, where ${a,b \in {\bf F}_q}$ are coefficients whose discriminant ${-16(4a^3+27b^2)}$ is non-vanishing, which is the projective version of the affine elliptic curve

$\displaystyle \{ (x,y): y^2 = x^3 + ax + b \}.$

To each such curve ${C}$ one can associate a genus ${g}$, which we will define later; for instance, elliptic curves have genus ${1}$. We can also count the cardinality ${|C({\bf F}_q)|}$ of the set ${C({\bf F}_q)}$ of ${{\bf F}_q}$-points of ${C}$. The Hasse-Weil bound relates the two:

Theorem 1 (Hasse-Weil bound) ${||C({\bf F}_q)| - q - 1| \leq 2g\sqrt{q}}$.

The usual proofs of this bound proceed by first establishing a trace formula of the form

$\displaystyle |C({\bf F}_{p^n})| = p^n - \sum_{i=1}^{2g} \alpha_i^n + 1 \ \ \ \ \ (1)$

for some complex numbers ${\alpha_1,\dots,\alpha_{2g}}$ independent of ${n}$; this is in fact a special case of the Lefschetz-Grothendieck trace formula, and can be interpreted as an assertion that the zeta function associated to the curve ${C}$ is rational. The task is then to establish a bound ${|\alpha_i| \leq \sqrt{p}}$ for all ${i=1,\dots,2g}$; this (or more precisely, the slightly stronger assertion ${|\alpha_i| = \sqrt{p}}$) is the Riemann hypothesis for such curves. This can be done either by passing to the Jacobian variety of ${C}$ and using a certain duality available on the cohomology of such varieties, known as Rosati involution; alternatively, one can pass to the product surface ${C \times C}$ and apply the Riemann-Roch theorem for that surface.

In 1969, Stepanov introduced an elementary method (a version of what is now known as the polynomial method) to count (or at least to upper bound) the quantity ${|C({\bf F}_q)|}$. The method was initially restricted to hyperelliptic curves, but was soon extended to general curves. In particular, Bombieri used this method to give a short proof of the following weaker version of the Hasse-Weil bound:

Theorem 2 (Weak Hasse-Weil bound) If ${q}$ is a perfect square, and ${q \geq (g+1)^4}$, then ${|C({\bf F}_q)| \leq q + (2g+1) \sqrt{q} + 1}$.

In fact, the bound on ${|C({\bf F}_q)|}$ can be sharpened a little bit further, as we will soon see.

Theorem 2 is only an upper bound on ${|C({\bf F}_q)|}$, but there is a Galois-theoretic trick to convert (a slight generalisation of) this upper bound to a matching lower bound, and if one then uses the trace formula (1) (and the “tensor power trick” of sending ${n}$ to infinity to control the weights ${\alpha_i}$) one can then recover the full Hasse-Weil bound. We discuss these steps below the fold.

I’ve discussed Bombieri’s proof of Theorem 2 in this previous post (in the special case of hyperelliptic curves), but now wish to present the full proof, with some minor simplifications from Bombieri’s original presentation; it is mostly elementary, with the deepest fact from algebraic geometry needed being Riemann’s inequality (a weak form of the Riemann-Roch theorem).

The first step is to reinterpret ${|C({\bf F}_q)|}$ as the number of points of intersection between two curves ${C_1,C_2}$ in the surface ${C \times C}$. Indeed, if we define the Frobenius endomorphism ${\hbox{Frob}_q}$ on any projective space by

$\displaystyle \hbox{Frob}_q( [x_0,\dots,x_n] ) := [x_0^q, \dots, x_n^q]$

then this map preserves the curve ${C}$, and the fixed points of this map are precisely the ${{\bf F}_q}$ points of ${C}$:

$\displaystyle C({\bf F}_q) = \{ z \in C: \hbox{Frob}_q(z) = z \}.$

Thus one can interpret ${|C({\bf F}_q)|}$ as the number of points of intersection between the diagonal curve

$\displaystyle \{ (z,z): z \in C \}$

and the Frobenius graph

$\displaystyle \{ (z, \hbox{Frob}_q(z)): z \in C \}$

which are copies of ${C}$ inside ${C \times C}$. But we can use the additional hypothesis that ${q}$ is a perfect square to write this more symmetrically, by taking advantage of the fact that the Frobenius map has a square root

$\displaystyle \hbox{Frob}_q = \hbox{Frob}_{\sqrt{q}}^2$

with ${\hbox{Frob}_{\sqrt{q}}}$ also preserving ${C}$. One can then also interpret ${|C({\bf F}_q)|}$ as the number of points of intersection between the curve

$\displaystyle C_1 := \{ (z, \hbox{Frob}_{\sqrt{q}}(z)): z \in C \} \ \ \ \ \ (2)$

and its transpose

$\displaystyle C_2 := \{ (\hbox{Frob}_{\sqrt{q}}(w), w): w \in C \}.$

Let ${k(C \times C)}$ be the field of rational functions on ${C \times C}$ (with coefficients in ${k}$), and define ${k(C_1)}$, ${k(C_2)}$, and ${k(C_1 \cap C_2)}$ analogously )(although ${C_1 \cap C_2}$ is likely to be disconnected, so ${k(C_1 \cap C_2)}$ will just be a ring rather than a field. We then (morally) have the commuting square

$\displaystyle \begin{array}{ccccc} && k(C \times C) && \\ & \swarrow & & \searrow & \\ k(C_1) & & & & k(C_2) \\ & \searrow & & \swarrow & \\ && k(C_1 \cap C_2) && \end{array},$

if we ignore the issue that a rational function on, say, ${C \times C}$, might blow up on all of ${C_1}$ and thus not have a well-defined restriction to ${C_1}$. We use ${\pi_1: k(C \times C) \rightarrow k(C_1)}$ and ${\pi_2: k(C \times C) \rightarrow k(C_2)}$ to denote the restriction maps. Furthermore, we have obvious isomorphisms ${\iota_1: k(C_1) \rightarrow k(C)}$, ${\iota_2: k(C_2) \rightarrow k(C)}$ coming from composing with the graphing maps ${z \mapsto (z, \hbox{Frob}_{\sqrt{q}}(z))}$ and ${w \mapsto (\hbox{Frob}_{\sqrt{q}}(w), w)}$.

The idea now is to find a rational function ${f \in k(C \times C)}$ on the surface ${C \times C}$ of controlled degree which vanishes when restricted to ${C_1}$, but is non-vanishing (and not blowing up) when restricted to ${C_2}$. On ${C_2}$, we thus get a non-zero rational function ${f \downharpoonright_{C_2}}$ of controlled degree which vanishes on ${C_1 \cap C_2}$ – which then lets us bound the cardinality of ${C_1 \cap C_2}$ in terms of the degree of ${f \downharpoonright_{C_2}}$. (In Bombieri’s original argument, one required vanishing to high order on the ${C_1}$ side, but in our presentation, we have factored out a ${\hbox{Frob}_{\sqrt{q}}}$ term which removes this high order vanishing condition.)

To find this ${f}$, we will use linear algebra. Namely, we will locate a finite-dimensional subspace ${V}$ of ${k(C \times C)}$ (consisting of certain “controlled degree” rational functions) which projects injectively to ${k(C_2)}$, but whose projection to ${k(C_1)}$ has strictly smaller dimension than ${V}$ itself. The rank-nullity theorem then forces the existence of a non-zero element ${P}$ of ${V}$ whose projection to ${k(C_1)}$ vanishes, but whose projection to ${k(C_2)}$ is non-zero.

Now we build ${V}$. Pick a ${{\bf F}_q}$ point ${P_\infty}$ of ${C}$, which we will think of as being a point at infinity. (For the purposes of proving Theorem 2, we may clearly assume that ${C({\bf F}_q)}$ is non-empty.) Thus ${P_\infty}$ is fixed by ${\hbox{Frob}_q}$. To simplify the exposition, we will also assume that ${P_\infty}$ is fixed by the square root ${\hbox{Frob}_{\sqrt{q}}}$ of ${\hbox{Frob}_q}$; in the opposite case when ${\hbox{Frob}_{\sqrt{q}}}$ has order two when acting on ${P_\infty}$, the argument is essentially the same, but all references to ${P_\infty}$ in the second factor of ${C \times C}$ need to be replaced by ${\hbox{Frob}_{\sqrt{q}} P_\infty}$ (we leave the details to the interested reader).

For any natural number ${n}$, define ${R_n}$ to be the set of rational functions ${f \in k(C)}$ which are allowed to have a pole of order up to ${n}$ at ${P_\infty}$, but have no other poles on ${C}$; note that as we are assuming ${C}$ to be smooth, it is unambiguous what a pole is (and what order it will have). (In the fancier language of divisors and Cech cohomology, we have ${R_n = H^0( C, {\mathcal O}_C(-n P_\infty) )}$.) The space ${R_n}$ is clearly a vector space over ${k}$; one can view intuitively as the space of “polynomials” on ${C}$ of “degree” at most ${n}$. When ${n=0}$, ${R_0}$ consists just of the constant functions. Indeed, if ${f \in R_0}$, then the image ${f(C)}$ of ${f}$ avoids ${\infty}$ and so lies in the affine line ${k = {\mathbf P}^1 \backslash \{\infty\}}$; but as ${C}$ is projective, the image ${f(C)}$ needs to be compact (hence closed) in ${{\mathbf P}^1}$, and must therefore be a point, giving the claim.

For higher ${n \geq 1}$, we have the easy relations

$\displaystyle \hbox{dim}(R_{n-1}) \leq \hbox{dim}(R_n) \leq \hbox{dim}(R_{n-1})+1. \ \ \ \ \ (3)$

The former inequality just comes from the trivial inclusion ${R_{n-1} \subset R_n}$. For the latter, observe that if two functions ${f, g}$ lie in ${R_n}$, so that they each have a pole of order at most ${n}$ at ${P_\infty}$, then some linear combination of these functions must have a pole of order at most ${n-1}$ at ${P_\infty}$; thus ${R_{n-1}}$ has codimension at most one in ${R_n}$, giving the claim.

From (3) and induction we see that each of the ${R_n}$ are finite dimensional, with the trivial upper bound

$\displaystyle \hbox{dim}(R_n) \leq n+1. \ \ \ \ \ (4)$

Riemann’s inequality complements this with the lower bound

$\displaystyle \hbox{dim}(R_n) \geq n+1-g, \ \ \ \ \ (5)$

thus one has ${\hbox{dim}(R_n) = \hbox{dim}(R_{n-1})+1}$ for all but at most ${g}$ exceptions (in fact, exactly ${g}$ exceptions as it turns out). This is a consequence of the Riemann-Roch theorem; it can be proven from abstract nonsense (the snake lemma) if one defines the genus ${g}$ in a non-standard fashion (as the dimension of the first Cech cohomology ${H^1(C)}$ of the structure sheaf ${{\mathcal O}_C}$ of ${C}$), but to obtain this inequality with a standard definition of ${g}$ (e.g. as the dimension of the zeroth Cech cohomolgy ${H^0(C, \Omega_C^1)}$ of the line bundle of differentials) requires the more non-trivial tool of Serre duality.

At any rate, now that we have these vector spaces ${R_n}$, we will define ${V \subset k(C \times C)}$ to be a tensor product space

$\displaystyle V = R_\ell \otimes R_m$

for some natural numbers ${\ell, m \geq 0}$ which we will optimise in later. That is to say, ${V}$ is spanned by functions of the form ${(z,w) \mapsto f(z) g(w)}$ with ${f \in R_\ell}$ and ${g \in R_m}$. This is clearly a linear subspace of ${k(C \times C)}$ of dimension ${\hbox{dim}(R_\ell) \hbox{dim}(R_m)}$, and hence by Rieman’s inequality we have

$\displaystyle \hbox{dim}(V) \geq (\ell+1-g) (m+1-g) \ \ \ \ \ (6)$

if

$\displaystyle \ell,m \geq g-1. \ \ \ \ \ (7)$

Observe that ${\iota_1 \circ \pi_1}$ maps a tensor product ${(z,w) \mapsto f(z) g(w)}$ to a function ${z \mapsto f(z) g(\hbox{Frob}_{\sqrt{q}} z)}$. If ${f \in R_\ell}$ and ${g \in R_m}$, then we see that the function ${z \mapsto f(z) g(\hbox{Frob}_{\sqrt{q}} z)}$ has a pole of order at most ${\ell+m\sqrt{q}}$ at ${P_\infty}$. We conclude that

$\displaystyle \iota_1 \circ \pi_1( V ) \subset R_{\ell + m\sqrt{q}} \ \ \ \ \ (8)$

and in particular by (4)

$\displaystyle \hbox{dim}(\pi_1(V)) \leq \ell + m \sqrt{q} + 1 \ \ \ \ \ (9)$

and similarly

$\displaystyle \hbox{dim}(\pi_2(V)) \leq \ell \sqrt{q} + m + 1. \ \ \ \ \ (10)$

We will choose ${m}$ to be a bit bigger than ${\ell}$, to make the ${\pi_2}$ image of ${V}$ smaller than that of ${\pi_1}$. From (6), (10) we see that if we have the inequality

$\displaystyle (\ell+1-g) (m+1-g) > \ell \sqrt{q}+m + 1 \ \ \ \ \ (11)$

(together with (7)) then ${\pi_2}$ cannot be injective.

On the other hand, we have the following basic fact:

Lemma 3 (Injectivity) If

$\displaystyle \ell < \sqrt{q}, \ \ \ \ \ (12)$

then ${\pi_1: V \rightarrow \pi_1(V)}$ is injective.

Proof: From (3), we can find a linear basis ${f_1,\dots,f_a}$ of ${R_\ell}$ such that each of the ${f_i}$ has a distinct order ${d_i}$ of pole at ${P_\infty}$ (somewhere between ${0}$ and ${\ell}$ inclusive). Similarly, we may find a linear basis ${g_1,\dots,g_b}$ of ${R_m}$ such that each of the ${g_j}$ has a distinct order ${e_j}$ of pole at ${P_\infty}$ (somewhere between ${0}$ and ${m}$ inclusive). The functions ${z \mapsto f_i(z) g_j(\hbox{Frob}_{\sqrt{q}} z)}$ then span ${\iota_1(\pi_1(V))}$, and the order of pole at ${P_\infty}$ is ${d_i + \sqrt{q} e_j}$. But since ${\ell < \sqrt{q}}$, these orders are all distinct, and so these functions must be linearly independent. The claim follows. $\Box$

This gives us the following bound:

Proposition 4 Let ${\ell,m}$ be natural numbers such that (7), (11), (12) hold. Then ${|C({\bf F}_q)| \leq \ell + m \sqrt{q}}$.

Proof: As ${\pi_2}$ is not injective, we can find ${f \in V}$ with ${\pi_2(f)}$ vanishing. By the above lemma, the function ${\iota_1(\pi_1(f))}$ is then non-zero, but it must also vanish on ${\iota_1(C_1 \cap C_2)}$, which has cardinality ${|C({\bf F}_q)|}$. On the other hand, by (8), ${\iota_1(\pi_1(f))}$ has a pole of order at most ${\ell+m\sqrt{q}}$ at ${P_\infty}$ and no other poles. Since the number of poles and zeroes of a rational function on a projective curve must add up to zero, the claim follows. $\Box$

If ${q \geq (g+1)^4}$, we may make the explicit choice

$\displaystyle m := \sqrt{q}+2g; \quad \ell := \lfloor \frac{g}{g+1} \sqrt{q} \rfloor + g + 1$

and a brief calculation then gives Theorem 2. In some cases one can optimise things a bit further. For instance, in the genus zero case ${g=0}$ (e.g. if ${C}$ is just the projective line ${{\mathbf P}^1}$) one may take ${\ell=1, m = \sqrt{q}}$ and conclude the absolutely sharp bound ${|C({\bf F}_q)| \leq q+1}$ in this case; in the case of the projective line ${{\mathbf P}^1}$, the function ${f}$ is in fact the very concrete function ${f(z,w) := z - w^{\sqrt{q}}}$.

Remark 1 When ${q = p^{2n+1}}$ is not a perfect square, one can try to run the above argument using the factorisation ${\hbox{Frob}_q = \hbox{Frob}_{p^n} \hbox{Frob}_{p^{n+1}}}$ instead of ${\hbox{Frob}_q = \hbox{Frob}_{\sqrt{q}} \hbox{Frob}_{\sqrt{q}}}$. This gives a weaker version of the above bound, of the shape ${|C({\bf F}_q)| \leq q + O( \sqrt{p} \sqrt{q} )}$. In the hyperelliptic case at least, one can erase this loss by working with a variant of the argument in which one requires ${f}$ to vanish to high order at ${C_1}$, rather than just to first order; see this survey article of mine for details.

This is the eighth thread for the Polymath8b project to obtain new bounds for the quantity

$\displaystyle H_m := \liminf_{n \rightarrow\infty} (p_{n+m} - p_n),$

either for small values of ${m}$ (in particular ${m=1,2}$) or asymptotically as ${m \rightarrow \infty}$. The previous thread may be found here. The currently best known bounds on ${H_m}$ can be found at the wiki page.

The big news since the last thread is that we have managed to obtain the (sieve-theoretically) optimal bound of ${H_1 \leq 6}$ assuming the generalised Elliott-Halberstam conjecture (GEH), which pretty much closes off that part of the story. Unconditionally, our bound on ${H_1}$ is still ${H_1 \leq 270}$. This bound was obtained using the “vanilla” Maynard sieve, in which the cutoff ${F}$ was supported in the original simplex ${\{ t_1+\dots+t_k \leq 1\}}$, and only Bombieri-Vinogradov was used. In principle, we can enlarge the sieve support a little bit further now; for instance, we can enlarge to ${\{ t_1+\dots+t_k \leq \frac{k}{k-1} \}}$, but then have to shrink the J integrals to ${\{t_1+\dots+t_{k-1} \leq 1-\epsilon\}}$, provided that the marginals vanish for ${\{ t_1+\dots+t_{k-1} \geq 1+\epsilon \}}$. However, we do not yet know how to numerically work with these expanded problems.

Given the substantial progress made so far, it looks like we are close to the point where we should declare victory and write up the results (though we should take one last look to see if there is any room to improve the ${H_1 \leq 270}$ bounds). There is actually a fair bit to write up:

• Improvements to the Maynard sieve (pushing beyond the simplex, the epsilon trick, and pushing beyond the cube);
• Asymptotic bounds for ${M_k}$ and hence ${H_m}$;
• Explicit bounds for ${H_m, m \geq 2}$ (using the Polymath8a results)
• ${H_1 \leq 270}$;
• ${H_1 \leq 6}$ on GEH (and parity obstructions to any further improvement).

I will try to create a skeleton outline of such a paper in the Polymath8 Dropbox folder soon. It shouldn’t be nearly as big as the Polymath8a paper, but it will still be quite sizeable.

There are multiple purposes to this blog post.

The first purpose is to announce the uploading of the paper “New equidistribution estimates of Zhang type, and bounded gaps between primes” by D.H.J. Polymath, which is the main output of the Polymath8a project on bounded gaps between primes, to the arXiv, and to describe the main results of this paper below the fold.

The second purpose is to roll over the previous thread on all remaining Polymath8a-related matters (e.g. updates on the submission status of the paper) to a fresh thread. (Discussion of the ongoing Polymath8b project is however being kept on a separate thread, to try to reduce confusion.)

The final purpose of this post is to coordinate the writing of a retrospective article on the Polymath8 experience, which has been solicited for the Newsletter of the European Mathematical Society. I suppose that this could encompass both the Polymath8a and Polymath8b projects, even though the second one is still ongoing (but I think we will soon be entering the endgame there). I think there would be two main purposes of such a retrospective article. The first one would be to tell a story about the process of conducting mathematical research, rather than just describe the outcome of such research; this is an important aspect of the subject which is given almost no attention in most mathematical writing, and it would be good to be able to capture some sense of this process while memories are still relatively fresh. The other would be to draw some tentative conclusions with regards to what the strengths and weaknesses of a Polymath project are, and how appropriate such a format would be for other mathematical problems than bounded gaps between primes. In my opinion, the bounded gaps problem had some fairly unique features that made it particularly amenable to a Polymath project, such as (a) a high level of interest amongst the mathematical community in the problem; (b) a very focused objective (“improve ${H}$!”), which naturally provided an obvious metric to measure progress; (c) the modular nature of the project, which allowed for people to focus on one aspect of the problem only, and still make contributions to the final goal; and (d) a very reasonable level of ambition (for instance, we did not attempt to prove the twin prime conjecture, which in my opinion would make a terrible Polymath project at our current level of mathematical technology). This is not an exhaustive list of helpful features of the problem; I would welcome other diagnoses of the project by other participants.

With these two objectives in mind, I propose a format for the retrospective article consisting of a brief introduction to the polymath concept in general and the polymath8 project in particular, followed by a collection of essentially independent contributions by different participants on their own experiences and thoughts. Finally we could have a conclusion section in which we make some general remarks on the polymath project (such as the remarks above). I’ve started a dropbox subfolder for this article (currently in a very skeletal outline form only), and will begin writing a section on my own experiences; other participants are of course encouraged to add their own sections (it is probably best to create separate files for these, and then input them into the main file retrospective.tex, to reduce edit conflicts. If there are participants who wish to contribute but do not currently have access to the Dropbox folder, please email me and I will try to have you added (or else you can supply your thoughts by email, or in the comments to this post; we may have a section for shorter miscellaneous comments from more casual participants, for people who don’t wish to write a lengthy essay on the subject).

As for deadlines, the EMS Newsletter would like a submitted article by mid-April in order to make the June issue, but in the worst case, it will just be held over until the issue after that.

This is the seventh thread for the Polymath8b project to obtain new bounds for the quantity

$\displaystyle H_m := \liminf_{n \rightarrow\infty} (p_{n+m} - p_n),$

either for small values of ${m}$ (in particular ${m=1,2}$) or asymptotically as ${m \rightarrow \infty}$. The previous thread may be found here. The currently best known bounds on ${H_m}$ can be found at the wiki page.

The current focus is on improving the upper bound on ${H_1}$ under the assumption of the generalised Elliott-Halberstam conjecture (GEH) from ${H_1 \leq 8}$ to ${H_1 \leq 6}$. Very recently, we have been able to exploit GEH more fully, leading to a promising new expansion of the sieve support region. The problem now reduces to the following:

Problem 1 Does there exist a (not necessarily convex) polytope ${R \subset [0,2]^3}$ with quantities ${0 \leq \varepsilon_1,\varepsilon_2,\varepsilon_3 \leq 1}$, and a non-trivial square-integrable function ${F: {\bf R}^3 \rightarrow {\bf R}}$ supported on ${R}$ such that

• ${R + R \subset \{ (x,y,z) \in [0,4]^3: \min(x+y,y+z,z+x) \leq 2 \},}$
• ${\int_0^\infty F(x,y,z)\ dx = 0}$ when ${y+z \geq 1+\varepsilon_1}$;
• ${\int_0^\infty F(x,y,z)\ dy = 0}$ when ${x+z \geq 1+\varepsilon_2}$;
• ${\int_0^\infty F(x,y,z)\ dz = 0}$ when ${x+y \geq 1+\varepsilon_3}$;

and such that we have the inequality

$\displaystyle \int_{y+z \leq 1-\varepsilon_1} (\int_{\bf R} F(x,y,z)\ dx)^2\ dy dz$

$\displaystyle + \int_{z+x \leq 1-\varepsilon_2} (\int_{\bf R} F(x,y,z)\ dy)^2\ dz dx$

$\displaystyle + \int_{x+y \leq 1-\varepsilon_3} (\int_{\bf R} F(x,y,z)\ dz)^2\ dx dy$

$\displaystyle > 2 \int_R F(x,y,z)^2\ dx dy dz?$

An affirmative answer to this question will imply ${H_1 \leq 6}$ on GEH. We are “within two percent” of this claim; we cannot quite reach ${2}$ yet, but have got as far as ${1.962998}$. However, we have not yet fully optimised ${F}$ in the above problem. In particular, the simplex

$\displaystyle R = \{ (x,y,z) \in [0,2]^3: x+y+z \leq 3/2 \}$

is now available, and should lead to some noticeable improvement in the numerology.

There is also a very slim chance that the twin prime conjecture is now provable on GEH. It would require an affirmative solution to the following problem:

Problem 2 Does there exist a (not necessarily convex) polytope ${R \subset [0,2]^2}$ with quantities ${0 \leq \varepsilon_1,\varepsilon_2 \leq 1}$, and a non-trivial square-integrable function ${F: {\bf R}^2 \rightarrow {\bf R}}$ supported on ${R}$ such that

• ${R + R \subset \{ (x,y) \in [0,4]^2: \min(x,y) \leq 2 \}}$

$\displaystyle = [0,2] \times [0,4] \cup [0,4] \times [0,2],$

• ${\int_0^\infty F(x,y)\ dx = 0}$ when ${y \geq 1+\varepsilon_1}$;
• ${\int_0^\infty F(x,y)\ dy = 0}$ when ${x \geq 1+\varepsilon_2}$;

and such that we have the inequality

$\displaystyle \int_{y \leq 1-\varepsilon_1} (\int_{\bf R} F(x,y)\ dx)^2\ dy$

$\displaystyle + \int_{x \leq 1-\varepsilon_2} (\int_{\bf R} F(x,y)\ dy)^2\ dx$

$\displaystyle > 2 \int_R F(x,y)^2\ dx dy?$

We suspect that the answer to this question is negative, but have not formally ruled it out yet.

For the rest of this post, I will justify why positive answers to these sorts of variational problems are sufficient to get bounds on ${H_1}$ (or more generally ${H_m}$).

This is the fourth thread for the Polymath8b project to obtain new bounds for the quantity

$\displaystyle H_m := \liminf_{n \rightarrow\infty} (p_{n+m} - p_n),$

either for small values of ${m}$ (in particular ${m=1,2}$) or asymptotically as ${m \rightarrow \infty}$. The previous thread may be found here. The currently best known bounds on ${H_m}$ are:

• (Maynard) Assuming the Elliott-Halberstam conjecture, ${H_1 \leq 12}$.
• (Polymath8b, tentative) ${H_1 \leq 272}$. Assuming Elliott-Halberstam, ${H_2 \leq 272}$.
• (Polymath8b, tentative) ${H_2 \leq 429{,}822}$. Assuming Elliott-Halberstam, ${H_4 \leq 493{,}408}$.
• (Polymath8b, tentative) ${H_3 \leq 26{,}682{,}014}$. (Presumably a comparable bound also holds for ${H_6}$ on Elliott-Halberstam, but this has not been computed.)
• (Polymath8b) ${H_m \leq \exp( 3.817 m )}$ for sufficiently large ${m}$. Assuming Elliott-Halberstam, ${H_m \ll m e^{2m}}$ for sufficiently large ${m}$.

While the ${H_1}$ bound on the Elliott-Halberstam conjecture has not improved since the start of the Polymath8b project, there is reason to hope that it will soon fall, hopefully to ${8}$. This is because we have begun to exploit more fully the fact that when using “multidimensional Selberg-GPY” sieves of the form

$\displaystyle \nu(n) := \sigma_{f,k}(n)^2$

with

$\displaystyle \sigma_{f,k}(n) := \sum_{d_1|n+h_1,\dots,d_k|n+h_k} \mu(d_1) \dots \mu(d_k) f( \frac{\log d_1}{\log R},\dots,\frac{\log d_k}{\log R}),$

where ${R := x^{\theta/2}}$, it is not necessary for the smooth function ${f: [0,+\infty)^k \rightarrow {\bf R}}$ to be supported on the simplex

$\displaystyle {\cal R}_k := \{ (t_1,\dots,t_k)\in [0,1]^k: t_1+\dots+t_k \leq 1\},$

but can in fact be allowed to range on larger sets. First of all, ${f}$ may instead be supported on the slightly larger polytope

$\displaystyle {\cal R}'_k := \{ (t_1,\dots,t_k)\in [0,1]^k: t_1+\dots+t_{j-1}+t_{j+1}+\dots+t_k \leq 1$

$\displaystyle \hbox{ for all } j=1,\dots,k\}.$

However, it turns out that more is true: given a sufficiently general version of the Elliott-Halberstam conjecture ${EH[\theta]}$ at the given value of ${\theta}$, one may work with functions ${f}$ supported on more general domains ${R}$, so long as the sumset ${R+R := \{ t+t': t,t'\in R\}}$ is contained in the non-convex region

$\displaystyle \bigcup_{j=1}^k \{ (t_1,\dots,t_k)\in [0,\frac{2}{\theta}]^k: t_1+\dots+t_{j-1}+t_{j+1}+\dots+t_k \leq 2 \} \cup \frac{2}{\theta} \cdot {\cal R}_k, \ \ \ \ \ (1)$

and also provided that the restriction

$\displaystyle (t_1,\dots,t_{j-1},t_{j+1},\dots,t_k) \mapsto f(t_1,\dots,t_{j-1},0,t_{j+1},\dots,t_k) \ \ \ \ \ (2)$

is supported on the simplex

$\displaystyle {\cal R}_{k-1} := \{ (t_1,\dots,t_{j-1},t_{j+1},\dots,t_k)\in [0,1]^{k-1}:$

$\displaystyle t_1+\dots+t_{j-1}+t_{j+1}+\dots t_k \leq 1\}.$

More precisely, if ${f}$ is a smooth function, not identically zero, with the above properties for some ${R}$, and the ratio

$\displaystyle \sum_{j=1}^k \int_{{\cal R}_{k-1}} f_{1,\dots,j-1,j+1,\dots,k}(t_1,\dots,t_{j-1},0,t_{j+1},\dots,t_k)^2 \ \ \ \ \ (3)$

$\displaystyle dt_1 \dots dt_{j-1} dt_{j+1} \dots dt_k$

$\displaystyle / \int_R f_{1,\dots,k}^2(t_1,\dots,t_k)\ dt_1 \dots dt_k$

is larger than ${\frac{2m}{\theta}}$, then the claim ${DHL[k,m+1]}$ holds (assuming ${EH[\theta]}$), and in particular ${H_m \leq H(k)}$.

I’ll explain why one can do this below the fold. Taking this for granted, we can rewrite this criterion in terms of the mixed derivative ${F := f_{1,\dots,k}}$, the upshot being that if one can find a smooth function ${F}$ supported on ${R}$ that obeys the vanishing marginal conditions

$\displaystyle \int F( t_1,\dots,t_k )\ dt_j = 0$

whenever ${1 \leq j \leq k}$ and ${t_1+\dots+t_{j-1}+t_{j+1}+\dots+t_k > 1}$, and the ratio

$\displaystyle \frac{\sum_{j=1}^k J_k^{(j)}(F)}{I_k(F)} \ \ \ \ \ (4)$

is larger than ${\frac{2m}{\theta}}$, where

$\displaystyle I_k(F) := \int_R F(t_1,\dots,t_k)^2\ dt_1 \dots dt_k$

and

$\displaystyle J_k^{(j)}(F) := \int_{{\cal R}_{k-1}} (\int_0^{1/\theta} F(t_1,\dots,t_k)\ dt_j)^2 dt_1 \dots dt_{j-1} dt_{j+1} \dots dt_k$

then ${DHL[k,m+1]}$ holds. (To equate these two formulations, it is convenient to assume that ${R}$ is a downset, in the sense that whenever ${(t_1,\dots,t_k) \in R}$, the entire box ${[0,t_1] \times \dots \times [0,t_k]}$ lie in ${R}$, but one can easily enlarge ${R}$ to be a downset without destroying the containment of ${R+R}$ in the non-convex region (1).) One initially requires ${F}$ to be smooth, but a limiting argument allows one to relax to bounded measurable ${F}$. (To approximate a rough ${F}$ by a smooth ${F}$ while retaining the required moment conditions, one can first apply a slight dilation and translation so that the marginals of ${F}$ are supported on a slightly smaller version of the simplex ${{\cal R}_{k-1}}$, and then convolve by a smooth approximation to the identity to make ${F}$ smooth, while keeping the marginals supported on ${{\cal R}_{k-1}}$.)

We are now exploring various choices of ${R}$ to work with, including the prism

$\displaystyle \{ (t_1,\dots,t_k) \in [0,1/\theta]^k: t_1+\dots+t_{k-1} \leq 1 \}$

and the symmetric region

$\displaystyle \{ (t_1,\dots,t_k) \in [0,1/\theta]^k: t_1+\dots+t_k \leq \frac{k}{k-1} \}.$

By suitably subdividing these regions into polytopes, and working with piecewise polynomial functions ${F}$ that are polynomial of a specified degree on each subpolytope, one can phrase the problem of optimising (4) as a quadratic program, which we have managed to work with for ${k=3}$. Extending this program to ${k=4}$, there is a decent chance that we will be able to obtain ${DHL[4,2]}$ on EH.

We have also been able to numerically optimise ${M_k}$ quite accurately for medium values of ${k}$ (e.g. ${k \sim 50}$), which has led to improved values of ${H_1}$ without EH. For large ${k}$, we now also have the asymptotic ${M_k=\log k - O(1)}$ with explicit error terms (details here) which have allowed us to slightly improve the ${m=2}$ numerology, and also to get explicit ${m=3}$ numerology for the first time.

Mertens’ theorems are a set of classical estimates concerning the asymptotic distribution of the prime numbers:

Theorem 1 (Mertens’ theorems) In the asymptotic limit ${x \rightarrow \infty}$, we have

$\displaystyle \sum_{p\leq x} \frac{\log p}{p} = \log x + O(1), \ \ \ \ \ (1)$

$\displaystyle \sum_{p\leq x} \frac{1}{p} = \log \log x + O(1), \ \ \ \ \ (2)$

and

$\displaystyle \sum_{p\leq x} \log(1-\frac{1}{p}) = -\log \log x - \gamma + o(1) \ \ \ \ \ (3)$

where ${\gamma}$ is the Euler-Mascheroni constant, defined by requiring that

$\displaystyle 1 + \frac{1}{2} + \ldots + \frac{1}{n} = \log n + \gamma + o(1) \ \ \ \ \ (4)$

in the limit ${n \rightarrow \infty}$.

The third theorem (3) is usually stated in exponentiated form

$\displaystyle \prod_{p \leq x} (1-\frac{1}{p}) = \frac{e^{-\gamma}+o(1)}{\log x},$

but in the logarithmic form (3) we see that it is strictly stronger than (2), in view of the asymptotic ${\log(1-\frac{1}{p}) = -\frac{1}{p} + O(\frac{1}{p^2})}$.

Remarkably, these theorems can be proven without the assistance of the prime number theorem

$\displaystyle \sum_{p \leq x} 1 = \frac{x}{\log x} + o( \frac{x}{\log x} ),$

which was proven about two decades after Mertens’ work. (But one can certainly use versions of the prime number theorem with good error term, together with summation by parts, to obtain good estimates on the various errors in Mertens’ theorems.) Roughly speaking, the reason for this is that Mertens’ theorems only require control on the Riemann zeta function ${\zeta(s) = \sum_{n=1}^\infty \frac{1}{n^s}}$ in the neighbourhood of the pole at ${s=1}$, whereas (as discussed in this previous post) the prime number theorem requires control on the zeta function on (a neighbourhood of) the line ${\{ 1+it: t \in {\bf R} \}}$. Specifically, Mertens’ theorem is ultimately deduced from the Euler product formula

$\displaystyle \zeta(s) = \prod_p (1-\frac{1}{p^s})^{-1}, \ \ \ \ \ (5)$

valid in the region ${\hbox{Re}(s) > 1}$ (which is ultimately a Fourier-Dirichlet transform of the fundamental theorem of arithmetic), and following crude asymptotics:

Proposition 2 (Simple pole) For ${s}$ sufficiently close to ${1}$ with ${\hbox{Re}(s) > 1}$, we have

$\displaystyle \zeta(s) = \frac{1}{s-1} + O(1) \ \ \ \ \ (6)$

and

$\displaystyle \zeta'(s) = \frac{-1}{(s-1)^2} + O(1).$

Proof: For ${s}$ as in the proposition, we have ${\frac{1}{n^s} = \frac{1}{t^s} + O(\frac{1}{n^2})}$ for any natural number ${n}$ and ${n \leq t \leq n+1}$, and hence

$\displaystyle \frac{1}{n^s} = \int_n^{n+1} \frac{1}{t^s}\ dt + O( \frac{1}{n^2} ).$

Summing in ${n}$ and using the identity ${\int_1^\infty \frac{1}{t^s}\ dt = \frac{1}{s-1}}$, we obtain the first claim. Similarly, we have

$\displaystyle \frac{-\log n}{n^s} = \int_n^{n+1} \frac{-\log t}{t^s}\ dt + O( \frac{\log n}{n^2} ),$

and by summing in ${n}$ and using the identity ${\int_1^\infty \frac{-\log t}{t^s}\ dt = \frac{-1}{(s-1)^2}}$ (the derivative of the previous identity) we obtain the claim. $\Box$

The first two of Mertens’ theorems (1), (2) are relatively easy to prove, and imply the third theorem (3) except with ${\gamma}$ replaced by an unspecified absolute constant. To get the specific constant ${\gamma}$ requires a little bit of additional effort. From (4), one might expect that the appearance of ${\gamma}$ arises from the refinement

$\displaystyle \zeta(s) = \frac{1}{s-1} + \gamma + O(|s-1|) \ \ \ \ \ (7)$

that one can obtain to (6). However, it turns out that the connection is not so much with the zeta function, but with the Gamma function, and specifically with the identity ${\Gamma'(1) = - \gamma}$ (which is of course related to (7) through the functional equation for zeta, but can be proven without any reference to zeta functions). More specifically, we have the following asymptotic for the exponential integral:

Proposition 3 (Exponential integral asymptotics) For sufficiently small ${\epsilon}$, one has

$\displaystyle \int_\epsilon^\infty \frac{e^{-t}}{t}\ dt = \log \frac{1}{\epsilon} - \gamma + O(\epsilon).$

A routine integration by parts shows that this asymptotic is equivalent to the identity

$\displaystyle \int_0^\infty e^{-t} \log t\ dt = -\gamma$

which is the identity ${\Gamma'(1)=-\gamma}$ mentioned previously.

Proof: We start by using the identity ${\frac{1}{i} = \int_0^1 x^{i-1}\ dx}$ to express the harmonic series ${H_n := 1+\frac{1}{2}+\ldots+\frac{1}{n}}$ as

$\displaystyle H_n = \int_0^1 1 + x + \ldots + x^{n-1}\ dx$

or on summing the geometric series

$\displaystyle H_n = \int_0^1 \frac{1-x^n}{1-x}\ dx.$

Since ${\int_0^{1-1/n} \frac{1}{1-x} = \log n}$, we thus have

$\displaystyle H_n - \log n = \int_0^1 \frac{1_{[1-1/n,1]}(x) - x^n}{1-x}\ dx;$

making the change of variables ${x = 1-\frac{t}{n}}$, this becomes

$\displaystyle H_n - \log n = \int_0^n \frac{1_{[0,1]}(t) - (1-\frac{t}{n})^n}{t}\ dt.$

As ${n \rightarrow \infty}$, ${\frac{1_{[0,1]}(t) - (1-\frac{t}{n})^n}{t}}$ converges pointwise to ${\frac{1_{[0,1]}(t) - e^{-t}}{t}}$ and is pointwise dominated by ${O( e^{-t} )}$. Taking limits as ${n \rightarrow \infty}$ using dominated convergence, we conclude that

$\displaystyle \gamma = \int_0^\infty \frac{1_{[0,1]}(t) - e^{-t}}{t}\ dt.$

or equivalently

$\displaystyle \int_0^\infty \frac{e^{-t} - 1_{[0,\epsilon]}(t)}{t}\ dt = \log \frac{1}{\epsilon} - \gamma.$

The claim then follows by bounding the ${\int_0^\epsilon}$ portion of the integral on the left-hand side. $\Box$

Below the fold I would like to record how Proposition 2 and Proposition 3 imply Theorem 1; the computations are utterly standard, and can be found in most analytic number theory texts, but I wanted to write them down for my own benefit (I always keep forgetting, in particular, how the third of Mertens’ theorems is proven).

### Recent Comments

 Terence Tao on Variants of the Selberg sieve,… anonymous on Variants of the Selberg sieve,… Eytan Paldi on Variants of the Selberg sieve,… Eytan Paldi on Variants of the Selberg sieve,… H on Variants of the Selberg sieve,… Eytan Paldi on Variants of the Selberg sieve,… H on Variants of the Selberg sieve,… Eytan Paldi on Variants of the Selberg sieve,… Dan Lacker on Real analysis relative to a fi… Gergely Harcos on Variants of the Selberg sieve,… Pace Nielsen on Variants of the Selberg sieve,… James Maynard on Variants of the Selberg sieve,… 089l8 on A remark on partial sums invol… 089l8 on A remark on partial sums invol… LB on Variants of the Selberg sieve,…