Theorem 1 (Roth’s theorem)Let be a compact abelian group, with Haar probability measure , which is -divisible (i.e. the map is surjective) and let be a measurable subset of with for some . Then we havewhere denotes the bound for some depending only on .

This theorem is usually formulated in the case that is a finite abelian group of odd order (in which case the result is essentially due to Meshulam) or more specifically a cyclic group of odd order (in which case it is essentially due to Varnavides), but is also valid for the more general setting of -divisible compact abelian groups, as we shall shortly see. One can be more precise about the dependence of the implied constant on , but to keep the exposition simple we will work at the qualitative level here, without trying at all to get good quantitative bounds. The theorem is also true without the -divisibility, but the proof we will discuss runs into some technical issues due to the degeneracy of the shift in that case.

We can deduce Theorem 1 from the following more general Khintchine-type statement. Let denote the Pontryagin dual of a compact abelian group , that is to say the set of all continuous homomorphisms from to the (additive) unit circle . Thus is a discrete abelian group, and functions have a Fourier transform defined by

If is -divisible, then is -torsion-free in the sense that the map is injective. For any finite set and any radius , define the *Bohr set*

where denotes the distance of to the nearest integer. We refer to the cardinality of as the *rank* of the Bohr set. We record a simple volume bound on Bohr sets:

Lemma 2 (Volume packing bound)Let be a -divisible compact abelian group with Haar probability measure . For any Bohr set , we have

*Proof:* We can cover the torus by translates of the cube . Then the sets form an cover of . But all of these sets lie in a translate of , and the claim then follows from the translation invariance of .

Given any Bohr set , we define a normalised “Lipschitz” cutoff function by the formula

where is the constant such that

thus

The function should be viewed as an -normalised “tent function” cutoff to . Note from Lemma 2 that

We then have the following sharper version of Theorem 1:

Theorem 3 (Roth-Khintchine theorem)Let be a compact abelian group, with Haar probability measure , and let . Then for any measurable function , there exists a Bohr set with and such thatwhere denotes the convolution operation

A variant of this result (expressed in the language of ergodic theory) appears in this paper of Bergelson, Host, and Kra; a combinatorial version of the Bergelson-Host-Kra result that is closer to Theorem 3 subsequently appeared in this paper of Ben Green and myself, but this theorem arguably appears implicitly in a much older paper of Bourgain. To see why Theorem 3 implies Theorem 1, we apply the theorem with and equal to a small multiple of to conclude that there is a Bohr set with and such that

But from (2) we have the pointwise bound , and Theorem 1 follows.

Below the fold, we give a short proof of Theorem 3, using an “energy pigeonholing” argument that essentially dates back to the 1986 paper of Bourgain mentioned previously (not to be confused with a later 1999 paper of Bourgain on Roth’s theorem that was highly influential, for instance in emphasising the importance of Bohr sets). The idea is to use the pigeonhole principle to choose the Bohr set to capture all the “large Fourier coefficients” of , but such that a certain “dilate” of does not capture much more Fourier energy of than itself. The bound (3) may then be obtained through elementary Fourier analysis, without much need to explicitly compute things like the Fourier transform of an indicator function of a Bohr set. (However, the bound obtained by this argument is going to be quite poor – of tower-exponential type.) To do this we perform a structural decomposition of into “structured”, “small”, and “highly pseudorandom” components, as is common in the subject (e.g. in this previous blog post), but even though we crucially need to retain non-negativity of one of the components in this decomposition, we can avoid recourse to conditional expectation with respect to a partition (or “factor”) of the space, using instead convolution with one of the considered above to achieve a similar effect.

** — 1. Proof of theorem — **

Fix . Let be a large integer (depending only on ) to be chosen later. We will need a sequence

of small parameters, with each assumed to be sufficiently small depending on all previous and on . (One should think of the as shrinking to zero incredibly fast, e.g. and for .)

We then define an increasing sequence

of finite sets of frequencies, as follows.

- (i) Initialise .
- (ii) Once has been selected for some , introduce the function
Note that this is a square-integrable function, thanks to (2).

- (iii) Define to be the set
- (iv) If , increment to and return to step (ii).

An easy induction using Plancherel’s theorem and (2) repeatedly gives the bounds

and

for all .

From Plancherel’s theorem we also have

As the are increasing, we thus see from the pigeonhole principle (if is sufficiently large depending on ) that there exists such that

We now decompose

where

One can think of as the “low frequency” (or “structured”, or “quasiperiodic”), “medium frequency” (or “small”), and “high frequency” (or “highly pseudorandom”) components of , where the notion of what it means for a frequency to be high or low is not determined by some preconceived notion of magnitude, but is instead adapted to the function itself. We will think of as the “main” term, and as “errors”.

Since is bounded in magnitude by , and are non-negative with total mass , we have the bounds

Now we give further bounds on . We begin with :

*Proof:* Let . From the definition of and elementary Fourier analysis we have

If , then from (5) we have

and the claim follows (since has total mass one). Now suppose instead that . Then from (4) we have

for all in the support of , and thus

and the claim again follows.

Now we control :

*Proof:* From the definition of and elementary Fourier analysis we have

We wish to bound this by . From (7) we see that the contribution of those in are acceptable. Now let us look at the contribution with . From (5) we have

for such , so this contribution to (9) is certainly acceptable thanks to (6). Now suppose that . The argument in the proof of Lemma 4 shows that and , and so this contribution is again acceptable thanks to (6).

Finally, we record the key properties of :

Lemma 6 ( continuous)is non-negative, has mean , and obeys the boundwhenever and .

*Proof:* The first two properties are obvious, so we turn to the third. Since , is bounded, and has total unit mass, it suffices to show that

whenever and . But from (1) and the triangle inequality we have

for such , and the claim follows from (2).

Now we can prove Theorem 3 with . Splitting as above, we can decompose the left-hand side of (3) into terms, each of the form

for some .

Let us consider a term for which . We can bound this term by

which is thanks to Lemma 5. Similar arguments hold for terms with or , after shifting by or .

Now let us consider a term for which , that is to say

A short Fourier-analytic calculation reveals that this integral may be written as a sum

From (2) we have

and thus by Plancherel

Similarly, from (6) we have

and

and hence by Cauchy-Schwarz and the -torsion-free nature of , one has

for all . Combining these estimates with Lemma 4, we conclude that this contribution to (3) is

Similar arguments dispose of the terms for which or .

The only remaining term is when , that is to say

From several applications of Lemma 6 we see that

for in the support of this integrand. Thus this term may be written as

which on evaluating the integral simplifies to

but by the non-negativity of and Hölder’s inequality, this is at least

and the claim follows.

Filed under: expository, math.CA, math.CO Tagged: additive combinatorics, Fourier transform, Roth's theorem ]]>

;

the previous thread may be found here.

Numerical progress on these bounds have slowed in recent months, although we have very recently lowered the unconditional bound on from 252 to 246 (see the wiki page for more detailed results). Â While there may still be scope for further improvement (particularly with respect to bounds for with , which we have not focused on for a while, it looks like we have reached the point of diminishing returns, and it is time to turn to the task of writing up the results.

A draft version of the paper so far may be found here (with the directory of source files here). Â Currently, the introduction and the sieve-theoretic portions of the paper are written up, although the sieve-theoretic arguments are surprisingly lengthy, and some simplification (or other reorganisation) may well be possible. Â Other portions of the paper that have not yet been written up include the asymptotic analysis of for large k (leading in particular to results for m=2,3,4,5), and a description of the quadratic programming that is used to estimate for small and medium k. Â Also we will eventually need an appendix to summarise the material from Polymath8a that we would use to generate various narrow admissible tuples.

One issue here is that our current unconditional bounds on for m=2,3,4,5 rely on a distributional estimate on the primes which we believed to be true in Polymath8a, but never actually worked out (among other things, there was some delicate algebraic geometry issues concerning the vanishing of certain cohomology groups that was never resolved). Â This issue does not affect the m=1 calculations, which only use the Bombieri-Vinogradov theorem or else assume the generalised Elliott-Halberstam conjecture. Â As such, we will have to rework the computations for these , given that the task of trying to attain the conjectured distributional estimate on the primes would be a significant amount of work that is rather disjoint from the rest of the Polymath8b writeup. Â One could simply dust off the old maple code for this (e.g. one could tweak the code here, with the constraint Â 1080*varpi/13+ 330*delta/13<1 Â being replaced by 600*varpi/7+180*delta/7<1), but there is also a chance that our asymptotic bounds for (currently given in messy detail here) could be sharpened. Â I plan to look at this issue fairly soon.

Also, there are a number of smaller observations (e.g. the parity problem barrier that prevents us from ever getting a better bound on than 6) that should also go into the paper at some point; the current outline of the paper as given in the draft is not necessarily comprehensive.

Filed under: polymath ]]>

The surface is said to be ruled if, for a Zariski open dense set of points , there exists a line through for some non-zero which is completely contained in , thus

for all . Also, a point is said to be a flecnode if there exists a line through for some non-zero which is tangent to to third order, in the sense that

for . Clearly, if is a ruled surface, then a Zariski open dense set of points on are a flecnode. We then have the remarkable theorem of Cayley and Salmon asserting the converse:

Theorem 1 (Cayley-Salmon theorem)Let be an irreducible polynomial with non-empty. Suppose that a Zariski dense set of points in are flecnodes. Then is a ruled surface.

Among other things, this theorem was used in the celebrated result of Guth and Katz that almost solved the Erdos distance problem in two dimensions, as discussed in this previous blog post. Vanishing to third order is necessary: observe that in a surface of negative curvature, such as the saddle , every point on the surface is tangent to second order to a line (the line in the direction for which the second fundamental form vanishes).

The original proof of the Cayley-Salmon theorem, dating back to at least 1915, is not easily accessible and not written in modern language. A modern proof of this theorem (together with substantial generalisations, for instance to higher dimensions) is given by Landsberg; the proof uses the machinery of modern algebraic geometry. The purpose of this post is to record an alternate proof of the Cayley-Salmon theorem based on classical differential geometry (in particular, the notion of torsion of a curve) and basic ODE methods (in particular, Gronwall’s inequality and the Picard existence theorem). The idea is to “integrate” the lines indicated by the flecnode to produce smooth curves on the surface ; one then uses the vanishing (1) and some basic calculus to conclude that these curves have zero torsion and are thus planar curves. Some further manipulation using (1) (now just to second order instead of third) then shows that these curves are in fact straight lines, giving the ruling on the surface.

Update: Janos Kollar has informed me that the above theorem was essentially known to Monge in 1809; see his recent arXiv note for more details.

I thank Larry Guth and Micha Sharir for conversations leading to this post.

** — 1. Proof — **

Let denote the smooth points of , then is a smooth surface that is a Zariski open dense subset of , and hence Zariski dense in . We consider the projective tangent bundle of ; this is a smooth three-dimensional manifold, which is a bundle of copies of the projective line over , with elements consisting of a point in and the projective class of a direction that is tangent to at and is non-zero. Since and are both irreducible varieties, it is easy to see that is also an irreducible variety.

Inside , we consider the subset of points which obey the flecnode condition (1) for . By hypothesis, the projection of to is Zariski dense. On the other hand, is clearly an algebraic set. Thus the dimension of is at least , and there is at least one component whose projection to is two-dimensional (i.e. is dominant). In particular we can find an irreducible algebraic surface in whose projection to is open dense (not just in the Zariski sense, but also in the differential geometry sense). By removing the singular points of , we may assume that is a smooth surface.

We now claim that the projection map is generically a local diffeomorphism, thus has full rank for a Zariski dense set of points in . This is a simple consequence of Sard’s theorem, but for our purposes it is also instructive to see an ODE proof: if fails to have full rank generically, then it must have rank one generically or rank zero generically. If it has rank one generically, one can use the Picard existence theorem to locally foliate an open dense subset of by curves with the property that for each , the derivative lies in the kernel of , so that if we write , then for all , and so is constant; thus the curves each lie in a single fibre of . This locally describes as a one-dimensional smooth family of curves inside the fibre of , and so the image is locally one-dimensional, contradicting the two-dimensional nature of . A similar argument works when has rank zero generically.

Since is a local diffeomorphism generically, we may apply the inverse function theorem to conclude that on an open dense subset of , we can locally invert this map, which in particular gives *smooth* local maps from open subsets of to unit tangent vectors at such that the flecnode condition (1) is satisfied for all such and .

By the Picard existence theorem, we may thus locally foliate by curves with the property that

for all ; thus has unit speed and is always tangent to a flecnode direction. Thus, by (1) we have

for . Expanding this out in coordinates by the chain rule (and using the usual summation conventions), using to denote the components of , and to denote the first partial derivatives of for , to denote the second partial derivatives, and so forth, we have

We can obtain further differential equations by differentiating the above equations in . For instance, if we differentiate (3) in we obtain

and hence by (4)

Similarly, if we differentiate (4) in we obtain

and hence by (5)

Finally, if we differentiate (6) in we obtain

and hence by (7)

The equations (3), (6), (8) have a simple geometric interpretation: the first three derivatives are all orthogonal to the gradient . Generically, this gradient is non-zero, and we are in three dimensions, so we conclude that are always coplanar. Equivalently, the torsion of the curve vanishes, and hence the curve is necessarily planar (locally, at least). Another way to see this is to start with the identity

where is the cross product, and conclude that is a scalar multiple of whenever it is non-vanishing, which by Gronwall’s inequality shows that has fixed orientation whenever it is non-vanishing.

So there is a plane in in which locally lies. If vanished on this plane, then , being irreducible, would be just and we would be done, so we may assume that is non-vanishing here, thus is at most one-dimensional. On the other hand, (3), (6) show that are both orthogonal to the gradient of restricted to , which is generically non-zero; as we now only have two dimensions, this implies that are parallel. Thus the curvature of now also vanishes, which implies that is a straight line. Hence we have locally foliated at least a small open neighbourhood in by straight lines, which ensures that is ruled as desired.

Filed under: expository, math.AG, math.DG Tagged: Cayley-Salmon theorem, flecnode, ruled surface ]]>

and difference sets

as well as iterated sumsets such as , , and so forth. Here, are finite non-empty subsets of some additive group (classically one took or , but nowadays one usually considers more general additive groups). Some basic estimates in this vein are the following:

Lemma 1 (Ruzsa covering lemma)Let be finite non-empty subsets of . Then may be covered by at most translates of .

*Proof:* Consider a maximal set of disjoint translates of by elements . These translates have cardinality , are disjoint, and lie in , so there are at most of them. By maximality, for any , must intersect at least one of the selected , thus , and the claim follows.

Lemma 2 (Ruzsa triangle inequality)Let be finite non-empty subsets of . Then .

*Proof:* Consider the addition map from to . Every element of has a preimage of this map of cardinality at least , thanks to the obvious identity for each . Since has cardinality , the claim follows.

Such estimates (which are covered, incidentally, in Section 2 of my book with Van Vu) are particularly useful for controlling finite sets of small doubling, in the sense that for some bounded . (There are deeper theorems, most notably Freiman’s theorem, which give more control than what elementary Ruzsa calculus does, however the known bounds in the latter theorem are worse than polynomial in (although it is conjectured otherwise), whereas the elementary estimates are almost all polynomial in .)

However, there are some settings in which the standard sum set estimates are not quite applicable. One such setting is the continuous setting, where one is dealing with bounded open sets in an additive Lie group (e.g. or a torus ) rather than a finite setting. Here, one can largely replicate the discrete sum set estimates by working with a Haar measure in place of cardinality; this is the approach taken for instance in this paper of mine. However, there is another setting, which one might dub the “discretised” setting (as opposed to the “discrete” setting or “continuous” setting), in which the sets remain finite (or at least discretisable to be finite), but for which there is a certain amount of “roundoff error” coming from the discretisation. As a typical example (working now in a non-commutative multiplicative setting rather than an additive one), consider the orthogonal group of orthogonal matrices, and let be the matrices obtained by starting with all of the orthogonal matrice in and rounding each coefficient of each matrix in this set to the nearest multiple of , for some small . This forms a finite set (whose cardinality grows as like a certain negative power of ). In the limit , the set is not a set of small doubling in the discrete sense. However, is still close to in a metric sense, being contained in the -neighbourhood of . Another key example comes from graphs of maps from a subset of one additive group to another . If is “approximately additive” in the sense that for all , is close to in some metric, then might not have small doubling in the discrete sense (because could take a large number of values), but could be considered a set of small doubling in a discretised sense.

One would like to have a sum set (or product set) theory that can handle these cases, particularly in “high-dimensional” settings in which the standard methods of passing back and forth between continuous, discrete, or discretised settings behave poorly from a quantitative point of view due to the exponentially large doubling constant of balls. One way to do this is to impose a translation invariant metric on the underlying group (reverting back to additive notation), and replace the notion of cardinality by that of metric entropy. There are a number of almost equivalent ways to define this concept:

Definition 3Let be a metric space, let be a subset of , and let be a radius.

- The
packing numberis the largest number of points one can pack inside such that the balls are disjoint.- The
internal covering numberis the fewest number of points such that the balls cover .- The
external covering numberis the fewest number of points such that the balls cover .- The
metric entropyis the largest number of points one can find in that are -separated, thus for all .

It is an easy exercise to verify the inequalities

for any , and that is non-increasing in and non-decreasing in for the three choices (but monotonicity in can fail for !). It turns out that the external covering number is slightly more convenient than the other notions of metric entropy, so we will abbreviate . The cardinality can be viewed as the limit of the entropies as .

If we have the bounded doubling property that is covered by translates of for each , and one has a Haar measure on which assigns a positive finite mass to each ball, then any of the above entropies is comparable to , as can be seen by simple volume packing arguments. Thus in the bounded doubling setting one can usually use the measure-theoretic sum set theory to derive entropy-theoretic sumset bounds (see e.g. this paper of mine for an example of this). However, it turns out that even in the absence of bounded doubling, one still has an entropy analogue of most of the elementary sum set theory, except that one has to accept some degradation in the radius parameter by some absolute constant. Such losses can be acceptable in applications in which the underlying sets are largely “transverse” to the balls , so that the -entropy of is largely independent of ; this is a situation which arises in particular in the case of graphs discussed above, if one works with “vertical” metrics whose balls extend primarily in the vertical direction. (I hope to present a specific application of this type here in the near future.)

Henceforth we work in an additive group equipped with a translation-invariant metric . (One can also generalise things slightly by allowing the metric to attain the values or , without changing much of the analysis below.) By the Heine-Borel theorem, any precompact set will have finite entropy for any . We now have analogues of the two basic Ruzsa lemmas above:

Lemma 4 (Ruzsa covering lemma)Let be precompact non-empty subsets of , and let . Then may be covered by at most translates of .

*Proof:* Let be a maximal set of points such that the sets are all disjoint. Then the sets are disjoint in and have entropy , and furthermore any ball of radius can intersect at most one of the . We conclude that , so . If , then must intersect one of the , so , and the claim follows.

Lemma 5 (Ruzsa triangle inequality)Let be precompact non-empty subsets of , and let . Then .

*Proof:* Consider the addition map from to . The domain may be covered by product balls . Every element of has a preimage of this map which projects to a translate of , and thus must meet at least of these product balls. However, if two elements of are separated by a distance of at least , then no product ball can intersect both preimages. We thus see that , and the claim follows.

Below the fold we will record some further metric entropy analogues of sum set estimates (basically redoing much of Chapter 2 of my book with Van Vu). Unfortunately there does not seem to be a direct way to abstractly deduce metric entropy results from their sum set analogues (basically due to the failure of a certain strong version of Freiman’s theorem, as discussed in this previous post); nevertheless, the proofs of the discrete arguments are elementary enough that they can be modified with a small amount of effort to handle the entropy case. (In fact, there should be a very general model-theoretic framework in which both the discrete and entropy arguments can be processed in a unified manner; see this paper of Hrushovski for one such framework.)

It is also likely that many of the arguments here extend to the non-commutative setting, but for simplicity we will not pursue such generalisations here.

** — 1. Approximate groups — **

In discrete sum set theory, a key concept is that of a *-approximate group* – a finite symmetric subset of containing the origin such that is covered by at most translates of . The analogous concept here will be that of a *-approximate group*: a precompact symmetric subset of containing the origin such that is covered by at most copies of . Such sets obey good iterated doubling properties; for instance, is covered by at most copies of for any . They can be generated from sets of small tripling:

Lemma 6Let be a precompact non-empty subset of , and let . If or , then is a -approximate group.

*Proof:* From Lemma 5 we have

(for an appropriate choice of sign) and

and thus by Lemma 4, may be covered by at most copies of , giving the claim.

** — 2. From small doubling to small tripling or quadrupling — **

As we saw above, Lemma 5 and Lemma 4 are already very powerful once one has some sort of control on triple or higher sums such as , , or . But if only controls a double sum such as or , it is a bit trickier to proceed. Here is one estimate (somewhat analogous to Proposition 2.18 from my book with Van Vu, but with slightly worse numerology):

Lemma 7Let be a precompact non-empty subset of , and let . If , then .

One can combine this lemma with Lemma 5 to obtain similar conclusions starting with a hypothesis on rather than ; we leave this to the interested reader. Of course, the conclusion can also be combined with Lemma 6; we again leave this as an exercise.

*Proof:* Write . Then , so we may find a -separated subset of with . By hypothesis, we may cover by balls with . Call a centre *popular* if contains at least differences with (counting multiplicity), and let denote the set of popular centres. Then at most of the pairs have lying in an unpopular ball , thus we have for at least pairs in . Thus, by the pigeonhole principle, there exists such that at least elements of lie in . Thus

and thus

Next, for any , we consider the set of pairs such that . We may write

for some and . By definition of , we can find distinct pairs for with such that for all . As is -separated and has diameter at most , the must be distinct in , and similarly for the . We then have

for . Each lies in and is thus lies in for some , and similarly for some . Then

and so . Also, since and the are -separated, we see that the are distinct as varies. We conclude that . On the other hand, the total number of pairs is , and any two -separated points in generate disjoint sets . We conclude that there can be at most -separated points in , thus

and thus

By Lemma 5 and (1), we conclude that

and the claim follows.

** — 3. The Balog-Szemeredi-Gowers lemma — **

One of the most difficult, but powerful, components of the elementary sum set theory is the tool now known as the *Balog-Szemerédi-Gowers lemma*, which converts control on partial sumsets (or equivalently, lower bounds on “additive energy”) to control on total sumsets, after suitable refinements of the sets. Here is one metric entropy version of this lemma.

Lemma 8 (Balog-Szemerédi-Gowers)Let and , and let be precompact subsets of . Suppose thatand

where we endow with the sup norm metric. Then there exist subsets , of respectively with

and

Again, this lemma may be usefully combined with the previous sum set estimates, much as was done in my book with Van Vu; I leave the details to the interested reader.

*Proof:* Let , be maximal -separated subsets of respectively, thus , and similarly .

By hypothesis, we can find at least quadruples which are -separated in , such that

for all such quadruples. By construction of , each such quadruple can be associated to a nearby quadruple with

and thus by the triangle inequality

Also by the triangle inequality we see that each can be associated to at most one of the quadruples , and as the are -separated, the are -separated. We conclude that there is a set of at least quadruples in obeying (2) that are -separated.

Call a pair *popular* if there are at least of the above quadruples in obeying (2) with the indicated first two coefficients. The unpopular pairs absorb at most of the quadruples, so at least of the quadruples are associated to popular pairs . On the other hand, as the quadruples are -separated, we see from (2) and the triangle inequality that for each there is at most one giving rise to a quadruple . Thus each can be associated to at most quadruples, and we conclude that the set of popular pairs has size at least . In particular this shows that .

We now apply the graph-theoretic Balog-Szemerédi-Gowers lemma (see Corollary 6.19 of my book with Van Vu) to conclude that there exists a subset of and of with

such that for every and there exist pairs such that lie in . Since was already -separated, we conclude that

Now fix and , and let be one of the above pairs. As is popular, we can thus find pairs such that

furthermore, the lie in an -separated set. Similarly, we can find pairs and pairs such that

with the and also lying in an -separated set. In particular, we see that given , uniquely determine , and uniquely determine , so a single sextuple can arise from at most one pair ; in particular, we see that sextuples are associated to each pair . Taking alternating combinations of (3), (4), (5) we see that

In particular, if and are two pairs in with at least apart, then a single sextuple can be associated to at most one of these pairs. Since the number of sextuples is at most , we conclude that there are at most pairs with -separated differences , thus as required.

Filed under: expository, math.CO, math.MG Tagged: additive combinatorics, metric entropy, sum set estimates ]]>

In the previous blog post, the Euler equations for inviscid incompressible fluid flow were interpreted in a Lagrangian fashion, and then Noether’s theorem invoked to derive the known conservation laws for these equations. In a bit more detail: starting with *Lagrangian space* and *Eulerian space* , we let be the space of volume-preserving, orientation-preserving maps from Lagrangian space to Eulerian space. Given a curve , we can define the *Lagrangian velocity field* as the time derivative of , and the *Eulerian velocity field* . The volume-preserving nature of ensures that is a divergence-free vector field:

If we formally define the functional

then one can show that the critical points of this functional (with appropriate boundary conditions) obey the Euler equations

for some pressure field . As discussed in the previous post, the time translation symmetry of this functional yields conservation of the Hamiltonian

the rigid motion symmetries of Eulerian space give conservation of the total momentum

and total angular momentum

and the diffeomorphism symmetries of Lagrangian space give conservation of circulation

for any closed loop in , or equivalently pointwise conservation of the Lagrangian vorticity , where is the -form associated with the vector field using the Euclidean metric on , with denoting pullback by .

It turns out that one can generalise the above calculations. Given any self-adjoint operator on divergence-free vector fields , we can define the functional

as we shall see below the fold, critical points of this functional (with appropriate boundary conditions) obey the generalised Euler equations

for some pressure field , where in coordinates is with the usual summation conventions. (When , , and this term can be absorbed into the pressure , and we recover the usual Euler equations.) Time translation symmetry then gives conservation of the Hamiltonian

If the operator commutes with rigid motions on , then we have conservation of total momentum

and total angular momentum

and the diffeomorphism symmetries of Lagrangian space give conservation of circulation

or pointwise conservation of the Lagrangian vorticity . These applications of Noether’s theorem proceed exactly as the previous post; we leave the details to the interested reader.

One particular special case of interest arises in two dimensions , when is the inverse derivative . The vorticity is a -form, which in the two-dimensional setting may be identified with a scalar. In coordinates, if we write , then

Since is also divergence-free, we may therefore write

where the stream function is given by the formula

If we take the curl of the generalised Euler equation (2), we obtain (after some computation) the surface quasi-geostrophic equation

This equation has strong analogies with the three-dimensional incompressible Euler equations, and can be viewed as a simplified model for that system; see this paper of Constantin, Majda, and Tabak for details.

Now we can specialise the general conservation laws derived previously to this setting. The conserved Hamiltonian is

(a law previously observed for this equation in the abovementioned paper of Constantin, Majda, and Tabak). As commutes with rigid motions, we also have (formally, at least) conservation of momentum

(which up to trivial transformations is also expressible in impulse form as , after integration by parts), and conservation of angular momentum

(which up to trivial transformations is ). Finally, diffeomorphism invariance gives pointwise conservation of Lagrangian vorticity , thus is transported by the flow (which is also evident from (3). In particular, all integrals of the form for a fixed function are conserved by the flow.

** — 1. Euler-Lagrange calculations — **

We now justify the claim that stationary points of the functional obey (2). We consider continuous deformations of the critical point , thus now depends on both and . We already have the Eulerian velocity field , which is related to the derivative of by the formula

similarly we may introduce a deformation field by

The vector field is divergence free and has to obey appropriate vanishing conditions at infinity, but is otherwise unconstrained. If we compute using the above two equations and the chain rule, we arrive at the “zero-curvature” condition

On the other hand, as is a critical point, we have

when . Differentiating under the integral sign and using the self-adjoint nature of , the left-hand side is

Inserting (4) and integrating by parts (and using the divergence-free nature of ), this expression can be rewritten as

Since is essentially an arbitrary divergence-free vector field, the expression inside parentheses must vanish, and the equation (2) follows.

Filed under: expository, math.AP, math.MP Tagged: Euler equations, Noether's theorem, surface quasi-geostrophic equation ]]>

It is a remarkable fact in the theory of differential equations that many of the ordinary and partial differential equations that are of interest (particularly in geometric PDE, or PDE arising from mathematical physics) admit a variational formulation; thus, a collection of one or more fields on a domain taking values in a space will solve the differential equation of interest if and only if is a critical point to the functional

involving the fields and their first derivatives , where the Lagrangian is a function on the vector bundle over consisting of triples with , , and a linear transformation; we also usually keep the boundary data of fixed in case has a non-trivial boundary, although we will ignore these issues here. (We also ignore the possibility of having additional constraints imposed on and , which require the machinery of Lagrange multipliers to deal with, but which will only serve as a distraction for the current discussion.) It is common to use local coordinates to parameterise as and as , in which case can be viewed locally as a function on .

Example 1 (Geodesic flow)Take and to be a Riemannian manifold, which we will write locally in coordinates as with metric for . A geodesic is then a critical point (keeping fixed) of the energy functionalor in coordinates (ignoring coordinate patch issues, and using the usual summation conventions)

As discussed in this previous post, both the Euler equations for rigid body motion, and the Euler equations for incompressible inviscid flow, can be interpreted as geodesic flow (though in the latter case, one has to work

reallyformally, as the manifold is now infinite dimensional).More generally, if is itself a Riemannian manifold, which we write locally in coordinates as with metric for , then a harmonic map is a critical point of the energy functional

or in coordinates (again ignoring coordinate patch issues)

If we replace the Riemannian manifold by a Lorentzian manifold, such as Minkowski space , then the notion of a harmonic map is replaced by that of a wave map, which generalises the scalar wave equation (which corresponds to the case ).

Example 2 (-particle interactions)Take and ; then a function can be interpreted as a collection of trajectories in space, which we give a physical interpretation as the trajectories of particles. If we assign each particle a positive mass , and also introduce a potential energy function , then it turns out that Newton’s laws of motion in this context (with the force on the particle being given by the conservative force ) are equivalent to the trajectories being a critical point of the action functional

Formally, if is a critical point of a functional , this means that

whenever is a (smooth) deformation with (and with respecting whatever boundary conditions are appropriate). Interchanging the derivative and integral, we (formally, at least) arrive at

Write for the infinitesimal deformation of . By the chain rule, can be expressed in terms of . In coordinates, we have

where we parameterise by , and we use subscripts on to denote partial derivatives in the various coefficients. (One can of course work in a coordinate-free manner here if one really wants to, but the notation becomes a little cumbersome due to the need to carefully split up the tangent space of , and we will not do so here.) Thus we can view (2) as an integral identity that asserts the vanishing of a certain integral, whose integrand involves , where vanishes at the boundary but is otherwise unconstrained.

A general rule of thumb in PDE and calculus of variations is that whenever one has an integral identity of the form for some class of functions that vanishes on the boundary, then there must be an associated differential identity that justifies this integral identity through Stokes’ theorem. This rule of thumb helps explain why integration by parts is used so frequently in PDE to justify integral identities. The rule of thumb can fail when one is dealing with “global” or “cohomologically non-trivial” integral identities of a topological nature, such as the Gauss-Bonnet or Kazhdan-Warner identities, but is quite reliable for “local” or “cohomologically trivial” identities, such as those arising from calculus of variations.

In any case, if we apply this rule to (2), we expect that the integrand should be expressible as a spatial divergence. This is indeed the case:

Proposition 1(Formal) Let be a critical point of the functional defined in (1). Then for any deformation with , we havewhere is the vector field that is expressible in coordinates as

*Proof:* Comparing (4) with (3), we see that the claim is equivalent to the Euler-Lagrange equation

The same computation, together with an integration by parts, shows that (2) may be rewritten as

Since is unconstrained on the interior of , the claim (6) follows (at a formal level, at least).

Many variational problems also enjoy one-parameter continuous *symmetries*: given any field (not necessarily a critical point), one can place that field in a one-parameter family with , such that

for all ; in particular,

which can be written as (2) as before. Applying the previous rule of thumb, we thus expect another divergence identity

whenever arises from a continuous one-parameter symmetry. This expectation is indeed the case in many examples. For instance, if the spatial domain is the Euclidean space , and the Lagrangian (when expressed in coordinates) has no direct dependence on the spatial variable , thus

then we obtain translation symmetries

for , where is the standard basis for . For a fixed , the left-hand side of (7) then becomes

where . Another common type of symmetry is a *pointwise* symmetry, in which

for all , in which case (7) clearly holds with .

If we subtract (4) from (7), we obtain the celebrated theorem of Noether linking symmetries with conservation laws:

Theorem 2 (Noether’s theorem)Suppose that is a critical point of the functional (1), and let be a one-parameter continuous symmetry with . Let be the vector field in (5), and let be the vector field in (7). Then we have the pointwise conservation law

In particular, for one-dimensional variational problems, in which , we have the conservation law for all (assuming of course that is connected and contains ).

Noether’s theorem gives a systematic way to locate conservation laws for solutions to variational problems. For instance, if and the Lagrangian has no explicit time dependence, thus

then by using the time translation symmetry , we have

as discussed previously, whereas we have , and hence by (5)

and so Noether’s theorem gives conservation of the *Hamiltonian*

For instance, for geodesic flow, the Hamiltonian works out to be

so we see that the speed of the geodesic is conserved over time.

For pointwise symmetries (9), vanishes, and so Noether’s theorem simplifies to ; in the one-dimensional case , we thus see from (5) that the quantity

is conserved in time. For instance, for the -particle system in Example 2, if we have the translation invariance

for all , then we have the pointwise translation symmetry

for all , and some , in which case , and the conserved quantity (11) becomes

as was arbitrary, this establishes conservation of the *total momentum*

Similarly, if we have the rotation invariance

for any and , then we have the pointwise rotation symmetry

for any skew-symmetric real matrix , in which case , and the conserved quantity (11) becomes

since is an arbitrary skew-symmetric matrix, this establishes conservation of the *total angular momentum*

Below the fold, I will describe how Noether’s theorem can be used to locate all of the conserved quantities for the Euler equations of inviscid fluid flow, discussed in this previous post, by interpreting that flow as geodesic flow in an infinite dimensional manifold.

** — 1. Euler’s equations — **

The geometric setup for the geodesic interpretation of Euler’s equations of fluid flow is as follows. We will need two copies and of Euclidean space , with two different structures. Firstly, we will have Lagrangian space

which is (viewed as a smooth manifold), together with the standard volume form . This space should be thought of as the space of “labels” of the particles of the fluid, and its coordinates are known as *Lagrangian coordinates*. The symmetry group of this space is the space of orientation-preserving and volume-preserving diffeomorphisms.

Secondly, we will need the *Eulerian space*

which is the smooth manifold together with the Euclidean metric and the standard volume form . This space is the physical space of “positions” of the particles of the fluid. The symmetry space of this space is the space of orientation-preserving rigid motions of Euclidean space.

Let be the space of diffeomorphisms from Lagrangian space to Eulerian space that preserve volume and orientation; this can be viewed as an infinite-dimensional manifold. A single element of describes the positions of an incompressible fluid at a snapshot in time; an incompressible fluid flow is then described by a curve . The time derivative of such a curve can be viewed in Eulerian coordinates as the *velocity field*

If one then defines the Lagrangian

then one can show that the critical points of this Lagrangian (formally) correspond to solutions to the Euler equations using the correspondence (12): see this previous blog post for details.

Applying a volume-preserving change of coordinates, the Lagrangian can also be expressed as

There are then three types of symmetries that are evident for this Lagrangian: time symmetry; symmetry on the Eulerian space; and symmetry on the Lagrangian space.

We begin with time symmetry, , which comes from the fact that the Lagrangian does not depend explicitly on the time variable . As discussed before, this gives conservation of the Hamiltonian (10), which (formally, at least) becomes

thus giving the familiar energy conservation law for the Euler equations.

Now we use the symmetry group that acts on the Eulerian space , and hence on fluid flows . This is a pointwise symmetry of the Lagrangian, and formally gives conservation of total momentum

and total angular momentum

in exact analogy with the situation with the -body system. (Indeed, one can formally view the Euler equations as an limit of a certain family of -body systems, which is indeed how these equations are physically derived.)

Finally, we consider symmetries on the Lagrangian space . Any divergence-free vector field on gives a one-parameter group of volume-preserving, orientation-preserving diffeomorphisms on , which then act on fluid flows by the formula

This is a pointwise symmetry of the Lagrangian, with infinitesimal derivative

Applying (11), we thus (formally) conclude that the quantity

is conserved. We can write this in coordinates as

We can specialise this conservation law by working with specific choices of divergence-free vector field . For instance, suppose we have a closed loop , which we parameterise by unit speed: . For an infinitesimal , we can then create a divergence-free vector field by setting when lies in the (transverse) -neighbourhood of , and zero otherwise. It is geometrically obvious that this field is divergence-free (up to errors of ). The conserved quantity (13) is then equal to

up to lower order terms, so that the quantity

is conserved. Writing for the curve , we see from the chain rule that this is equal to

giving Kelvin’s circulation theorem.

More generally, we can generate a divergence-free vector field from an alternating -vector by taking a further divergence:

Integrating by parts, we can then write (13) as

since is an arbitrary alternating -tensor, we conclude that for each and , the quantity

is conserved in time. Writing and using the chain rule, this becomes

which after interchange of the indices may be rewritten in terms of the vorticity as

giving the pointwise conservation of the pullback of the vorticity in Lagrangian coordinates. (Of course, this is just the differential form of Kelvin’s circulation theorem; it also implies conservation of the vortex stream lines in Lagrangian coordinates.)

Finally, as the vorticity is divergence-free (when viewed as a polar vector field), the pullback is also. If we then set to be the vector field associated to the conserved quantity , the quantity (13) can then be rewritten as the helicity

which is then also conserved; a similar argument gives conservation of the helicity on any set that is the union of stream lines.

Remark 1The above Lagrangian mechanics calculations can also be recast into a Hamiltonian mechanics formalism; see for instance this paper of Olver for a Hamiltonian perspective on the conservation laws for the Euler equations.

Filed under: expository, math.AP, math.CA Tagged: calculus of variations, conservation laws, Euler-Arnold equation, incompressible Euler equations, Noether's theorem, symmetry ]]>

where is the velocity field, and is the pressure field. To avoid technicalities we will assume that both fields are smooth, and that is bounded. We will take the dimension to be at least two, with the three-dimensional case being of course especially interesting.

The Euler equations are the inviscid limit of the Navier-Stokes equations; as discussed in my previous post, one potential route to establishing finite time blowup for the latter equations when is to be able to construct “computers” solving the Euler equations, which generate smaller replicas of themselves in a noise-tolerant manner (as the viscosity term in the Navier-Stokes equation is to be viewed as perturbative noise).

Perhaps the most prominent obstacles to this route are the *conservation laws* for the Euler equations, which limit the types of final states that a putative computer could reach from a given initial state. Most famously, we have the conservation of energy

(assuming sufficient decay of the velocity field at infinity); thus for instance it would not be possible for a computer to generate a replica of itself which had greater total energy than the initial computer. This by itself is not a fatal obstruction (in this paper of mine, I constructed such a “computer” for an averaged Euler equation that still obeyed energy conservation). However, there are other conservation laws also, for instance in three dimensions one also has conservation of helicity

and (formally, at least) one has conservation of momentum

and angular momentum

(although, as we shall discuss below, due to the slow decay of at infinity, these integrals have to either be interpreted in a principal value sense, or else replaced with their vorticity-based formulations, namely impulse and moment of impulse). Total vorticity

is also conserved, although it turns out in three dimensions that this quantity vanishes when one assumes sufficient decay at infinity. Then there are the pointwise conservation laws: the vorticity and the volume form are both transported by the fluid flow, while the velocity field (when viewed as a covector) is transported up to a gradient; among other things, this gives the transport of vortex lines as well as Kelvin’s circulation theorem, and can also be used to deduce the helicity conservation law mentioned above. In my opinion, none of these laws actually prohibits a self-replicating computer from existing within the laws of ideal fluid flow, but they do significantly complicate the task of actually designing such a computer, or of the basic “gates” that such a computer would consist of.

Below the fold I would like to record and derive all the conservation laws mentioned above, which to my knowledge essentially form the complete set of known conserved quantities for the Euler equations. The material here (although not the notation) is drawn from this text of Majda and Bertozzi.

For reasons which may become clearer later, I will rewrite the Euler equations in the language of Riemannian geometry, in particular, using the abstract index notation of Penrose), and using the Euclidean metric on to raise and lower indices, and to define the covariant derivative through the Levi-Civita connection (which, in Cartesian coordinates, is just the usual partial derivative evaluated componentwise). The velocity field now is written as ; contracting against the metric gives a -form , which I will call the *covelocity*, and also write as . The Euler equations then become

In particular we have

which leads to the conservation of energy (1) upon integrating in space.

In the usual treatment of the Euler equations, it is common to introduce the material derivative

Here, we shall adopt the subtly different (but closely related) approach of using the *material Lie derivative*

where is the Lie derivative along the vector field . For scalar fields , the material Lie derivative is the same as the material derivative:

However, the two notions differ when applied to vector fields or forms, with the material Lie derivative having better covariance properties than the material derivative. When applied to vector fields , we have

and so

Similarly, for -forms , we have

and similarly for -forms we have

leading to similar formulae comparing and for forms.

Since , the material Lie derivative of the velocity field is just the time derivative:

The material Lie derivative of the *covelocity* field is however more interesting:

In particular, we see that the material Lie derivative of the covelocity is a gradient:

Since the integral of a gradient along any closed loop is zero, we obtain

Theorem 1 (Kelvin’s circulation theorem)Let be a time-dependent loop in which is transported by the flow (thus for any scalar function ). Then

Now we take an exterior derivative of the covelocity to obtain the *vorticity*

In abstract index notation, is the -form

As exterior derivatives commute with diffeomorphisms, they also commute with Lie derivatives, so in particular

Since was a gradient, its exterior derivative vanishes, and we thus have *transport of vorticity*:

(This fact was also interpreted as conservation of exterior momentum in this previous blog post.) This fact also follows from Kelvin’s circulation theorem, after first applying Stokes’ theorem to rewrite as for a spanning surface that is transported by the flow.

If we let be the usual volume -form on , then the divergence-free nature of (and the time-independent nature of ) implies that is also transported by the flow:

If we thus define the *polar vorticity* to be the -vector that is the Hodge star of with respect to this volume form, thus

for all -forms , then we see from (5), (6) that the polar vorticity is also transported by the flow:

In two dimensions , the polar vorticity is just a scalar, which by abuse of notation is also denoted (in coordinates, ), and (7) becomes the well-known transport of scalar vorticity:

In three dimensions , is a vector field which by abuse of notation is also denoted (in coordinates, ), and (7) becomes the well-known vorticity equation:

From (7) we also see that the vortex lines are transported by the flow; in fact we have the stronger statement that if is transported by the flow and obeys

at the initial time , then it continues to do so at all later times .

In three dimensions, we may contract the polar vorticity against the covelocity to obtain a scalar . We may then combine (7) and (4) to obtain

Now the exterior derivative of vanishes, so that is divergence-free, and so annihilates . We therefore conclude conservation of helicity (2). In fact we conclude the stronger statement that if is any time-dependent region in which is preserved by (i.e. it is the union of vortex lines) and is transported by the flow, then is conserved in time. This is consistent with Kelvin’s circulation theorem, since one can use Fubini’s theorem to compute the integral by first computing the integral of on each of the vortex lines in , and then integrating against on the space of vortex lines in (which is a two-dimensional space on which naturally descends to become an area form. All of these quantities are transported by the flow.

Finally, we consider the conservation of various moments of the velocity and vorticity. Here it is best to return to material derivatives instead of material Lie derivatives , basically because the flow along does not preserve the Euclidean metric or the flat connection , making the interchange of Lie derivatives with the integration of vector-valued quantities a little tricky.

Because we will be considering linear integrals of or rather than quadratic integrals, there can be some difficulty in ensuring absolute integrability of the integrals used; for instance, in three dimensions the Biot-Savart law suggests that could decay as slowly as , even if the vorticity is compactly supported. However, the vorticity transport equation (7) tells us (in any dimension) that if the vorticity is compactly supported at time zero, then it remains compactly supported at later times (with the support being transported by the flow). In practice, this means that we will be able to justify operations such as integration by parts if there is at least one factor of the vorticity present.

We begin with the total vorticity

which is well-defined as a -form thanks to the flat connection. Formally, if we write and integrate by parts, this vorticity should vanish; however if has slow decay then this is not necessarily the case. For instance, if is a smooth mollification of the 2D Biot-Savart kernel then the total vorticity is one (times the standard -form). In three dimensions, though, there is a trick that allows one to establish vanishing of the total polar vorticity

and hence also the total vorticity. Namely, if is the scaling vector field , then

and integration in parts (now involving the compactly supported vorticity ) gives the required vanishing. An application of Fubini’s theorem then shows that the total vorticity also vanishes in four and higher dimensions.

In any dimension, though, the total vorticity (and hence also total polar vorticity) is conserved. Indeed, from (5) and (3) we have

where we have used the vanishing of the exterior derivative of vorticity, as well as the divergence-free nature of . This expresses as a total derivative

giving conservation of total vorticity.

Now we look at total velocity

which (up to a scaling factor representing the density of the incompressible fluid) has the physical interpretation as the total momentum of the fluid. We have

which *formally* suggests that total velocity is conserved. However, in practice usually decays too slowly to justify this calculation, unless one works in a suitable principal value sense. We shall take a different tack, noting that

Thus, *when has enough decay*, one has

however, the right-hand side remains well defined even when decays slowly, assuming that the vorticity is compactly supported. It is thus natural to then define the impulse

in three dimensions, this would be . The above considerations suggest that the impulse should be another conserved quantity, and indeed it is. To see this, we first compute using (8):

and so it will suffice to show that is also a total derivative. But it is:

Finally, we look at the total angular momentum

Again, we have

which as before formally suggests that total angular momentum should be conserved. As with total momentum, in practice the velocity field decays too slowly to justify this calculation, unless one works carefully with principal value integrals (and uses quite precise asymptotics on the decay of at infinity). Once again, one can avoid these technicalities by recasting this quantity in terms of vorticity. Using to denote antisymmetrisation in the indices, we observe that

and so we have

when there is sufficient decay of the velocity field. Again, the right-hand side makes sense whenever the vorticity is compactly supported. If we then define the *moment of impulse*

then we expect this quantity to also be conserved by the flow. This is indeed the case, and can be verified by a rather lengthy calculation similar to that used to establish conservation of impulse; we omit the details here as they are rather tedious and unenlightening, with a key step being the establishment of the fact that is a total derivative, by manipulating the identity (9).

Filed under: expository, math.AP Tagged: conservation laws, Euler equations, helicity, vorticity ]]>

either for small values of (in particular ) or asymptotically as . The previous thread may be found here. The currently best known bounds on can be found at the wiki page.

The focus is now on bounding unconditionally (in particular, without resorting to the Elliott-Halberstam conjecture or its generalisations). We can bound whenever one can find a symmetric square-integrable function supported on the simplex such that

Our strategy for establishing this has been to restrict to be a linear combination of symmetrised monomials (restricted of course to ), where the degree is small; actually, it seems convenient to work with the slightly different basis where the are restricted to be even. The criterion (1) then becomes a large quadratic program with explicit but complicated rational coefficients. This approach has lowered down to , which led to the bound .

Actually, we know that the more general criterion

will suffice, whenever and is supported now on and obeys the vanishing marginal condition whenever . The latter is in particular obeyed when is supported on . A modification of the preceding strategy has lowered slightly to , giving the bound which is currently our best record.

However, the quadratic programs here have become extremely large and slow to run, and more efficient algorithms (or possibly more computer power) may be required to advance further.

Filed under: math.NA, polymath Tagged: polymath8 ]]>

either for small values of (in particular ) or asymptotically as . The previous thread may be found here. The currently best known bounds on can be found at the wiki page.

The big news since the last thread is that we have managed to obtain the (sieve-theoretically) optimal bound of assuming the generalised Elliott-Halberstam conjecture (GEH), which pretty much closes off that part of the story. Unconditionally, our bound on is still . This bound was obtained using the “vanilla” Maynard sieve, in which the cutoff was supported in the original simplex , and only Bombieri-Vinogradov was used. In principle, we can enlarge the sieve support a little bit further now; for instance, we can enlarge to , but then have to shrink the J integrals to , provided that the marginals vanish for . However, we do not yet know how to numerically work with these expanded problems.

Given the substantial progress made so far, it looks like we are close to the point where we should declare victory and write up the results (though we should take one last look to see if there is any room to improve the bounds). There is actually a fair bit to write up:

- Improvements to the Maynard sieve (pushing beyond the simplex, the epsilon trick, and pushing beyond the cube);
- Asymptotic bounds for and hence ;
- Explicit bounds for (using the Polymath8a results)
- ;
- on GEH (and parity obstructions to any further improvement).

I will try to create a skeleton outline of such a paper in the Polymath8 Dropbox folder soon. It shouldn’t be nearly as big as the Polymath8a paper, but it will still be quite sizeable.

Filed under: math.NT, polymath Tagged: polymath8 ]]>

The first purpose is to announce the uploading of the paper “New equidistribution estimates of Zhang type, and bounded gaps between primes” by D.H.J. Polymath, which is the main output of the Polymath8a project on bounded gaps between primes, to the arXiv, and to describe the main results of this paper below the fold.

The second purpose is to roll over the previous thread on all remaining Polymath8a-related matters (e.g. updates on the submission status of the paper) to a fresh thread. (Discussion of the ongoing Polymath8b project is however being kept on a separate thread, to try to reduce confusion.)

The final purpose of this post is to coordinate the writing of a retrospective article on the Polymath8 experience, which has been solicited for the Newsletter of the European Mathematical Society. I suppose that this could encompass both the Polymath8a and Polymath8b projects, even though the second one is still ongoing (but I think we will soon be entering the endgame there). I think there would be two main purposes of such a retrospective article. The first one would be to tell a story about the *process* of conducting mathematical research, rather than just describe the *outcome* of such research; this is an important aspect of the subject which is given almost no attention in most mathematical writing, and it would be good to be able to capture some sense of this process while memories are still relatively fresh. The other would be to draw some tentative conclusions with regards to what the strengths and weaknesses of a Polymath project are, and how appropriate such a format would be for other mathematical problems than bounded gaps between primes. In my opinion, the bounded gaps problem had some fairly unique features that made it particularly amenable to a Polymath project, such as (a) a high level of interest amongst the mathematical community in the problem; (b) a very focused objective (“improve !”), which naturally provided an obvious metric to measure progress; (c) the modular nature of the project, which allowed for people to focus on one aspect of the problem only, and still make contributions to the final goal; and (d) a very reasonable level of ambition (for instance, we did not attempt to prove the twin prime conjecture, which in my opinion would make a terrible Polymath project at our current level of mathematical technology). This is not an exhaustive list of helpful features of the problem; I would welcome other diagnoses of the project by other participants.

With these two objectives in mind, I propose a format for the retrospective article consisting of a brief introduction to the polymath concept in general and the polymath8 project in particular, followed by a collection of essentially independent contributions by different participants on their own experiences and thoughts. Finally we could have a conclusion section in which we make some general remarks on the polymath project (such as the remarks above). I’ve started a dropbox subfolder for this article (currently in a very skeletal outline form only), and will begin writing a section on my own experiences; other participants are of course encouraged to add their own sections (it is probably best to create separate files for these, and then input them into the main file retrospective.tex, to reduce edit conflicts. If there are participants who wish to contribute but do not currently have access to the Dropbox folder, please email me and I will try to have you added (or else you can supply your thoughts by email, or in the comments to this post; we may have a section for shorter miscellaneous comments from more casual participants, for people who don’t wish to write a lengthy essay on the subject).

As for deadlines, the EMS Newsletter would like a submitted article by mid-April in order to make the June issue, but in the worst case, it will just be held over until the issue after that.

** — 1. Description of Polymath8a results — **

Let denote the quantity

where denotes the prime. Thus for instance the notorious twin prime conjecture is equivalent to the claim that . However, even establishing the finite nature of unconditionally was an open problem until the celebrated work of Zhang last year, who established the bound

Zhang’s argument, which built upon earlier work of Goldston, Pintz, and Yildirim, can be summarised as follows. For any natural number , define an *admissible -tuple* to be a tuple of increasing integers, which avoids at least one residue class modulo for each prime . For instance, is an admissible -tuple, but is not. The Hardy-Littlewood prime tuples conjecture asserts that if is an admissible -tuple, then there exists infinitely many such that are simultaneously prime. This conjecture is currently out of reach for any ; for instance, the case when and the tuple is is the twin prime conjecture. However, Zhang was able to prove a weaker claim, which we call , for sufficiently large . Specifically, (following the notation of Pintz) let denote the assertion that given any admissible -tuple , one has infinitely many such that *at least two* of the are prime. It is easy to see that if holds and is an admissible -tuple, then . So to bound , it suffices to show that holds for some , and then find as narrow an admissible -tuple as possible.

Zhang was able to obtain for , and then took the first primes larger than to be the admissible -tuple, observing that this tuple had diameter at most . (Actually, it has diameter , as observed by Trudgian.) The earliest phase of the Polymath8a project consisted of using increasingly sophisticated methods to search for narrow admissible tuples of a given cardinality; in the case of this particular , we were able to find an admissible tuple whose diameter was . On the other hand, an application of the large sieve inequalities shows that admissible -tuples asymptotically must have size at least (and we conjecture that the narrowest -tuple in fact has size ), so there is a definite limit to how much one can improve the bound on purely from finding ever narrower admissible tuples. (As part of the Polymath8a project, a database of narrow tuples was set up here (and is still accepting submissions), building upon previous data of Engelsma.)

To make further progress, one has to analyse how the result is proven. Here, Zhang follows the arguments of Goldston, Pintz, and Yildirim, which are based on constructing a sieve function , supported on (say) the interval for a large , such that the sum

has good upper bounds, and the sums

has good lower bounds for . Provided that the ratio between the lower and upper bounds is big enough, one can then easily deduce (essentially from the pigeonhole principle).

One then needs to find a good choice of , which on the one hand is simple enough that the sums (1), (2) can be bounded rigorously, but on the other hand are sophisticated enough that one gets a good ratio between (2) and (1). Goldston, Pintz, and Yildirim eventually settled on a choice essentially of the form

for some auxiliary parameter and some ; this is a variant of the Selberg sieve. With this choice, they were already able to establish upper bounds of as strong as on the Elliott-Halberstam conjecture, which asserts that

for all and and to obtain the weaker result without this conjecture. Furthermore, any nontrivial progress on the Elliott-Halberstam conjecture (beyond what is provided by the Bombieri-Vinogradov theorem, which covers the case ) would give some finite bound on .

Even after all the recent progress on bounded gaps, we still do not have any direct progress on the Elliott-Halberstam conjecture (3) for any . However, Zhang (and independently, Motohashi and Pintz) observed that one does not need the full strength of (3) in order to obtain the conclusions of Goldston-Pintz-Yildirim. Firstly, one does not need all residue classes here, but only those classes that are the roots of a certain polynomial. Secondly, one does not need all moduli here, but can restrict attention to *smooth* (or *friable* moduli – moduli with no large prime factors – as the error incurred by ignoring all other moduli turns out to be exponentially small in . With these caveats, Zhang was able to obtain a restricted form of (3) with as large as , which he then used to obtain as small as .

Actually, Zhang’s treatment of the truncation error is not optimal, and by being more careful here (and by relaxing the requirement of smooth moduli to the less stringent requirement of “densely divisible” moduli) we were able to reduce down to . Furthermore, by replacing the monomial with the more flexible cutoff and then optimising in (a computation first made in unpublished work of Conrey, and then in the paper of Farkas, Pintz, and Revesz, with the optimal turning out to come from a Bessel function), one could reduce to be as small as (leading to a bound of that ended up to be ).

To go beyond this, we had to unpack Zhang’s proof of (a weakened version of) the Elliott-Halberstam type bound (3). His approach follows a well known sequence of papers by Bombieri, Fouvry, Friedlander, and Iwaniec on various restricted breakthroughs beyond the Bombieri-Vinogradov barrier, although with the key difference that Zhang did not use automorphic form techniques, which (at our current level of understanding) are almost entirely restricted to the regime where the residue class is fixed in (as opposed to varying amongst the roots of a polynomial modulo , which is what is needed for the current application). However, the remaining steps are familiar: first one uses the Heath-Brown identity to decompose (a variant of) the expression in (3) into some simpler bilinear and trilinear sums, which Zhang called “Type I”, “Type II”, and “Type III” (though one should caution that these are slightly different from the “Type I” and “Type II” sums arising from Vaughan-type identities). The Type I and Type II sums turn out to be treatable using a careful combination of the Cauchy-Schwarz inequality (as embodied in tools such as the dispersion method of Linnik), the Polya-Vinogradov completion of sums method, and estimates on one-dimensional exponential sums (which are variants of Kloosterman sums) which can ultimately be handled by the Riemann hypothesis for curves over finite fields, first established by Weil (and which can in this particular context also be proven by the elementary method of Stepanov). The Type III sums can be treated by a variant of these methods, except that one-dimensional exponential sum estimates are insufficient; Zhang instead needed to turn to the three-dimensional exponential sum estimates of Birch and Bombieri to get an adequate amount of cancellation, and these estimates ultimately arose from the deep work of Deligne on the Riemann hypothesis for higher dimensional varieties (see this previous blog post for a discussion of these hypotheses).

In our work, we were able to improve the Cauchy-Schwarz components of these arguments in a number of ways, with the most significant gain coming from applying the “-van der Corput -process” of Graham and Ringrose to the Type I sums; we also have a slightly different way to handle the Type III sums (based on a recent preprint of Fouvry, Kowalski, and Michel), based on correlations of hyper-Kloosterman sums (again coming from Deligne’s work), which gives significantly better results for these sums (so much so, in fact, that the Type III sums are no longer the dominant obstruction to further improvement of the numerology). Putting all these computations together, we can stretch Zhang’s improvement to Bombieri-Vinogradov by about an order of magnitude, with now allowed to be as large as rather than . This leads to a value of as low as , which in turn leads to the bound . These latter bounds have since been improved by Maynard and by Polymath8b, mostly by significant improvements to the sieve-theoretic part of the argument (and no longer using any distributional result on the primes beyond the Bombieri-Vinogradov theorem), but the distribution result of Polymath8a is still the best distribution result known on the primes, and may well have other applications beyond the bounded gaps problem.

Interestingly, the -van der Corput -process is strong enough, in fact, that we can still get non-trivial bounds of (weakened versions of) the form (3) even if we don’t attempt to estimate the Type III sums, so in particular we can obtain a Zhang-type distribution theorem even without using Deligne’s theorems, with now reaching as large as .

Filed under: math.NT, paper Tagged: polymath8 ]]>