Trivially, one has . Using Fourier methods (and the density increment argument of Roth), the bound of was obtained by Meshulam, and improved only as late as 2012 to for some absolute constant by Bateman and Katz. But in a very recent breakthrough, Ellenberg (and independently Gijswijt) obtained the exponentially superior bound , using a version of the polynomial method recently introduced by Croot, Lev, and Pach. (In the converse direction, a construction of Edel gives capsets as large as .) Given the success of the polynomial method in superficially similar problems such as the finite field Kakeya problem (discussed in this previous post), it was natural to wonder that this method could be applicable to the cap set problem (see for instance this MathOverflow comment of mine on this from 2010), but it took a surprisingly long time before Croot, Lev, and Pach were able to identify the precise variant of the polynomial method that would actually work here.

The proof of the capset bound is very short (Ellenberg’s and Gijswijt’s preprints are both 3 pages long, and Croot-Lev-Pach is 6 pages), but I thought I would present a slight reformulation of the argument which treats the three points on a line in symmetrically (as opposed to treating the third point differently from the first two, as is done in the Ellenberg and Gijswijt papers; Croot-Lev-Pach also treat the middle point of a three-term arithmetic progression differently from the two endpoints, although this is a very natural thing to do in their context of ). The basic starting point is this: if is a capset, then one has the identity

for all , where is the Kronecker delta function, which we view as taking values in . Indeed, (1) reflects the fact that the equation has solutions precisely when are either all equal, or form a line, and the latter is ruled out precisely when is a capset.

To exploit (1), we will show that the left-hand side of (1) is “low rank” in some sense, while the right-hand side is “high rank”. Recall that a function taking values in a field is of *rank one* if it is non-zero and of the form for some , and that the rank of a general function is the least number of rank one functions needed to express as a linear combination. More generally, if , we define the *rank* of a function to be the least number of “rank one” functions of the form

for some and some functions , , that are needed to generate as a linear combination. For instance, when , the rank one functions take the form , , , and linear combinations of such rank one functions will give a function of rank at most .

It is a standard fact in linear algebra that the rank of a diagonal matrix is equal to the number of non-zero entries. This phenomenon extends to higher dimensions:

Lemma 1 (Rank of diagonal hypermatrices)Let , let be a finite set, let be a field, and for each , let be a coefficient. Then the rank of the function

*Proof:* We induct on . As mentioned above, the case follows from standard linear algebra, so suppose now that and the claim has already been proven for .

It is clear that the function (2) has rank at most equal to the number of non-zero (since the summands on the right-hand side are rank one functions), so it suffices to establish the lower bound. By deleting from those elements with (which cannot increase the rank), we may assume without loss of generality that all the are non-zero. Now suppose for contradiction that (2) has rank at most , then we obtain a representation

for some sets of cardinalities adding up to at most , and some functions and .

Consider the space of functions that are orthogonal to all the , in the sense that

for all . This space is a vector space whose dimension is at least . A basis of this space generates a coordinate matrix of full rank, which implies that there is at least one non-singular minor. This implies that there exists a function in this space which is non-zero on some subset of of cardinality at least .

If we multiply (3) by and sum in , we conclude that

where

The right-hand side has rank at most , since the summands are rank one functions. On the other hand, from induction hypothesis the left-hand side has rank at least , giving the required contradiction.

On the other hand, we have the following (symmetrised version of a) beautifully simple observation of Croot, Lev, and Pach:

*Proof:* Using the identity for , we have

The right-hand side is clearly a polynomial of degree in , which is then a linear combination of monomials

with with

In particular, from the pigeonhole principle, at least one of is at most .

Consider the contribution of the monomials for which . We can regroup this contribution as

where ranges over those with , is the monomial

and is some explicitly computable function whose exact form will not be of relevance to our argument. The number of such is equal to , so this contribution has rank at most . The remaining contributions arising from the cases and similarly have rank at most (grouping the monomials so that each monomial is only counted once), so the claim follows.

Upon restricting from to , the rank of is still at most . The two lemmas then combine to give the Ellenberg-Gijswijt bound

All that remains is to compute the asymptotic behaviour of . This can be done using the general tool of Cramer’s theorem, but can also be derived from Stirling’s formula (discussed in this previous post). Indeed, if , , for some summing to , Stirling’s formula gives

where is the entropy function

We then have

where is the maximum entropy subject to the constraints

A routine Lagrange multiplier computation shows that the maximum occurs when

and is approximately , giving rise to the claimed bound of .

Remark 3As noted in the Ellenberg and Gijswijt papers, the above argument extends readily to other fields than to control the maximal size of aubset of that has no non-trivial solutions to the equation , where are non-zero constants that sum to zero. Of course one replaces the function in Lemma 2 by in this case.

Remark 4This symmetrised formulation suggests that one possible way to improve slightly on the numerical quantity by finding a more efficient way to decompose into rank one functions, however I was not able to do so (though such improvements are reminiscent of the Strassen type algorithms for fast matrix multiplication).

Remark 5It is tempting to see if this method can get non-trivial upper bounds for sets with no length progressions, in (say) . One can run the above arguments, replacing the functionwith

this leads to the bound where

Unfortunately, is asymptotic to and so this bound is in fact slightly worse than the trivial bound ! However, there is a slim chance that there is a more efficient way to decompose into rank one functions that would give a non-trivial bound on . I experimented with a few possible such decompositions but unfortunately without success.

Remark 6Return now to the capset problem. Since Lemma 1 is valid for any field , one could perhaps hope to get better bounds by viewing the Kronecker delta function as taking values in another field than , such as the complex numbers . However, as soon as one works in a field of characteristic other than , one can adjoin a cube root of unity, and one now has the Fourier decompositionMoving to the Fourier basis, we conclude from Lemma 1 that the function on now has rank exactly , and so one cannot improve upon the trivial bound of by this method using fields of characteristic other than three as the range field. So it seems one has to stick with (or the algebraic completion thereof).

Thanks to Jordan Ellenberg and Ben Green for helpful discussions.

Filed under: expository, math.CO, math.RA Tagged: cap sets, Dion Gijswijt, Ernie Croot, Jordan Ellenberg, Peter Pach, polynomial method, Seva Lev ]]>

as was demonstrated, where was any positive integer and denoted the Liouville function. The proof proceeded using a method I call the “entropy decrement argument”, which ultimately reduced matters to establishing a bound of the form

whenever was a slowly growing function of . This was in turn established in a previous paper of Matomaki, Radziwill, and myself, using the recent breakthrough of Matomaki and Radziwill.

It is natural to see to what extent the arguments can be adapted to attack the higher-point cases of the logarithmically averaged Chowla conjecture (ignoring for this post the more general Elliott conjecture for other bounded multiplicative functions than the Liouville function). That is to say, one would like to prove that

as for any fixed distinct integers . As it turns out (and as is detailed in the current paper), the entropy decrement argument extends to this setting (after using some known facts about linear equations in primes), and allows one to reduce the above estimate to an estimate of the form

for a slowly growing function of and some fixed (in fact we can take for ), where is the (normalised) local Gowers uniformity norm. (In the case , , this becomes the Fourier-uniformity conjecture discussed in this previous post.) If one then applied the (now proven) inverse conjecture for the Gowers norms, this estimate is in turn equivalent to the more complicated looking assertion

where the supremum is over all possible choices of *nilsequences* of controlled step and complexity (see the paper for definitions of these terms).

The main novelty in the paper (elaborating upon a previous comment I had made on this blog) is to observe that this latter estimate in turn follows from the logarithmically averaged form of Sarnak’s conjecture (discussed in this previous post), namely that

whenever is a zero entropy (i.e. deterministic) sequence. Morally speaking, this follows from the well-known fact that nilsequences have zero entropy, but the presence of the supremum in (1) means that we need a little bit more; roughly speaking, we need the *class* of nilsequences of a given step and complexity to have “uniformly zero entropy” in some sense.

On the other hand, it was already known (see previous post) that the Chowla conjecture implied the Sarnak conjecture, and similarly for the logarithmically averaged form of the two conjectures. Putting all these implications together, we obtain the pleasant fact that the logarithmically averaged Sarnak and Chowla conjectures are equivalent, which is the main result of the current paper. There have been a large number of special cases of the Sarnak conjecture worked out (when the deterministic sequence involved came from a special dynamical system), so these results can now also be viewed as partial progress towards the Chowla conjecture also (at least with logarithmic averaging). However, my feeling is that the full resolution of these conjectures will not come from these sorts of special cases; instead, conjectures like the Fourier-uniformity conjecture in this previous post look more promising to attack.

It would also be nice to get rid of the pesky logarithmic averaging, but this seems to be an inherent requirement of the entropy decrement argument method, so one would probably have to find a way to avoid that argument if one were to remove the log averaging.

Filed under: math.NT, paper Tagged: Chowla conjecture, entropy decrement argument, Sarnak conjecture ]]>

Suppose we are in the classical (Kolmogorov) framework of probability theory, in which one has a probability space representing all possible states . One can make a distinction between *deterministic* quantities that do not depend on the state , and *random* variables (or stochastic variables) that do depend (in some measurable fashion) on the state . (As discussed in this previous post, it is often helpful to adopt a perspective that suppresses the sample space as much as possible, but we will not do so for the current discussion.)

One can visualise the distinction as follows. If I pick a deterministic integer between and , say , then this fixes the value of for the rest of the discussion:

.

However, if I pick a *random* integer uniformly from (e.g. by rolling a fair die), one can think of as a quantity that keeps changing as one flips from one state to the next:

.

Here, I have “faked” the randomness by looping together a finite number of images, each of which is depicting one of the possible values could take. As such, one may notice that the above image eventually repeats in an endless loop. One could presumably write some more advanced code to render a more random-looking sequence of ‘s, but the above imperfect rendering should hopefully suffice for the sake of illustration.

Here is a (“faked” rendering of) a random variable that also takes values in , but is non-uniformly distributed, being more biased towards smaller values than larger values:

.

For continuous random variables, taking values for instance in with some distribution (e.g. uniform in a square, multivariate gaussian, etc.) one could display these random variables as a rapidly changing dot wandering over ; if one lets some “afterimages” of previous dots linger for some time on the screen, one can begin to see the probability density function emerge in the animation. This is unfortunately beyond my ability to quickly whip up as an image; but if someone with a bit more programming skill is willing to do so, I would be very happy to see the result :).

The operation of conditioning to an event corresponds to ignoring all states in the sample space outside of the event. For instance, if one takes the previous random variable , and conditions to the event , one gets the conditioned random variable

.

One can use the animation to help illustrate concepts such as independence or correlation. If we revert to the unconditioned random variable

and let be an independently sampled uniform random variable from , one can sum the variables together to create a new random variable , ranging in :

(In principle, the above images should be synchronised, so that the value of stays the same from line to line at any given point in time. Unfortunately, due to internet lag, caching, and other web artefacts, you may experience an unpleasant delay between the two. Closing the page, clearing your cache and returning to the page may help.)

If on the other hand one defines the random variable to be , then has the same distribution as (they are both uniformly distributed on , but now there is a very strong correlation between and , leading to completely different behaviour for :

.

Filed under: expository, math.PR Tagged: random variables ]]>

A (real) manifold can be defined in at least two ways. On one hand, one can define the manifold *extrinsically*, as a subset of some standard space such as a Euclidean space . On the other hand, one can define the manifold *intrinsically*, as a topological space equipped with an atlas of coordinate charts. The fundamental *embedding theorems* show that, under reasonable assumptions, the intrinsic and extrinsic approaches give the same classes of manifolds (up to isomorphism in various categories). For instance, we have the following (special case of) the Whitney embedding theorem:

Theorem 1 (Whitney embedding theorem)Let be a compact manifold. Then there exists an embedding from to a Euclidean space .

In fact, if is -dimensional, one can take to equal , which is often best possible (easy examples include the circle which embeds into but not , or the Klein bottle that embeds into but not ). One can also relax the compactness hypothesis on to second countability, but we will not pursue this extension here. We give a “cheap” proof of this theorem below the fold which allows one to take equal to .

A significant strengthening of the Whitney embedding theorem is (a special case of) the Nash embedding theorem:

Theorem 2 (Nash embedding theorem)Let be a compactRiemannianmanifold. Then there exists a isometric embedding from to a Euclidean space .

In order to obtain the isometric embedding, the dimension has to be a bit larger than what is needed for the Whitney embedding theorem; in this article of Gunther the bound

is attained, which I believe is still the record for large . (In the converse direction, one cannot do better than , basically because this is the number of degrees of freedom in the Riemannian metric .) Nash’s original proof of theorem used what is now known as Nash-Moser inverse function theorem, but a subsequent simplification of Gunther allowed one to proceed using just the ordinary inverse function theorem (in Banach spaces).

I recently had the need to invoke the Nash embedding theorem to establish a blowup result for a nonlinear wave equation, which motivated me to go through the proof of the theorem more carefully. Below the fold I give a proof of the theorem that does not attempt to give an optimal value of , but which hopefully isolates the main ideas of the argument (as simplified by Gunther). One advantage of not optimising in is that it allows one to freely exploit the very useful tool of *pairing* together two maps , to form a combined map that can be closer to an embedding or an isometric embedding than the original maps . This lets one perform a “divide and conquer” strategy in which one first starts with the simpler problem of constructing some “partial” embeddings of and then pairs them together to form a “better” embedding.

In preparing these notes, I found the articles of Deane Yang and of Siyuan Lu to be helpful.

** — 1. The Whitney embedding theorem — **

To prove the Whitney embedding theorem, we first prove a weaker version in which the embedding is replaced by an immersion:

Theorem 3 (Weak Whitney embedding theorem)Let be a compact manifold. Then there exists an immersion from to a Euclidean space .

*Proof:* Our objective is to construct a map such that the derivatives are linearly independent in for each . For any given point , we have a coordinate chart from some neighbourhood of to . If we set to be multiplied by a suitable cutoff function supported near , we see that is an immersion in a neighbourhood of . Pairing together finitely many of the and using compactness, we obtain the claim.

Now we upgrade the immersion from the above theorem to an embedding by further use of pairing. First observe that as is smooth and compact, an embedding is nothing more than an immersion that is injective. Let be an immersion. Let be the set of pairs of distinct points such that ; note that this set is compact since is an immersion (and so there is no failure of injectivity when is near the diagonal). If is empty then is injective and we are done. If contains a point , then by pairing with some scalar function that separates and , we can replace by another immersion (in one higher dimension ) such that a neighbourhood of and a neighbourhood of get mapped to disjoint sets, thus effectively removing an open neighbourhood of from . Repeating these procedures finitely many times, using the compactness of , we end up with an immersion which is injective, giving the Whitney embedding theorem.

At present, the embedding of an -dimensional compact manifold could be extremely high dimensional. However, if , then it is possible to project from to by the random projection trick (discussed in this previous post). Indeed, if one picks a random element of the unit sphere, and then lets be the (random) orthogonal projection to the hyperplane orthogonal to , then it is geometrically obvious that will remain an embedding unless either is of the form for some distinct , or lies in the tangent plane to at for some . But the set of all such excluded is of dimension at most (using, for instance, the Hausdorff notion of dimension), and so for almost every in will avoid this set. Thus one can use these projections to cut the dimension down by one for ; iterating this observation we can end up with the final value of for the Whitney embedding theorem.

Remark 4The Whitney embedding theorem for is more difficult to prove. Using the random projection trick, one can arrive at an immersion which is injective except at a finite number of “double points” where meets itself transversally (think of projecting a knot in randomly down to ). One then needs to “push” the double points out of existence using a device known as the “Whitney trick”.

** — 2. Reduction to a local isometric embedding theorem — **

We now begin the proof of the Nash embedding theorem. In this section we make a series of reductions that reduce the “global” problem of isometric embedding a compact manifold to a “local” problem of turning a near-isometric embedding of a torus into a true isometric embedding.

We first make a convenient (though not absolutely necessary) reduction: in order to prove Theorem 2, it suffices to do so in the case when is a torus (equipped with some metric which is not necessarily flat). Indeed, if is not a torus, we can use the Whitney embedding theorem to embed (non-isometrically) into some Euclidean space , which by rescaling and then quotienting out by lets one assume without loss of generality that is some submanifold of a torus equipped with some metric . One can then use a smooth version of the Tietze extension theorem to extend the metric smoothly from to all of ; this extended metric will remain positive definite in some neighbourhood of , so by using a suitable (smooth) partition of unity and taking a convex combination of with the flat metric on , one can find another extension of to that remains positive definite (and symmetric) on all of , giving rise to a Riemannian torus . Any isometric embedding of this torus into will induce an isometric embedding of the original manifold , completing the reduction.

The main advantage of this reduction to the torus case is that it gives us a global system of (periodic) coordinates on , so that we no longer need to work with local coordinate charts. Also, one can easily use Fourier analysis on the torus to verify the ellipticity properties of the Laplacian that we will need later in the proof. These are however fairly minor conveniences, and it would not be difficult to continue the argument below without having first reduced to the torus case.

Henceforth our manifold is assumed to be the torus equipped with a Riemannian metric , where the indices run from to . Our task is to find an injective map which is isometric in the sense that it obeys the system of partial differential equations

for , where denotes the usual dot product on . Let us write this equation as

where is the symmetric tensor

The operator is a nonlinear differential operator, but it behaves very well with respect to pairing:

We can use (2) to obtain a number of very useful reductions (at the cost of worsening the eventual value of , which as stated in the introduction we will not be attempting to optimise). First we claim that we can drop the injectivity requirement on , that is to say it suffices to show that every Riemannian metric on is of the form for some map into some Euclidean space . Indeed, suppose that this were the case. Let be any (not necessarily isometric) embedding (the existence of which is guaranteed by the Whitney embedding theorem; alternatively, one can use the usual exponential map to embed into ). For small enough, the map is short in the sense that pointwise in the sense of symmetric tensors (or equivalently, the map is a contraction from to ). For such an , we can write for some Riemannian metric . If we then write for some (not necessarily injective) map , then from (2) we see that ; since inherits its injectivity from the component map , this gives the desired isometric embedding.

Call a metric on *good* if it is of the form for some map into a Euclidean space . Our task is now to show that every metric is good; the relation (2) tells us that the sum of any two good metrics is good.

In order to make the local theory work later, it will be convenient to introduce the following notion: a map is said to be *free* if, for every point , the vectors , and the vectors , are all linearly independent; equivalently, given a further map , there are no dependencies whatsoever between the scalar functions , and , . Clearly, a free map into is only possible for , and this explains the bulk of the formula (1) of the best known value of .

For any natural number , the “Veronese embedding” defined by

can easily be verified to be free. From this, one can construct a free map by starting with an arbitrary immersion and composing it with the Veronese embedding (the fact that the composition is free will follow after several applications of the chain rule).

Given a Riemannian metric , one can find a free map which is *short* in the sense that , by taking an arbitrary free map and scaling it down by some small scaling factor . This gives us a decomposition

for some Riemannian metric .

The metric is clearly good, so by (2) it would suffice to show that is good. What is easy to show is that is *approximately good*:

Proposition 5Let be a Riemannian metric on . Then there exists a smooth symmetric tensor on with the property that is good for every .

*Proof:* Roughly speaking, the idea here is to use “tightly wound spirals” to capture various “rank one” components of the metric , the point being that if a map “oscillates” at some high frequency with some “amplitude” , then is approximately equal to the rank one tensor . The argument here is related to the technique of *convex integration*, which among other things leads to one way to establish the -principle of Gromov.

By the spectral theorem, every positive definite tensor can be written as a positive linear combination of symmetric rank one tensors for some vector . By adding some additional rank one tensors if necessary, one can make this decomposition stable, in the sense that any nearby tensor is also a positive linear combination of the . One can think of as the gradient of some linear function . Using compactness and a smooth partition of unity, one can then arrive at a decomposition

for some finite , some smooth scalar functions (one can take to be linear functions on small coordinate charts, and to basically be cutoffs to these charts).

For any and , consider the “spiral” map defined by

Direct computation shows that

and the claim follows by summing in (using (2)) and taking .

The claim then reduces to the following local (perturbative) statement, that shows that the property of being good is stable around a free map:

Theorem 6 (Local embedding)Let be a free map. Then is good for all symmetric tensors sufficiently close to zero in the topology.

Indeed, assuming Theorem 6, and with as in Proposition 5, we have good for small enough. By (2) and Proposition 5, we then have good, as required.

The remaining task is to prove Theorem 6. This is a problem in perturbative PDE, to which we now turn.

** — 3. Proof of local embedding — **

We are given a free map and a small tensor . It will suffice to find a perturbation of that solves the PDE

We can expand the left-hand side and cancel off to write this as

where the symmetric tensor-valued first-order linear operator is defined (in terms of the fixed free map ) as

To exploit the free nature of , we would like to write the operator in terms of the inner products and . After some rearranging using the product rule, we arrive at the representation

Among other things, this allows for a way to right-invert the underdetermined linear operator . As is free, we can use Cramer’s rule to find smooth maps for (with ) that is dual to in the sense that

where denotes the Kronecker delta. If one then defines the linear zeroth-order operator from symmetric tensors to maps by the formula

then direct computation shows that for any sufficiently regular . As a consequence of this, one could try to use the ansatz and transform the equation (3) to the fixed point equation

One can hope to solve this equation by standard perturbative techniques, such as the inverse function theorem or the contraction mapping theorem, hopefully exploiting the smallness of to obtain the required contraction. Unfortunately we run into a fundamental *loss of derivatives problem*, in that the quadratic differential operator loses a degree of regularity, and this loss is not recovered by the operator (which has no smoothing properties).

We know of two ways around this difficulty. The original argument of Nash used what is now known as the Nash-Moser iteration scheme to overcome the loss of derivatives by replacing the simple iterative scheme used in the contraction mapping theorem with a much more rapidly convergent scheme that generalises Newton’s method; see this previous blog post for a similar idea. The other way out, due to Gunther, is to observe that can be factored as

where is a *zeroth order* quadratic operator , so that (3) can be written instead as

and using the right-inverse , it now suffices to solve the equation

(compare with (4)), which can be done perturbatively if is indeed zeroth order (e.g. if it is bounded on Hölder spaces such as ).

It remains to achieve the desired factoring (5). We can bilinearise as , where

The basic point is that when is much higher frequency than , then

which can be approximated by applied to some quantity relating to the vector field ; similarly if is much higher frequency than . One can formalise these notions of “much higher frequency” using the machinery of paraproducts, but one can proceed in a slightly more elementary fashion by using the Laplacian operator and its (modified) inverse operator (which is easily defined on the torus using the Fourier transform, and has good smoothing properties) as a substitute for the paraproduct calculus. We begin by writing

The dangerous term here is . Using the product rule and symmetry, we can write

The second term will be “lower order” in that it only involves second derivatives of , rather than third derivatives. As for the higher order term , the main contribution will come from the terms where is higher frequency than (since the Laplacian accentuates high frequencies and dampens low frequencies, as can be seen by inspecting the Fourier symbol of the Laplacian). As such, we can profitably use the approximation (7) here. Indeed, from the product rule we have

Putting all this together, we obtain the decomposition

where

and

If we then use Cramer’s rule to create smooth functions dual to the in the sense that

then we obtain the desired factorisation (5) with

Note that is the smoothing operator applied to quadratic expressions of up to two derivatives of . As such, one can show using elliptic (Schauder) estimates to show that is Lipschitz continuous in the Holder spaces for (with the Lipschitz constant being small when has small norm); this together with the contraction mapping theorem in the Banach space is already enough to solve the equation (6) in this space if is small enough. This is not quite enough because we also need to be smooth; but it is possible (using Schauder estimates and product Hölder estimates) to establish bounds of the form

for any (with implied constants depending on but independent of ), which can be used (for small enough) to show that the solution constructed by the contraction mapping principle lies in for any (by showing that the iterates used in the construction remain bounded in these norms), and is thus smooth.

Filed under: expository, math.AP, math.DG, math.MG Tagged: Nash embedding theorem, Riemannian geometry, Whitney embedding theorem ]]>

As of today, the “L-functions and modular forms database” is now out of beta, and open to the public; at present the database is mostly geared towards specialists in computational number theory, but will hopefully develop into a more broadly useful resource as time develops. An article by John Cremona summarising the purpose of the database can be found here.

(Thanks to Andrew Sutherland and Kiran Kedlaya for the information.)

Filed under: advertising, math.NT Tagged: L-function, Langlands program, modular forms ]]>

Filed under: advertising Tagged: Breakout Fellowship ]]>

Some of these topologies are much stronger than others (in that they contain many more open sets, or equivalently that they have many fewer convergent sequences and nets). However, even the weakest topologies used in analysis (e.g. convergence in distributions) tend to be Hausdorff, since this at least ensures the uniqueness of limits of sequences and nets, which is a fundamentally useful feature for analysis. On the other hand, some Hausdorff topologies used are “better” than others in that many more analysis tools are available for those topologies. In particular, topologies that come from Banach space norms are particularly valued, as such topologies (and their attendant norm and metric structures) grant access to many convenient additional results such as the Baire category theorem, the uniform boundedness principle, the open mapping theorem, and the closed graph theorem.

Of course, most topologies placed on a vector space will not come from Banach space norms. For instance, if one takes the space of continuous functions on that converge to zero at infinity, the topology of uniform convergence comes from a Banach space norm on this space (namely, the uniform norm ), but the topology of pointwise convergence does not; and indeed all the other usual modes of convergence one could use here (e.g. convergence, locally uniform convergence, convergence in measure, etc.) do not arise from Banach space norms.

I recently realised (while teaching a graduate class in real analysis) that the closed graph theorem provides a quick explanation for why Banach space topologies are so rare:

Proposition 1Let be a Hausdorff topological vector space. Then, up to equivalence of norms, there is at most one norm one can place on so that is a Banach space whose topology is at least as strong as . In particular, there is at most one topology stronger than that comes from a Banach space norm.

*Proof:* Suppose one had two norms on such that and were both Banach spaces with topologies stronger than . Now consider the graph of the identity function from the Banach space to the Banach space . This graph is closed; indeed, if is a sequence in this graph that converged in the product topology to , then converges to in norm and hence in , and similarly converges to in norm and hence in . But limits are unique in the Hausdorff topology , so . Applying the closed graph theorem (see also previous discussions on this theorem), we see that the identity map is continuous from to ; similarly for the inverse. Thus the norms are equivalent as claimed.

By using various generalisations of the closed graph theorem, one can generalise the above proposition to Fréchet spaces, or even to F-spaces. The proposition can fail if one drops the requirement that the norms be stronger than a specified Hausdorff topology; indeed, if is infinite dimensional, one can use a Hamel basis of to construct a linear bijection on that is unbounded with respect to a given Banach space norm , and which can then be used to give an inequivalent Banach space structure on .

One can interpret Proposition 1 as follows: once one equips a vector space with some “weak” (but still Hausdorff) topology, there is a *canonical* choice of “strong” topology one can place on that space that is stronger than the “weak” topology but arises from a Banach space structure (or at least a Fréchet or F-space structure), provided that at least one such structure exists. In the case of function spaces, one can usually use the topology of convergence in distribution as the “weak” Hausdorff topology for this purpose, since this topology is weaker than almost all of the other topologies used in analysis. This helps justify the common practice of describing a Banach or Fréchet function space just by giving the set of functions that belong to that space (e.g. is the space of Schwartz functions on ) without bothering to specify the precise topology to serve as the “strong” topology, since it is usually understood that one is using the canonical such topology (e.g. the Fréchet space structure on given by the usual Schwartz space seminorms).

Of course, there are still some topological vector spaces which have no “strong topology” arising from a Banach space at all. Consider for instance the space of finitely supported sequences. A weak, but still Hausdorff, topology to place on this space is the topology of pointwise convergence. But there is no norm stronger than this topology that makes this space a Banach space. For, if there were, then letting be the standard basis of , the series would have to converge in , and hence pointwise, to an element of , but the only available pointwise limit for this series lies outside of . But I do not know if there is an easily checkable criterion to test whether a given vector space (equipped with a Hausdorff “weak” toplogy) can be equipped with a stronger Banach space (or Fréchet space or -space) topology.

Filed under: 245B - Real analysis, expository, math.FA Tagged: Banach spaces, closed graph theorem, strong topology, weak topology ]]>

We use the term “concatenation theorem” to denote results in which structural control of a function in two or more “directions” can be “concatenated” into structural control in a *joint* direction. A trivial example of such a concatenation theorem is the following: if a function is constant in the first variable (thus is constant for each ), and also constant in the second variable (thus is constant for each ), then it is constant in the joint variable . A slightly less trivial example: if a function is affine-linear in the first variable (thus, for each , there exist such that for all ) and affine-linear in the second variable (thus, for each , there exist such that for all ) then is a quadratic polynomial in ; in fact it must take the form

for some real numbers . (This can be seen for instance by using the affine linearity in to show that the coefficients are also affine linear.)

The same phenomenon extends to higher degree polynomials. Given a function from one additive group to another, we say that is of *degree less than * along a subgroup of if all the -fold iterated differences of along directions in vanish, that is to say

for all and , where is the difference operator

(We adopt the convention that the only of degree less than is the zero function.)

We then have the following simple proposition:

Proposition 1 (Concatenation of polynomiality)Let be of degree less than along one subgroup of , and of degree less than along another subgroup of , for some . Then is of degree less than along the subgroup of .

Note the previous example was basically the case when , , , , and .

*Proof:* The claim is trivial for or (in which is constant along or respectively), so suppose inductively and the claim has already been proven for smaller values of .

We take a derivative in a direction along to obtain

where is the shift of by . Then we take a further shift by a direction to obtain

leading to the *cocycle equation*

Since has degree less than along and degree less than along , has degree less than along and less than along , so is degree less than along by induction hypothesis. Similarly is also of degree less than along . Combining this with the cocycle equation we see that is of degree less than along for any , and hence is of degree less than along , as required.

While this proposition is simple, it already illustrates some basic principles regarding how one would go about proving a concatenation theorem:

- (i) One should perform induction on the degrees involved, and take advantage of the recursive nature of degree (in this case, the fact that a function is of less than degree along some subgroup of directions iff all of its first derivatives along are of degree less than ).
- (ii) Structure is preserved by operations such as addition, shifting, and taking derivatives. In particular, if a function is of degree less than along some subgroup , then any derivative of is also of degree less than along ,
*even if does not belong to*.

Here is another simple example of a concatenation theorem. Suppose an at most countable additive group acts by measure-preserving shifts on some probability space ; we call the pair (or more precisely ) a *-system*. We say that a function is a *generalised eigenfunction of degree less than * along some subgroup of and some if one has

almost everywhere for all , and some functions of degree less than along , with the convention that a function has degree less than if and only if it is equal to . Thus for instance, a function is an generalised eigenfunction of degree less than along if it is constant on almost every -ergodic component of , and is a generalised function of degree less than along if it is an eigenfunction of the shift action on almost every -ergodic component of . A basic example of a higher order eigenfunction is the function on the *skew shift* with action given by the generator for some irrational . One can check that for every integer , where is a generalised eigenfunction of degree less than along , so is of degree less than along .

We then have

Proposition 2 (Concatenation of higher order eigenfunctions)Let be a -system, and let be a generalised eigenfunction of degree less than along one subgroup of , and a generalised eigenfunction of degree less than along another subgroup of , for some . Then is a generalised eigenfunction of degree less than along the subgroup of .

The argument is almost identical to that of the previous proposition and is left as an exercise to the reader. The key point is the point (ii) identified earlier: the space of generalised eigenfunctions of degree less than along is preserved by multiplication and shifts, as well as the operation of “taking derivatives” even along directions that do not lie in . (To prove this latter claim, one should restrict to the region where is non-zero, and then divide by to locate .)

A typical example of this proposition in action is as follows: consider the -system given by the -torus with generating shifts

for some irrational , which can be checked to give a action

The function can then be checked to be a generalised eigenfunction of degree less than along , and also less than along , and less than along . One can view this example as the dynamical systems translation of the example (1) (see this previous post for some more discussion of this sort of correspondence).

The main results of our concatenation paper are analogues of these propositions concerning a more complicated notion of “polynomial-like” structure that are of importance in additive combinatorics and in ergodic theory. On the ergodic theory side, the notion of structure is captured by the *Host-Kra characteristic factors* of a -system along a subgroup . These factors can be defined in a number of ways. One is by duality, using the *Gowers-Host-Kra uniformity seminorms* (defined for instance here) . Namely, is the factor of defined up to equivalence by the requirement that

An equivalent definition is in terms of the *dual functions* of along , which can be defined recursively by setting and

where denotes the ergodic average along a Følner sequence in (in fact one can also define these concepts in non-amenable abelian settings as per this previous post). The factor can then be alternately defined as the factor generated by the dual functions for .

In the case when and is -ergodic, a deep theorem of Host and Kra shows that the factor is equivalent to the inverse limit of nilsystems of step less than . A similar statement holds with replaced by any finitely generated group by Griesmer, while the case of an infinite vector space over a finite field was treated in this paper of Bergelson, Ziegler, and myself. The situation is more subtle when is not -ergodic, or when is -ergodic but is a proper subgroup of acting non-ergodically, when one has to start considering measurable families of directional nilsystems; see for instance this paper of Austin for some of the subtleties involved (for instance, higher order group cohomology begins to become relevant!).

One of our main theorems is then

Proposition 3 (Concatenation of characteristic factors)Let be a -system, and let be measurable with respect to the factor and with respect to the factor for some and some subgroups of . Then is also measurable with respect to the factor .

We give two proofs of this proposition in the paper; an ergodic-theoretic proof using the Host-Kra theory of “cocycles of type (along a subgroup )”, which can be used to inductively describe the factors , and a combinatorial proof based on a combinatorial analogue of this proposition which is harder to state (but which roughly speaking asserts that a function which is nearly orthogonal to all bounded functions of small norm, and also to all bounded functions of small norm, is also nearly orthogonal to alll bounded functions of small norm). The combinatorial proof parallels the proof of Proposition 2. A key point is that dual functions obey a property analogous to being a generalised eigenfunction, namely that

where and is a “structured function of order ” along . (In the language of this previous paper of mine, this is an assertion that dual functions are uniformly almost periodic of order .) Again, the point (ii) above is crucial, and in particular it is key that any structure that has is inherited by the associated functions and . This sort of inheritance is quite easy to accomplish in the ergodic setting, as there is a ready-made language of factors to encapsulate the concept of structure, and the shift-invariance and -algebra properties of factors make it easy to show that just about any “natural” operation one performs on a function measurable with respect to a given factor, returns a function that is still measurable in that factor. In the finitary combinatorial setting, though, encoding the fact (ii) becomes a remarkably complicated notational nightmare, requiring a huge amount of “epsilon management” and “second-order epsilon management” (in which one manages not only scalar epsilons, but also function-valued epsilons that depend on other parameters). In order to avoid all this we were forced to utilise a nonstandard analysis framework for the combinatorial theorems, which made the arguments greatly resemble the ergodic arguments in many respects (though the two settings are still not equivalent, see this previous blog post for some comparisons between the two settings). Unfortunately the arguments are still rather complicated.

For combinatorial applications, dual formulations of the concatenation theorem are more useful. A direct dualisation of the theorem yields the following decomposition theorem: a bounded function which is small in norm can be split into a component that is small in norm, and a component that is small in norm. (One may wish to understand this type of result by first proving the following baby version: any function that has mean zero on every coset of , can be decomposed as the sum of a function that has mean zero on every coset, and a function that has mean zero on every coset. This is dual to the assertion that a function that is constant on every coset and constant on every coset, is constant on every coset.) Combining this with some standard “almost orthogonality” arguments (i.e. Cauchy-Schwarz) give the following Bessel-type inequality: if one has a lot of subgroups and a bounded function is small in norm for most , then it is also small in norm for most . (Here is a baby version one may wish to warm up on: if a function has small mean on for some large prime , then it has small mean on most of the cosets of most of the one-dimensional subgroups of .)

There is also a generalisation of the above Bessel inequality (as well as several of the other results mentioned above) in which the subgroups are replaced by more general *coset progressions* (of bounded rank), so that one has a Bessel inequailty controlling “local” Gowers uniformity norms such as by “global” Gowers uniformity norms such as . This turns out to be particularly useful when attempting to compute polynomial averages such as

for various functions . After repeated use of the van der Corput lemma, one can control such averages by expressions such as

(actually one ends up with more complicated expressions than this, but let’s use this example for sake of discussion). This can be viewed as an average of various Gowers uniformity norms of along arithmetic progressions of the form for various . Using the above Bessel inequality, this can be controlled in turn by an average of various Gowers uniformity norms along rank two generalised arithmetic progressions of the form for various . But for generic , this rank two progression is close in a certain technical sense to the “global” interval (this is ultimately due to the basic fact that two randomly chosen large integers are likely to be coprime, or at least have a small gcd). As a consequence, one can use the concatenation theorems from our first paper to control expressions such as (2) in terms of *global* Gowers uniformity norms. This is important in number theoretic applications, when one is interested in computing sums such as

or

where and are the Möbius and von Mangoldt functions respectively. This is because we are able to control global Gowers uniformity norms of such functions (thanks to results such as the proof of the inverse conjecture for the Gowers norms, the orthogonality of the Möbius function with nilsequences, and asymptotics for linear equations in primes), but much less control is currently available for local Gowers uniformity norms, even with the assistance of the generalised Riemann hypothesis (see this previous blog post for some further discussion).

By combining these tools and strategies with the “transference principle” approach from our previous paper (as improved using the recent “densification” technique of Conlon, Fox, and Zhao, discussed in this previous post), we are able in particular to establish the following result:

Theorem 4 (Polynomial patterns in the primes)Let be polynomials of degree at most , whose degree coefficients are all distinct, for some . Suppose that is admissible in the sense that for every prime , there are such that are all coprime to . Then there exist infinitely many pairs of natural numbers such that are prime.

Furthermore, we obtain an asymptotic for the number of such pairs in the range , (actually for minor technical reasons we reduce the range of to be very slightly less than ). In fact one could in principle obtain asymptotics for smaller values of , and relax the requirement that the degree coefficients be distinct with the requirement that no two of the differ by a constant, provided one had good enough local uniformity results for the Möbius or von Mangoldt functions. For instance, we can obtain an asymptotic for triplets of the form unconditionally for , and conditionally on GRH for all , using known results on primes in short intervals on average.

The case of this theorem was obtained in a previous paper of myself and Ben Green (using the aforementioned conjectures on the Gowers uniformity norm and the orthogonality of the Möbius function with nilsequences, both of which are now proven). For higher , an older result of Tamar and myself was able to tackle the case when (though our results there only give lower bounds on the number of pairs , and no asymptotics). Both of these results generalise my older theorem with Ben Green on the primes containing arbitrarily long arithmetic progressions. The theorem also extends to multidimensional polynomials, in which case there are some additional previous results; see the paper for more details. We also get a technical refinement of our previous result on narrow polynomial progressions in (dense subsets of) the primes by making the progressions just a little bit narrower in the case of the density of the set one is using is small.

. This latter Bessel type inequality is particularly useful in combinatorial and number-theoretic applications, as it allows one to convert “global” Gowers uniformity norm (basically, bounds on norms such as ) to “local” Gowers uniformity norm control.

Filed under: math.CO, math.DS, math.NT, paper Tagged: characteristic factor, concatenation theorems, Gowers uniformity norms, polynomial recurrence, Tamar Ziegler ]]>

This phenomenon is superficially similar to the more well known Chebyshev bias concerning the reduction of a single prime to a small modulus , but is in fact a rather different (and much stronger) bias than the Chebyshev bias, and seems to arise from a completely different source. The Chebyshev bias asserts, roughly speaking, that a randomly selected prime of a large magnitude will typically (though not always) be slightly more likely to be a quadratic non-residue modulo than a quadratic residue, but the bias is small (the difference in probabilities is only about for typical choices of ), and certainly consistent with known or conjectured positive results such as Dirichlet’s theorem or the generalised Riemann hypothesis. The reason for the Chebyshev bias can be traced back to the von Mangoldt explicit formula which relates the distribution of the von Mangoldt function modulo with the zeroes of the -functions with period . This formula predicts (assuming some standard conjectures like GRH) that the von Mangoldt function is quite unbiased modulo . The von Mangoldt function is *mostly* concentrated in the primes, but it also has a medium-sized contribution coming from *squares* of primes, which are of course all located in the quadratic residues modulo . (Cubes and higher powers of primes also make a small contribution, but these are quite negligible asymptotically.) To balance everything out, the contribution of the primes must then exhibit a small preference towards quadratic non-residues, and this is the Chebyshev bias. (See this article of Rubinstein and Sarnak for a more technical discussion of the Chebyshev bias, and this survey of Granville and Martin for an accessible introduction. The story of the Chebyshev bias is also related to Skewes’ number, once considered the largest explicit constant to naturally appear in a mathematical argument.)

The paper of Lemke Oliver and Soundararajan considers instead the distribution of the pairs for small and for large consecutive primes , say drawn at random from the primes comparable to some large . For sake of discussion let us just take . Then all primes larger than are either or ; Chebyshev’s bias gives a very slight preference to the latter (of order , as discussed above), but apart from this, we expect the primes to be more or less equally distributed in both classes. For instance, assuming GRH, the probability that lands in would be , and similarly for .

In view of this, one would expect that up to errors of or so, the pair should be equally distributed amongst the four options , , , , thus for instance the probability that this pair is would naively be expected to be , and similarly for the other three tuples. These assertions are not yet proven (although some non-trivial upper and lower bounds for such probabilities can be obtained from recent work of Maynard).

However, Lemke Oliver and Soundararajan argue (backed by both plausible heuristic arguments (based ultimately on the Hardy-Littlewood prime tuples conjecture), as well as substantial numerical evidence) that there is a significant bias away from the tuples and – informally, adjacent primes don’t like being in the same residue class! For instance, they predict that the probability of attaining is in fact

with similar predictions for the other three pairs (in fact they give a somewhat more precise prediction than this). The magnitude of this bias, being comparable to , is significantly stronger than the Chebyshev bias of .

One consequence of this prediction is that the prime gaps are slightly less likely to be divisible by than naive random models of the primes would predict. Indeed, if the four options , , , all occurred with equal probability , then should equal with probability , and and with probability each (as would be the case when taking the difference of two random numbers drawn from those integers not divisible by ); but the Lemke Oliver-Soundararajan bias predicts that the probability of being divisible by three should be slightly lower, being approximately .

Below the fold we will give a somewhat informal justification of (a simplified version of) this phenomenon, based on the Lemke Oliver-Soundararajan calculation using the prime tuples conjecture.

To explain the Lemke Oliver-Soundararajan bias, it is convenient to relax the requirement that the primes are consecutive, and just look at small prime differences between primes that are somewhat close (in the sense that is of size , which corresponds by the prime number theorem to the mean spacing between primes), but not necessarily consecutive. (This relaxation changes some of the constants in the Lemke Oliver-Soundarajaran analysis, basically by eliminating the need to invoke the inclusion-exclusion principle, but does not affect the qualitative nature of the bias.) The naive Cramér random model for the primes (discussed for instance in this post) suggests, as a first approximation, that for any , the number of prime differences that are equal to with should be on the order of . Of course, this naive model is well known to require some adjustment: most obviously, prime differences are almost always even, so the number of solutions to is close to zero when is odd. A little less obviously, values of , such as , which are multiples of three should (all other things being equal) be twice as likely to be prime differences as values of (such as ) which are not; that is to say, one expects about twice as many “sexy primes” as “twin primes“. This is ultimately because the lower prime in a sexy prime pair can lie in either of the two residue classes , , but the lower prime in a twin prime pair can only lie in the residue class (after excluding the first twin prime pair ).

The Lemke Oliver-Soundararajan bias pushes back against this phenomenon slightly; roughly speaking, it says that a typical number that is a multiple of is only about times as likely to be a prime difference as a typical number that is a non-multiple of , for some absolute constant (which can be computed explicitly from their work, but I will not do so here).

This bias can be established assuming the Hardy-Littlewood prime tuples conjecture (with a sufficiently good error term). This conjecture asserts, roughly speaking, that the number of solutions to with and some given even is proportional to , where is the quantity

The proportionality constant depends on the implicit constants in the relation , and also involves the twin prime constant

it will not play an important role though in our analysis, so we omit it. The reason for the factor can be explained from the following simple calculation: if is an odd prime, and we select two numbers independently at random that are coprime to (and equally likely to be in each of the primitive residue classes mod ), then the probability that can be calculated to be times as large as the probability that for any given not divisible by . For instance, if , then is twice as likely to equal as it is equal , as we have already observed before.

Naively, one would expect the quantity to be about twice as large when is a multiple of three than when is not a multiple of , due to the factor of in (1). However, it turns out that when restricting to the range , the average value of for a multiple of is only about as large as the average value of for not a multiple of .

If we strip out the term in (1), creating a new function

then the previous bias turns out to be a consequence of an asymptotic roughly of the form

for some absolute constants and all . (Actually, to avoid some artificial boundary issues one should replace the restriction with a smoother weight such as , but we will ignore this technicality for sake of discussion.) Indeed, assuming the asymptotic (2), we have

whereas

so we see that has a slightly lower mean (by a factor of about ) on the multiples of than in general, which implies the corresponding claim about . We thus see that the Lemke Oliver-Soundarajan bias can be traced to the lower order term in (2).

In the paper of Lemke Oliver and Soundararajan, the asymptotic (2) (smoothed out as discussed above) is obtained from standard complex methods, based on an analysis of the Dirichlet series

As it turns out, this Dirichlet series has poles at both and (it contains a factor of ), contributing to the and terms respectively. One can also establish (2) (with smoothing) using elementary number theory methods (as in this previous post); we sketch the argument as follows. We can factor as a Dirichlet convolution

where vanishes unless is the product of distinct primes greater than or equal to , in which case

(with the convention ). Then we have

Morally speaking, behaves like ; we can use pseudorandomness heuristics to argue that the fluctuation around this main term give a lower order contribution (and one can argue this rigorously when using smoother weights like ). The term can be interpreted as , as per this previous post. Assuming this approximation, we obtain the approximation

The sequence behaves somewhat like . As such, one expects (and can calculate) to have an asymptotic of the form , while has an asymptotic of the form for some explicit constants , which gives (2).

Remark 1One way of thinking about (2) is that the function behaves on the average like . The bulk of the bias effect is then coming from small values of , that is from prime gaps that are significantly smaller than average; one should not see the bias effect if one restricts to prime gaps of the typical size of . This is consistent with the general philosophy that one does not expect to see “long-range” correlations between the primes.

Filed under: expository, math.NT Tagged: Kannan Soundararajan, prime gaps, pseudorandomness, Robert Lemke Oliver ]]>

If is the integers, then there are no non-trivial subgroups, and one can thus expect to start growing with . For instance, one has the following easy result:

*Proof:* We use an argument of Ruzsa, which is based in turn on an older argument of Choi. Let be the largest element of , and then recursively, once has been selected, let be the largest element of not equal to any of the , such that for all , terminating this construction when no such can be located. This gives a sequence of elements in which are sum-free in , and with the property that for any , either is equal to one of the , or else for some with . Iterating this, we see that any is of the form for some and . The number of such expressions is at most , thus which implies . Since , the claim follows.

In particular, we have for subsets of the integers. It has been possible to improve upon this easy bound, but only with remarkable effort. The best lower bound currently is

a result of Shao (building upon earlier work of Sudakov, Szemeredi, and Vu and of Dousse). In the opposite direction, a construction of Ruzsa gives examples of large sets with .

Using the standard tool of Freiman homomorphisms, the above results for the integers extend to other torsion-free abelian groups . In our paper we study the opposite case where is finite (but still abelian). In this paper of Erdös (in which the quantity was first introduced), the following question was posed: if is sufficiently large depending on , does this imply the existence of two elements with ? As it turns out, we were able to find some simple counterexamples to this statement. For instance, if is any finite additive group, then the set has but with no summing to zero; this type of example in fact works with replaced by any larger Mersenne prime, and we also have a counterexample in for arbitrarily large. However, in the positive direction, we can show that the answer to Erdös’s question is positive if is assumed to have no small prime factors. That is to say,

Theorem 2For every there exists such that if is a finite abelian group whose order is not divisible by any prime less than or equal to , and is a subset of with order at least and , then there exist with .

There are two main tools used to prove this result. One is an “arithmetic removal lemma” proven by Král, Serra, and Vena. Note that the condition means that for any *distinct* , at least one of the , , must also lie in . Roughly speaking, the arithmetic removal lemma allows one to “almost” remove the requirement that be distinct, which basically now means that for almost all . This near-dilation symmetry, when combined with the hypothesis that has no small prime factors, gives a lot of “dispersion” in the Fourier coefficients of which can now be exploited to prove the theorem.

The second tool is the following structure theorem, which is the main result of our paper, and goes a fair ways towards classifying sets for which is small:

Theorem 3Let be a finite subset of an arbitrary additive group , with . Then one can find finite subgroups with such that and . Furthermore, if , then the exceptional set is empty.

Roughly speaking, this theorem shows that the example of the union of subgroups mentioned earlier is more or less the “only” example of sets with , modulo the addition of some small exceptional sets and some refinement of the subgroups to dense subsets.

This theorem has the flavour of other inverse theorems in additive combinatorics, such as Freiman’s theorem, and indeed one can use Freiman’s theorem (and related tools, such as the Balog-Szemeredi theorem) to easily get a weaker version of this theorem. Indeed, if there are no sum-free subsets of of order , then a fraction of all pairs in must have their sum also in (otherwise one could take random elements of and they would be sum-free in with positive probability). From this and the Balog-Szemeredi theorem and Freiman’s theorem (in arbitrary abelian groups, as established by Green and Ruzsa), we see that must be “commensurate” with a “coset progression” of bounded rank. One can then eliminate the torsion-free component of this coset progression by a number of methods (e.g. by using variants of the argument in Proposition 1), with the upshot being that one can locate a finite group that has large intersection with .

At this point it is tempting to simply remove from and iterate. But one runs into a technical difficulty that removing a set such as from can alter the quantity in unpredictable ways, so one has to still keep around when analysing the residual set . A second difficulty is that the latter set could be considerably smaller than or , but still large in absolute terms, so in particular any error term whose size is only bounded by for a small could be massive compared with the residual set , and so such error terms would be unacceptable. One can get around these difficulties if one first performs some preliminary “normalisation” of the group , so that the residual set does not intersect any coset of too strongly. The arguments become even more complicated when one starts removing more than one group from and analyses the residual set ; indeed the “epsilon management” involved became so fearsomely intricate that we were forced to use a nonstandard analysis formulation of the problem in order to keep the complexity of the argument at a reasonable level (cf. my previous blog post on this topic). One drawback of doing so is that we have no effective bounds for the implied constants in our main theorem; it would be of interest to obtain a more direct proof of our main theorem that would lead to effective bounds.

Filed under: math.CO, paper Tagged: additive combinatorics, sum-free sets, Van Vu ]]>