You are currently browsing the tag archive for the ‘Ben Green’ tag.
Ben Green and I have updated our paper “An arithmetic regularity lemma, an associated counting lemma, and applications” to account for a somewhat serious issue with the paper that was pointed out to us recently by Daniel Altman. This paper contains two core theorems:
- An “arithmetic regularity lemma” that, roughly speaking, decomposes an arbitrary bounded sequence
on an interval
as an “irrational nilsequence”
of controlled complexity, plus some “negligible” errors (where one uses the Gowers uniformity norm as the main norm to control the neglibility of the error); and
- An “arithmetic counting lemma” that gives an asymptotic formula for counting various averages
for various affine-linear forms
when the functions
are given by irrational nilsequences.
The combination of the two theorems is then used to address various questions in additive combinatorics.
There are no direct issues with the arithmetic regularity lemma. However, it turns out that the arithmetic counting lemma is only true if one imposes an additional property (which we call the “flag property”) on the affine-linear forms . Without this property, there does not appear to be a clean asymptotic formula for these averages if the only hypothesis one places on the underlying nilsequences is irrationality. Thus when trying to understand the asymptotics of averages involving linear forms that do not obey the flag property, the paradigm of understanding these averages via a combination of the regularity lemma and a counting lemma seems to require some significant revision (in particular, one would probably have to replace the existing regularity lemma with some variant, despite the fact that the lemma is still technically true in this setting). Fortunately, for most applications studied to date (including the important subclass of translation-invariant affine forms), the flag property holds; however our claim in the paper to have resolved a conjecture of Gowers and Wolf on the true complexity of systems of affine forms must now be narrowed, as our methods only verify this conjecture under the assumption of the flag property.
In a bit more detail: the asymptotic formula for our counting lemma involved some finite-dimensional vector spaces for various natural numbers
, defined as the linear span of the vectors
as
ranges over the parameter space
. Roughly speaking, these spaces encode some constraints one would expect to see amongst the forms
. For instance, in the case of length four arithmetic progressions when
,
, and
The arguments in our paper turn out to be perfectly correct under the assumption of the “flag property” that for all
. The problem is that the flag property turns out to not always hold. A counterexample, provided by Daniel Altman, involves the four linear forms
Fortunately, the flag property does hold in several key cases, most notably the translation invariant case when contains
, as well as “complexity one” cases. Nevertheless non-flag property systems of affine forms do exist, thus limiting the range of applicability of the techniques in this paper. In particular, the conjecture of Gowers and Wolf (Theorem 1.13 in the paper) is now open again in the non-flag property case.
Ben Green and I have (finally!) uploaded to the arXiv our paper “New bounds for Szemerédi’s theorem, III: A polylogarithmic bound for “, submitted to Mathematika. This is the sequel to two previous papers (and an erratum to the former paper), concerning quantitative versions of Szemerédi’s theorem in the case of length four progressions. This sequel has been delayed for over a decade for a number of reasons, but we have finally managed to write the arguments up to our satisfaction and submit it (to a special issue of Mathematika honouring the work of Klaus Roth).
For any natural number , define
to be the largest cardinality of a subset
of
which does not contain any non-trivial arithmetic progressions
of length four (where “non-trivial” means that
is non-zero). Trivially we have
. In 1969, Szemerédi showed that
. However, the decay rate that could be theoretically extracted from this argument (and from several subsequent proofs of this bound, including one by Roth) were quite poor. The first significant quantitative bound on this quantity was by Gowers, who showed that
for some absolute constant
. In the second paper in the above-mentioned series, we managed to improve this bound to
. In this paper, we improve the bound further to
, which seems to be the limit of the methods. (We remark that if we could take
to be larger than one, this would imply the length four case of a well known conjecture of Erdös that any set of natural numbers whose sum of reciprocals diverges would contain arbitrarily long arithmetic progressions. Thanks to the work of Sanders and of Bloom, the corresponding case of the conjecture for length three conjectures is nearly settled, as it is known that for the analogous bound on
one can take any
less than one.)
Most of the previous work on bounding relied in some form or another on the density increment argument introduced by Roth back in 1953; roughly speaking, the idea is to show that if a dense subset
of
fails to contain arithmetic progressions of length four, one seeks to then locate a long subprogression of
in which
has increased density. This was the basic method for instance underlying our previous bound
, as well as a finite field analogue of the bound
; however we encountered significant technical difficulties for several years in extending this argument to obtain the result of the current paper. Our method is instead based on “energy increment arguments”, and more specifically on establishing quantitative version of a Khintchine-type recurrence theorem, similar to the qualitative recurrence theorems established (in the ergodic theory context) by Bergelson-Host-Kra, and (in the current combinatorial context) by Ben Green and myself.
One way to phrase the latter recurrence theorem is as follows. Suppose that has density
. Then one would expect a “randomly” selected arithmetic progression
in
(using the convention that random variables will be in boldface) to be contained in
with probability about
. This is not true in general, however it was shown by Ben and myself that for any
, there was a set of shifts
of cardinality
, such that for any such
one had
if was chosen uniformly at random from
. This easily implies that
, but does not give a particularly good bound on the decay rate, because the implied constant in the cardinality lower bound
is quite poor (in fact of tower-exponential type, due to the use of regularity lemmas!), and so one has to take
to be extremely large compared to
to avoid the possibility that the set of shifts in the above theorem consists only of the trivial shift
.
We do not know how to improve the lower bound on the set of shifts to the point where it can give bounds that are competitive with those in this paper. However, we can obtain better quantitative results if we permit ourselves to couple together the two parameters and
of the length four progression. Namely, with
,
,
as above, we are able to show that there exist random variables
, not necessarily independent, such that
and such that we have the non-degeneracy bound
This then easily implies the main theorem.
The energy increment method is then deployed to locate a good pair of random variables that will obey the above bounds. One can get some intuition on how to proceed here by considering some model cases. Firstly one can consider a “globally quadratically structured” case in which the indicator function
“behaves like” a globally quadratic function such as
, for some irrational
and some smooth periodic function
of mean
. If one then takes
to be uniformly distributed in
and
respectively for some small
, with no coupling between the two variables, then the left-hand side of (1) is approximately of the form
where the integral is with respect to the probability Haar measure, and the constraint ultimately arises from the algebraic constraint
However, an application of the Cauchy-Schwarz inequality and Fubini’s theorem shows that the integral in (2) is at least , which (morally at least) gives (1) in this case.
Due to the nature of the energy increment argument, it also becomes necessary to consider “locally quadratically structured” cases, in which is partitioned into some number of structured pieces
(think of these as arithmetic progressions, or as “Bohr sets), and on each piece
,
behaves like a locally quadratic function such as
, where
now varies with
, and the mean of
will be approximately
on the average after averaging in
(weighted by the size of the pieces
). Now one should select
and
in the following coupled manner: first one chooses
uniformly from
, then one defines
to be the label
such that
, and then selects
uniformly from a set
which is related to
in much the same way that
is related to
. If one does this correctly, the analogue of (2) becomes
and one can again use Cauchy-Schwarz and Fubini’s theorem to conclude.
The general case proceeds, very roughly, by an iterative argument. At each stage of the iteration, one has some sort of quadratic model of which involves a decomposition of
into structured pieces
, and a quadratic approximation to
on each piece. If this approximation is accurate enough (or more precisely, if a certain (averaged) local Gowers uniformity norm
of the error is small enough) to model the count in (1) (for random variables
determined by the above partition of
into pieces
), and if the frequencies (such as
) involved in the quadratic approximation are “high rank” or “linearly independent over the rationals” in a suitably quantitative sense, then some version of the above arguments can be made to work. If there are some unwanted linear dependencies in the frequencies, we can do some linear algebra to eliminate one of the frequencies (using some geometry of numbers to keep the quantitative bounds under control) and continue the iteration. If instead the approximation is too inaccurate, then the error will be large in a certain averaged local Gowers uniformity norm
. A significant fraction of the paper is then devoted to establishing a quantitative inverse theorem for that norm that concludes (with good bounds) that the error must then locally correlate with locally quadratic phases, which can be used to refine the quadratic approximation to
in a manner that significantly increases its “energy” (basically an
norm). Such energy increments cannot continue indefinitely, and when they terminate we obtain the desired claim.
There are existing inverse theorems for type norms in the literature, going back to the pioneering work of Gowers mentioned previously, and relying on arithmetic combinatorics tools such as Freiman’s theorem and the Balog-Szemerédi-Gowers lemma, which are good for analysing the “
-structured homomorphisms” that arise in Gowers’ argument. However, when we applied these methods to the local Gowers norms we obtained inferior quantitative results that were not strong enough for our application. Instead, we use arguments from a different paper of Gowers in which he tackled Szemerédi’s theorem for arbitrary length progressions. This method produces “
-structured homomorphisms” associated to any function with large Gowers uniformity norm; however the catch is that such homomorphisms are initially supported only on a sparse unstructured set, rather than a structured set such as a Bohr set. To proceed further, one first has to locate inside the sparse unstructured set a sparse pseudorandom subset of a Bohr set, and then use “error-correction” type methods (such as “majority-vote” based algorithms) to locally upgrade this
-structured homomorphism on pseudorandom subsets of Bohr sets to a
-structured homomorphism on the entirety of a Bohr set. It is then possible to use some “approximate cohomology” tools to “integrate” these homomorphisms (and discern a key “local symmetry” property of these homomorphisms) to locate the desired local quadratic structure (in much the same fashion that a
-form on
that varies linearly with the coordinates can be integrated to be the derivative of a quadratic function if we know that the
-form is closed). These portions of the paper are unfortunately rather technical, but broadly follow the methods already used in previous literature.
This week I have been at a Banff workshop “Combinatorics meets Ergodic theory“, focused on the combinatorics surrounding Szemerédi’s theorem and the Gowers uniformity norms on one hand, and the ergodic theory surrounding Furstenberg’s multiple recurrence theorem and the Host-Kra structure theory on the other. This was quite a fruitful workshop, and directly inspired the various posts this week on this blog. Incidentally, BIRS being as efficient as it is, videos for this week’s talks are already online.
As mentioned in the previous two posts, Ben Green, Tamar Ziegler, and myself proved the following inverse theorem for the Gowers norms:
Theorem 1 (Inverse theorem for Gowers norms) Let
and
be integers, and let
. Suppose that
is a function supported on
such that
Then there exists a filtered nilmanifold
of degree
and complexity
, a polynomial sequence
, and a Lipschitz function
of Lipschitz constant
such that
There is a higher dimensional generalisation, which first appeared explicitly (in a more general form) in this preprint of Szegedy (which used a slightly different argument than the one of Ben, Tammy, and myself; see also this previous preprint of Szegedy with related results):
Theorem 2 (Inverse theorem for multidimensional Gowers norms) Let
and
be integers, and let
. Suppose that
is a function supported on
such that
Then there exists a filtered nilmanifold
of degree
and complexity
, a polynomial sequence
, and a Lipschitz function
of Lipschitz constant
such that
The case of this theorem was recently used by Wenbo Sun. One can replace the polynomial sequence with a linear sequence if desired by using a lifting trick (essentially due to Furstenberg, but which appears explicitly in Appendix C of my paper with Ben and Tammy).
In this post I would like to record a very neat and simple observation of Ben Green and Nikos Frantzikinakis, that uses the tool of Freiman isomorphisms to derive Theorem 2 as a corollary of the one-dimensional theorem. Namely, consider the linear map defined by
that is to say is the digit string base
that has digits
. This map is a linear map from
to a subset of
of density
. Furthermore it has the following “Freiman isomorphism” property: if
lie in
with
in the image set
of
for all
, then there exist (unique) lifts
such that
and
for all . Indeed, the injectivity of
on
uniquely determines the sum
for each
, and one can use base
arithmetic to verify that the alternating sum of these sums on any
-facet of the cube
vanishes, which gives the claim. (In the language of additive combinatorics, the point is that
is a Freiman isomorphism of order (say)
on
.)
Now let be the function defined by setting
whenever
, with
vanishing outside of
. If
obeys (1), then from the above Freiman isomorphism property we have
Applying the one-dimensional inverse theorem (Theorem 1), with reduced by a factor of
and
replaced by
, this implies the existence of a filtered nilmanifold
of degree
and complexity
, a polynomial sequence
, and a Lipschitz function
of Lipschitz constant
such that
which by the Freiman isomorphism property again implies that
But the map is clearly a polynomial map from
to
(the composition of two polynomial maps is polynomial, see e.g. Appendix B of my paper with Ben and Tammy), and the claim follows.
Remark 3 This trick appears to be largely restricted to the case of boundedly generated groups such as
; I do not see any easy way to deduce an inverse theorem for, say,
from the
-inverse theorem by this method.
Remark 4 By combining this argument with the one in the previous post, one can obtain a weak ergodic inverse theorem for
-actions. Interestingly, the Freiman isomorphism argument appears to be difficult to implement directly in the ergodic category; in particular, there does not appear to be an obvious direct way to derive the Host-Kra inverse theorem for
actions (a result first obtained in the PhD thesis of Griesmer) from the counterpart for
actions.
A few years ago, Ben Green, Tamar Ziegler, and myself proved the following (rather technical-looking) inverse theorem for the Gowers norms:
Theorem 1 (Discrete inverse theorem for Gowers norms) Let
and
be integers, and let
. Suppose that
is a function supported on
such that
Then there exists a filtered nilmanifold
of degree
and complexity
, a polynomial sequence
, and a Lipschitz function
of Lipschitz constant
such that
For the definitions of “filtered nilmanifold”, “degree”, “complexity”, and “polynomial sequence”, see the paper of Ben, Tammy, and myself. (I should caution the reader that this blog post will presume a fair amount of familiarity with this subfield of additive combinatorics.) This result has a number of applications, for instance to establishing asymptotics for linear equations in the primes, but this will not be the focus of discussion here.
The purpose of this post is to record the observation that this “discrete” inverse theorem, together with an equidistribution theorem for nilsequences that Ben and I worked out in a separate paper, implies a continuous version:
Theorem 2 (Continuous inverse theorem for Gowers norms) Let
be an integer, and let
. Suppose that
is a measurable function supported on
such that
Then there exists a filtered nilmanifold
of degree
and complexity
, a (smooth) polynomial sequence
, and a Lipschitz function
of Lipschitz constant
such that
The interval can be easily replaced with any other fixed interval by a change of variables. A key point here is that the bounds are completely uniform in the choice of
. Note though that the coefficients of
can be arbitrarily large (and this is necessary, as can be seen just by considering functions of the form
for some arbitrarily large frequency
).
It is likely that one could prove Theorem 2 by carefully going through the proof of Theorem 1 and replacing all instances of with
(and making appropriate modifications to the argument to accommodate this). However, the proof of Theorem 1 is quite lengthy. Here, we shall proceed by the usual limiting process of viewing the continuous interval
as a limit of the discrete interval
as
. However there will be some problems taking the limit due to a failure of compactness, and specifically with regards to the coefficients of the polynomial sequence
produced by Theorem 1, after normalising these coefficients by
. Fortunately, a factorisation theorem from a paper of Ben Green and myself resolves this problem by splitting
into a “smooth” part which does enjoy good compactness properties, as well as “totally equidistributed” and “periodic” parts which can be eliminated using the measurability (and thus, approximate smoothness), of
.
Kevin Ford, Ben Green, Sergei Konyagin, James Maynard, and I have just uploaded to the arXiv our paper “Long gaps between primes“. This is a followup work to our two previous papers (discussed in this previous post), in which we had simultaneously shown that the maximal gap
between primes up to exhibited a lower bound of the shape
for some function that went to infinity as
; this improved upon previous work of Rankin and other authors, who established the same bound but with
replaced by a constant. (Again, see the previous post for a more detailed discussion.)
In our previous papers, we did not specify a particular growth rate for . In my paper with Kevin, Ben, and Sergei, there was a good reason for this: our argument relied (amongst other things) on the inverse conjecture on the Gowers norms, as well as the Siegel-Walfisz theorem, and the known proofs of both results both have ineffective constants, rendering our growth function
similarly ineffective. Maynard’s approach ostensibly also relies on the Siegel-Walfisz theorem, but (as shown in another recent paper of his) can be made quite effective, even when tracking
-tuples of fairly large size (about
for some small
). If one carefully makes all the bounds in Maynard’s argument quantitative, one eventually ends up with a growth rate
of shape
thus leading to a bound
on the gaps between primes for large ; this is an unpublished calculation of James’.
In this paper we make a further refinement of this calculation to obtain a growth rate
leading to a bound of the form
for large and some small constant
. Furthermore, this appears to be the limit of current technology (in particular, falling short of Cramer’s conjecture that
is comparable to
); in the spirit of Erdös’ original prize on this problem, I would like to offer 10,000 USD for anyone who can show (in a refereed publication, of course) that the constant
here can be replaced by an arbitrarily large constant
.
The reason for the growth rate (3) is as follows. After following the sieving process discussed in the previous post, the problem comes down to something like the following: can one sieve out all (or almost all) of the primes in by removing one residue class modulo
for all primes
in (say)
? Very roughly speaking, if one can solve this problem with
, then one can obtain a growth rate on
of the shape
. (This is an oversimplification, as one actually has to sieve out a random subset of the primes, rather than all the primes in
, but never mind this detail for now.)
Using the quantitative “dense clusters of primes” machinery of Maynard, one can find lots of -tuples in
which contain at least
primes, for
as large as
or so (so that
is about
). By considering
-tuples in arithmetic progression, this means that one can find lots of residue classes modulo a given prime
in
that capture about
primes. In principle, this means that union of all these residue classes can cover about
primes, allowing one to take
as large as
, which corresponds to (3). However, there is a catch: the residue classes for different primes
may collide with each other, reducing the efficiency of the covering. In our previous papers on the subject, we selected the residue classes randomly, which meant that we had to insert an additional logarithmic safety margin in expected number of times each prime would be shifted out by one of the residue classes, in order to guarantee that we would (with high probability) sift out most of the primes. This additional safety margin is ultimately responsible for the
loss in (2).
The main innovation of this paper, beyond detailing James’ unpublished calculations, is to use ideas from the literature on efficient hypergraph covering, to avoid the need for a logarithmic safety margin. The hypergraph covering problem, roughly speaking, is to try to cover a set of vertices using as few “edges” from a given hypergraph
as possible. If each edge has
vertices, then one certainly needs at least
edges to cover all the vertices, and the question is to see if one can come close to attaining this bound given some reasonable uniform distribution hypotheses on the hypergraph
. As before, random methods tend to require something like
edges before one expects to cover, say
of the vertices.
However, it turns out (under reasonable hypotheses on ) to eliminate this logarithmic loss, by using what is now known as the “semi-random method” or the “Rödl nibble”. The idea is to randomly select a small number of edges (a first “nibble”) – small enough that the edges are unlikely to overlap much with each other, thus obtaining maximal efficiency. Then, one pauses to remove all the edges from
that intersect edges from this first nibble, so that all remaining edges will not overlap with the existing edges. One then randomly selects another small number of edges (a second “nibble”), and repeats this process until enough nibbles are taken to cover most of the vertices. Remarkably, it turns out that under some reasonable assumptions on the hypergraph
, one can maintain control on the uniform distribution of the edges throughout the nibbling process, and obtain an efficient hypergraph covering. This strategy was carried out in detail in an influential paper of Pippenger and Spencer.
In our setup, the vertices are the primes in , and the edges are the intersection of the primes with various residue classes. (Technically, we have to work with a family of hypergraphs indexed by a prime
, rather than a single hypergraph, but let me ignore this minor technical detail.) The semi-random method would in principle eliminate the logarithmic loss and recover the bound (3). However, there is a catch: the analysis of Pippenger and Spencer relies heavily on the assumption that the hypergraph is uniform, that is to say all edges have the same size. In our context, this requirement would mean that each residue class captures exactly the same number of primes, which is not the case; we only control the number of primes in an average sense, but we were unable to obtain any concentration of measure to come close to verifying this hypothesis. And indeed, the semi-random method, when applied naively, does not work well with edges of variable size – the problem is that edges of large size are much more likely to be eliminated after each nibble than edges of small size, since they have many more vertices that could overlap with the previous nibbles. Since the large edges are clearly the more useful ones for the covering problem than small ones, this bias towards eliminating large edges significantly reduces the efficiency of the semi-random method (and also greatly complicates the analysis of that method).
Our solution to this is to iteratively reweight the probability distribution on edges after each nibble to compensate for this bias effect, giving larger edges a greater weight than smaller edges. It turns out that there is a natural way to do this reweighting that allows one to repeat the Pippenger-Spencer analysis in the presence of edges of variable size, and this ultimately allows us to recover the full growth rate (3).
To go beyond (3), one either has to find a lot of residue classes that can capture significantly more than primes of size
(which is the limit of the multidimensional Selberg sieve of Maynard and myself), or else one has to find a very different method to produce large gaps between primes than the Erdös-Rankin method, which is the method used in all previous work on the subject.
It turns out that the arguments in this paper can be combined with the Maier matrix method to also produce chains of consecutive large prime gaps whose size is of the order of (4); three of us (Kevin, James, and myself) will detail this in a future paper. (A similar combination was also recently observed in connection with our earlier result (1) by Pintz, but there are some additional technical wrinkles required to recover the full gain of (3) for the chains of large gaps problem.)
Ben Green and I have just uploaded to the arXiv our new paper “On sets defining few ordinary lines“, submitted to Discrete and Computational Geometry. This paper asymptotically solves two old questions concerning finite configurations of points in the plane
. Given a set
of
points in the plane, define an ordinary line to be a line containing exactly two points of
. The classical Sylvester-Gallai theorem, first posed as a problem by Sylvester in 1893, asserts that as long as the points of
are not all collinear,
defines at least one ordinary line:
It is then natural to pose the question of what is the minimal number of ordinary lines that a set of non-collinear points can generate. In 1940, Melchior gave an elegant proof of the Sylvester-Gallai theorem based on projective duality and Euler’s formula
, showing that at least three ordinary lines must be created; in 1951, Motzkin showed that there must be
ordinary lines. Previously to this paper, the best lower bound was by Csima and Sawyer, who in 1993 showed that there are at least
ordinary lines. In the converse direction, if
is even, then by considering
equally spaced points on a circle, and
points on the line at infinity in equally spaced directions, one can find a configuration of
points that define just
ordinary lines.
As first observed by Böröczky, variants of this example also give few ordinary lines for odd , though not quite as few as
; more precisely, when
one can find a configuration with
ordinary lines, and when
one can find a configuration with
ordinary lines. Our first main result is that these configurations are best possible for sufficiently large
:
Theorem 1 (Dirac-Motzkin conjecture) If
is sufficiently large, then any set of
non-collinear points in the plane will define at least
ordinary lines. Furthermore, if
is odd, at least
ordinary lines must be created.
The Dirac-Motzkin conjecture asserts that the first part of this theorem in fact holds for all , not just for sufficiently large
; in principle, our theorem reduces that conjecture to a finite verification, although our bound for “sufficiently large” is far too poor to actually make this feasible (it is of double exponential type). (There are two known configurations for which one has
ordinary lines, one with
(discovered by Kelly and Moser), and one with
(discovered by Crowe and McKee).)
Our second main result concerns not the ordinary lines, but rather the -rich lines of an
-point set – a line that meets exactly three points of that set. A simple double counting argument (counting pairs of distinct points in the set in two different ways) shows that there are at most
-rich lines. On the other hand, on an elliptic curve, three distinct points P,Q,R on that curve are collinear precisely when they sum to zero with respect to the group law on that curve. Thus (as observed first by Sylvester in 1868), any finite subgroup of an elliptic curve (of which one can produce numerous examples, as elliptic curves in
have the group structure of either
or
) can provide examples of
-point sets with a large number of
-rich lines (
, to be precise). One can also shift such a finite subgroup by a third root of unity and obtain a similar example with only one fewer
-rich line. Sylvester then formally posed the question of determining whether this was best possible.
This problem was known as the Orchard planting problem, and was given a more poetic formulation as such by Jackson in 1821 (nearly fifty years prior to Sylvester!):
Our second main result answers this problem affirmatively in the large case:
Theorem 2 (Orchard planting problem) If
is sufficiently large, then any set of
points in the plane will determine at most
![]()
-rich lines.
Again, our threshold for “sufficiently large” for this is extremely large (though slightly less large than in the previous theorem), and so a full solution of the problem, while in principle reduced to a finitary computation, remains infeasible at present.
Our results also classify the extremisers (and near extremisers) for both of these problems; basically, the known examples mentioned previously are (up to projective transformation) the only extremisers when is sufficiently large.
Our proof strategy follows the “inverse theorem method” from additive combinatorics. Namely, rather than try to prove direct theorems such as lower bounds on the number of ordinary lines, or upper bounds on the number of -rich lines, we instead try to prove inverse theorems (also known as structure theorems), in which one attempts a structural classification of all configurations with very few ordinary lines (or very many
-rich lines). In principle, once one has a sufficiently explicit structural description of these sets, one simply has to compute the precise number of ordinary lines or
-rich lines in each configuration in the list provided by that structural description in order to obtain results such as the two theorems above.
Note from double counting that sets with many -rich lines will necessarily have few ordinary lines. Indeed, if we let
denote the set of lines that meet exactly
points of an
-point configuration, so that
is the number of
-rich lines and
is the number of ordinary lines, then we have the double counting identity
which among other things implies that any counterexample to the orchard problem can have at most ordinary lines. In particular, any structural theorem that lets us understand configurations with
ordinary lines will, in principle, allow us to obtain results such as the above two theorems.
As it turns out, we do eventually obtain a structure theorem that is strong enough to achieve these aims, but it is difficult to prove this theorem directly. Instead we proceed more iteratively, beginning with a “cheap” structure theorem that is relatively easy to prove but provides only a partial amount of control on the configurations with ordinary lines. One then builds upon that theorem with additional arguments to obtain an “intermediate” structure theorem that gives better control, then a “weak” structure theorem that gives even more control, a “strong” structure theorem that gives yet more control, and then finally a “very strong” structure theorem that gives an almost complete description of the configurations (but only in the asymptotic regime when
is very large). It turns out that the “weak” theorem is enough for the orchard planting problem, and the “strong” version is enough for the Dirac-Motzkin conjecture. (So the “very strong” structure theorem ends up being unnecessary for the two applications given, but may be of interest for other applications.) Note that the stronger theorems do not completely supercede the weaker ones, because the quantitative bounds in the theorems get progressively worse as the control gets stronger.
Before we state these structure theorems, note that all the examples mentioned previously of sets with few ordinary lines involved cubic curves: either irreducible examples such as elliptic curves, or reducible examples such as the union of a circle (or more generally, a conic section) and a line. (We also allow singular cubic curves, such as the union of a conic section and a tangent line, or a singular irreducible curve such as .) This turns out to be no coincidence; cubic curves happen to be very good at providing many
-rich lines (and thus, few ordinary lines), and conversely it turns out that they are essentially the only way to produce such lines. This can already be evidenced by our cheap structure theorem:
Theorem 3 (Cheap structure theorem) Let
be a configuration of
points with at most
ordinary lines for some
. Then
can be covered by at most
cubic curves.
This theorem is already a non-trivial amount of control on sets with few ordinary lines, but because the result does not specify the nature of these curves, and how they interact with each other, it does not seem to be directly useful for applications. The intermediate structure theorem given below gives a more precise amount of control on these curves (essentially guaranteeing that all but at most one of the curve components are lines):
Theorem 4 (Intermediate structure theorem) Let
be a configuration of
points with at most
ordinary lines for some
. Then one of the following is true:
lies on the union of an irreducible cubic curve and an additional
points.
lies on the union of an irreducible conic section and an additional
lines, with
of the points on
in either of the two components.
lies on the union of
lines and an additional
points.
By some additional arguments (including a very nice argument supplied to us by Luke Alexander Betts, an undergraduate at Cambridge, which replaces a much more complicated (and weaker) argument we originally had for this paper), one can cut down the number of lines in the above theorem to just one, giving a more useful structure theorem, at least when is large:
Theorem 5 (Weak structure theorem) Let
be a configuration of
points with at most
ordinary lines for some
. Assume that
for some sufficiently large absolute constant
. Then one of the following is true:
lies on the union of an irreducible cubic curve and an additional
points.
lies on the union of an irreducible conic section, a line, and an additional
points, with
of the points on
in either of the first two components.
lies on the union of a single line and an additional
points.
As mentioned earlier, this theorem is already strong enough to resolve the orchard planting problem for large . The presence of the double exponential here is extremely annoying, and is the main reason why the final thresholds for “sufficiently large” in our results are excessively large, but our methods seem to be unable to eliminate these exponentials from our bounds (though they can fortunately be confined to a lower bound for
, keeping the other bounds in the theorem polynomial in
).
For the Dirac-Motzkin conjecture one needs more precise control on the portion of on the various low-degree curves indicated. This is given by the following result:
Theorem 6 (Strong structure theorem) Let
be a configuration of
points with at most
ordinary lines for some
. Assume that
for some sufficiently large absolute constant
. Then, after adding or deleting
points from
if necessary (modifying
appropriately), and then applying a projective transformation, one of the following is true:
is a finite subgroup of an elliptic curve (EDIT: as pointed out in comments, one also needs to allow for finite subgroups of acnodal singular cubic curves), possibly shifted by a third root of unity.
is the Borozcky example mentioned previously (the union of
equally spaced points on the circle, and
points on the line at infinity).
lies on a single line.
By applying a final “cleanup” we can replace the in the above theorem with the optimal
, which is our “very strong” structure theorem. But the strong structure theorem is already sufficient to establish the Dirac-Motzkin conjecture for large
.
There are many tools that go into proving these theorems, some of which are extremely classical (with at least one going back to the ancient Greeks), and others being more recent. I will discuss some (not all) of these tools below the fold, and more specifically:
- Melchior’s argument, based on projective duality and Euler’s formula, initially used to prove the Sylvester-Gallai theorem;
- Chasles’ version of the Cayley-Bacharach theorem, which can convert dual triangular grids (produced by Melchior’s argument) into cubic curves that meet many points of the original configuration
);
- Menelaus’s theorem, which is useful for producing ordinary lines when the point configuration lies on a few non-concurrent lines, particularly when combined with a sum-product estimate of Elekes, Nathanson, and Ruzsa;
- Betts’ argument, that produces ordinary lines when the point configuration lies on a few concurrent lines;
- A result of Poonen and Rubinstein that any point not on the origin or unit circle can lie on at most seven chords connecting roots of unity; this, together with a variant for elliptic curves, gives the very strong structure theorem, and is also (a strong version of) what is needed to finish off the Dirac-Motzkin and orchard planting problems from the structure theorems given above.
There are also a number of more standard tools from arithmetic combinatorics (e.g. a version of the Balog-Szemeredi-Gowers lemma) which are needed to tie things together at various junctures, but I won’t say more about these methods here as they are (by now) relatively routine.
Ben Green and I have just uploaded to the arXiv our paper “New bounds for Szemeredi’s theorem, Ia: Progressions of length 4 in finite field geometries revisited“, submitted to Proc. Lond. Math. Soc.. This is both an erratum to, and a replacement for, our previous paper “New bounds for Szemeredi’s theorem. I. Progressions of length 4 in finite field geometries“. The main objective in both papers is to bound the quantity for a vector space
over a finite field
of characteristic greater than
, where
is defined as the cardinality of the largest subset of
that does not contain an arithmetic progression of length
. In our earlier paper, we gave two arguments that bounded
in the regime when the field
was fixed and
was large. The first “cheap” argument gave the bound
and the more complicated “expensive” argument gave the improvement
for some constant depending only on
.
Unfortunately, while the cheap argument is correct, we discovered a subtle but serious gap in our expensive argument in the original paper. Roughly speaking, the strategy in that argument is to employ the density increment method: one begins with a large subset of
that has no arithmetic progressions of length
, and seeks to locate a subspace on which
has a significantly increased density. Then, by using a “Koopman-von Neumann theorem”, ultimately based on an iteration of the inverse
theorem of Ben and myself (and also independently by Samorodnitsky), one approximates
by a “quadratically structured” function
, which is (locally) a combination of a bounded number of quadratic phase functions, which one can prepare to be in a certain “locally equidistributed” or “locally high rank” form. (It is this reduction to the high rank case that distinguishes the “expensive” argument from the “cheap” one.) Because
has no progressions of length
, the count of progressions of length
weighted by
will also be small; by combining this with the theory of equidistribution of quadratic phase functions, one can then conclude that there will be a subspace on which
has increased density.
The error in the paper was to conclude from this that the original function also had increased density on the same subspace; it turns out that the manner in which
approximates
is not strong enough to deduce this latter conclusion from the former. (One can strengthen the nature of approximation until one restores such a conclusion, but only at the price of deteriorating the quantitative bounds on
one gets at the end of the day to be worse than the cheap argument.)
After trying unsuccessfully to repair this error, we eventually found an alternate argument, based on earlier papers of ourselves and of Bergelson-Host-Kra, that avoided the density increment method entirely and ended up giving a simpler proof of a stronger result than (1), and also gives the explicit value of for the exponent
in (1). In fact, it gives the following stronger result:
Theorem 1 Let
be a subset of
of density at least
, and let
. Then there is a subspace
of
of codimension
such that the number of (possibly degenerate) progressions
in
is at least
.
The bound (1) is an easy consequence of this theorem after choosing and removing the degenerate progressions from the conclusion of the theorem.
The main new idea is to work with a local Koopman-von Neumann theorem rather than a global one, trading a relatively weak global approximation to with a significantly stronger local approximation to
on a subspace
. This is somewhat analogous to how sometimes in graph theory it is more efficient (from the point of view of quantative estimates) to work with a local version of the Szemerédi regularity lemma which gives just a single regular pair of cells, rather than attempting to regularise almost all of the cells. This local approach is well adapted to the inverse
theorem we use (which also has this local aspect), and also makes the reduction to the high rank case much cleaner. At the end of the day, one ends up with a fairly large subspace
on which
is quite dense (of density
) and which can be well approximated by a “pure quadratic” object, namely a function of a small number of quadratic phases obeying a high rank condition. One can then exploit a special positivity property of the count of length four progressions weighted by pure quadratic objects, essentially due to Bergelson-Host-Kra, which then gives the required lower bound.
Emmanuel Breuillard, Ben Green and I have just uploaded to the arXiv the short paper “A nilpotent Freiman dimension lemma“, submitted to the special volume of the European Journal of Combinatorics in honour of Yahya Ould Hamidoune. This paper is a nonabelian (or more precisely, nilpotent) variant of the following additive combinatorics lemma of Freiman:
Freiman’s lemma. Let A be a finite subset of a Euclidean space with
. Then A is contained in an affine subspace of dimension at most
.
This can be viewed as a “cheap” version of the more well known theorem of Freiman that places sets of small doubling in a torsion-free abelian group inside a generalised arithmetic progression. The advantage here is that the bound on the dimension is extremely explicit.
Our main result is
Theorem. Let A be a finite subset of a simply-connected nilpotent Lie group G which is a K-approximate group (i.e. A is symmetric, contains the identity, and
can be covered by up to K left translates of A. Then A can be covered by at most
left-translates of a closed connected Lie subgroup of dimension at most
.
We remark that our previous paper established a similar result, in which the dimension bound was improved to , but at the cost of worsening the covering number to
, and with a much more complicated proof (91 pages instead of 8). Furthermore, the bound on
is ineffective, due to the use of ultraproducts in the argument (though it is likely that some extremely lousy explicit bound could eventually be squeezed out of the argument by finitising everything). Note that the step of the ambient nilpotent group G does not influence the final bounds in the theorem, although we do of course need this step to be finite. A simple quotienting argument allows one to deduce a corollary of the above theorem in which the ambient group is assumed to be residually torsion-free nilpotent instead of being a simply connected nilpotent Lie group, but we omit the statement of this corollary here.
To motivate the proof of this theorem, let us first show a simple case of an argument of Gleason, which is very much in the spirit of Freiman’s lemma:
Gleason Lemma (special case). Let
be a finite symmetric subset of a Euclidean space, and let
be a sequence of subspaces in this space, such that the sets
are strictly increasing in i for
. Then
, where
.
Proof. By hypothesis, for each , the projection
of
to
is non-trivial, finite, and symmetric. In particular, since the vector space
is torsion-free,
is strictly larger than
. Equivalently, one can find
in
that does not lie in
; in particular,
and
is disjoint from
. As a consequence, the
are disjoint and lie in 5A, whence the claim.
Note that by combining the contrapositive of this lemma with a greedy algorithm, one can show that any K-approximate group in a Euclidean space is contained in a subspace of dimension at most , which is a weak version of Freiman’s lemma.
To extend the argument to the nilpotent setting we use the following idea. Observe that any non-trivial genuine subgroup H of a nilpotent group G will contain at least one non-trivial central element; indeed, by intersecting H with the lower central series of G, and considering the last intersection
which is non-trivial, one obtains the claim. It turns out that one can adapt this argument to approximate groups, so that any sufficiently large K-approximate subgroup A of G will contain a non-trivial element that centralises a large fraction of A. Passing to this large fraction and quotienting out the central element, we obtain a new approximate group. If, after a bounded number of steps, this procedure gives an approximate group of bounded size, we are basically done. If, however, the process continues, then by using some Lie group theory, one can find a long sequence
of connected Lie subgroups of G, such that the sets
are strictly increasing in i. Using some Lie group theory and the hypotheses on G, one can deduce that the group
generated by
is much larger than
, in the sense that the latter group has infinite index in the former. It then turns out that the Gleason argument mentioned above can be adapted to this setting.
Emmanuel Breuillard, Ben Green, and I have just uploaded to the arXiv our paper “The structure of approximate groups“, submitted to Pub. IHES. We had announced the main results of this paper in various forums (including this blog) for a few months now, but it had taken some time to fully write up the paper and put in various refinements and applications.
As announced previously, the main result of this paper is what is a (virtually, qualitatively) complete description of finite approximate groups in an arbitrary (local or global) group . For simplicity let us work in the much more familiar setting of global groups, although our results also apply (but are a bit more technical to state) in the local group setting.
Recall that in a global group , a
-approximate group is a symmetric subset
of
containing the origin, with the property that the product set
is covered by
left-translates of
. Examples of
-approximate groups include genuine groups, convex bodies in a bounded dimensional vector space, small balls in a bounded dimensional Lie group, large balls in a discrete nilpotent group of bounded rank or step, or generalised arithmetic progressions (or more generally, coset progressions) of bounded rank in an abelian group. Specialising now to finite approximate groups, a key example of such a group is what we call a coset nilprogression: a set of the form
, where
is a homomorphism with finite kernel from a subgroup
of
to a nilpotent group
of bounded step, and
is a nilprogression with a bounded number of generators
in
and some lengths
, where
consists of all the words involving at most
copies of
,
copies of
, and so forth up to
copies of
. One can show (by some nilpotent algebra) that all such coset nilprogressions are
-approximate groups so long as the step and the rank
are bounded (and if
are sufficiently large).
Our main theorem (which was essentially conjectured independently by Helfgott and by Lindenstrauss) asserts, roughly speaking, that coset nilprogressions are essentially the only examples of approximate groups.
Theorem 1 Let
be a
-approximate group. Then
contains a coset nilprogression
of rank and step
, such that
can be covered by
left-translates of
.
In the torsion-free abelian case, this result is essentially Freiman’s theorem (with an alternate proof by Ruzsa); for general abelian case, it is due to Green and Ruzsa. Various partial results in this direction for some other groups (e.g. free groups, nilpotent groups, solvable groups, or simple groups of Lie type) are also known; see these previous blog posts for a summary of several of these results.
This result has a number of applications to geometric growth theory, and in particular to variants of Gromov’s theorem of groups of polynomial growth, which asserts that a finitely generated group is of polynomial growth if and only if it is virtually nilpotent. The connection lies in the fact that if the balls associated to a finite set of generators
has polynomial growth, then some simple volume-packing arguments combined with the pigeonhole principle will show that
will end up being a
-approximate group for many radii
. In fact, since our theorem only needs a single approximate group to obtain virtually nilpotent structure, we are able to obtain some new strengthenings of Gromov’s theorem. For instance, if
is any
-approximate group in a finitely generated group
that contains
for some set of generators
and some
that is sufficiently large depending on
, our theorem implies that
is virtually nilpotent, answering a question of Petrunin. Among other things, this gives an alternate proof of a recent result of Kapovitch and Wilking (see also this previous paper of Cheeger and Colding) that a compact manifold of bounded diameter and Ricci curvature at least
necessarily has a virtually nilpotent fundamental group if
is sufficiently small (depending only on dimension). The main point here is that no lower bound on the injectivity radius is required. Another application is a “Margulis-type lemma”, which asserts that if a metric space
has “bounded packing” (in the sense that any ball of radius (say)
is covered by a bounded number of balls of radius
), and
is a group of isometries on
that acts discretely (i.e. every orbit has only finitely many elements (counting multiplicity) in each bounded set), then the near-stabiliser
of a point
is virtually nilpotent if
is small enough depending on the packing constant.
There are also some variants and refinements to the main theorem proved in the paper, such as an extension to local groups, and also an improvement on the bound on the rank and step from to
(but at the cost of replacing
in the theorem with
).
I’ll be discussing the proof of the main theorem in detail in the next few lecture notes of my current graduate course. The full proof is somewhat lengthy (occupying about 50 pages of the 90-page paper), but can be summarised in the following steps:
- (Hrushovski) Take an arbitrary sequence
of finite
-approximate groups, and show that an appropriate limit
of such groups can be “modeled” in some sense by an open bounded subset of a locally compact group. (The precise definition of “model” is technical, but “macroscopically faithful representation” is a good first approximation.) As discussed in the previous lecture notes, we use an ultralimit for this purpose; the paper of Hrushovski where this strategy was first employed also considered more sophisticated model-theoretic limits. To build a locally compact topology, Hrushovski used some tools from definability theory; in our paper, we instead use a combinatorial lemma of Sanders (closely related to a similar result of Croot and Sisask.)
- (Gleason-Yamabe) The locally compact group can in turn be “modeled” by a Lie group (possibly after shrinking the group, and thus the ultralimit
, slightly). (This result arose from the solution to Hilbert’s fifth problem, as discussed here. For our extension to local groups, we use a recent local version of the Gleason-Yamabe theorem, due to Goldbring.)
- (Gleason) Using the escape properties of the Lie model, construct a norm
(and thus a left-invariant metric
) on the ultralimit approximate group
(and also on the finitary groups
) that obeys a number of good properties, such as a commutator estimate
. (This is modeled on an analogous construction used in the theory of Hilbert’s fifth problem, as discussed in this previous set of lecture notes.) This norm is essentially an escape norm associated to (a slight modification) of
or
.
- (Jordan-Bieberbach-Frobenius) We now take advantage of the finite nature of the
by locating the non-trivial element
of
with minimal escape norm (but one has to first quotient out the elements of zero escape norm first). The commutator estimate mentioned previously ensures that this element is essentially “central” in
. One can then quotient out a progression
generated by this central element (reducing the dimension of the Lie model by one in the process) and iterates the process until the dimension of the model drops to zero. Reversing the process, this constructs a coset nilprogression inside
. This argument is based on the classic proof of Jordan’s theorem due to Bieberbach and Frobenius, as discussed in this blog post.
One quirk of the argument is that it requires one to work in the category of local groups rather than global groups. (This is somewhat analogous to how, in the standard proofs of Freiman’s theorem, one needs to work with the category of Freiman homomorphisms, rather than group homomorphisms.) The reason for this arises when performing the quotienting step in the Jordan-Bieberbach-Frobenius leg of the argument. The obvious way to perform this step (and the thing that we tried first) would be to quotient out by the entire cyclic group generated by the element
of minimal escape norm. However, it turns out that this doesn’t work too well, because the group quotiented out is so “large” that it can create a lot of torsion in the quotient. In particular, elements which used to have positive escape norm, can now become trapped in the quotient of
, thus sending their escape norm to zero. This leads to an inferior conclusion (in which a coset nilprogression is replaced by a more complicated tower of alternating extensions between central progressions and finite groups, similar to the towers encountered in my previous paper on this topic). To prevent this unwanted creation of torsion, one has to truncate the cyclic group
before it escapes
, so that one quotients out by a geometric progression
rather than the cyclic group. But the operation of quotienting out by a
, which is a local group rather than a global one, cannot be formalised in the category of global groups, but only in the category of local groups. Because of this, we were forced to carry out the entire argument using the language of local groups. As it turns out, the arguments are ultimately more natural in this setting, although there is an initial investment of notation required, given that global group theory is much more familiar and well-developed than local group theory.
One interesting feature of the argument is that it does not use much of the existing theory of Freiman-type theorems, instead building the coset nilprogression directly from the geometric properties of the approximate group. In particular, our argument gives a new proof of Freiman’s theorem in the abelian case, which largely avoids Fourier analysis (except through the use of the theory of Hilbert’s fifth problem, which uses the Peter-Weyl theorem (or, in the abelian case, Pontryagin duality), which is basically a version of Fourier analysis).
Recent Comments