You are currently browsing the tag archive for the ‘Ben Green’ tag.

Ben Green and I have (finally!) uploaded to the arXiv our paper “New bounds for Szemerédi’s theorem, III: A polylogarithmic bound for ${r_4(N)}$“, submitted to Mathematika. This is the sequel to two previous papers (and an erratum to the former paper), concerning quantitative versions of Szemerédi’s theorem in the case of length four progressions. This sequel has been delayed for over a decade for a number of reasons, but we have finally managed to write the arguments up to our satisfaction and submit it (to a special issue of Mathematika honouring the work of Klaus Roth).

For any natural number ${N}$, define ${r_4(N)}$ to be the largest cardinality of a subset ${A}$ of ${[N] = \{1,\dots,N\}}$ which does not contain any non-trivial arithmetic progressions ${a, a+r, a+2r, a+3r}$ of length four (where “non-trivial” means that ${r}$ is non-zero). Trivially we have ${r_4(N) \leq N}$. In 1969, Szemerédi showed that ${r_4(N) = o(N)}$. However, the decay rate that could be theoretically extracted from this argument (and from several subsequent proofs of this bound, including one by Roth) were quite poor. The first significant quantitative bound on this quantity was by Gowers, who showed that ${r_4(N) \ll N (\log \log N)^{-c}}$ for some absolute constant ${c>0}$. In the second paper in the above-mentioned series, we managed to improve this bound to ${r_4(N) \ll N \exp( - c \sqrt{\log \log N})}$. In this paper, we improve the bound further to ${r_4(N) \ll N (\log N)^{-c}}$, which seems to be the limit of the methods. (We remark that if we could take ${c}$ to be larger than one, this would imply the length four case of a well known conjecture of Erdös that any set of natural numbers whose sum of reciprocals diverges would contain arbitrarily long arithmetic progressions. Thanks to the work of Sanders and of Bloom, the corresponding case of the conjecture for length three conjectures is nearly settled, as it is known that for the analogous bound on ${r_3(N)}$ one can take any ${c}$ less than one.)

Most of the previous work on bounding ${r_4(N)}$ relied in some form or another on the density increment argument introduced by Roth back in 1953; roughly speaking, the idea is to show that if a dense subset ${A}$ of ${[N]}$ fails to contain arithmetic progressions of length four, one seeks to then locate a long subprogression of ${[N]}$ in which ${A}$ has increased density. This was the basic method for instance underlying our previous bound ${r_4(N) \ll N \exp( - c \sqrt{\log \log N})}$, as well as a finite field analogue of the bound ${r_4(N) \ll N (\log N)^{-c}}$; however we encountered significant technical difficulties for several years in extending this argument to obtain the result of the current paper. Our method is instead based on “energy increment arguments”, and more specifically on establishing quantitative version of a Khintchine-type recurrence theorem, similar to the qualitative recurrence theorems established (in the ergodic theory context) by Bergelson-Host-Kra, and (in the current combinatorial context) by Ben Green and myself.

One way to phrase the latter recurrence theorem is as follows. Suppose that ${A \subset [N]}$ has density ${\delta}$. Then one would expect a “randomly” selected arithmetic progression ${{\bf a}, {\bf a}+{\bf r}, {\bf a}+2{\bf r}, {\bf a}+3{\bf r}}$ in ${[N]}$ (using the convention that random variables will be in boldface) to be contained in ${A}$ with probability about ${\delta^4}$. This is not true in general, however it was shown by Ben and myself that for any ${\eta>0}$, there was a set of shifts ${r \in [-N,N]}$ of cardinality ${\gg_{\delta,\eta} N}$, such that for any such ${r}$ one had

$\displaystyle {\bf P}( {\bf a}, {\bf a}+r, {\bf a}+2r, {\bf a}+3r \in A ) \geq \delta^4 - \eta$

if ${{\bf a}}$ was chosen uniformly at random from ${[N]}$. This easily implies that ${r_4(N) = o(N)}$, but does not give a particularly good bound on the decay rate, because the implied constant in the cardinality lower bound ${\gg_{\delta,\eta} N}$ is quite poor (in fact of tower-exponential type, due to the use of regularity lemmas!), and so one has to take ${N}$ to be extremely large compared to ${\delta,\eta}$ to avoid the possibility that the set of shifts in the above theorem consists only of the trivial shift ${r=0}$.

We do not know how to improve the lower bound on the set of shifts to the point where it can give bounds that are competitive with those in this paper. However, we can obtain better quantitative results if we permit ourselves to couple together the two parameters ${{\bf a}}$ and ${{\bf r}}$ of the length four progression. Namely, with ${A}$, ${\delta}$, ${\eta}$ as above, we are able to show that there exist random variables ${{\bf a}, {\bf r}}$, not necessarily independent, such that

$\displaystyle {\bf P}( {\bf a}, {\bf a}+{\bf r}, {\bf a}+2{\bf r}, {\bf a}+3{\bf r} \in A ) \geq \delta^4 - \eta \ \ \ \ \ (1)$

and such that we have the non-degeneracy bound

$\displaystyle {\bf P}( {\bf r} = 0 ) \ll \exp( - \eta^{-O(1)} ) / N.$

This then easily implies the main theorem.

The energy increment method is then deployed to locate a good pair ${({\bf a}, {\bf r})}$ of random variables that will obey the above bounds. One can get some intuition on how to proceed here by considering some model cases. Firstly one can consider a “globally quadratically structured” case in which the indicator function ${1_A}$ “behaves like” a globally quadratic function such as ${F( \alpha n^2 )}$, for some irrational ${\alpha}$ and some smooth periodic function ${F: {\bf R}/{\bf Z} \rightarrow {\bf R}}$ of mean ${\delta}$. If one then takes ${{\bf a}, {\bf r}}$ to be uniformly distributed in ${[N]}$ and ${[-\varepsilon N, \varepsilon N]}$ respectively for some small ${\varepsilon>0}$, with no coupling between the two variables, then the left-hand side of (1) is approximately of the form

$\displaystyle \int_{(x,y,z,w) \in ({\bf R}/{\bf Z})^4: x-3y+3z-w = 0} F(x) F(y) F(z) F(w) \ \ \ \ \ (2)$

where the integral is with respect to the probability Haar measure, and the constraint ${x-3y+3z-w=0}$ ultimately arises from the algebraic constraint

$\displaystyle \alpha {\bf a}^2 - 3 \alpha ({\bf a}+{\bf r})^2 + 3 \alpha ({\bf a}+2{\bf r})^2 - \alpha ({\bf a}+3{\bf r})^2 = 0.$

However, an application of the Cauchy-Schwarz inequality and Fubini’s theorem shows that the integral in (2) is at least ${(\int_{{\bf R}/{\bf Z}} F)^4}$, which (morally at least) gives (1) in this case.

Due to the nature of the energy increment argument, it also becomes necessary to consider “locally quadratically structured” cases, in which ${[N]}$ is partitioned into some number of structured pieces ${B_c}$ (think of these as arithmetic progressions, or as “Bohr sets), and on each piece ${B_c}$, ${1_A}$ behaves like a locally quadratic function such as ${F_c( \alpha_c n^2 )}$, where ${\alpha_c}$ now varies with ${c}$, and the mean of ${F_c}$ will be approximately ${\delta}$ on the average after averaging in ${c}$ (weighted by the size of the pieces ${B_c}$). Now one should select ${{\bf a}}$ and ${{\bf r}}$ in the following coupled manner: first one chooses ${{\bf a}}$ uniformly from ${[N]}$, then one defines ${{\bf c}}$ to be the label ${c}$ such that ${{\bf a} \in B_c}$, and then selects ${{\bf r}}$ uniformly from a set ${B_{c,\varepsilon}}$ which is related to ${B_c}$ in much the same way that ${[-\varepsilon N, \varepsilon N]}$ is related to ${[N]}$. If one does this correctly, the analogue of (2) becomes

$\displaystyle {\bf E} \int_{(x,y,z,w) \in ({\bf R}/{\bf Z})^4: x-3y+3z-w = 0} F_{\mathbf c}(x) F_{\mathbf c}(y) F_{\mathbf c}(z) F_{\mathbf c}(w),$

and one can again use Cauchy-Schwarz and Fubini’s theorem to conclude.

The general case proceeds, very roughly, by an iterative argument. At each stage of the iteration, one has some sort of quadratic model of ${1_A}$ which involves a decomposition of ${[N]}$ into structured pieces ${B_c}$, and a quadratic approximation to ${1_A}$ on each piece. If this approximation is accurate enough (or more precisely, if a certain (averaged) local Gowers uniformity norm ${U^3}$ of the error is small enough) to model the count in (1) (for random variables ${{\bf a}, {\bf r}}$ determined by the above partition of ${[N]}$ into pieces ${B_c}$), and if the frequencies (such as ${\alpha_c}$) involved in the quadratic approximation are “high rank” or “linearly independent over the rationals” in a suitably quantitative sense, then some version of the above arguments can be made to work. If there are some unwanted linear dependencies in the frequencies, we can do some linear algebra to eliminate one of the frequencies (using some geometry of numbers to keep the quantitative bounds under control) and continue the iteration. If instead the approximation is too inaccurate, then the error will be large in a certain averaged local Gowers uniformity norm ${U^3}$. A significant fraction of the paper is then devoted to establishing a quantitative inverse theorem for that norm that concludes (with good bounds) that the error must then locally correlate with locally quadratic phases, which can be used to refine the quadratic approximation to ${1_A}$ in a manner that significantly increases its “energy” (basically an ${L^2}$ norm). Such energy increments cannot continue indefinitely, and when they terminate we obtain the desired claim.

There are existing inverse theorems for ${U^3}$ type norms in the literature, going back to the pioneering work of Gowers mentioned previously, and relying on arithmetic combinatorics tools such as Freiman’s theorem and the Balog-Szemerédi-Gowers lemma, which are good for analysing the “${1\%}$-structured homomorphisms” that arise in Gowers’ argument. However, when we applied these methods to the local Gowers norms we obtained inferior quantitative results that were not strong enough for our application. Instead, we use arguments from a different paper of Gowers in which he tackled Szemerédi’s theorem for arbitrary length progressions. This method produces “${99\%}$-structured homomorphisms” associated to any function with large Gowers uniformity norm; however the catch is that such homomorphisms are initially supported only on a sparse unstructured set, rather than a structured set such as a Bohr set. To proceed further, one first has to locate inside the sparse unstructured set a sparse pseudorandom subset of a Bohr set, and then use “error-correction” type methods (such as “majority-vote” based algorithms) to locally upgrade this ${99\%}$-structured homomorphism on pseudorandom subsets of Bohr sets to a ${100\%}$-structured homomorphism on the entirety of a Bohr set. It is then possible to use some “approximate cohomology” tools to “integrate” these homomorphisms (and discern a key “local symmetry” property of these homomorphisms) to locate the desired local quadratic structure (in much the same fashion that a ${1}$-form on ${{\bf R}^n}$ that varies linearly with the coordinates can be integrated to be the derivative of a quadratic function if we know that the ${1}$-form is closed). These portions of the paper are unfortunately rather technical, but broadly follow the methods already used in previous literature.

This week I have been at a Banff workshop “Combinatorics meets Ergodic theory“, focused on the combinatorics surrounding Szemerédi’s theorem and the Gowers uniformity norms on one hand, and the ergodic theory surrounding Furstenberg’s multiple recurrence theorem and the Host-Kra structure theory on the other. This was quite a fruitful workshop, and directly inspired the various posts this week on this blog. Incidentally, BIRS being as efficient as it is, videos for this week’s talks are already online.

As mentioned in the previous two posts, Ben Green, Tamar Ziegler, and myself proved the following inverse theorem for the Gowers norms:

Theorem 1 (Inverse theorem for Gowers norms) Let ${N \geq 1}$ and ${s \geq 1}$ be integers, and let ${\delta > 0}$. Suppose that ${f: {\bf Z} \rightarrow [-1,1]}$ is a function supported on ${[N] := \{1,\dots,N\}}$ such that

$\displaystyle \frac{1}{N^{s+2}} \sum_{n,h_1,\dots,h_{s+1} \in {\bf Z}} \prod_{\omega \in \{0,1\}^{s+1}} f(n+\omega_1 h_1 + \dots + \omega_{s+1} h_{s+1}) \geq \delta.$

Then there exists a filtered nilmanifold ${G/\Gamma}$ of degree ${\leq s}$ and complexity ${O_{s,\delta}(1)}$, a polynomial sequence ${g: {\bf Z} \rightarrow G}$, and a Lipschitz function ${F: G/\Gamma \rightarrow {\bf R}}$ of Lipschitz constant ${O_{s,\delta}(1)}$ such that

$\displaystyle \frac{1}{N} \sum_n f(n) F(g(n) \Gamma) \gg_{s,\delta} 1.$

There is a higher dimensional generalisation, which first appeared explicitly (in a more general form) in this preprint of Szegedy (which used a slightly different argument than the one of Ben, Tammy, and myself; see also this previous preprint of Szegedy with related results):

Theorem 2 (Inverse theorem for multidimensional Gowers norms) Let ${N \geq 1}$ and ${s,d \geq 1}$ be integers, and let ${\delta > 0}$. Suppose that ${f: {\bf Z}^d \rightarrow [-1,1]}$ is a function supported on ${[N]^d}$ such that

$\displaystyle \frac{1}{N^{d(s+2)}} \sum_{n,h_1,\dots,h_{s+1} \in {\bf Z}^d} \prod_{\omega \in \{0,1\}^{s+1}} f(n+\omega_1 h_1 + \dots + \omega_{s+1} h_{s+1}) \geq \delta. \ \ \ \ \ (1)$

Then there exists a filtered nilmanifold ${G/\Gamma}$ of degree ${\leq s}$ and complexity ${O_{s,\delta,d}(1)}$, a polynomial sequence ${g: {\bf Z}^d \rightarrow G}$, and a Lipschitz function ${F: G/\Gamma \rightarrow {\bf R}}$ of Lipschitz constant ${O_{s,\delta,d}(1)}$ such that

$\displaystyle \frac{1}{N^d} \sum_{n \in {\bf Z}^d} f(n) F(g(n) \Gamma) \gg_{s,\delta,d} 1.$

The ${d=2}$ case of this theorem was recently used by Wenbo Sun. One can replace the polynomial sequence with a linear sequence if desired by using a lifting trick (essentially due to Furstenberg, but which appears explicitly in Appendix C of my paper with Ben and Tammy).

In this post I would like to record a very neat and simple observation of Ben Green and Nikos Frantzikinakis, that uses the tool of Freiman isomorphisms to derive Theorem 2 as a corollary of the one-dimensional theorem. Namely, consider the linear map ${\phi: {\bf Z}^d \rightarrow {\bf Z}}$ defined by

$\displaystyle \phi( n_1,\dots,n_d ) := \sum_{i=1}^d (10 N)^{i-1} n_i,$

that is to say ${\phi}$ is the digit string base ${10N}$ that has digits ${n_d \dots n_1}$. This map is a linear map from ${[N]^d}$ to a subset of ${[d 10^d N^d]}$ of density ${1/(d10^d)}$. Furthermore it has the following “Freiman isomorphism” property: if ${n, h_1,\dots,h_{s+1}}$ lie in ${{\bf Z}}$ with ${n + \omega_1 h_1 + \dots + \omega_{s+1} h_{s+1}}$ in the image set ${\phi( [N]^d )}$ of ${[N]^d}$ for all ${\omega}$, then there exist (unique) lifts ${\tilde n \in {\bf Z}^d, \tilde h_1,\dots,\tilde h_{s+1} \in {\bf Z}}$ such that

$\displaystyle \tilde n + \omega_1 \tilde h_1 + \dots + \omega_{s+1} \tilde h_{s+1} \in [N]^d$

and

$\displaystyle \phi( \tilde n + \omega_1 \tilde h_1 + \dots + \omega_{s+1} \tilde h_{s+1} ) = n + \omega_1 h_1 + \dots + \omega_{s+1} h_{s+1}$

for all ${\omega}$. Indeed, the injectivity of ${\phi}$ on ${[N]^d}$ uniquely determines the sum ${\tilde n + \omega_1 \tilde h_1 + \dots + \omega_{s+1} \tilde h_{s+1}}$ for each ${\omega}$, and one can use base ${10N}$ arithmetic to verify that the alternating sum of these sums on any ${2}$-facet of the cube ${\{0,1\}^{s+1}}$ vanishes, which gives the claim. (In the language of additive combinatorics, the point is that ${\phi}$ is a Freiman isomorphism of order (say) ${8}$ on ${[N]^d}$.)

Now let ${\tilde f: {\bf Z} \rightarrow [-1,1]}$ be the function defined by setting ${\tilde f( \phi(n) ) := f(n)}$ whenever ${n \in [N]^d}$, with ${\tilde f}$ vanishing outside of ${\phi([N]^d)}$. If ${f}$ obeys (1), then from the above Freiman isomorphism property we have

$\displaystyle \frac{1}{N^{d(s+2)}} \sum_{n, h_1,\dots,h_{s+1} \in {\bf Z}} \prod_{\omega \in \{0,1\}^{s+1}} \tilde f(n+\omega_1 h_1 + \dots + \omega_{s+1} h_{s+1}) \geq \delta.$

Applying the one-dimensional inverse theorem (Theorem 1), with ${\delta}$ reduced by a factor of ${d 10^d}$ and ${N}$ replaced by ${d 10^d N^d}$, this implies the existence of a filtered nilmanifold ${G/\Gamma}$ of degree ${\leq s}$ and complexity ${O_{s,\delta,d}(1)}$, a polynomial sequence ${g: {\bf Z} \rightarrow G}$, and a Lipschitz function ${F: G/\Gamma \rightarrow {\bf R}}$ of Lipschitz constant ${O_{s,\delta,d}(1)}$ such that

$\displaystyle \frac{1}{N^{d(s+2)}} \sum_{n \in {\bf Z}} \tilde f(n) F(g(n) \Gamma) \gg_{s,\delta,d} 1$

which by the Freiman isomorphism property again implies that

$\displaystyle \frac{1}{N^{d(s+2)}} \sum_{n \in {\bf Z}^d} f(n) F(g(\phi(n)) \Gamma) \gg_{s,\delta,d} 1.$

But the map ${n \mapsto g(\phi(n))}$ is clearly a polynomial map from ${{\bf Z}^d}$ to ${G}$ (the composition of two polynomial maps is polynomial, see e.g. Appendix B of my paper with Ben and Tammy), and the claim follows.

Remark 3 This trick appears to be largely restricted to the case of boundedly generated groups such as ${{\bf Z}^d}$; I do not see any easy way to deduce an inverse theorem for, say, ${\bigcup_{n=1}^\infty {\mathbb F}_p^n}$ from the ${{\bf Z}}$-inverse theorem by this method.

Remark 4 By combining this argument with the one in the previous post, one can obtain a weak ergodic inverse theorem for ${{\bf Z}^d}$-actions. Interestingly, the Freiman isomorphism argument appears to be difficult to implement directly in the ergodic category; in particular, there does not appear to be an obvious direct way to derive the Host-Kra inverse theorem for ${{\bf Z}^d}$ actions (a result first obtained in the PhD thesis of Griesmer) from the counterpart for ${{\bf Z}}$ actions.

A few years ago, Ben Green, Tamar Ziegler, and myself proved the following (rather technical-looking) inverse theorem for the Gowers norms:

Theorem 1 (Discrete inverse theorem for Gowers norms) Let ${N \geq 1}$ and ${s \geq 1}$ be integers, and let ${\delta > 0}$. Suppose that ${f: {\bf Z} \rightarrow [-1,1]}$ is a function supported on ${[N] := \{1,\dots,N\}}$ such that

$\displaystyle \frac{1}{N^{s+2}} \sum_{n,h_1,\dots,h_{s+1}} \prod_{\omega \in \{0,1\}^{s+1}} f(n+\omega_1 h_1 + \dots + \omega_{s+1} h_{s+1}) \geq \delta.$

Then there exists a filtered nilmanifold ${G/\Gamma}$ of degree ${\leq s}$ and complexity ${O_{s,\delta}(1)}$, a polynomial sequence ${g: {\bf Z} \rightarrow G}$, and a Lipschitz function ${F: G/\Gamma \rightarrow {\bf R}}$ of Lipschitz constant ${O_{s,\delta}(1)}$ such that

$\displaystyle \frac{1}{N} \sum_n f(n) F(g(n) \Gamma) \gg_{s,\delta} 1.$

For the definitions of “filtered nilmanifold”, “degree”, “complexity”, and “polynomial sequence”, see the paper of Ben, Tammy, and myself. (I should caution the reader that this blog post will presume a fair amount of familiarity with this subfield of additive combinatorics.) This result has a number of applications, for instance to establishing asymptotics for linear equations in the primes, but this will not be the focus of discussion here.

The purpose of this post is to record the observation that this “discrete” inverse theorem, together with an equidistribution theorem for nilsequences that Ben and I worked out in a separate paper, implies a continuous version:

Theorem 2 (Continuous inverse theorem for Gowers norms) Let ${s \geq 1}$ be an integer, and let ${\delta>0}$. Suppose that ${f: {\bf R} \rightarrow [-1,1]}$ is a measurable function supported on ${[0,1]}$ such that

$\displaystyle \int_{{\bf R}^{s+1}} \prod_{\omega \in \{0,1\}^{s+1}} f(t+\omega_1 h_1 + \dots + \omega_{s+1} h_{s+1})\ dt dh_1 \dots dh_{s+1} \geq \delta. \ \ \ \ \ (1)$

Then there exists a filtered nilmanifold ${G/\Gamma}$ of degree ${\leq s}$ and complexity ${O_{s,\delta}(1)}$, a (smooth) polynomial sequence ${g: {\bf R} \rightarrow G}$, and a Lipschitz function ${F: G/\Gamma \rightarrow {\bf R}}$ of Lipschitz constant ${O_{s,\delta}(1)}$ such that

$\displaystyle \int_{\bf R} f(t) F(g(t) \Gamma)\ dt \gg_{s,\delta} 1.$

The interval ${[0,1]}$ can be easily replaced with any other fixed interval by a change of variables. A key point here is that the bounds are completely uniform in the choice of ${f}$. Note though that the coefficients of ${g}$ can be arbitrarily large (and this is necessary, as can be seen just by considering functions of the form ${f(t) = \cos( \xi t)}$ for some arbitrarily large frequency ${\xi}$).

It is likely that one could prove Theorem 2 by carefully going through the proof of Theorem 1 and replacing all instances of ${{\bf Z}}$ with ${{\bf R}}$ (and making appropriate modifications to the argument to accommodate this). However, the proof of Theorem 1 is quite lengthy. Here, we shall proceed by the usual limiting process of viewing the continuous interval ${[0,1]}$ as a limit of the discrete interval ${\frac{1}{N} \cdot [N]}$ as ${N \rightarrow \infty}$. However there will be some problems taking the limit due to a failure of compactness, and specifically with regards to the coefficients of the polynomial sequence ${g: {\bf N} \rightarrow G}$ produced by Theorem 1, after normalising these coefficients by ${N}$. Fortunately, a factorisation theorem from a paper of Ben Green and myself resolves this problem by splitting ${g}$ into a “smooth” part which does enjoy good compactness properties, as well as “totally equidistributed” and “periodic” parts which can be eliminated using the measurability (and thus, approximate smoothness), of ${f}$.

Kevin Ford, Ben Green, Sergei Konyagin, James Maynard, and I have just uploaded to the arXiv our paper “Long gaps between primes“. This is a followup work to our two previous papers (discussed in this previous post), in which we had simultaneously shown that the maximal gap

$\displaystyle G(X) := \sup_{p_n, p_{n+1} \leq X} p_{n+1}-p_n$

between primes up to ${X}$ exhibited a lower bound of the shape

$\displaystyle G(X) \geq f(X) \log X \frac{\log \log X \log\log\log\log X}{(\log\log\log X)^2} \ \ \ \ \ (1)$

for some function ${f(X)}$ that went to infinity as ${X \rightarrow \infty}$; this improved upon previous work of Rankin and other authors, who established the same bound but with ${f(X)}$ replaced by a constant. (Again, see the previous post for a more detailed discussion.)

In our previous papers, we did not specify a particular growth rate for ${f(X)}$. In my paper with Kevin, Ben, and Sergei, there was a good reason for this: our argument relied (amongst other things) on the inverse conjecture on the Gowers norms, as well as the Siegel-Walfisz theorem, and the known proofs of both results both have ineffective constants, rendering our growth function ${f(X)}$ similarly ineffective. Maynard’s approach ostensibly also relies on the Siegel-Walfisz theorem, but (as shown in another recent paper of his) can be made quite effective, even when tracking ${k}$-tuples of fairly large size (about ${\log^c x}$ for some small ${c}$). If one carefully makes all the bounds in Maynard’s argument quantitative, one eventually ends up with a growth rate ${f(X)}$ of shape

$\displaystyle f(X) \asymp \frac{\log \log \log X}{\log\log\log\log X}, \ \ \ \ \ (2)$

thus leading to a bound

$\displaystyle G(X) \gg \log X \frac{\log \log X}{\log\log\log X}$

on the gaps between primes for large ${X}$; this is an unpublished calculation of James’.

In this paper we make a further refinement of this calculation to obtain a growth rate

$\displaystyle f(X) \asymp \log \log \log X \ \ \ \ \ (3)$

leading to a bound of the form

$\displaystyle G(X) \geq c \log X \frac{\log \log X \log\log\log\log X}{\log\log\log X} \ \ \ \ \ (4)$

for large ${X}$ and some small constant ${c}$. Furthermore, this appears to be the limit of current technology (in particular, falling short of Cramer’s conjecture that ${G(X)}$ is comparable to ${\log^2 X}$); in the spirit of Erdös’ original prize on this problem, I would like to offer 10,000 USD for anyone who can show (in a refereed publication, of course) that the constant ${c}$ here can be replaced by an arbitrarily large constant ${C}$.

The reason for the growth rate (3) is as follows. After following the sieving process discussed in the previous post, the problem comes down to something like the following: can one sieve out all (or almost all) of the primes in ${[x,y]}$ by removing one residue class modulo ${p}$ for all primes ${p}$ in (say) ${[x/4,x/2]}$? Very roughly speaking, if one can solve this problem with ${y = g(x) x}$, then one can obtain a growth rate on ${f(X)}$ of the shape ${f(X) \sim g(\log X)}$. (This is an oversimplification, as one actually has to sieve out a random subset of the primes, rather than all the primes in ${[x,y]}$, but never mind this detail for now.)

Using the quantitative “dense clusters of primes” machinery of Maynard, one can find lots of ${k}$-tuples in ${[x,y]}$ which contain at least ${\gg \log k}$ primes, for ${k}$ as large as ${\log^c x}$ or so (so that ${\log k}$ is about ${\log\log x}$). By considering ${k}$-tuples in arithmetic progression, this means that one can find lots of residue classes modulo a given prime ${p}$ in ${[x/4,x/2]}$ that capture about ${\log\log x}$ primes. In principle, this means that union of all these residue classes can cover about ${\frac{x}{\log x} \log\log x}$ primes, allowing one to take ${g(x)}$ as large as ${\log\log x}$, which corresponds to (3). However, there is a catch: the residue classes for different primes ${p}$ may collide with each other, reducing the efficiency of the covering. In our previous papers on the subject, we selected the residue classes randomly, which meant that we had to insert an additional logarithmic safety margin in expected number of times each prime would be shifted out by one of the residue classes, in order to guarantee that we would (with high probability) sift out most of the primes. This additional safety margin is ultimately responsible for the ${\log\log\log\log X}$ loss in (2).

The main innovation of this paper, beyond detailing James’ unpublished calculations, is to use ideas from the literature on efficient hypergraph covering, to avoid the need for a logarithmic safety margin. The hypergraph covering problem, roughly speaking, is to try to cover a set of ${n}$ vertices using as few “edges” from a given hypergraph ${H}$ as possible. If each edge has ${m}$ vertices, then one certainly needs at least ${n/m}$ edges to cover all the vertices, and the question is to see if one can come close to attaining this bound given some reasonable uniform distribution hypotheses on the hypergraph ${H}$. As before, random methods tend to require something like ${\frac{n}{m} \log r}$ edges before one expects to cover, say ${1-1/r}$ of the vertices.

However, it turns out (under reasonable hypotheses on ${H}$) to eliminate this logarithmic loss, by using what is now known as the “semi-random method” or the “Rödl nibble”. The idea is to randomly select a small number of edges (a first “nibble”) – small enough that the edges are unlikely to overlap much with each other, thus obtaining maximal efficiency. Then, one pauses to remove all the edges from ${H}$ that intersect edges from this first nibble, so that all remaining edges will not overlap with the existing edges. One then randomly selects another small number of edges (a second “nibble”), and repeats this process until enough nibbles are taken to cover most of the vertices. Remarkably, it turns out that under some reasonable assumptions on the hypergraph ${H}$, one can maintain control on the uniform distribution of the edges throughout the nibbling process, and obtain an efficient hypergraph covering. This strategy was carried out in detail in an influential paper of Pippenger and Spencer.

In our setup, the vertices are the primes in ${[x,y]}$, and the edges are the intersection of the primes with various residue classes. (Technically, we have to work with a family of hypergraphs indexed by a prime ${p}$, rather than a single hypergraph, but let me ignore this minor technical detail.) The semi-random method would in principle eliminate the logarithmic loss and recover the bound (3). However, there is a catch: the analysis of Pippenger and Spencer relies heavily on the assumption that the hypergraph is uniform, that is to say all edges have the same size. In our context, this requirement would mean that each residue class captures exactly the same number of primes, which is not the case; we only control the number of primes in an average sense, but we were unable to obtain any concentration of measure to come close to verifying this hypothesis. And indeed, the semi-random method, when applied naively, does not work well with edges of variable size – the problem is that edges of large size are much more likely to be eliminated after each nibble than edges of small size, since they have many more vertices that could overlap with the previous nibbles. Since the large edges are clearly the more useful ones for the covering problem than small ones, this bias towards eliminating large edges significantly reduces the efficiency of the semi-random method (and also greatly complicates the analysis of that method).

Our solution to this is to iteratively reweight the probability distribution on edges after each nibble to compensate for this bias effect, giving larger edges a greater weight than smaller edges. It turns out that there is a natural way to do this reweighting that allows one to repeat the Pippenger-Spencer analysis in the presence of edges of variable size, and this ultimately allows us to recover the full growth rate (3).

To go beyond (3), one either has to find a lot of residue classes that can capture significantly more than ${\log\log x}$ primes of size ${x}$ (which is the limit of the multidimensional Selberg sieve of Maynard and myself), or else one has to find a very different method to produce large gaps between primes than the Erdös-Rankin method, which is the method used in all previous work on the subject.

It turns out that the arguments in this paper can be combined with the Maier matrix method to also produce chains of consecutive large prime gaps whose size is of the order of (4); three of us (Kevin, James, and myself) will detail this in a future paper. (A similar combination was also recently observed in connection with our earlier result (1) by Pintz, but there are some additional technical wrinkles required to recover the full gain of (3) for the chains of large gaps problem.)

Emmanuel Breuillard, Ben Green, Bob Guralnick, and I have just uploaded to the arXiv our joint paper “Expansion in finite simple groups of Lie type“. This long-delayed paper (announced way back in 2010!) is a followup to our previous paper in which we showed that, with one possible exception, generic pairs of elements of a simple algebraic group (over an uncountable field) generated a free group which was strongly dense in the sense that any nonabelian subgroup of this group was Zariski dense. The main result of this paper is to establish the analogous result for finite simple groups of Lie type (as defined in the previous blog post) and bounded rank, namely that almost all pairs ${a,b}$ of elements of such a group generate a Cayley graph which is a (two-sided) expander, with expansion constant bounded below by a quantity depending on the rank of the group. (Informally, this means that the random walk generated by ${a,b}$ spreads out in logarithmic time to be essentially uniformly distributed across the group, as opposed for instance to being largely trapped in an algebraic subgroup. Thus if generic elements did not generate a strongly dense group, one would probably expect expansion to fail.)

There are also some related results established in the paper. Firstly, as we discovered after writing our first paper, there was one class of algebraic groups for which our demonstration of strongly dense subgroups broke down, namely the ${Sp_4}$ groups in characteristic three. In the current paper we provide in a pair of appendices a new argument that covers this case (or more generally, ${Sp_4}$ in odd characteristic), by first reducing to the case of affine groups ${k^2 \rtimes SL_2(k)}$ (which can be found inside ${Sp_4}$ as a subgroup) and then using a ping-pong argument (in a p-adic metric) in the latter context.

Secondly, we show that the distinction between one-sided expansion and two-sided expansion (see this set of lecture notes of mine for definitions) is erased in the context of Cayley graphs of bounded degree, in the sense that such graphs are one-sided expanders if and only if they are two-sided expanders (perhaps with slightly different expansion constants). The argument turns out to be an elementary combinatorial one, based on the “pivot” argument discussed in these lecture notes of mine.

Now to the main result of the paper, namely the expansion of random Cayley graphs. This result had previously been established for ${SL_2}$ by Bourgain and Gamburd, and Ben, Emmanuel and I had used the Bourgain-Gamburd method to achieve the same result for Suzuki groups. For the other finite simple groups of Lie type, expander graphs had been constructed by Kassabov, Lubotzky, and Nikolov, but they required more than two generators, which were placed deterministically rather than randomly. (Here, I am skipping over a large number of other results on expanding Cayley graphs; see this survey of Lubotzsky for a fairly recent summary of developments.) The current paper also uses the “Bourgain-Gamburd machine”, as discussed in these lecture notes of mine, to demonstrate expansion. This machine shows how expansion of a Cayley graph follows from three basic ingredients, which we state informally as follows:

• Non-concentration (A random walk in this graph does not concentrate in a proper subgroup);
• Product theorem (A medium-sized subset of this group which is not trapped in a proper subgroup will expand under multiplication); and
• Quasirandomness (The group has no small non-trivial linear representations).

Quasirandomness of arbitrary finite simple groups of Lie type was established many years ago (predating, in fact, the introduction of the term “quasirandomness” by Gowers for this property) by Landazuri-Seitz and Seitz-Zalesskii, and the product theorem was already established by Pyber-Szabo and independently by Breuillard, Green, and myself. So the main problem is to establish non-concentration: that for a random Cayley graph on a finite simple group ${G}$ of Lie type, random walks did not concentrate in proper subgroups.

The first step was to classify the proper subgroups of ${G}$. Fortunately, these are all known; in particular, such groups are either contained in proper algebraic subgroups of the algebraic group containing ${G}$ (or a bounded cover thereof) with bounded complexity, or are else arising (up to conjugacy) from a version ${G(F')}$ of the same group ${G =G(F)}$ associated to a proper subfield ${F'}$ of the field ${F}$ respectively; this follows for instance from the work of Larsen and Pink, but also can be deduced using the classification of finite simple groups, together with some work of Aschbacher, Liebeck-Seitz, and Nori. We refer to the two types of subgroups here as “structural subgroups” and “subfield subgroups”.

To preclude concentration in a structural subgroup, we use our previous result that generic elements of an algebraic group generate a strongly dense subgroup, and so do not concentrate in any algebraic subgroup. To translate this result from the algebraic group setting to the finite group setting, we need a Schwarz-Zippel lemma for finite simple groups of Lie type. This is straightforward for Chevalley groups, but turns out to be a bit trickier for the Steinberg and Suzuki-Ree groups, and we have to go back to the Chevalley-type parameterisation of such groups in terms of (twisted) one-parameter subgroups, that can be found for instance in the text of Carter; this “twisted Schwartz-Zippel lemma” may possibly have further application to analysis on twisted simple groups of Lie type. Unfortunately, the Schwartz-Zippel estimate becomes weaker in twisted settings, and particularly in the case of triality groups ${{}^3 D_4(q)}$, which require a somewhat ad hoc additional treatment that relies on passing to a simpler subgroup present in a triality group, namely a central product of two different ${SL_2}$‘s.

To rule out concentration in a conjugate of a subfield group, we repeat an argument we introduced in our Suzuki paper and pass to a matrix model and analyse the coefficients of the characteristic polynomial of words in this Cayley graph, to prevent them from concentrating in a subfield. (Note that these coefficients are conjugation-invariant.)

Ben Green and I have just uploaded to the arXiv our new paper “On sets defining few ordinary lines“, submitted to Discrete and Computational Geometry. This paper asymptotically solves two old questions concerning finite configurations of points ${P}$ in the plane ${{\mathbb R}^2}$. Given a set ${P}$ of ${n}$ points in the plane, define an ordinary line to be a line containing exactly two points of ${P}$. The classical Sylvester-Gallai theorem, first posed as a problem by Sylvester in 1893, asserts that as long as the points of ${P}$ are not all collinear, ${P}$ defines at least one ordinary line:

It is then natural to pose the question of what is the minimal number of ordinary lines that a set of ${n}$ non-collinear points can generate. In 1940, Melchior gave an elegant proof of the Sylvester-Gallai theorem based on projective duality and Euler’s formula ${V-E+F=2}$, showing that at least three ordinary lines must be created; in 1951, Motzkin showed that there must be ${\gg n^{1/2}}$ ordinary lines. Previously to this paper, the best lower bound was by Csima and Sawyer, who in 1993 showed that there are at least ${6n/13}$ ordinary lines. In the converse direction, if ${n}$ is even, then by considering ${n/2}$ equally spaced points on a circle, and ${n/2}$ points on the line at infinity in equally spaced directions, one can find a configuration of ${n}$ points that define just ${n/2}$ ordinary lines.

As first observed by Böröczky, variants of this example also give few ordinary lines for odd ${n}$, though not quite as few as ${n/2}$; more precisely, when ${n=1 \hbox{ mod } 4}$ one can find a configuration with ${3(n-1)/4}$ ordinary lines, and when ${n = 3 \hbox{ mod } 4}$ one can find a configuration with ${3(n-3)/4}$ ordinary lines. Our first main result is that these configurations are best possible for sufficiently large ${n}$:

Theorem 1 (Dirac-Motzkin conjecture) If ${n}$ is sufficiently large, then any set of ${n}$ non-collinear points in the plane will define at least ${\lfloor n/2\rfloor}$ ordinary lines. Furthermore, if ${n}$ is odd, at least ${3\lfloor n/4\rfloor}$ ordinary lines must be created.

The Dirac-Motzkin conjecture asserts that the first part of this theorem in fact holds for all ${n}$, not just for sufficiently large ${n}$; in principle, our theorem reduces that conjecture to a finite verification, although our bound for “sufficiently large” is far too poor to actually make this feasible (it is of double exponential type). (There are two known configurations for which one has ${(n-1)/2}$ ordinary lines, one with ${n=7}$ (discovered by Kelly and Moser), and one with ${n=13}$ (discovered by Crowe and McKee).)

Our second main result concerns not the ordinary lines, but rather the ${3}$-rich lines of an ${n}$-point set – a line that meets exactly three points of that set. A simple double counting argument (counting pairs of distinct points in the set in two different ways) shows that there are at most

$\displaystyle \binom{n}{2} / \binom{3}{2} = \frac{1}{6} n^2 - \frac{1}{6} n$

${3}$-rich lines. On the other hand, on an elliptic curve, three distinct points P,Q,R on that curve are collinear precisely when they sum to zero with respect to the group law on that curve. Thus (as observed first by Sylvester in 1868), any finite subgroup of an elliptic curve (of which one can produce numerous examples, as elliptic curves in ${{\mathbb R}^2}$ have the group structure of either ${{\mathbb R}/{\mathbb Z}}$ or ${{\mathbb R}/{\mathbb Z} \times ({\mathbb Z}/2{\mathbb Z})}$) can provide examples of ${n}$-point sets with a large number of ${3}$-rich lines (${\lfloor \frac{1}{6} n^2 - \frac{1}{2} n + 1\rfloor}$, to be precise). One can also shift such a finite subgroup by a third root of unity and obtain a similar example with only one fewer ${3}$-rich line. Sylvester then formally posed the question of determining whether this was best possible.

This problem was known as the Orchard planting problem, and was given a more poetic formulation as such by Jackson in 1821 (nearly fifty years prior to Sylvester!):

Our second main result answers this problem affirmatively in the large ${n}$ case:

Theorem 2 (Orchard planting problem) If ${n}$ is sufficiently large, then any set of ${n}$ points in the plane will determine at most ${\lfloor \frac{1}{6} n^2 - \frac{1}{2} n + 1\rfloor}$ ${3}$-rich lines.

Again, our threshold for “sufficiently large” for this ${n}$ is extremely large (though slightly less large than in the previous theorem), and so a full solution of the problem, while in principle reduced to a finitary computation, remains infeasible at present.

Our results also classify the extremisers (and near extremisers) for both of these problems; basically, the known examples mentioned previously are (up to projective transformation) the only extremisers when ${n}$ is sufficiently large.

Our proof strategy follows the “inverse theorem method” from additive combinatorics. Namely, rather than try to prove direct theorems such as lower bounds on the number of ordinary lines, or upper bounds on the number of ${3}$-rich lines, we instead try to prove inverse theorems (also known as structure theorems), in which one attempts a structural classification of all configurations with very few ordinary lines (or very many ${3}$-rich lines). In principle, once one has a sufficiently explicit structural description of these sets, one simply has to compute the precise number of ordinary lines or ${3}$-rich lines in each configuration in the list provided by that structural description in order to obtain results such as the two theorems above.

Note from double counting that sets with many ${3}$-rich lines will necessarily have few ordinary lines. Indeed, if we let ${N_k}$ denote the set of lines that meet exactly ${k}$ points of an ${n}$-point configuration, so that ${N_3}$ is the number of ${3}$-rich lines and ${N_2}$ is the number of ordinary lines, then we have the double counting identity

$\displaystyle \sum_{k=2}^n \binom{k}{2} N_k = \binom{n}{2}$

which among other things implies that any counterexample to the orchard problem can have at most ${n+O(1)}$ ordinary lines. In particular, any structural theorem that lets us understand configurations with ${O(n)}$ ordinary lines will, in principle, allow us to obtain results such as the above two theorems.

As it turns out, we do eventually obtain a structure theorem that is strong enough to achieve these aims, but it is difficult to prove this theorem directly. Instead we proceed more iteratively, beginning with a “cheap” structure theorem that is relatively easy to prove but provides only a partial amount of control on the configurations with ${O(n)}$ ordinary lines. One then builds upon that theorem with additional arguments to obtain an “intermediate” structure theorem that gives better control, then a “weak” structure theorem that gives even more control, a “strong” structure theorem that gives yet more control, and then finally a “very strong” structure theorem that gives an almost complete description of the configurations (but only in the asymptotic regime when ${n}$ is very large). It turns out that the “weak” theorem is enough for the orchard planting problem, and the “strong” version is enough for the Dirac-Motzkin conjecture. (So the “very strong” structure theorem ends up being unnecessary for the two applications given, but may be of interest for other applications.) Note that the stronger theorems do not completely supercede the weaker ones, because the quantitative bounds in the theorems get progressively worse as the control gets stronger.

Before we state these structure theorems, note that all the examples mentioned previously of sets with few ordinary lines involved cubic curves: either irreducible examples such as elliptic curves, or reducible examples such as the union of a circle (or more generally, a conic section) and a line. (We also allow singular cubic curves, such as the union of a conic section and a tangent line, or a singular irreducible curve such as ${\{ (x,y): y^2 = x^3 \}}$.) This turns out to be no coincidence; cubic curves happen to be very good at providing many ${3}$-rich lines (and thus, few ordinary lines), and conversely it turns out that they are essentially the only way to produce such lines. This can already be evidenced by our cheap structure theorem:

Theorem 3 (Cheap structure theorem) Let ${P}$ be a configuration of ${n}$ points with at most ${{}Kn}$ ordinary lines for some ${K \geq 1}$. Then ${P}$ can be covered by at most ${500K}$ cubic curves.

This theorem is already a non-trivial amount of control on sets with few ordinary lines, but because the result does not specify the nature of these curves, and how they interact with each other, it does not seem to be directly useful for applications. The intermediate structure theorem given below gives a more precise amount of control on these curves (essentially guaranteeing that all but at most one of the curve components are lines):

Theorem 4 (Intermediate structure theorem) Let ${P}$ be a configuration of ${n}$ points with at most ${{}Kn}$ ordinary lines for some ${K \geq 1}$. Then one of the following is true:

1. ${P}$ lies on the union of an irreducible cubic curve and an additional ${O(K^{O(1)})}$ points.
2. ${P}$ lies on the union of an irreducible conic section and an additional ${O(K^{O(1)})}$ lines, with ${n/2 + O(K^{O(1)})}$ of the points on ${P}$ in either of the two components.
3. ${P}$ lies on the union of ${O(K)}$ lines and an additional ${O(K^{O(1)})}$ points.

By some additional arguments (including a very nice argument supplied to us by Luke Alexander Betts, an undergraduate at Cambridge, which replaces a much more complicated (and weaker) argument we originally had for this paper), one can cut down the number of lines in the above theorem to just one, giving a more useful structure theorem, at least when ${n}$ is large:

Theorem 5 (Weak structure theorem) Let ${P}$ be a configuration of ${n}$ points with at most ${{}Kn}$ ordinary lines for some ${K \geq 1}$. Assume that ${n \geq \exp(\exp(CK^C))}$ for some sufficiently large absolute constant ${C}$. Then one of the following is true:

1. ${P}$ lies on the union of an irreducible cubic curve and an additional ${O(K^{O(1)})}$ points.
2. ${P}$ lies on the union of an irreducible conic section, a line, and an additional ${O(K^{O(1)})}$ points, with ${n/2 + O(K^{O(1)})}$ of the points on ${P}$ in either of the first two components.
3. ${P}$ lies on the union of a single line and an additional ${O(K^{O(1)})}$ points.

As mentioned earlier, this theorem is already strong enough to resolve the orchard planting problem for large ${n}$. The presence of the double exponential here is extremely annoying, and is the main reason why the final thresholds for “sufficiently large” in our results are excessively large, but our methods seem to be unable to eliminate these exponentials from our bounds (though they can fortunately be confined to a lower bound for ${n}$, keeping the other bounds in the theorem polynomial in ${K}$).

For the Dirac-Motzkin conjecture one needs more precise control on the portion of ${P}$ on the various low-degree curves indicated. This is given by the following result:

Theorem 6 (Strong structure theorem) Let ${P}$ be a configuration of ${n}$ points with at most ${{}Kn}$ ordinary lines for some ${K \geq 1}$. Assume that ${n \geq \exp(\exp(CK^C))}$ for some sufficiently large absolute constant ${C}$. Then, after adding or deleting ${O(K^{O(1)})}$ points from ${P}$ if necessary (modifying ${n}$ appropriately), and then applying a projective transformation, one of the following is true:

1. ${P}$ is a finite subgroup of an elliptic curve (EDIT: as pointed out in comments, one also needs to allow for finite subgroups of acnodal singular cubic curves), possibly shifted by a third root of unity.
2. ${P}$ is the Borozcky example mentioned previously (the union of ${n/2}$ equally spaced points on the circle, and ${n/2}$ points on the line at infinity).
3. ${P}$ lies on a single line.

By applying a final “cleanup” we can replace the ${O(K^{O(1)})}$ in the above theorem with the optimal ${O(K)}$, which is our “very strong” structure theorem. But the strong structure theorem is already sufficient to establish the Dirac-Motzkin conjecture for large ${n}$.

There are many tools that go into proving these theorems, some of which are extremely classical (with at least one going back to the ancient Greeks), and others being more recent. I will discuss some (not all) of these tools below the fold, and more specifically:

1. Melchior’s argument, based on projective duality and Euler’s formula, initially used to prove the Sylvester-Gallai theorem;
2. Chasles’ version of the Cayley-Bacharach theorem, which can convert dual triangular grids (produced by Melchior’s argument) into cubic curves that meet many points of the original configuration ${P}$);
3. Menelaus’s theorem, which is useful for producing ordinary lines when the point configuration lies on a few non-concurrent lines, particularly when combined with a sum-product estimate of Elekes, Nathanson, and Ruzsa;
4. Betts’ argument, that produces ordinary lines when the point configuration lies on a few concurrent lines;
5. A result of Poonen and Rubinstein that any point not on the origin or unit circle can lie on at most seven chords connecting roots of unity; this, together with a variant for elliptic curves, gives the very strong structure theorem, and is also (a strong version of) what is needed to finish off the Dirac-Motzkin and orchard planting problems from the structure theorems given above.

There are also a number of more standard tools from arithmetic combinatorics (e.g. a version of the Balog-Szemeredi-Gowers lemma) which are needed to tie things together at various junctures, but I won’t say more about these methods here as they are (by now) relatively routine.

Ben Green and I have just uploaded to the arXiv our paper “New bounds for Szemeredi’s theorem, Ia: Progressions of length 4 in finite field geometries revisited“, submitted to Proc. Lond. Math. Soc.. This is both an erratum to, and a replacement for, our previous paper “New bounds for Szemeredi’s theorem. I. Progressions of length 4 in finite field geometries“. The main objective in both papers is to bound the quantity ${r_4(F^n)}$ for a vector space ${F^n}$ over a finite field ${F}$ of characteristic greater than ${4}$, where ${r_4(F^n)}$ is defined as the cardinality of the largest subset of ${F^n}$ that does not contain an arithmetic progression of length ${4}$. In our earlier paper, we gave two arguments that bounded ${r_4(F^n)}$ in the regime when the field ${F}$ was fixed and ${n}$ was large. The first “cheap” argument gave the bound

$\displaystyle r_4(F^n) \ll |F|^n \exp( - c \sqrt{\log n} )$

and the more complicated “expensive” argument gave the improvement

$\displaystyle r_4(F^n) \ll |F|^n n^{-c} \ \ \ \ \ (1)$

for some constant ${c>0}$ depending only on ${F}$.

Unfortunately, while the cheap argument is correct, we discovered a subtle but serious gap in our expensive argument in the original paper. Roughly speaking, the strategy in that argument is to employ the density increment method: one begins with a large subset ${A}$ of ${F^n}$ that has no arithmetic progressions of length ${4}$, and seeks to locate a subspace on which ${A}$ has a significantly increased density. Then, by using a “Koopman-von Neumann theorem”, ultimately based on an iteration of the inverse ${U^3}$ theorem of Ben and myself (and also independently by Samorodnitsky), one approximates ${A}$ by a “quadratically structured” function ${f}$, which is (locally) a combination of a bounded number of quadratic phase functions, which one can prepare to be in a certain “locally equidistributed” or “locally high rank” form. (It is this reduction to the high rank case that distinguishes the “expensive” argument from the “cheap” one.) Because ${A}$ has no progressions of length ${4}$, the count of progressions of length ${4}$ weighted by ${f}$ will also be small; by combining this with the theory of equidistribution of quadratic phase functions, one can then conclude that there will be a subspace on which ${f}$ has increased density.

The error in the paper was to conclude from this that the original function ${1_A}$ also had increased density on the same subspace; it turns out that the manner in which ${f}$ approximates ${1_A}$ is not strong enough to deduce this latter conclusion from the former. (One can strengthen the nature of approximation until one restores such a conclusion, but only at the price of deteriorating the quantitative bounds on ${r_4(F^n)}$ one gets at the end of the day to be worse than the cheap argument.)

After trying unsuccessfully to repair this error, we eventually found an alternate argument, based on earlier papers of ourselves and of Bergelson-Host-Kra, that avoided the density increment method entirely and ended up giving a simpler proof of a stronger result than (1), and also gives the explicit value of ${c = 2^{-22}}$ for the exponent ${c}$ in (1). In fact, it gives the following stronger result:

Theorem 1 Let ${A}$ be a subset of ${F^n}$ of density at least ${\alpha}$, and let ${\epsilon>0}$. Then there is a subspace ${W}$ of ${F^n}$ of codimension ${O( \epsilon^{-2^{20}})}$ such that the number of (possibly degenerate) progressions ${a, a+r, a+2r, a+3r}$ in ${A \cap W}$ is at least ${(\alpha^4-\epsilon)|W|^2}$.

The bound (1) is an easy consequence of this theorem after choosing ${\epsilon := \alpha^4/2}$ and removing the degenerate progressions from the conclusion of the theorem.

The main new idea is to work with a local Koopman-von Neumann theorem rather than a global one, trading a relatively weak global approximation to ${1_A}$ with a significantly stronger local approximation to ${1_A}$ on a subspace ${W}$. This is somewhat analogous to how sometimes in graph theory it is more efficient (from the point of view of quantative estimates) to work with a local version of the Szemerédi regularity lemma which gives just a single regular pair of cells, rather than attempting to regularise almost all of the cells. This local approach is well adapted to the inverse ${U^3}$ theorem we use (which also has this local aspect), and also makes the reduction to the high rank case much cleaner. At the end of the day, one ends up with a fairly large subspace ${W}$ on which ${A}$ is quite dense (of density ${\alpha-O(\epsilon)}$) and which can be well approximated by a “pure quadratic” object, namely a function of a small number of quadratic phases obeying a high rank condition. One can then exploit a special positivity property of the count of length four progressions weighted by pure quadratic objects, essentially due to Bergelson-Host-Kra, which then gives the required lower bound.

Emmanuel Breuillard, Ben Green and I have just uploaded to the arXiv the short paper “A nilpotent Freiman dimension lemma“, submitted to the special volume of the European Journal of Combinatorics in honour of Yahya Ould Hamidoune.  This paper is a nonabelian (or more precisely, nilpotent) variant of the following additive combinatorics lemma of Freiman:

Freiman’s lemma.  Let A be a finite subset of a Euclidean space with $|A+A| \leq K|A|$.  Then A is contained in an affine subspace of dimension at most ${}\lfloor K-1 \rfloor$.

This can be viewed as a “cheap” version of the more well known theorem of Freiman that places sets of small doubling in a torsion-free abelian group inside a generalised arithmetic progression.  The advantage here is that the bound on the dimension is extremely explicit.

Our main result is

Theorem.  Let A be a finite subset of a simply-connected nilpotent Lie group G which is a K-approximate group (i.e. A is symmetric, contains the identity, and $A \cdot A$ can be covered by up to K left translates of A.  Then A can be covered by at most $K^{2+29K^9}$ left-translates of a closed connected Lie subgroup of dimension at most $K^9$.

We remark that our previous paper established a similar result, in which the dimension bound was improved to $6\log_2 K$, but at the cost of worsening the covering number to $O_K(1)$, and with a much more complicated proof  (91 pages instead of 8). Furthermore, the bound on $O_K(1)$ is ineffective, due to the use of ultraproducts in the argument (though it is likely that some extremely lousy explicit bound could eventually be squeezed out of the argument by finitising everything).  Note that the step of the ambient nilpotent group G does not influence the final bounds in the theorem, although we do of course need this step to be finite.  A simple quotienting argument allows one to deduce a corollary of the above theorem in which the ambient group is assumed to be residually torsion-free nilpotent instead of being a simply connected nilpotent Lie group, but we omit the statement of this corollary here.

To motivate the proof of this theorem, let us first show a simple case of an argument of Gleason, which is very much in the spirit of Freiman’s lemma:

Gleason Lemma (special case).  Let $A$ be a finite symmetric subset of a Euclidean space, and let $0 = H_0 \subset H_1 \subset \ldots \subset H_k$ be a sequence of subspaces in this space, such that the sets $2A \cap H_i$ are strictly increasing in i for $i=0,\ldots,k$.  Then $|5A| \geq k|A|$, where $5A = A+A+A+A+A$.

Proof.    By hypothesis, for each $i=1,\ldots,k$, the projection $B_i$ of $2A \cap H_i$ to $H_i / H_{i-1}$ is non-trivial, finite, and symmetric.   In particular, since the vector space $H_i/H_{i-1}$ is torsion-free, $B_i+B_i$ is strictly larger than $B_i$.  Equivalently, one can find $a_i$ in $(2A \cap H_i) + (2A \cap H_i)$ that does not lie in $(2A \cap H_i) + H_{i-1}$; in particular, $a_i \in 4A \cap H_i$ and $a_i+A$ is disjoint from $H_{i-1}+A$.  As a consequence, the $a_i+A$ are disjoint and lie in 5A, whence the claim. $\Box$

Note that by combining the contrapositive of this lemma with a greedy algorithm, one can show that any K-approximate group in a Euclidean space is contained in a subspace of dimension at most $K^4$, which is a weak version of Freiman’s lemma.

To extend the argument to the nilpotent setting we use the following idea.  Observe that any non-trivial genuine subgroup H of a nilpotent group G will contain at least one non-trivial central element; indeed, by intersecting H with the lower central series $G = G_1 \geq G_2 \geq G_3 \geq \ldots$ of G, and considering the last intersection $H \cap G_k$ which is non-trivial, one obtains the claim.  It turns out that one can adapt this argument to approximate groups, so that any sufficiently large K-approximate subgroup A of G will contain a non-trivial element that centralises a large fraction of A.  Passing to this large fraction and quotienting out the central element, we obtain a new approximate group.    If, after a bounded number of steps, this procedure gives an approximate group of bounded size, we are basically done.  If, however, the process continues, then by using some Lie group theory, one can find a long sequence $H_0 \leq H_1 \leq \ldots \leq H_k$ of connected Lie subgroups of G, such that the sets $A^2 \cap H_i$ are strictly increasing in i.   Using some Lie group theory and the hypotheses on G, one can deduce that the group $\langle A^2 \cap H_{i+1}\rangle$ generated by $A^2 \cap H_{i+1}$ is much larger than $\langle A^2 \cap H_i \rangle$, in the sense that the latter group has infinite index in the former.   It then turns out that the Gleason argument mentioned above can be adapted to this setting.

Emmanuel Breuillard, Ben Green, and I have just uploaded to the arXiv our paper “The structure of approximate groups“, submitted to Pub. IHES. We had announced the main results of this paper in various forums (including this blog) for a few months now, but it had taken some time to fully write up the paper and put in various refinements and applications.

As announced previously, the main result of this paper is what is a (virtually, qualitatively) complete description of finite approximate groups in an arbitrary (local or global) group ${G}$. For simplicity let us work in the much more familiar setting of global groups, although our results also apply (but are a bit more technical to state) in the local group setting.

Recall that in a global group ${G = (G,\cdot)}$, a ${K}$-approximate group is a symmetric subset ${A}$ of ${G}$ containing the origin, with the property that the product set ${A \cdot A}$ is covered by ${K}$ left-translates of ${A}$. Examples of ${O(1)}$-approximate groups include genuine groups, convex bodies in a bounded dimensional vector space, small balls in a bounded dimensional Lie group, large balls in a discrete nilpotent group of bounded rank or step, or generalised arithmetic progressions (or more generally, coset progressions) of bounded rank in an abelian group. Specialising now to finite approximate groups, a key example of such a group is what we call a coset nilprogression: a set of the form ${\pi^{-1}(P)}$, where ${\pi: G' \rightarrow N}$ is a homomorphism with finite kernel from a subgroup ${G'}$ of ${G}$ to a nilpotent group ${N}$ of bounded step, and ${P = P(u_1,\ldots,u_r;N_1,\ldots,N_r)}$ is a nilprogression with a bounded number of generators ${u_1,\ldots,u_r}$ in ${N}$ and some lengths ${N_1,\ldots,N_r \gg 1}$, where ${P(u_1,\ldots,u_r;N_1,\ldots,N_r)}$ consists of all the words involving at most ${N_1}$ copies of ${u_1^{\pm 1}}$, ${N_2}$ copies of ${u_2^{\pm 1}}$, and so forth up to ${N_r}$ copies of ${u_r^{\pm 1}}$. One can show (by some nilpotent algebra) that all such coset nilprogressions are ${O(1)}$-approximate groups so long as the step and the rank ${r}$ are bounded (and if ${N_1,\ldots,N_r}$ are sufficiently large).

Our main theorem (which was essentially conjectured independently by Helfgott and by Lindenstrauss) asserts, roughly speaking, that coset nilprogressions are essentially the only examples of approximate groups.

Theorem 1 Let ${A}$ be a ${K}$-approximate group. Then ${A^4}$ contains a coset nilprogression ${P}$ of rank and step ${O_K(1)}$, such that ${A}$ can be covered by ${O_K(1)}$ left-translates of ${P}$.

In the torsion-free abelian case, this result is essentially Freiman’s theorem (with an alternate proof by Ruzsa); for general abelian case, it is due to Green and Ruzsa. Various partial results in this direction for some other groups (e.g. free groups, nilpotent groups, solvable groups, or simple groups of Lie type) are also known; see these previous blog posts for a summary of several of these results.

This result has a number of applications to geometric growth theory, and in particular to variants of Gromov’s theorem of groups of polynomial growth, which asserts that a finitely generated group is of polynomial growth if and only if it is virtually nilpotent. The connection lies in the fact that if the balls ${B_S(R)}$ associated to a finite set of generators ${S}$ has polynomial growth, then some simple volume-packing arguments combined with the pigeonhole principle will show that ${B_S(R)}$ will end up being a ${O(1)}$-approximate group for many radii ${R}$. In fact, since our theorem only needs a single approximate group to obtain virtually nilpotent structure, we are able to obtain some new strengthenings of Gromov’s theorem. For instance, if ${A}$ is any ${K}$-approximate group in a finitely generated group ${G}$ that contains ${B_S(R_0)}$ for some set of generators ${S}$ and some ${R_0}$ that is sufficiently large depending on ${K}$, our theorem implies that ${G}$ is virtually nilpotent, answering a question of Petrunin. Among other things, this gives an alternate proof of a recent result of Kapovitch and Wilking (see also this previous paper of Cheeger and Colding) that a compact manifold of bounded diameter and Ricci curvature at least ${-\epsilon}$ necessarily has a virtually nilpotent fundamental group if ${\epsilon}$ is sufficiently small (depending only on dimension). The main point here is that no lower bound on the injectivity radius is required. Another application is a “Margulis-type lemma”, which asserts that if a metric space ${X}$ has “bounded packing” (in the sense that any ball of radius (say) ${4}$ is covered by a bounded number of balls of radius ${1}$), and ${\Gamma}$ is a group of isometries on ${X}$ that acts discretely (i.e. every orbit has only finitely many elements (counting multiplicity) in each bounded set), then the near-stabiliser ${\{ \gamma \in \Gamma: d(\gamma x, x) \leq \epsilon \}}$ of a point ${x}$ is virtually nilpotent if ${\epsilon}$ is small enough depending on the packing constant.

There are also some variants and refinements to the main theorem proved in the paper, such as an extension to local groups, and also an improvement on the bound on the rank and step from ${O_K(1)}$ to ${O(\log K)}$ (but at the cost of replacing ${A^4}$ in the theorem with ${A^{O(1)}}$).

I’ll be discussing the proof of the main theorem in detail in the next few lecture notes of my current graduate course. The full proof is somewhat lengthy (occupying about 50 pages of the 90-page paper), but can be summarised in the following steps:

1. (Hrushovski) Take an arbitrary sequence ${A_n}$ of finite ${K}$-approximate groups, and show that an appropriate limit ${A}$ of such groups can be “modeled” in some sense by an open bounded subset of a locally compact group. (The precise definition of “model” is technical, but “macroscopically faithful representation” is a good first approximation.) As discussed in the previous lecture notes, we use an ultralimit for this purpose; the paper of Hrushovski where this strategy was first employed also considered more sophisticated model-theoretic limits. To build a locally compact topology, Hrushovski used some tools from definability theory; in our paper, we instead use a combinatorial lemma of Sanders (closely related to a similar result of Croot and Sisask.)
2. (Gleason-Yamabe) The locally compact group can in turn be “modeled” by a Lie group (possibly after shrinking the group, and thus the ultralimit ${A}$, slightly). (This result arose from the solution to Hilbert’s fifth problem, as discussed here. For our extension to local groups, we use a recent local version of the Gleason-Yamabe theorem, due to Goldbring.)
3. (Gleason) Using the escape properties of the Lie model, construct a norm ${\| \|}$ (and thus a left-invariant metric ${d}$) on the ultralimit approximate group ${A}$ (and also on the finitary groups ${A_n}$) that obeys a number of good properties, such as a commutator estimate ${\| [g,h]\| \ll \|g\| \|h\|}$. (This is modeled on an analogous construction used in the theory of Hilbert’s fifth problem, as discussed in this previous set of lecture notes.) This norm is essentially an escape norm associated to (a slight modification) of ${A}$ or ${A_n}$.
4. (Jordan-Bieberbach-Frobenius) We now take advantage of the finite nature of the ${A_n}$ by locating the non-trivial element ${e}$ of ${A_n}$ with minimal escape norm (but one has to first quotient out the elements of zero escape norm first). The commutator estimate mentioned previously ensures that this element is essentially “central” in ${A_n}$. One can then quotient out a progression ${P(e;N)}$ generated by this central element (reducing the dimension of the Lie model by one in the process) and iterates the process until the dimension of the model drops to zero. Reversing the process, this constructs a coset nilprogression inside ${A_n^4}$. This argument is based on the classic proof of Jordan’s theorem due to Bieberbach and Frobenius, as discussed in this blog post.

One quirk of the argument is that it requires one to work in the category of local groups rather than global groups. (This is somewhat analogous to how, in the standard proofs of Freiman’s theorem, one needs to work with the category of Freiman homomorphisms, rather than group homomorphisms.) The reason for this arises when performing the quotienting step in the Jordan-Bieberbach-Frobenius leg of the argument. The obvious way to perform this step (and the thing that we tried first) would be to quotient out by the entire cyclic group ${\langle e \rangle}$ generated by the element ${e}$ of minimal escape norm. However, it turns out that this doesn’t work too well, because the group quotiented out is so “large” that it can create a lot of torsion in the quotient. In particular, elements which used to have positive escape norm, can now become trapped in the quotient of ${A_n}$, thus sending their escape norm to zero. This leads to an inferior conclusion (in which a coset nilprogression is replaced by a more complicated tower of alternating extensions between central progressions and finite groups, similar to the towers encountered in my previous paper on this topic). To prevent this unwanted creation of torsion, one has to truncate the cyclic group ${\langle e \rangle}$ before it escapes ${A_n}$, so that one quotients out by a geometric progression ${P(e;N)}$ rather than the cyclic group. But the operation of quotienting out by a ${P(e;N)}$, which is a local group rather than a global one, cannot be formalised in the category of global groups, but only in the category of local groups. Because of this, we were forced to carry out the entire argument using the language of local groups. As it turns out, the arguments are ultimately more natural in this setting, although there is an initial investment of notation required, given that global group theory is much more familiar and well-developed than local group theory.

One interesting feature of the argument is that it does not use much of the existing theory of Freiman-type theorems, instead building the coset nilprogression directly from the geometric properties of the approximate group. In particular, our argument gives a new proof of Freiman’s theorem in the abelian case, which largely avoids Fourier analysis (except through the use of the theory of Hilbert’s fifth problem, which uses the Peter-Weyl theorem (or, in the abelian case, Pontryagin duality), which is basically a version of Fourier analysis).

Last year, Emmanuel Breuillard, Ben Green, Bob Guralnick, and I wrote a paper entitled “Strongly dense free subgroups of semisimple Lie groups“. The main theorem in that paper asserted that given any semisimple algebraic group ${G(k)}$ over an uncountable algebraically closed field ${k}$, there existed a free subgroup ${\Gamma}$ which was strongly dense in the sense that any non-abelian subgroup of ${\Gamma}$ was Zariski dense in ${G(k)}$. This type of result is useful for establishing expansion in finite simple groups of Lie type, as we will discuss in a subsequent paper.

An essentially equivalent formulation of the main result is that if ${w_1, w_2 \in F_2}$ are two non-commuting elements of the free group ${F_2}$ on two generators, and ${(a, b)}$ is a generic pair of elements in ${G(k) \times G(k)}$, then ${w_1(a,b)}$ and ${w_2(a,b)}$ are not contained in any proper closed algebraic subgroup ${H}$ of ${G(k)}$. Here, “generic” means “outside of at most countably many proper subvarieties”. In most cases, one expects that if ${(a, b)}$ are generically drawn from ${G(k) \times G(k)}$, then ${(w_1(a,b), w_2(a,b))}$ will also be generically drawn from ${G(k) \times G(k)}$, but this is not always the case, which is a key source of difficulty in the paper. For instance, if ${w_2}$ is conjugate to ${w_1}$ in ${F_2}$, then ${w_1(a,b)}$ and ${w_2(a,b)}$ must be conjugate in ${G(k)}$ and so the pair ${(w_1(a,b), w_2(a,b))}$ lie in a proper subvariety of ${G(k) \times G(k)}$. It is currently an open question to determine all the pairs ${w_1, w_2}$ of words for which ${(w_1(a,b), w_2(a,b))}$ is not generic for generic ${a,b}$ (or equivalently, the double word map ${(a,b) \mapsto (w_1(a,b),w_2(a,b))}$ is not dominant).

The main strategy of proof was as follows. It is not difficult to reduce to the case when ${G}$ is simple. Suppose for contradiction that we could find two non-commuting words ${w_1, w_2}$ such that ${w_1(a,b), w_2(a,b)}$ were generically trapped in a proper closed algebraic subgroup. As it turns out, there are only finitely many conjugacy classes of such groups, and so one can assume that ${w_1(a,b), w_2(a,b)}$ were generically trapped in a conjugate ${H^g}$ of a fixed proper closed algebraic subgroup ${H}$. One can show that ${w_1(a,b)}$, ${w_2(a,b)}$, and ${[w_1(a,b),w_2(a,b)]}$ are generically regular semisimple, which implies that ${H}$ is a maximal rank semisimple subgroup. The key step was then to find another proper semisimple subgroup ${H'}$ of ${G}$ which was not a degeneration of ${H}$, by which we mean that there did not exist a pair ${(x,y)}$ in the Zariski closure ${\overline{\bigcup_{g \in G} H^g \times H^g}}$ of the products of conjugates of ${H}$, such that ${x, y}$ generated a Zariski-dense subgroup of ${H'}$. This is enough to establish the theorem, because we could use an induction hypothesis to find ${a,b}$ in ${H'}$ (and hence in ${G(k)}$ such that ${w_1(a,b), w_2(a,b)}$ generated a Zariski-dense subgroup of ${H'}$, which contradicts the hypothesis that ${(w_1(a,b),w_2(a,b))}$ was trapped in ${\bigcup_{g \in G} H^g \times H^g}$ for generic ${(a,b)}$ (and hence in ${\overline{\bigcup_{g \in G} H^g \times H^g}}$ for all ${(a,b)}$.

To illustrate the concept of a degeneration, take ${G(k) = SO(5)}$ and let ${H = SO(3) \times SO(2)}$ be the stabiliser of a non-degenerate ${2}$-space in ${k^5}$. All other stabilisers of non-degenerate ${2}$-spaces are conjugate to ${H}$. However, stabilisers of degenerate ${2}$-spaces are not conjugate to ${H}$, but are still degenerations of ${H}$. For instance, the stabiliser of a totally singular ${2}$-space (which is isomorphic to the affine group on ${k^2}$, extended by ${k}$) is a degeneration of ${H}$.

A significant portion of the paper was then devoted to verifying that for each simple algebraic group ${G}$, and each maximal rank proper semisimple subgroup ${H}$ of ${G}$, one could find another proper semisimple subgroup ${H'}$ which was not a degeneration of ${H}$; roughly speaking, this means that ${H'}$ is so “different” from ${H}$ that no conjugate of ${H}$ can come close to covering ${H'}$. This required using the standard classification of algebraic groups via Dynkin diagrams, and knowledge of the various semisimple subgroups of these groups and their representations (as we used the latter as obstructions to degeneration, for instance one can show that a reducible representation cannot degenerate to an irreducible one).

During the refereeing process for this paper, we discovered that there was precisely one family of simple algebraic groups for which this strategy did not actually work, namely the group ${G = Sp(4) = Spin(5)}$ (or the group ${SO(5)}$ that is double-covered by this group) in characteristic ${3}$. This group (which has Dynkin diagram ${B_2=C_2}$, as discussed in this previous post) has one maximal rank proper semisimple subgroup up to conjugacy, namely ${SO(4)}$, which is the stabiliser of a line in ${k^5}$. To find a proper semisimple group ${H'}$ that is not a degeneration of this group, we basically need to find a subgroup ${H'}$ that does not stabilise any line in ${k^5}$. In characteristic larger than three (or characteristic zero), one can proceed by using the action of ${SL_2(k)}$ on the five-dimensional space ${\hbox{Sym}^4(k^2)}$ of homogeneous degree four polynomials on ${k^2}$, which preserves a non-degenerate symmetric form (the four-fold tensor power of the area form on ${k^2}$) and thus embeds into ${SO(5)}$; as no polynomial is fixed by all of ${SL_2(k)}$, we see that this copy of ${SL_2(k)}$ is not a degeneration of ${H}$.

Unfortunately, in characteristics two and three, the symmetric form on ${\hbox{Sym}^4(k^2)}$ degenerates, and this embedding is lost. In the characteristic two case, one can proceed by using the characteristic ${2}$ fact that ${SO(5)}$ is isomorphic to ${Sp(4)}$ (because in characteristic two, the space of null vectors is a hyperplane, and the symmetric form becomes symplectic on this hyperplane), and thus has an additional maximal rank proper semisimple subgroup ${Sp(2) \times Sp(2)}$ which is not conjugate to the ${SO(4)}$ subgroup. But in characteristic three, it turns out that there are no further semisimple subgroups of ${SO(5)}$ that are not already contained in a conjugate of the ${SO(4)}$. (This is not a difficulty for larger groups such as ${SO(6)}$ or ${SO(7)}$, where there are plenty of other semisimple groups to utilise; it is only this smallish group ${SO(5)}$ that has the misfortune of having exactly one maximal rank proper semisimple group to play with, and not enough other semisimples lying around in characteristic three.)

As a consequence of this issue, our argument does not actually work in the case when the characteristic is three and the semisimple group ${G}$ contains a copy of ${SO(5)}$ (or ${Sp(4)}$), and we have had to modify our paper to delete this case from our results. We believe that such groups still do contain strongly dense free subgroups, but this appears to be just out of reach of our current method.

One thing that this experience has taught me is that algebraic groups behave somewhat pathologically in low characteristic; in particular, intuition coming from the characteristic zero case can become unreliable in characteristic two or three.