You are currently browsing the tag archive for the ‘Gowers uniformity norm’ tag.

Kaisa Matomäki, Xuancheng Shao, Joni Teräväinen, and myself have just uploaded to the arXiv our preprint “Higher uniformity of arithmetic functions in short intervals I. All intervals“. This paper investigates the higher order (Gowers) uniformity of standard arithmetic functions in analytic number theory (and specifically, the Möbius function ${\mu}$, the von Mangoldt function ${\Lambda}$, and the generalised divisor functions ${d_k}$) in short intervals ${(X,X+H]}$, where ${X}$ is large and ${H}$ lies in the range ${X^{\theta+\varepsilon} \leq H \leq X^{1-\varepsilon}}$ for a fixed constant ${0 < \theta < 1}$ (that one would like to be as small as possible). If we let ${f}$ denote one of the functions ${\mu, \Lambda, d_k}$, then there is extensive literature on the estimation of short sums

$\displaystyle \sum_{X < n \leq X+H} f(n)$

and some literature also on the estimation of exponential sums such as

$\displaystyle \sum_{X < n \leq X+H} f(n) e(-\alpha n)$

for a real frequency ${\alpha}$, where ${e(\theta) := e^{2\pi i \theta}}$. For applications in the additive combinatorics of such functions ${f}$, it is also necessary to consider more general correlations, such as polynomial correlations

$\displaystyle \sum_{X < n \leq X+H} f(n) e(-P(n))$

where ${P: {\bf Z} \rightarrow {\bf R}}$ is a polynomial of some fixed degree, or more generally

$\displaystyle \sum_{X < n \leq X+H} f(n) \overline{F}(g(n) \Gamma)$

where ${G/\Gamma}$ is a nilmanifold of fixed degree and dimension (and with some control on structure constants), ${g: {\bf Z} \rightarrow G}$ is a polynomial map, and ${F: G/\Gamma \rightarrow {\bf C}}$ is a Lipschitz function (with some bound on the Lipschitz constant). Indeed, thanks to the inverse theorem for the Gowers uniformity norm, such correlations let one control the Gowers uniformity norm of ${f}$ (possibly after subtracting off some renormalising factor) on such short intervals ${(X,X+H]}$, which can in turn be used to control other multilinear correlations involving such functions.

Traditionally, asymptotics for such sums are expressed in terms of a “main term” of some arithmetic nature, plus an error term that is estimated in magnitude. For instance, a sum such as ${\sum_{X < n \leq X+H} \Lambda(n) e(-\alpha n)}$ would be approximated in terms of a main term that vanished (or is negligible) if ${\alpha}$ is “minor arc”, but would be expressible in terms of something like a Ramanujan sum if ${\alpha}$ was “major arc”, together with an error term. We found it convenient to cancel off such main terms by subtracting an approximant ${f^\sharp}$ from each of the arithmetic functions ${f}$ and then getting upper bounds on remainder correlations such as

$\displaystyle |\sum_{X < n \leq X+H} (f(n)-f^\sharp(n)) \overline{F}(g(n) \Gamma)| \ \ \ \ \ (1)$

(actually for technical reasons we also allow the ${n}$ variable to be restricted further to a subprogression of ${(X,X+H]}$, but let us ignore this minor extension for this discussion). There is some flexibility in how to choose these approximants, but we eventually found it convenient to use the following choices.

• For the Möbius function ${\mu}$, we simply set ${\mu^\sharp = 0}$, as per the Möbius pseudorandomness conjecture. (One could choose a more sophisticated approximant in the presence of a Siegel zero, as I did with Joni in this recent paper, but we do not do so here.)
• For the von Mangoldt function ${\Lambda}$, we eventually went with the Cramér-Granville approximant ${\Lambda^\sharp(n) = \frac{W}{\phi(W)} 1_{(n,W)=1}}$, where ${W = \prod_{p < R} p}$ and ${R = \exp(\log^{1/10} X)}$.
• For the divisor functions ${d_k}$, we used a somewhat complicated-looking approximant ${d_k^\sharp(n) = \sum_{m \leq X^{\frac{k-1}{5k}}} P_m(\log n)}$ for some explicit polynomials ${P_m}$, chosen so that ${d_k^\sharp}$ and ${d_k}$ have almost exactly the same sums along arithmetic progressions (see the paper for details).

The objective is then to obtain bounds on sums such as (1) that improve upon the “trivial bound” that one can get with the triangle inequality and standard number theory bounds such as the Brun-Titchmarsh inequality. For ${\mu}$ and ${\Lambda}$, the Siegel-Walfisz theorem suggests that it is reasonable to expect error terms that have “strongly logarithmic savings” in the sense that they gain a factor of ${O_A(\log^{-A} X)}$ over the trivial bound for any ${A>0}$; for ${d_k}$, the Dirichlet hyperbola method suggests instead that one has “power savings” in that one should gain a factor of ${X^{-c_k}}$ over the trivial bound for some ${c_k>0}$. In the case of the Möbius function ${\mu}$, there is an additional trick (introduced by Matomäki and Teräväinen) that allows one to lower the exponent ${\theta}$ somewhat at the cost of only obtaining “weakly logarithmic savings” of shape ${\log^{-c} X}$ for some small ${c>0}$.

Our main estimates on sums of the form (1) work in the following ranges:

• For ${\theta=5/8}$, one can obtain strongly logarithmic savings on (1) for ${f=\mu,\Lambda}$, and power savings for ${f=d_k}$.
• For ${\theta=3/5}$, one can obtain weakly logarithmic savings for ${f = \mu, d_k}$.
• For ${\theta=5/9}$, one can obtain power savings for ${f=d_3}$.
• For ${\theta=1/3}$, one can obtain power savings for ${f=d_2}$.

Conjecturally, one should be able to obtain power savings in all cases, and lower ${\theta}$ down to zero, but the ranges of exponents and savings given here seem to be the limit of current methods unless one assumes additional hypotheses, such as GRH. The ${\theta=5/8}$ result for correlation against Fourier phases ${e(\alpha n)}$ was established previously by Zhan, and the ${\theta=3/5}$ result for such phases and ${f=\mu}$ was established previously by by Matomäki and Teräväinen.

By combining these results with tools from additive combinatorics, one can obtain a number of applications:

• Direct insertion of our bounds in the recent work of Kanigowski, Lemanczyk, and Radziwill on the prime number theorem on dynamical systems that are analytic skew products gives some improvements in the exponents there.
• We can obtain a “short interval” version of a multiple ergodic theorem along primes established by Frantzikinakis-Host-Kra and Wooley-Ziegler, in which we average over intervals of the form ${(X,X+H]}$ rather than ${[1,X]}$.
• We can obtain a “short interval” version of the “linear equations in primes” asymptotics obtained by Ben Green, Tamar Ziegler, and myself in this sequence of papers, where the variables in these equations lie in short intervals ${(X,X+H]}$ rather than long intervals such as ${[1,X]}$.

We now briefly discuss some of the ingredients of proof of our main results. The first step is standard, using combinatorial decompositions (based on the Heath-Brown identity and (for the ${\theta=3/5}$ result) the Ramaré identity) to decompose ${\mu(n), \Lambda(n), d_k(n)}$ into more tractable sums of the following types:

• Type ${I}$ sums, which are basically of the form ${\sum_{m \leq A:m|n} \alpha(m)}$ for some weights ${\alpha(m)}$ of controlled size and some cutoff ${A}$ that is not too large;
• Type ${II}$ sums, which are basically of the form ${\sum_{A_- \leq m \leq A_+:m|n} \alpha(m)\beta(n/m)}$ for some weights ${\alpha(m)}$, ${\beta(n)}$ of controlled size and some cutoffs ${A_-, A_+}$ that are not too close to ${1}$ or to ${X}$;
• Type ${I_2}$ sums, which are basically of the form ${\sum_{m \leq A:m|n} \alpha(m) d_2(n/m)}$ for some weights ${\alpha(m)}$ of controlled size and some cutoff ${A}$ that is not too large.

The precise ranges of the cutoffs ${A, A_-, A_+}$ depend on the choice of ${\theta}$; our methods fail once these cutoffs pass a certain threshold, and this is the reason for the exponents ${\theta}$ being what they are in our main results.

The Type ${I}$ sums involving nilsequences can be treated by methods similar to those in this previous paper of Ben Green and myself; the main innovations are in the treatment of the Type ${II}$ and Type ${I_2}$ sums.

For the Type ${II}$ sums, one can split into the “abelian” case in which (after some Fourier decomposition) the nilsequence ${F(g(n)\Gamma)}$ is basically of the form ${e(P(n))}$, and the “non-abelian” case in which ${G}$ is non-abelian and ${F}$ exhibits non-trivial oscillation in a central direction. In the abelian case we can adapt arguments of Matomaki and Shao, which uses Cauchy-Schwarz and the equidistribution properties of polynomials to obtain good bounds unless ${e(P(n))}$ is “major arc” in the sense that it resembles (or “pretends to be”) ${\chi(n) n^{it}}$ for some Dirichlet character ${\chi}$ and some frequency ${t}$, but in this case one can use classical multiplicative methods to control the correlation. It turns out that the non-abelian case can be treated similarly. After applying Cauchy-Schwarz, one ends up analyzing the equidistribution of the four-variable polynomial sequence

$\displaystyle (n,m,n',m') \mapsto (g(nm)\Gamma, g(n'm)\Gamma, g(nm') \Gamma, g(n'm'\Gamma))$

as ${n,m,n',m'}$ range in various dyadic intervals. Using the known multidimensional equidistribution theory of polynomial maps in nilmanifolds, one can eventually show in the non-abelian case that this sequence either has enough equidistribution to give cancellation, or else the nilsequence involved can be replaced with one from a lower dimensional nilmanifold, in which case one can apply an induction hypothesis.

For the type ${I_2}$ sum, a model sum to study is

$\displaystyle \sum_{X < n \leq X+H} d_2(n) e(\alpha n)$

which one can expand as

$\displaystyle \sum_{n,m: X < nm \leq X+H} e(\alpha nm).$

We experimented with a number of ways to treat this type of sum (including automorphic form methods, or methods based on the Voronoi formula or van der Corput’s inequality), but somewhat to our surprise, the most efficient approach was an elementary one, in which one uses the Dirichlet approximation theorem to decompose the hyperbolic region ${\{ (n,m) \in {\bf N}^2: X < nm \leq X+H \}}$ into a number of arithmetic progressions, and then uses equidistribution theory to establish cancellation of sequences such as ${e(\alpha nm)}$ on the majority of these progressions. As it turns out, this strategy works well in the regime ${H > X^{1/3+\varepsilon}}$ unless the nilsequence involved is “major arc”, but the latter case is treatable by existing methods as discussed previously; this is why the ${\theta}$ exponent for our ${d_2}$ result can be as low as ${1/3}$.

In a sequel to this paper (currently in preparation), we will obtain analogous results for almost all intervals ${(x,x+H]}$ with ${x}$ in the range ${[X,2X]}$, in which we will be able to lower ${\theta}$ all the way to ${0}$.

Asgar Jamneshan and myself have just uploaded to the arXiv our preprint “The inverse theorem for the ${U^3}$ Gowers uniformity norm on arbitrary finite abelian groups: Fourier-analytic and ergodic approaches“. This paper, which is a companion to another recent paper of ourselves and Or Shalom, studies the inverse theory for the third Gowers uniformity norm

$\displaystyle \| f \|_{U^3(G)}^8 = {\bf E}_{h_1,h_2,h_3,x \in G} \Delta_{h_1} \Delta_{h_2} \Delta_{h_3} f(x)$

on an arbitrary finite abelian group ${G}$, where ${\Delta_h f(x) := f(x+h) \overline{f(x)}}$ is the multiplicative derivative. Our main result is as follows:

Theorem 1 (Inverse theorem for ${U^3(G)}$) Let ${G}$ be a finite abelian group, and let ${f: G \rightarrow {\bf C}}$ be a ${1}$-bounded function with ${\|f\|_{U^3(G)} \geq \eta}$ for some ${0 < \eta \leq 1/2}$. Then:
• (i) (Correlation with locally quadratic phase) There exists a regular Bohr set ${B(S,\rho) \subset G}$ with ${|S| \ll \eta^{-O(1)}}$ and ${\exp(-\eta^{-O(1)}) \ll \rho \leq 1/2}$, a locally quadratic function ${\phi: B(S,\rho) \rightarrow {\bf R}/{\bf Z}}$, and a function ${\xi: G \rightarrow \hat G}$ such that

$\displaystyle {\bf E}_{x \in G} |{\bf E}_{h \in B(S,\rho)} f(x+h) e(-\phi(h)-\xi(x) \cdot h)| \gg \eta^{O(1)}.$

• (ii) (Correlation with nilsequence) There exists an explicit degree two filtered nilmanifold ${H/\Lambda}$ of dimension ${O(\eta^{-O(1)})}$, a polynomial map ${g: G \rightarrow H/\Lambda}$, and a Lipschitz function ${F: H/\Lambda \rightarrow {\bf C}}$ of constant ${O(\exp(\eta^{-O(1)}))}$ such that

$\displaystyle |{\bf E}_{x \in G} f(x) \overline{F}(g(x))| \gg \exp(-\eta^{-O(1)}).$

Such a theorem was proven by Ben Green and myself in the case when ${|G|}$ was odd, and by Samorodnitsky in the ${2}$-torsion case ${G = {\bf F}_2^n}$. In all cases one uses the “higher order Fourier analysis” techniques introduced by Gowers. After some now-standard manipulations (using for instance what is now known as the Balog-Szemerédi-Gowers lemma), one arrives (for arbitrary ${G}$) at an estimate that is roughly of the form

$\displaystyle |{\bf E}_{x \in G} {\bf E}_{h,k \in B(S,\rho)} f(x+h+k) b(x,k) b(x,h) e(-B(h,k))| \gg \eta^{O(1)}$

where ${b}$ denotes various ${1}$-bounded functions whose exact values are not too important, and ${B: B(S,\rho) \times B(S,\rho) \rightarrow {\bf R}/{\bf Z}}$ is a symmetric locally bilinear form. The idea is then to “integrate” this form by expressing it in the form

$\displaystyle B(h,k) = \phi(h+k) - \phi(h) - \phi(k) \ \ \ \ \ (1)$

for some locally quadratic ${\phi: B(S,\rho) \rightarrow {\bf C}}$; this then allows us to write the above correlation as

$\displaystyle |{\bf E}_{x \in G} {\bf E}_{h,k \in B(S,\rho)} f(x+h+k) e(-\phi(h+k)) b(x,k) b(x,h)| \gg \eta^{O(1)}$

(after adjusting the ${b}$ functions suitably), and one can now conclude part (i) of the above theorem using some linear Fourier analysis. Part (ii) follows by encoding locally quadratic phase functions as nilsequences; for this we adapt an algebraic construction of Manners.

So the key step is to obtain a representation of the form (1), possibly after shrinking the Bohr set ${B(S,\rho)}$ a little if needed. This has been done in the literature in two ways:

• When ${|G|}$ is odd, one has the ability to divide by ${2}$, and on the set ${2 \cdot B(S,\frac{\rho}{10}) = \{ 2x: x \in B(S,\frac{\rho}{10})\}}$ one can establish (1) with ${\phi(h) := B(\frac{1}{2} h, h)}$. (This is similar to how in single variable calculus the function ${x \mapsto \frac{1}{2} x^2}$ is a function whose second derivative is equal to ${1}$.)
• When ${G = {\bf F}_2^n}$, then after a change of basis one can take the Bohr set ${B(S,\rho)}$ to be ${{\bf F}_2^m}$ for some ${m}$, and the bilinear form can be written in coordinates as

$\displaystyle B(h,k) = \sum_{1 \leq i,j \leq m} a_{ij} h_i k_j / 2 \hbox{ mod } 1$

for some ${a_{ij} \in {\bf F}_2}$ with ${a_{ij}=a_{ji}}$. The diagonal terms ${a_{ii}}$ cause a problem, but by subtracting off the rank one form ${(\sum_{i=1}^m a_{ii} h_i) ((\sum_{i=1}^m a_{ii} k_i) / 2}$ one can write

$\displaystyle B(h,k) = \sum_{1 \leq i,j \leq m} b_{ij} h_i k_j / 2 \hbox{ mod } 1$

on the orthogonal complement of ${(a_{11},\dots,a_{mm})}$ for some coefficients ${b_{ij}=b_{ji}}$ which now vanish on the diagonal: ${b_{ii}=0}$. One can now obtain (1) on this complement by taking

$\displaystyle \phi(h) := \sum_{1 \leq i < j \leq m} b_{ij} h_i h_k / 2 \hbox{ mod } 1.$

In our paper we can now treat the case of arbitrary finite abelian groups ${G}$, by means of the following two new ingredients:

• (i) Using some geometry of numbers, we can lift the group ${G}$ to a larger (possibly infinite, but still finitely generated) abelian group ${G_S}$ with a projection map ${\pi: G_S \rightarrow G}$, and find a globally bilinear map ${\tilde B: G_S \times G_S \rightarrow {\bf R}/{\bf Z}}$ on the latter group, such that one has a representation

$\displaystyle B(\pi(x), \pi(y)) = \tilde B(x,y) \ \ \ \ \ (2)$

of the locally bilinear form ${B}$ by the globally bilinear form ${\tilde B}$ when ${x,y}$ are close enough to the origin.
• (ii) Using an explicit construction, one can show that every globally bilinear map ${\tilde B: G_S \times G_S \rightarrow {\bf R}/{\bf Z}}$ has a representation of the form (1) for some globally quadratic function ${\tilde \phi: G_S \rightarrow {\bf R}/{\bf Z}}$.

To illustrate (i), consider the Bohr set ${B(S,1/10) = \{ x \in {\bf Z}/N{\bf Z}: \|x/N\|_{{\bf R}/{\bf Z}} < 1/10\}}$ in ${G = {\bf Z}/N{\bf Z}}$ (where ${\|\|_{{\bf R}/{\bf Z}}}$ denotes the distance to the nearest integer), and consider a locally bilinear form ${B: B(S,1/10) \times B(S,1/10) \rightarrow {\bf R}/{\bf Z}}$ of the form ${B(x,y) = \alpha x y \hbox{ mod } 1}$ for some real number ${\alpha}$ and all integers ${x,y \in (-N/10,N/10)}$ (which we identify with elements of ${G}$. For generic ${\alpha}$, this form cannot be extended to a globally bilinear form on ${G}$; however if one lifts ${G}$ to the finitely generated abelian group

$\displaystyle G_S := \{ (x,\theta) \in {\bf Z}/N{\bf Z} \times {\bf R}: \theta = x/N \hbox{ mod } 1 \}$

(with projection map ${\pi: (x,\theta) \mapsto x}$) and introduces the globally bilinear form ${\tilde B: G_S \times G_S \rightarrow {\bf R}/{\bf Z}}$ by the formula

$\displaystyle \tilde B((x,\theta),(y,\sigma)) = N^2 \alpha \theta \sigma \hbox{ mod } 1$

then one has (2) when ${\theta,\sigma}$ lie in the interval ${(-1/10,1/10)}$. A similar construction works for higher rank Bohr sets.

To illustrate (ii), the key case turns out to be when ${G_S}$ is a cyclic group ${{\bf Z}/N{\bf Z}}$, in which case ${\tilde B}$ will take the form

$\displaystyle \tilde B(x,y) = \frac{axy}{N} \hbox{ mod } 1$

for some integer ${a}$. One can then check by direct construction that (1) will be obeyed with

$\displaystyle \tilde \phi(x) = \frac{a \binom{x}{2}}{N} - \frac{a x \binom{N}{2}}{N^2} \hbox{ mod } 1$

regardless of whether ${N}$ is even or odd. A variant of this construction also works for ${{\bf Z}}$, and the general case follows from a short calculation verifying that the claim (ii) for any two groups ${G_S, G'_S}$ implies the corresponding claim (ii) for the product ${G_S \times G'_S}$.

This concludes the Fourier-analytic proof of Theorem 1. In this paper we also give an ergodic theory proof of (a qualitative version of) Theorem 1(ii), using a correspondence principle argument adapted from this previous paper of Ziegler, and myself. Basically, the idea is to randomly generate a dynamical system on the group ${G}$, by selecting an infinite number of random shifts ${g_1, g_2, \dots \in G}$, which induces an action of the infinitely generated free abelian group ${{\bf Z}^\omega = \bigcup_{n=1}^\infty {\bf Z}^n}$ on ${G}$ by the formula

$\displaystyle T^h x := x + \sum_{i=1}^\infty h_i g_i.$

Much as the law of large numbers ensures the almost sure convergence of Monte Carlo integration, one can show that this action is almost surely ergodic (after passing to a suitable Furstenberg-type limit ${X}$ where the size of ${G}$ goes to infinity), and that the dynamical Host-Kra-Gowers seminorms of that system coincide with the combinatorial Gowers norms of the original functions. One is then well placed to apply an inverse theorem for the third Host-Kra-Gowers seminorm ${U^3(X)}$ for ${{\bf Z}^\omega}$-actions, which was accomplished in the companion paper to this one. After doing so, one almost gets the desired conclusion of Theorem 1(ii), except that after undoing the application of the Furstenberg correspondence principle, the map ${g: G \rightarrow H/\Lambda}$ is merely an almost polynomial rather than a polynomial, which roughly speaking means that instead of certain derivatives of ${g}$ vanishing, they instead are merely very small outside of a small exceptional set. To conclude we need to invoke a “stability of polynomials” result, which at this level of generality was first established by Candela and Szegedy (though we also provide an independent proof here in an appendix), which roughly speaking asserts that every approximate polynomial is close in measure to an actual polynomial. (This general strategy is also employed in the Candela-Szegedy paper, though in the absence of the ergodic inverse theorem input that we rely upon here, the conclusion is weaker in that the filtered nilmanifold ${H/\Lambda}$ is replaced with a general space known as a “CFR nilspace”.)

This transference principle approach seems to work well for the higher step cases (for instance, the stability of polynomials result is known in arbitrary degree); the main difficulty is to establish a suitable higher step inverse theorem in the ergodic theory setting, which we hope to do in future research.

Ben Green and I have (finally!) uploaded to the arXiv our paper “New bounds for Szemerédi’s theorem, III: A polylogarithmic bound for ${r_4(N)}$“, submitted to Mathematika. This is the sequel to two previous papers (and an erratum to the former paper), concerning quantitative versions of Szemerédi’s theorem in the case of length four progressions. This sequel has been delayed for over a decade for a number of reasons, but we have finally managed to write the arguments up to our satisfaction and submit it (to a special issue of Mathematika honouring the work of Klaus Roth).

For any natural number ${N}$, define ${r_4(N)}$ to be the largest cardinality of a subset ${A}$ of ${[N] = \{1,\dots,N\}}$ which does not contain any non-trivial arithmetic progressions ${a, a+r, a+2r, a+3r}$ of length four (where “non-trivial” means that ${r}$ is non-zero). Trivially we have ${r_4(N) \leq N}$. In 1969, Szemerédi showed that ${r_4(N) = o(N)}$. However, the decay rate that could be theoretically extracted from this argument (and from several subsequent proofs of this bound, including one by Roth) were quite poor. The first significant quantitative bound on this quantity was by Gowers, who showed that ${r_4(N) \ll N (\log \log N)^{-c}}$ for some absolute constant ${c>0}$. In the second paper in the above-mentioned series, we managed to improve this bound to ${r_4(N) \ll N \exp( - c \sqrt{\log \log N})}$. In this paper, we improve the bound further to ${r_4(N) \ll N (\log N)^{-c}}$, which seems to be the limit of the methods. (We remark that if we could take ${c}$ to be larger than one, this would imply the length four case of a well known conjecture of Erdös that any set of natural numbers whose sum of reciprocals diverges would contain arbitrarily long arithmetic progressions. Thanks to the work of Sanders and of Bloom, the corresponding case of the conjecture for length three conjectures is nearly settled, as it is known that for the analogous bound on ${r_3(N)}$ one can take any ${c}$ less than one.)

Most of the previous work on bounding ${r_4(N)}$ relied in some form or another on the density increment argument introduced by Roth back in 1953; roughly speaking, the idea is to show that if a dense subset ${A}$ of ${[N]}$ fails to contain arithmetic progressions of length four, one seeks to then locate a long subprogression of ${[N]}$ in which ${A}$ has increased density. This was the basic method for instance underlying our previous bound ${r_4(N) \ll N \exp( - c \sqrt{\log \log N})}$, as well as a finite field analogue of the bound ${r_4(N) \ll N (\log N)^{-c}}$; however we encountered significant technical difficulties for several years in extending this argument to obtain the result of the current paper. Our method is instead based on “energy increment arguments”, and more specifically on establishing quantitative version of a Khintchine-type recurrence theorem, similar to the qualitative recurrence theorems established (in the ergodic theory context) by Bergelson-Host-Kra, and (in the current combinatorial context) by Ben Green and myself.

One way to phrase the latter recurrence theorem is as follows. Suppose that ${A \subset [N]}$ has density ${\delta}$. Then one would expect a “randomly” selected arithmetic progression ${{\bf a}, {\bf a}+{\bf r}, {\bf a}+2{\bf r}, {\bf a}+3{\bf r}}$ in ${[N]}$ (using the convention that random variables will be in boldface) to be contained in ${A}$ with probability about ${\delta^4}$. This is not true in general, however it was shown by Ben and myself that for any ${\eta>0}$, there was a set of shifts ${r \in [-N,N]}$ of cardinality ${\gg_{\delta,\eta} N}$, such that for any such ${r}$ one had

$\displaystyle {\bf P}( {\bf a}, {\bf a}+r, {\bf a}+2r, {\bf a}+3r \in A ) \geq \delta^4 - \eta$

if ${{\bf a}}$ was chosen uniformly at random from ${[N]}$. This easily implies that ${r_4(N) = o(N)}$, but does not give a particularly good bound on the decay rate, because the implied constant in the cardinality lower bound ${\gg_{\delta,\eta} N}$ is quite poor (in fact of tower-exponential type, due to the use of regularity lemmas!), and so one has to take ${N}$ to be extremely large compared to ${\delta,\eta}$ to avoid the possibility that the set of shifts in the above theorem consists only of the trivial shift ${r=0}$.

We do not know how to improve the lower bound on the set of shifts to the point where it can give bounds that are competitive with those in this paper. However, we can obtain better quantitative results if we permit ourselves to couple together the two parameters ${{\bf a}}$ and ${{\bf r}}$ of the length four progression. Namely, with ${A}$, ${\delta}$, ${\eta}$ as above, we are able to show that there exist random variables ${{\bf a}, {\bf r}}$, not necessarily independent, such that

$\displaystyle {\bf P}( {\bf a}, {\bf a}+{\bf r}, {\bf a}+2{\bf r}, {\bf a}+3{\bf r} \in A ) \geq \delta^4 - \eta \ \ \ \ \ (1)$

and such that we have the non-degeneracy bound

$\displaystyle {\bf P}( {\bf r} = 0 ) \ll \exp( \eta^{-O(1)} ) / N.$

This then easily implies the main theorem.

The energy increment method is then deployed to locate a good pair ${({\bf a}, {\bf r})}$ of random variables that will obey the above bounds. One can get some intuition on how to proceed here by considering some model cases. Firstly one can consider a “globally quadratically structured” case in which the indicator function ${1_A}$ “behaves like” a globally quadratic function such as ${F( \alpha n^2 )}$, for some irrational ${\alpha}$ and some smooth periodic function ${F: {\bf R}/{\bf Z} \rightarrow {\bf R}}$ of mean ${\delta}$. If one then takes ${{\bf a}, {\bf r}}$ to be uniformly distributed in ${[N]}$ and ${[-\varepsilon N, \varepsilon N]}$ respectively for some small ${\varepsilon>0}$, with no coupling between the two variables, then the left-hand side of (1) is approximately of the form

$\displaystyle \int_{(x,y,z,w) \in ({\bf R}/{\bf Z})^4: x-3y+3z-w = 0} F(x) F(y) F(z) F(w) \ \ \ \ \ (2)$

where the integral is with respect to the probability Haar measure, and the constraint ${x-3y+3z-w=0}$ ultimately arises from the algebraic constraint

$\displaystyle \alpha {\bf a}^2 - 3 \alpha ({\bf a}+{\bf r})^2 + 3 \alpha ({\bf a}+2{\bf r})^2 - \alpha ({\bf a}+3{\bf r})^2 = 0.$

However, an application of the Cauchy-Schwarz inequality and Fubini’s theorem shows that the integral in (2) is at least ${(\int_{{\bf R}/{\bf Z}} F)^4}$, which (morally at least) gives (1) in this case.

Due to the nature of the energy increment argument, it also becomes necessary to consider “locally quadratically structured” cases, in which ${[N]}$ is partitioned into some number of structured pieces ${B_c}$ (think of these as arithmetic progressions, or as “Bohr sets), and on each piece ${B_c}$, ${1_A}$ behaves like a locally quadratic function such as ${F_c( \alpha_c n^2 )}$, where ${\alpha_c}$ now varies with ${c}$, and the mean of ${F_c}$ will be approximately ${\delta}$ on the average after averaging in ${c}$ (weighted by the size of the pieces ${B_c}$). Now one should select ${{\bf a}}$ and ${{\bf r}}$ in the following coupled manner: first one chooses ${{\bf a}}$ uniformly from ${[N]}$, then one defines ${{\bf c}}$ to be the label ${c}$ such that ${{\bf a} \in B_c}$, and then selects ${{\bf r}}$ uniformly from a set ${B_{c,\varepsilon}}$ which is related to ${B_c}$ in much the same way that ${[-\varepsilon N, \varepsilon N]}$ is related to ${[N]}$. If one does this correctly, the analogue of (2) becomes

$\displaystyle {\bf E} \int_{(x,y,z,w) \in ({\bf R}/{\bf Z})^4: x-3y+3z-w = 0} F_{\mathbf c}(x) F_{\mathbf c}(y) F_{\mathbf c}(z) F_{\mathbf c}(w),$

and one can again use Cauchy-Schwarz and Fubini’s theorem to conclude.

The general case proceeds, very roughly, by an iterative argument. At each stage of the iteration, one has some sort of quadratic model of ${1_A}$ which involves a decomposition of ${[N]}$ into structured pieces ${B_c}$, and a quadratic approximation to ${1_A}$ on each piece. If this approximation is accurate enough (or more precisely, if a certain (averaged) local Gowers uniformity norm ${U^3}$ of the error is small enough) to model the count in (1) (for random variables ${{\bf a}, {\bf r}}$ determined by the above partition of ${[N]}$ into pieces ${B_c}$), and if the frequencies (such as ${\alpha_c}$) involved in the quadratic approximation are “high rank” or “linearly independent over the rationals” in a suitably quantitative sense, then some version of the above arguments can be made to work. If there are some unwanted linear dependencies in the frequencies, we can do some linear algebra to eliminate one of the frequencies (using some geometry of numbers to keep the quantitative bounds under control) and continue the iteration. If instead the approximation is too inaccurate, then the error will be large in a certain averaged local Gowers uniformity norm ${U^3}$. A significant fraction of the paper is then devoted to establishing a quantitative inverse theorem for that norm that concludes (with good bounds) that the error must then locally correlate with locally quadratic phases, which can be used to refine the quadratic approximation to ${1_A}$ in a manner that significantly increases its “energy” (basically an ${L^2}$ norm). Such energy increments cannot continue indefinitely, and when they terminate we obtain the desired claim.

There are existing inverse theorems for ${U^3}$ type norms in the literature, going back to the pioneering work of Gowers mentioned previously, and relying on arithmetic combinatorics tools such as Freiman’s theorem and the Balog-Szemerédi-Gowers lemma, which are good for analysing the “${1\%}$-structured homomorphisms” that arise in Gowers’ argument. However, when we applied these methods to the local Gowers norms we obtained inferior quantitative results that were not strong enough for our application. Instead, we use arguments from a different paper of Gowers in which he tackled Szemerédi’s theorem for arbitrary length progressions. This method produces “${99\%}$-structured homomorphisms” associated to any function with large Gowers uniformity norm; however the catch is that such homomorphisms are initially supported only on a sparse unstructured set, rather than a structured set such as a Bohr set. To proceed further, one first has to locate inside the sparse unstructured set a sparse pseudorandom subset of a Bohr set, and then use “error-correction” type methods (such as “majority-vote” based algorithms) to locally upgrade this ${99\%}$-structured homomorphism on pseudorandom subsets of Bohr sets to a ${100\%}$-structured homomorphism on the entirety of a Bohr set. It is then possible to use some “approximate cohomology” tools to “integrate” these homomorphisms (and discern a key “local symmetry” property of these homomorphisms) to locate the desired local quadratic structure (in much the same fashion that a ${1}$-form on ${{\bf R}^n}$ that varies linearly with the coordinates can be integrated to be the derivative of a quadratic function if we know that the ${1}$-form is closed). These portions of the paper are unfortunately rather technical, but broadly follow the methods already used in previous literature.

Let ${\lambda}$ denote the Liouville function. The prime number theorem is equivalent to the estimate

$\displaystyle \sum_{n \leq x} \lambda(n) = o(x)$

as ${x \rightarrow \infty}$, that is to say that ${\lambda}$ exhibits cancellation on large intervals such as ${[1,x]}$. This result can be improved to give cancellation on shorter intervals. For instance, using the known zero density estimates for the Riemann zeta function, one can establish that

$\displaystyle \int_X^{2X} |\sum_{x \leq n \leq x+H} \lambda(n)|\ dx = o( HX ) \ \ \ \ \ (1)$

as ${X \rightarrow \infty}$ if ${X^{1/6+\varepsilon} \leq H \leq X}$ for some fixed ${\varepsilon>0}$; I believe this result is due to Ramachandra (see also Exercise 21 of this previous blog post), and in fact one could obtain a better error term on the right-hand side that for instance gained an arbitrary power of ${\log X}$. On the Riemann hypothesis (or the weaker density hypothesis), it was known that the ${X^{1/6+\varepsilon}}$ could be lowered to ${X^\varepsilon}$.

Early this year, there was a major breakthrough by Matomaki and Radziwill, who (among other things) showed that the asymptotic (1) was in fact valid for any ${H = H(X)}$ with ${H \leq X}$ that went to infinity as ${X \rightarrow \infty}$, thus yielding cancellation on extremely short intervals. This has many further applications; for instance, this estimate, or more precisely its extension to other “non-pretentious” bounded multiplicative functions, was a key ingredient in my recent solution of the Erdös discrepancy problem, as well as in obtaining logarithmically averaged cases of Chowla’s conjecture, such as

$\displaystyle \sum_{n \leq x} \frac{\lambda(n) \lambda(n+1)}{n} = o(\log x). \ \ \ \ \ (2)$

It is of interest to twist the above estimates by phases such as the linear phase ${n \mapsto e(\alpha n) := e^{2\pi i \alpha n}}$. In 1937, Davenport showed that

$\displaystyle \sup_\alpha |\sum_{n \leq x} \lambda(n) e(\alpha n)| \ll_A x \log^{-A} x$

which of course improves the prime number theorem. Recently with Matomaki and Radziwill, we obtained a common generalisation of this estimate with (1), showing that

$\displaystyle \sup_\alpha \int_X^{2X} |\sum_{x \leq n \leq x+H} \lambda(n) e(\alpha n)|\ dx = o(HX) \ \ \ \ \ (3)$

as ${X \rightarrow \infty}$, for any ${H = H(X) \leq X}$ that went to infinity as ${X \rightarrow \infty}$. We were able to use this estimate to obtain an averaged form of Chowla’s conjecture.

In that paper, we asked whether one could improve this estimate further by moving the supremum inside the integral, that is to say to establish the bound

$\displaystyle \int_X^{2X} \sup_\alpha |\sum_{x \leq n \leq x+H} \lambda(n) e(\alpha n)|\ dx = o(HX) \ \ \ \ \ (4)$

as ${X \rightarrow \infty}$, for any ${H = H(X) \leq X}$ that went to infinity as ${X \rightarrow \infty}$. This bound is asserting that ${\lambda}$ is locally Fourier-uniform on most short intervals; it can be written equivalently in terms of the “local Gowers ${U^2}$ norm” as

$\displaystyle \int_X^{2X} \sum_{1 \leq a \leq H} |\sum_{x \leq n \leq x+H} \lambda(n) \lambda(n+a)|^2\ dx = o( H^3 X )$

from which one can see that this is another averaged form of Chowla’s conjecture (stronger than the one I was able to prove with Matomaki and Radziwill, but a consequence of the unaveraged Chowla conjecture). If one inserted such a bound into the machinery I used to solve the Erdös discrepancy problem, it should lead to further averaged cases of Chowla’s conjecture, such as

$\displaystyle \sum_{n \leq x} \frac{\lambda(n) \lambda(n+1) \lambda(n+2)}{n} = o(\log x), \ \ \ \ \ (5)$

though I have not fully checked the details of this implication. It should also have a number of new implications for sign patterns of the Liouville function, though we have not explored these in detail yet.

One can write (4) equivalently in the form

$\displaystyle \int_X^{2X} \sum_{x \leq n \leq x+H} \lambda(n) e( \alpha(x) n + \beta(x) )\ dx = o(HX) \ \ \ \ \ (6)$

uniformly for all ${x}$-dependent phases ${\alpha(x), \beta(x)}$. In contrast, (3) is equivalent to the subcase of (6) when the linear phase coefficient ${\alpha(x)}$ is independent of ${x}$. This dependency of ${\alpha(x)}$ on ${x}$ seems to necessitate some highly nontrivial additive combinatorial analysis of the function ${x \mapsto \alpha(x)}$ in order to establish (4) when ${H}$ is small. To date, this analysis has proven to be elusive, but I would like to record what one can do with more classical methods like Vaughan’s identity, namely:

Proposition 1 The estimate (4) (or equivalently (6)) holds in the range ${X^{2/3+\varepsilon} \leq H \leq X}$ for any fixed ${\varepsilon>0}$. (In fact one can improve the right-hand side by an arbitrary power of ${\log X}$ in this case.)

The values of ${H}$ in this range are far too large to yield implications such as new cases of the Chowla conjecture, but it appears that the ${2/3}$ exponent is the limit of “classical” methods (at least as far as I was able to apply them), in the sense that one does not do any combinatorial analysis on the function ${x \mapsto \alpha(x)}$, nor does one use modern equidistribution results on “Type III sums” that require deep estimates on Kloosterman-type sums. The latter may shave a little bit off of the ${2/3}$ exponent, but I don’t see how one would ever hope to go below ${1/2}$ without doing some non-trivial combinatorics on the function ${x \mapsto \alpha(x)}$. UPDATE: I have come across this paper of Zhan which uses mean-value theorems for L-functions to lower the ${2/3}$ exponent to ${5/8}$.

Let me now sketch the proof of the proposition, omitting many of the technical details. We first remark that known estimates on sums of the Liouville function (or similar functions such as the von Mangoldt function) in short arithmetic progressions, based on zero-density estimates for Dirichlet ${L}$-functions, can handle the “major arc” case of (4) (or (6)) where ${\alpha}$ is restricted to be of the form ${\alpha = \frac{a}{q} + O( X^{-1/6-\varepsilon} )}$ for ${q = O(\log^{O(1)} X)}$ (the exponent here being of the same numerology as the ${X^{1/6+\varepsilon}}$ exponent in the classical result of Ramachandra, tied to the best zero density estimates currently available); for instance a modification of the arguments in this recent paper of Koukoulopoulos would suffice. Thus we can restrict attention to “minor arc” values of ${\alpha}$ (or ${\alpha(x)}$, using the interpretation of (6)).

Next, one breaks up ${\lambda}$ (or the closely related Möbius function) into Dirichlet convolutions using one of the standard identities (e.g. Vaughan’s identity or Heath-Brown’s identity), as discussed for instance in this previous post (which is focused more on the von Mangoldt function, but analogous identities exist for the Liouville and Möbius functions). The exact choice of identity is not terribly important, but the upshot is that ${\lambda(n)}$ can be decomposed into ${\log^{O(1)} X}$ terms, each of which is either of the “Type I” form

$\displaystyle \sum_{d \sim D; m \sim M: dm=n} a_d$

for some coefficients ${a_d}$ that are roughly of logarithmic size on the average, and scales ${D, M}$ with ${D \ll X^{2/3}}$ and ${DM \sim X}$, or else of the “Type II” form

$\displaystyle \sum_{d \sim D; m \sim M: dm=n} a_d b_m$

for some coefficients ${a_d, b_m}$ that are roughly of logarithmic size on the average, and scales ${D,M}$ with ${X^{1/3} \ll D,M \ll X^{2/3}}$ and ${DM \sim X}$. As discussed in the previous post, the ${2/3}$ exponent is a natural barrier in these identities if one is unwilling to also consider “Type III” type terms which are roughly of the shape of the third divisor function ${\tau_3(n) := \sum_{d_1d_2d_3=1} 1}$.

A Type I sum makes a contribution to ${ \sum_{x \leq n \leq x+H} \lambda(n) e( \alpha(x) n + \beta(x) )}$ that can be bounded (via Cauchy-Schwarz) in terms of an expression such as

$\displaystyle \sum_{d \sim D} | \sum_{x/d \leq m \leq x/d+H/d} e(\alpha(x) dm )|^2.$

The inner sum exhibits a lot of cancellation unless ${\alpha(x) d}$ is within ${O(D/H)}$ of an integer. (Here, “a lot” should be loosely interpreted as “gaining many powers of ${\log X}$ over the trivial bound”.) Since ${H}$ is significantly larger than ${D}$, standard Vinogradov-type manipulations (see e.g. Lemma 13 of these previous notes) show that this bad case occurs for many ${d}$ only when ${\alpha}$ is “major arc”, which is the case we have specifically excluded. This lets us dispose of the Type I contributions.

A Type II sum makes a contribution to ${ \sum_{x \leq n \leq x+H} \lambda(n) e( \alpha(x) n + \beta(x) )}$ roughly of the form

$\displaystyle \sum_{d \sim D} | \sum_{x/d \leq m \leq x/d+H/d} b_m e(\alpha(x) dm)|.$

We can break this up into a number of sums roughly of the form

$\displaystyle \sum_{d = d_0 + O( H / M )} | \sum_{x/d_0 \leq m \leq x/d_0 + H/D} b_m e(\alpha(x) dm)|$

for ${d_0 \sim D}$; note that the ${d}$ range is non-trivial because ${H}$ is much larger than ${M}$. Applying the usual bilinear sum Cauchy-Schwarz methods (e.g. Theorem 14 of these notes) we conclude that there is a lot of cancellation unless one has ${\alpha(x) = a/q + O( \frac{X \log^{O(1)} X}{H^2} )}$ for some ${q = O(\log^{O(1)} X)}$. But with ${H \geq X^{2/3+\varepsilon}}$, ${X \log^{O(1)} X/H^2}$ is well below the threshold ${X^{-1/6-\varepsilon}}$ for the definition of major arc, so we can exclude this case and obtain the required cancellation.

Ben Green, Tamar Ziegler and I have just uploaded to the arXiv our paper “An inverse theorem for the Gowers $U^4$ norm“.  This paper establishes the next case of the inverse conjecture for the Gowers norm for the integers (after the $U^3$ case, which was done by Ben and myself a few years ago).  This conjecture has a number of combinatorial and number-theoretic consequences, for instance by combining this new inverse theorem with previous results, one can now get the correct asymptotic for the number of arithmetic progressions of primes of length five in any large interval $[N] = \{1,\ldots,N\}$.

To state the inverse conjecture properly requires a certain amount of notation.  Given a function $f: {\Bbb Z} \to {\Bbb C}$ and a shift $h \in {\Bbb Z}$, define the multiplicative derivative

$\Delta_h f(x) := f(x+h) \overline{f(x)}$

and then define the Gowers $U^{s+1}[N]$ norm of a function $f: [N] \to {\Bbb C}$ to (essentially) be the quantity

$\| f\|_{U^{s+1}[N]} := ({\Bbb E}_{h_1,\ldots,h_{s+1} \in [-N,N]} {\Bbb E}_{x \in [N]} |\Delta_{h_1} \ldots \Delta_{h_{s+1}} f(x)|)^{1/2^{s+1}},$

where we extend f by zero outside of $[N]$.  (Actually, we use a slightly different normalisation to ensure that the function 1 has a $U^{s+1}$ norm of 1, but never mind this for now.)

Informally, the Gowers norm $\|f\|_{U^{s+1}[N]}$ measures the amount of bias present in the $(s+1)^{st}$ multiplicative derivatives of $f$.  In particular, if $f = e(P) := e^{2\pi i P}$ for some polynomial $P: {\Bbb Z} \to {\Bbb C}$, then the $(s+1)^{th}$ derivative of $f$ is identically 1, and so is the Gowers norm.

However, polynomial phases are not the only functions with large Gowers norm.  For instance, consider the function $f(n) := e( \lfloor \sqrt{2} n \rfloor \sqrt{3} n )$, which is what we call a quadratic bracket polynomial phase.  This function isn’t quite quadratic, but it is close enough to being quadratic (because one has the approximate linearity relationship $\lfloor x+y \rfloor = \lfloor x \rfloor + \lfloor y \rfloor$ holding a good fraction of the time) that it turns out that third derivative is trivial fairly often, and the Gowers norm $\|f\|_{U^3[N]}$ is comparable to 1.  This bracket polynomial phase can be modeled as a nilsequence $n \mapsto F( g(n) \Gamma )$, where $n \mapsto g(n) \Gamma$ is a polynomial orbit on a nilmanifold $G/\Gamma$, which in this case has step 2.  (The function F is only piecewise smooth, due to the discontinuity in the floor function $\lfloor \rfloor$, so strictly speaking we would classify this as an almost nilsequence rather than a nilsequence, but let us ignore this technical issue here.)  In fact, there is a very close relationship between nilsequences and bracket polynomial phases, but I will detail this in a later post.

The inverse conjecture for the Gowers norm, GI(s), asserts that such nilsequences are the only obstruction to the Gowers norm being small.  Roughly speaking, it goes like this:

Inverse conjecture, GI(s). (Informal statement)  Suppose that $f: [N] \to {\Bbb C}$ is bounded but has large $U^{s+1}[N]$ norm.  Then there is an s-step nilsequence $n \mapsto F( g(n) \Gamma )$ of “bounded complexity” that correlates with f.

This conjecture is trivial for s=0, is a short consequence of Fourier analysis when s=1, and was proven for s=2 by Ben and myself.  In this paper we establish the s=3 case.  An equivalent formulation in this case is that any bounded function $f$ of large $U^4$ norm must correlate with a “bracket cubic phase”, which is the product of a bounded number of phases from the following list

$e( \alpha n^3 + \beta n^2 + \gamma n), e( \lfloor \alpha n \rfloor \beta n^2 ), e( \lfloor \alpha n \rfloor \lfloor \beta n \rfloor \gamma n ), e( \lfloor \alpha n \rfloor \beta n )$ (*)

for various real numbers $\alpha,\beta,\gamma$.

It appears that our methods also work in higher step, though for technical reasons it is convenient to make a number of adjustments to our arguments to do so, most notably a switch from standard analysis to non-standard analysis, about which I hope to say more later.  But there are a number of simplifications available on the s=3 case which make the argument significantly shorter, and so we will be writing the higher s argument in a separate paper.

The arguments largely follow those for the s=2 case (which in turn are based on this paper of Gowers).  Two major new ingredients are a deployment of a normal form and equidistribution theory for bracket quadratic phases, and a combinatorial decomposition of frequency space which we call the sunflower decomposition.  I will sketch these ideas below the fold.

For a number of reasons, including the start of the summer break for me and my coauthors, a number of papers that we have been working on are being released this week.  For instance, Ben Green and I have just uploaded to the arXiv our paper “An equivalence between inverse sumset theorems and inverse conjectures for the $U^3$ norm“, submitted to Math. Proc. Camb. Phil. Soc..  The main result of this short paper (which was briefly announced in this earlier post) is a connection between two types of inverse theorems in additive combinatorics, namely the inverse sumset theorems of Freiman type, and inverse theorems for the Gowers uniformity norm, and more specifically, for the $U^3$ norm

$\|f\|_{U^3(G)}^8 := {\Bbb E}_{x,a,b,c \in G} f(x) \overline{f(x+a)} \overline{f(x+b)} \overline{f(x+c)} f(x+a+b) f(x+a+c) f(x+b+c) \overline{f(x+a+b+c)}$

on finite additive group G, where $f: G \to {\Bbb C}$ is a complex-valued function.

As usual, the connection is easiest to state in a finite field model such as $G = {\Bbb F}_2^n$.  In this case, we have the following inverse sumset theorem of Ruzsa:

Theorem 1. If $A \subset {\Bbb F}_2^n$ is such that $|A+A| \leq K|A|$, then A can be covered by a translate of a subspace of ${\Bbb F}_2^n$ of cardinality at most $K^2 2^{K^4} |A|$.

The constant $K^2 2^{K^4}$ has been improved for large $K$ in a sequence of papers, from $K 2^{\lfloor K^3 \rfloor-1}$ by Ruzsa, $K^2 2^{K^2-2}$ by Green-Ruzsa, $2^{O(K^{3/2} \log(1+K)}$ by Sanders, $2^{2K+O(\sqrt{K} \log K})$ by Green and myself, and finally $2^{2K+O(\log K)}$ by Konyagin (private communication) which is sharp except for the precise value of the O() implied constant (as can be seen by considering the example when A consists of about 2K independent elements).  However, it is conjectured that the polynomial loss can be removed entirely if one modifies the conclusion slightly:

Conjecture 1. (Polynomial Freiman-Ruzsa conjecture for ${\Bbb F}_2^n$.) If $A \subset {\Bbb F}_2^n$ is such that $|A+A| \leq K|A|$, then A can be covered by $O(K^{O(1)})$ translates of subspaces of ${\Bbb F}_2^n$ of cardinality at most |A|.

This conjecture was verified for downsets by Green and myself, but is open in general.   This conjecture has a number of equivalent formulations; see this paper of Green for more discussion.  In this previous post we show that a stronger version of this conjecture fails.

Meanwhile, for the Gowers norm, we have the following inverse theorem, due to Samorodnitsky:

Theorem 2. Let $f: {\Bbb F}_2^n \to [-1,1]$ be a function whose $U^3$ norm is at least 1/K.  Then there exists a quadratic polynomial $Q: {\Bbb F}_2^n \to {\Bbb F}_2$ such that $|{\Bbb E}_{x \in {\Bbb F}_2^n} f(x) (-1)^{Q(x)}| \geq \exp( - O(K)^{O(1)} )$.

Note that the quadratic phases $(-1)^{Q(x)}$ are the only functions taking values in [-1,1] whose $U^3$ norm attains its maximal value of 1.

It is conjectured that the exponentially weak correlation here can be strengthened to a polynomial one:

Conjecture 2. (Polynomial inverse conjecture for the $U^3({\Bbb F}_2^n)$ norm). Let $f: {\Bbb F}_2^n \to [-1,1]$ be a function whose $U^3$ norm is at least 1/K.  Then there exists a quadratic polynomial $Q: {\Bbb F}_2^n \to {\Bbb F}_2$ such that $|{\Bbb E}_{x \in {\Bbb F}_2^n} f(x) (-1)^{Q(x)}| \geq K^{-O(1)}$.

The first main result of this paper is

Theorem 3. Conjecture 1 and Conjecture 2 are equivalent.

This result was also independently observed by Shachar Lovett (private communication).  We also establish an analogous result for the cyclic group ${\Bbb Z}/N{\Bbb Z}$, in which the notion of polynomial is replaced by that of a subexponential $\exp(K^{o(1)})$, and in which the notion of a quadratic polynomial is replaced by a 2-step nilsequence; the precise statement is a bit technical and will not be given here.  We also observe a partial partial analogue of the correpsondence between inverse sumset theorems and Gowers norms in the higher order case, in particular observing that $U^4$ inverse theorems imply a certain rigidity result for “Freiman-quadratic polynomials” (a quadratic version of Conjecture 3 below).

Below the fold, we sketch the proof of Theorem 3.

Vitaly Bergelson, Tamar Ziegler, and I have just uploaded to the arXiv our paper “An inverse theorem for the uniformity seminorms associated with the action of $F^\infty_p$“. This paper establishes the ergodic inverse theorems that are needed in our other recent paper to establish the inverse conjecture for the Gowers norms over finite fields in high characteristic (and to establish a partial result in low characteristic), as follows:

Theorem. Let ${\Bbb F}$ be a finite field of characteristic p.  Suppose that $X = (X,{\mathcal B},\mu)$ is a probability space with an ergodic measure-preserving action $(T_g)_{g \in {\Bbb F}^\omega}$ of ${\Bbb F}^\omega$.  Let $f \in L^\infty(X)$ be such that the Gowers-Host-Kra seminorm $\|f\|_{U^k(X)}$ (defined in a previous post) is non-zero.

1. In the high-characteristic case $p \geq k$, there exists a phase polynomial g of degree <k (as defined in the previous post) such that $|\int_X f \overline{g}\ d\mu| > 0$.
2. In general characteristic, there exists a phase polynomial of degree <C(k) for some C(k) depending only on k such that $|\int_X f \overline{g}\ d\mu| > 0$.

This theorem is closely analogous to a similar theorem of Host and Kra on ergodic actions of ${\Bbb Z}$, in which the role of phase polynomials is played by functions that arise from nilsystem factors of X.  Indeed, our arguments rely heavily on the machinery of Host and Kra.

The paper is rather technical (60+ pages!) and difficult to describe in detail here, but I will try to sketch out (in very broad brush strokes) what the key steps in the proof of part 2 of the theorem are.  (Part 1 is similar but requires a more delicate analysis at various stages, keeping more careful track of the degrees of various polynomials.)

Tamar Ziegler and I have just uploaded to the arXiv our paper, “The inverse conjecture for the Gowers norm over finite fields via the correspondence principle“, submitted to Analysis & PDE.  As announced a few months ago in this blog post, this paper establishes (most of) the inverse conjecture for the Gowers norm from an ergodic theory analogue of this conjecture (in a forthcoming paper by Vitaly Bergelson, Tamar Ziegler, and myself, which should be ready shortly), using a variant of the Furstenberg correspondence principle.  Our papers were held up for a while due to some unexpected technical difficulties arising in the low characteristic case; as a consequence, our paper only establishes the full inverse conjecture in the high characteristic case $p \geq k$, and gives a partial result in the low characteristic case $p < k$.

In the rest of this post, I would like to describe the inverse conjecture (in both combinatorial and ergodic forms), and sketch how one deduces one from the other via the correspondence principle (together with two additional ingredients, namely a statistical sampling lemma and a local testability result for polynomials).