In Notes 3, we saw that the number of additive patterns in a given set was (in principle, at least) controlled by the Gowers uniformity norms of functions associated to that set.

Such norms can be defined on any finite additive group (and also on some other types of domains, though we will not discuss this point here). In particular, they can be defined on the finite-dimensional vector spaces ${V}$ over a finite field ${{\bf F}}$.

In this case, the Gowers norms ${U^{d+1}(V)}$ are closely tied to the space ${\hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})}$ of polynomials of degree at most ${d}$. Indeed, as noted in Exercise 20 of Notes 4, a function ${f: V \rightarrow {\bf C}}$ of ${L^\infty(V)}$ norm ${1}$ has ${U^{d+1}(V)}$ norm equal to ${1}$ if and only if ${f = e(\phi)}$ for some ${\phi \in \hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})}$; thus polynomials solve the “${100\%}$ inverse problem” for the trivial inequality ${\|f\|_{U^{d+1}(V)} \leq \|f\|_{L^\infty(V)}}$. They are also a crucial component of the solution to the “${99\%}$ inverse problem” and “${1\%}$ inverse problem”. For the former, we will soon show:

Proposition 1 (${99\%}$ inverse theorem for ${U^{d+1}(V)}$) Let ${f: V \rightarrow {\bf C}}$ be such that ${\|f\|_{L^\infty(V)}}$ and ${\|f\|_{U^{d+1}(V)} \geq 1-\epsilon}$ for some ${\epsilon > 0}$. Then there exists ${\phi \in \hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})}$ such that ${\| f - e(\phi)\|_{L^1(V)} = O_{d, {\bf F}}( \epsilon^c )}$, where ${c = c_d > 0}$ is a constant depending only on ${d}$.

Thus, for the Gowers norm to be almost completely saturated, one must be very close to a polynomial. The converse assertion is easily established:

Exercise 1 (Converse to ${99\%}$ inverse theorem for ${U^{d+1}(V)}$) If ${\|f\|_{L^\infty(V)} \leq 1}$ and ${\|f-e(\phi)\|_{L^1(V)} \leq \epsilon}$ for some ${\phi \in \hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})}$, then ${\|F\|_{U^{d+1}(V)} \geq 1 - O_{d,{\bf F}}( \epsilon^c )}$, where ${c = c_d > 0}$ is a constant depending only on ${d}$.

In the ${1\%}$ world, one no longer expects to be close to a polynomial. Instead, one expects to correlate with a polynomial. Indeed, one has

Lemma 2 (Converse to the ${1\%}$ inverse theorem for ${U^{d+1}(V)}$) If ${f: V \rightarrow {\bf C}}$ and ${\phi \in \hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})}$ are such that ${|\langle f, e(\phi) \rangle_{L^2(V)}| \geq \epsilon}$, where ${\langle f, g \rangle_{L^2(V)} := {\bf E}_{x \in G} f(x) \overline{g(x)}}$, then ${\|f\|_{U^{d+1}(V)} \geq \epsilon}$.

Proof: From the definition of the ${U^1}$ norm (equation (18) from Notes 3), the monotonicity of the Gowers norms (Exercise 19 of Notes 3), and the polynomial phase modulation invariance of the Gowers norms (Exercise 21 of Notes 3), one has

$\displaystyle |\langle f, e(\phi) \rangle| = \| f e(-\phi) \|_{U^1(V)}$

$\displaystyle \leq \|f e(-\phi) \|_{U^{d+1}(V)}$

$\displaystyle = \|f\|_{U^{d+1}(V)}$

and the claim follows. $\Box$

In the high characteristic case ${\hbox{char}({\bf F}) > d}$ at least, this can be reversed:

Theorem 3 (${1\%}$ inverse theorem for ${U^{d+1}(V)}$) Suppose that ${\hbox{char}({\bf F}) > d \geq 0}$. If ${f: V \rightarrow {\bf C}}$ is such that ${\|f\|_{L^\infty(V)} \leq 1}$ and ${\|f\|_{U^{d+1}(V)} \geq \epsilon}$, then there exists ${\phi \in \hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})}$ such that ${|\langle f, e(\phi) \rangle_{L^2(V)}| \gg_{\epsilon,d,{\bf F}} 1}$.

This result is sometimes referred to as the inverse conjecture for the Gowers norm (in high, but bounded, characteristic). For small ${d}$, the claim is easy:

Exercise 2 Verify the cases ${d=0,1}$ of this theorem. (Hint: to verify the ${d=1}$ case, use the Fourier-analytic identities ${\|f\|_{U^2(V)} = (\sum_{\xi \in \hat V} |\hat f(\xi)|^4)^{1/4}}$ and ${\|f\|_{L^2(V)} = (\sum_{\xi \in \hat V} |\hat f(\xi)|^2)^{1/2}}$, where ${\hat V}$ is the space of all homomorphisms ${\xi: x \mapsto \xi \cdot x}$ from ${V}$ to ${{\bf R}/{\bf Z}}$, and ${\hat f(\xi) := \mathop{\bf E}_{x \in V} f(x) e(-\xi \cdot x)}$ are the Fourier coefficients of ${f}$.)

This conjecture for larger values of ${d}$ are more difficult to establish. The ${d=2}$ case of the theorem was established by Ben Green and myself in the high characteristic case ${\hbox{char}({\bf F}) > 2}$; the low characteristic case ${\hbox{char}({\bf F}) = d = 2}$ was independently and simultaneously established by Samorodnitsky. The cases ${d>2}$ in the high characteristic case was established in two stages, firstly using a modification of the Furstenberg correspondence principle, due to Ziegler and myself. to convert the problem to an ergodic theory counterpart, and then using a modification of the methods of Host-Kra and Ziegler to solve that counterpart, as done in this paper of Bergelson, Ziegler, and myself.

The situation with the low characteristic case in general is still unclear. In the high characteristic case, we saw from Notes 4 that one could replace the space of non-classical polynomials ${\hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})}$ in the above conjecture with the essentially equivalent space of classical polynomials ${\hbox{Poly}_{\leq d}(V \rightarrow {\bf F})}$. However, as we shall see below, this turns out not to be the case in certain low characteristic cases (a fact first observed by Lovett, Meshulam, and Samorodnitsky, and independently by Ben Green and myself), for instance if ${\hbox{char}({\bf F}) = 2}$ and ${d \geq 3}$; this is ultimately due to the existence in those cases of non-classical polynomials which exhibit no significant correlation with classical polynomials of equal or lesser degree. This distinction between classical and non-classical polynomials appears to be a rather non-trivial obstruction to understanding the low characteristic setting; it may be necessary to obtain a more complete theory of non-classical polynomials in order to fully settle this issue.

The inverse conjecture has a number of consequences. For instance, it can be used to establish the analogue of Szemerédi’s theorem in this setting:

Theorem 4 (Szemerédi’s theorem for finite fields) Let ${{\bf F} = {\bf F}_p}$ be a finite field, let ${\delta > 0}$, and let ${A \subset {\bf F}^n}$ be such that ${|A| \geq \delta |{\bf F}^n|}$. If ${n}$ is sufficiently large depending on ${p,\delta}$, then ${A}$ contains an (affine) line ${\{ x, x+r, \ldots, x+(p-1)r\}}$ for some ${x,r \in {\bf F}^n}$ with ${ r\neq 0}$.

Exercise 3 Use Theorem 4 to establish the following generalisation: with the notation as above, if ${k \geq 1}$ and ${n}$ is sufficiently large depending on ${p,\delta}$, then ${A}$ contains an affine ${k}$-dimensional subspace.

We will prove this theorem in two different ways, one using a density increment method, and the other using an energy increment method. We discuss some other applications below the fold.

— 1. The ${99\%}$ inverse theorem —

We now prove Proposition 1. Results of this type for general ${d}$ appear in this paper of Alon, Kaufman, Krivelevich, Litsyn, and Ron (see also this paper of Sudan, Trevisan, and Vadhan for a precursor result), the ${d=1}$ case was treated previously by by Blum, Luby, and Rubinfeld. The argument here is due to Tamar Ziegler and myself. The argument has a certain “cohomological” flavour (comparing cocycles with coboundaries, determining when a closed form is exact, etc.). Indeed, the inverse theory can be viewed as a sort of “additive combinatorics cohomology”.

Let ${{\bf F}, V, d, f, \epsilon}$ be as in the theorem. We let all implied constants depend on ${d, {\bf F}}$. We use the symbol ${c}$ to denote various positive constants depending only on ${d}$. We may assume ${\epsilon}$ is sufficiently small depending on ${d, {\bf F}}$, as the claim is trivial otherwise.

The case ${d=0}$ is easy, so we assume inductively that ${d \geq 1}$ and that the claim has been already proven for ${d-1}$.

The first thing to do is to make ${f}$ unit magnitude. One easily verifies the crude bound

$\displaystyle \|f\|_{U^{d+1}(V)}^{2^{d+1}} \leq \|f\|_{L^1(V)}$

and thus

$\displaystyle \|f\|_{L^1(V)} \geq 1-O( \epsilon ).$

Since ${|f| \leq 1}$ pointwise, we conclude that

$\displaystyle \mathop{\bf E}_{x \in V} 1 - |f(x)| = O(\epsilon).$

As such, ${f}$ differs from a function ${\tilde f}$ of unit magnitude by ${O(\epsilon)}$ in ${L^1}$ norm. By replacing ${f}$ with ${\tilde f}$ and using the triangle inequality for the Gowers norm (changing ${\epsilon}$ and worsening the constant ${c}$ in Proposition 1 if necessary), we may assume without loss of generality that ${|f|=1}$ throughout, thus ${f = e(\psi)}$ for some ${\psi: V \rightarrow {\bf R}/{\bf Z}}$.

Since

$\displaystyle \|f\|_{U^{d+1}(V)}^{2^{d+1}} = \mathop{\bf E}_{h \in V} \| e( \partial_h \psi ) \|_{U^d(V)}^{2^d}$

we see from Markov’s inequality that

$\displaystyle \| e(\partial_h \psi ) \|_{U^d(V)} \geq 1 - O(\epsilon^c)$

for all ${h}$ in a subset ${H}$ of ${V}$ of density ${1-O(\epsilon^c)}$. Applying the inductive hypothesis, we see that for each such ${h}$, we can find a polynomial ${\phi_h \in \hbox{Poly}_{\leq d-1}(V \rightarrow {\bf R}/{\bf Z})}$ such that

$\displaystyle \| e( \partial_h \psi ) - e(\phi_h ) \|_{L^1(V)} = O(\epsilon^c).$

Now let ${h,k \in H}$. Using the cocycle identity

$\displaystyle e( \partial_{h+k} \psi ) = e(\partial_h \phi) T^h e(\partial_k \phi)$

where ${T^h}$ is the shift operator ${T^h f(x) := f(x+h)}$, we see using Hölder’s inequality that

$\displaystyle \| e( \partial_{h+k} \psi ) - e(\phi_h T^h \phi_k) \|_{L^1(V)} = O(\epsilon^c).$

On the other hand, ${\phi_h T^h \phi_k}$ is a polynomial of order ${d}$. Also, since ${H}$ is so dense, every element ${l}$ of ${V}$ has at least one representation of the form ${l=h+k}$ for some ${h,k \in H}$ (indeed, out of all ${|V|}$ possible representations ${l=h+k}$, ${h}$ or ${k}$ can fall outside of ${H}$ for at most ${O(\epsilon^c |V|)}$ of these representations). We conclude that for every ${l \in V}$ there exists a polynomial ${\phi'_l \in \hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})}$ such that

$\displaystyle \| e( \partial_l \psi ) - e(\phi'_l ) \|_{L^1(V)} = O(\epsilon^c). \ \ \ \ \ (1)$

The new polynomial ${\phi'_l}$ supercedes the old one ${\phi_l}$; to reflect this, we abuse notation and write ${\phi_l}$ for ${\phi'_l}$. Applying the cocycle equation again, we see that

$\displaystyle \| e( \phi_{h+k} ) - e( \phi_h T^h \phi_k ) \|_{L^1(V)} = O(\epsilon^c) \ \ \ \ \ (2)$

for all ${h,k \in V}$. Applying the rigidity of polynomials (Exercise 14 from Notes 4), we conclude that

$\displaystyle \phi_{h+k} = \phi_h T^h \phi_k + c_{h,k}$

for some constant ${c_{h,k} \in {\bf R}/{\bf Z}}$. From (2) we in fact have ${c_{h,k}=O(\epsilon^c)}$ for all ${h,k \in V}$.

The expression ${c_{h,k}}$ is known as a ${2}$-coboundary (see this blog post for more discussion). To eliminate it, we use the finite characteristic to discretise the problem as follows. First, we use the cocycle identity

$\displaystyle \prod_{j=0}^{p-1} e( T^{jh} \partial_h \psi ) = 1$

where ${p}$ is the characteristic of the field. Using (1), we conclude that

$\displaystyle \| \prod_{j=0}^{p-1} e( T^{jh} \phi_h ) - 1 \|_{L^1(V)} = O(\epsilon^c).$

On the other hand, ${T^{jh} \phi_h}$ takes values in some coset of a finite subgroup ${C}$ of ${{\bf R}/{\bf Z}}$ (depending only on ${p,d}$), by Lemma 1 of Notes 4. We conclude that this coset must be a shift of ${C}$ by ${O(\epsilon^c)}$. Since ${\phi_h}$ itself takes values in some coset of a finite subgroup, we conclude that there is a finite subgroup ${C'}$ (depending only on ${p,d}$) such that each ${\phi_h}$ takes values in a shift of ${C'}$ by ${O(\epsilon^c)}$.

Next, we note that we have the freedom to shift each ${\phi_h}$ by ${O(\epsilon^c)}$ (adjusting ${c_{h,k}}$ accordingly) without significantly affecting any of the properties already established. Doing so, we can thus ensure that all the ${\phi_h}$ take values in ${C'}$ itself, which forces ${c_{h,k}}$ to do so also. But since ${c_{h,k} = O(\epsilon^c)}$, we conclude that ${c_{h,k} = 0}$ for all ${h,k}$, thus ${\phi_h}$ is a perfect cocycle:

$\displaystyle \phi_{h+k} = \phi_h T^h \phi_k.$

We may thus integrate ${\phi_h}$ and write ${\phi_h = \partial_h \Phi}$, where ${\Phi(x) := \phi_x(0)}$. Thus ${\partial_h \Phi}$ is a polynomial of degree ${d-1}$ for each ${h}$, thus ${\Phi}$ itself is a polynomial of degree ${d}$. From (1) one has

$\displaystyle \mathop{\bf E}_{x \in V} e( \partial_h (\psi - \Phi) ) = 1 + O( \epsilon^c )$

for all ${h \in V}$; averaging in ${V}$ we conclude that

$\displaystyle |\mathop{\bf E}_{x \in V} e( \psi - \Phi )|^2 = 1 + O( \epsilon^c )$

and thus

$\displaystyle \| e(\psi) - e(\Phi) \|_{L^1(V)} = O(\epsilon^c)$

and Proposition 1 follows.

One consequence of Proposition 1 is that the property of being a classical polynomial of a fixed degree ${d}$ is locally testable, which is a notion of interest in theoretical computer science. More precisely, suppose one is given a large finite vector space ${V}$ and two functions ${\phi_1, \phi_2: V \rightarrow {\bf F}}$. One is told that one of the functions ${\phi_1, \phi_2}$ is a classical polynomial of degree at most ${d}$, while the other is quite far from being such a classical polynomial, in the sense that every polynomial of degree at most ${d}$ will differ with that polynomial on at least ${\epsilon}$ of the values in ${V}$. The task is then to decide with a high degree of confidence which of the functions is a polynomial and which one is not, without inspecting too many of the values of ${\phi_1}$ or ${\phi_2}$.

This can be done as follows. Pick ${x,h_1,\ldots,h_{d+1} \in V}$ at random, and test whether the identities

$\displaystyle \partial_{h_1} \ldots \partial_{h_{d+1}} \phi_1(x) = 0$

and

$\displaystyle \partial_{h_1} \ldots \partial_{h_{d+1}} \phi_2(x) = 0$

hold; note that one only has to inspect ${\phi_1, \phi_2}$ at ${2^{d+1}}$ values in ${V}$ for this. If one of these identities fails, then that function must not be polynomial, and so one has successfully decided which of the functions is polynomials. We claim that the probability that the identity fails for the non-polynomial function is at least ${\delta}$ for some ${\delta \gg_{d,{\bf F}} \epsilon^{O_{d,{\bf F}}(1)}}$, and so if one iterates this test ${O_\delta(1)}$ times, one will be able to successfully solve the problem with probability arbitrarily close to ${1}$. To verify the claim, suppose for contradiction that the identity only failed at most ${\delta}$ of the time for the non-polynomial (say it is ${\phi_2}$); then ${\| e(\phi_2)\|_{U^{d+1}(V)} \geq 1-O(\delta)}$, and thus by Proposition 1, ${\phi_2}$ is very close in ${L^1}$ norm to a polynomial; rounding that polynomial to a root of unity we thus see that ${\phi_2}$ agrees with high accuracy to a classical polynomial, which leads to a contradiction if ${\delta}$ is chosen suitably.

— 2. A partial counterexample in low characteristic —

We now show a distinction between classical polynomials and non-classical polynomials that causes the inverse conjecture to fail in low characteristic if one insists on using classical polynomials. For simplicity we restrict attention to the characteristic two case ${{\bf F} = {\bf F}_2}$. We will use an argument of Alon and Beigel, reproduced in this paper of Green and myself. A different argument (with stronger bounds) appears in this paper of Lovett, Meshulam, and Samorodnitsky.

We work in a standard vector space ${V = {\bf F}^n}$, with standard basis ${e_1,\ldots,e_n}$ and coordinates ${x_1,\ldots,x_n}$. Among all the classical polynomials on this space are the symmetric polynomials

$\displaystyle S_m := \sum_{1 \leq i_1 < \ldots < i_m \leq n} x_{i_1} \ldots x_{i_m},$

which play a special role.

Exercise 4 Let ${L: V \rightarrow {\bf N}}$ be the digit summation function ${L := \# \{ 1 \leq i \leq n: x_i = 1 \}}$. Show that

$\displaystyle S_m = \binom{L}{m} \hbox{ mod } 2.$

Establish Lucas’ theorem

$\displaystyle S_m = S_{2^{j_1}} \ldots S_{2^{j_r}}$

where ${m = 2^{j_1} + \ldots + 2^{j_r}}$, ${j_1 > \ldots>j_r}$ is the binary expansion of ${m}$. Show that ${S_{2^j}}$ is the ${2^j}$ binary coefficient of ${L}$, and conclude that ${S_m}$ is a function of ${L \hbox{ mod } 2^{j_1}}$. (Note: These results are closely related to the well-known fact that Pascal’s triangle modulo ${2}$ takes the form of an infinite Sierpinski gasket.)

We define an an affine coordinate subspace to be a translate of a subspace of ${V}$ generated by some subset of the standard basis vectors ${e_1,\ldots,e_n}$. To put it another way, an affine coordinate subspace is created by freezing some of the coordinates, but letting some other coordinates be arbitrary.

Of course, not all classical polynomials come from symmetric polynomials. However, thanks to an application of Ramsey’s theorem observed by Alon and Biegel, this is true on coordinate subspaces:

Lemma 5 (Ramsey’s theorem for polynomials) Let ${P: {\bf F}^n \rightarrow {\bf F}}$ be a polynomial of degree at most ${d}$. Then one can partition ${{\bf F}^n}$ into affine coordinate subspaces of dimension ${W}$ at least ${\omega_d(n)}$, where ${\omega_d(n) \rightarrow \infty}$ as ${n \rightarrow \infty}$ for fixed ${d}$, such that on each such subspace ${W}$, ${P}$ is equal to a linear combination of the symmetric polynomials ${S_0, S_1, \ldots, S_d}$.

Proof: We induct on ${d}$. The claim is trivial for ${d=0}$, so suppose that ${d \geq 1}$ and the claim has already been proven for smaller ${d}$. The degree ${d}$ term ${P_d}$ of ${P}$ can be written as

$\displaystyle P_d = \sum_{\{i_1,\ldots,i_d\} \in E} x_{i_1} \ldots x_{i_d}$

where ${E}$ is a ${d}$-uniform hypergraph on ${\{1,\ldots,n\}}$, i.e. a collection of ${d}$-element subsets of ${\{1,\ldots,n\}}$. Applying Ramsey’s theorem (for hypergraphs), one can find a subcollection ${j_1,\ldots,j_m}$ of indices with ${m \geq \omega_d(n)}$ such that ${E}$ either has no edges in ${\{j_1,\ldots,j_m\}}$, or else contains all the edges in ${\{j_1,\ldots,j_m\}}$. We then foliate ${{\bf F}^n}$ into the affine subspaces formed by translating the coordinate subspace generated by ${e_{j_1},\ldots,e_{j_m}}$. By construction, we see that on each such subspace, ${P}$ is equal to either ${0}$ or ${S_d}$ plus a polynomial of degree ${d-1}$. The claim then follows by applying the induction hypothesis (and noting that the linear span of ${S_0,\ldots,S_{d-1}}$ on an affine coordinate subspace is equivariant with respect to translation of that subspace). $\Box$

Because of this, if one wants to concoct a function which is almost orthogonal to all polynomials of degree at most ${d}$, it will suffice to build a function which is almost orthogonal to the symmetric polynomials ${S_0,\ldots,S_d}$ on all affine coordinate subspaces of moderately large size. Pursuing this idea, we are led to

Proposition 6 (Counterexample to classical inverse conjecture) Let ${d \geq 1}$, and let ${f: {\bf F}_2^n \rightarrow S^1}$ be the function ${f := e(L/2^d)}$, where ${L}$ is as in Exercise 4. Then ${L/2^d \hbox{ mod } 1}$ is a non-classical polynomial of degree at most ${d}$, and so ${\|f\|_{U^{d+1}({\bf F}_2^n)} = 1}$; but one has

$\displaystyle \langle f, e(\phi) \rangle_{L^2({\bf F}_2^n)} = o_{n \rightarrow \infty; d}(1)$

uniformly for all classical polynomials ${\phi}$ of degree less than ${2^{d-1}}$, where ${o_{n \rightarrow \infty; d}(1)}$ is bounded in magnitude by a quantity that goes to zero as ${n \rightarrow \infty}$ for each fixed ${d}$.

Proof: We first prove the polynomiality of ${L/2^d \hbox{ mod } 1}$. Let ${x \mapsto |x|}$ be the obvious map from ${{\bf F}_2}$ to ${\{0,1\}}$, thus

$\displaystyle L = \sum_{i=1}^n |x_i|.$

By linearity, it will suffice to show that each function ${|x_i| \hbox{ mod } 2^d}$ is a polynomial of degree at most ${d}$. But one easily verifies that for any ${h \in {\bf F}_2^n}$, ${\partial_h |x_i|}$ is equal to zero when ${h_i=0}$ and equal to ${1-2|x_i|}$ when ${h_i=1}$. Iterating this observation ${d}$ times, we obtain the claim.

Now let ${\phi}$ be a classical polynomial of degree less than ${2^{d-1}}$. By Lemma 5, we can partition ${{\bf F}_2^n}$ into affine coordinate subspaces ${W}$ of dimension at least ${\omega_d(n)}$ such that ${\phi}$ is a linear combination of ${S_0,\ldots,S_{2^{d-1}-1}}$ on each such subspace. By the pigeonhole principle, we thus can find such a ${W}$ such that

$\displaystyle |\langle f, e(\phi) \rangle_{L^2({\bf F}_2^n)}| \leq |\langle f, e(\phi) \rangle_{L^2(W)}|.$

On the other hand, from Exercise 4, the function ${\phi}$ on ${W}$ depends only on ${L \hbox{ mod } 2^{d-1}}$. Now, as ${\hbox{dim}(W) \rightarrow \infty}$, the function ${L \hbox{ mod } 2^d}$ (which is essentially the distribution function of a simple random walk of length ${\hbox{dim}(V)}$ on ${{\bf Z}/2^d{\bf Z}}$) becomes equidistributed; in particular, for any ${a \in {\bf Z}/2^d{\bf Z}}$, the function ${f}$ will take the values ${e(a/2^d)}$ and ${-e(a/2^d)}$ with asymptotically equal frequency on ${W}$, whilst ${\phi}$ remains unchanged. As such we see that ${|\langle f, e(\phi) \rangle_{L^2(W)}| \rightarrow 0}$ as ${\hbox{dim}(W) \rightarrow \infty}$, and thus as ${n \rightarrow \infty}$, and the claim follows. $\Box$

Exercise 5 With the same setup as the previous proposition, show that ${\|e(S_{2^{d-1}}/2)\|_{U^{d+1}({\bf F}_2^n)} \gg 1}$, but that ${\langle e( S^{2^{d-1}}/2 ), e(\phi) \rangle_{L^2({\bf F}_2^n)} = o_{n \rightarrow \infty; d}(1)}$ for all classical polynomials ${\phi}$ of degree less than ${2^{d-1}}$.

— 3. The ${1\%}$ inverse theorem: sketches of a proof —

The proof of Theorem 3 is rather difficult once ${d \geq 2}$; even the ${d=2}$ case is not particularly easy. However, the arguments still have the same cohomological flavour encountered in the ${99\%}$ theory. We will not give full proofs of this theorem here, but indicate some of the main ideas.

We begin by discussing (quite non-rigorously) the significantly simpler (but still non-trivial) ${d=2}$ case, established by Ben Green and myself. Unsurprisingly, we will take advantage of the ${d=1}$ case of the theorem as an induction hypothesis.

Let ${V = {\bf F}^n}$ for some field ${{\bf F}}$ of characteristic greater than ${2}$, and ${f}$ be a function with ${\|f\|_{L^\infty(V)} \leq 1}$ and ${\|f\|_{U^3(V)} \gg 1}$. We would like to show that ${f}$ correlates with a quadratic phase function ${e(\phi)}$ (due to the characteristic hypothesis, we may take ${\phi}$ to be classical), in the sense that ${|\langle f, e(\phi) \rangle_{L^2(V)}| \gg 1}$.

We expand ${\|f\|_{U^3(V)}^8}$ as ${\mathop{\bf E}_{h \in V} \| \Delta_h f \|_{U^2(V)}^4}$. By the pigeonhole principle, we conclude that

$\displaystyle \| \Delta_h f \|_{U^2(V)} \gg 1$

for “many” ${h\in V}$, where by “many” we mean “a proportion of ${\gg 1}$“. Applying the ${U^2}$ inverse theorem, we conclude that for many ${h}$, that there exists a linear polynomial ${\phi_h: V \rightarrow {\bf F}}$ (which we may as well take to be classical) such that

$\displaystyle |\langle \Delta_h f, e(\phi_h) \rangle_{L^2(V)}| \gg 1.$

This should be compared with the ${99\%}$ theory. There, we were able to force ${\Delta_h f}$ close to ${e(\phi_h)}$ for most ${h}$; here, we only have the weaker statement that ${\Delta_h f}$ correlates with ${e(\phi_h)}$ for many (not most) ${h}$. Still, we will keep going. In the ${99\%}$ theory, we were able to assume ${f}$ had magnitude ${1}$, which made the cocycle equation ${\Delta_{h+k} f = (\Delta_h f) T^h \Delta_k f}$ available; this then forced an approximate cocycle equation ${\phi_{h+k} \approx \phi_h + T^h \phi_k}$ for most ${h,k}$ (indeed, we were able to use this trick to upgrade “most” to “all”).

This doesn’t quite work in the ${1\%}$ case. Firstly, ${f}$ need not have magnitude exactly equal to ${1}$. This is not a terribly serious problem, but the more important difficulty is that correlation, unlike the property of being close, is not transitive or multiplicative: just because ${\Delta_h f}$ correlates with ${e(\phi_h)}$, and ${T^h \Delta_k f}$ correlates with ${T^h e(\phi_k)}$, one cannot then conclude that ${\Delta_{h+k} f = (\Delta_h f) T^h \Delta_k f}$ correlates with ${e(\phi_h) T^h e(\phi_k)}$; and even if one had this, and if ${\Delta_{h+k} f}$ correlated with ${e(\phi_{h+k})}$, one could not conclude that ${e(\phi_{h+k})}$ correlated with ${e(\phi_h) T^h e(\phi_k)}$.

Despite all these obstacles, it is still possible to extract something resembling a cocycle equation for the ${\phi_h}$, by means of the Cauchy-Schwarz inequality. Indeed, we have the following remarkable observation of Gowers:

Lemma 7 (Gowers’ Cauchy-Schwarz argument) Let ${V}$ be a finite additive group, and let ${f: V \rightarrow {\bf C}}$ be a function, bounded by ${1}$. Let ${H \subset V}$ be a subset with ${|H| \gg |V|}$, and suppose that for each ${h \in H}$, suppose that we have a function ${\chi_h: V \rightarrow {\bf C}}$ bounded by ${1}$, such that

$\displaystyle |\langle \Delta_h f, \chi_h \rangle_{L^2(V)}| \gg 1$

uniformly in ${h}$. Then there exist ${\gg |V|^3}$ quadruples ${h_1,h_2,h_3,h_4 \in H}$ with ${h_1+h_2=h_3+h_4}$ such that

$\displaystyle |\mathop{\bf E}_{x \in V} \chi_{h_1}(x) \chi_{h_2}(x+h_1-h_4) \overline{\chi_{h_3}}(x) \overline{\chi_{h_4}}(x+h_1-h_4)| \gg 1$

We shall refer to quadruples ${(h_1,h_2,h_3,h_4)}$ obeying the relation ${h_1+h_2=h_3+h_4}$ as additive quadruples.

Proof: We extend ${\chi_h}$ to be zero when ${h}$ lies outside of ${H}$. Then we have

$\displaystyle \mathop{\bf E}_{h \in V} |\langle \Delta_h f, \chi_h \rangle_{L^2(V)}|^2 \gg 1.$

We expand the left-hand side as

$\displaystyle \mathop{\bf E}_{h,x,k \in V} f(x+h) \overline{f(x)} f(x+k) \overline{f(x+h+k)} \overline{\chi_h(x)} \chi_h(x+k);$

setting ${y := x+h}$, this becomes

$\displaystyle \mathop{\bf E}_{k,x,y \in V} \Delta_k f(x) \overline{\Delta_k f(y)} \overline{\Delta_k \chi_{y-x}(x)}.$

From the pigeonhole principle, we conclude that for many values of ${k}$, one has

$\displaystyle |\mathop{\bf E}_{x,y \in V} \Delta_k f(x) \overline{\Delta_k f(y)} \overline{\Delta_k \chi_{y-x}(x)}| \gg 1.$

Performing Cauchy-Schwarz once in ${x}$ and once in ${y}$ to eliminate the ${f}$ factors, and then re-averaging in ${k}$, we conclude that

$\displaystyle \mathop{\bf E}_{k \in V} \mathop{\bf E}_{x,x',y,y' \in V} \overline{\Delta_k \chi_{y-x}(x)} \Delta_k \chi_{y'-x}(x) \Delta_k \chi_{y-x'}(x') \overline{\Delta_k \chi_{y'-x'}(x')} \gg 1.$

Setting ${(h_1,h_2,h_3,h_4)}$ to be the additive quadruple ${(y'-x, y-x', y-x, y'-x')}$ we obtain

$\displaystyle \mathop{\bf E}_{h_1+h_2=h_3+h_4} \mathop{\bf E}_{k,x \in V} \Delta_k \chi_{h_1}(x) \Delta_k \chi_{h_2}(x+h_1-h_4) \Delta_k \overline{\chi_{h_3}}(x) \Delta_k \overline{\chi_{h_4}}(x+h_1-h_4)$

$\displaystyle \gg 1.$

Performing the ${k, x}$ averages we obtain

$\displaystyle \mathop{\bf E}_{h_1+h_2=h_3+h_4} |\mathop{\bf E}_{x \in V} \chi_{h_1}(x) \chi_{h_2}(x+h_1-h_4) \overline{\chi_{h_3}}(x) \overline{\chi_{h_4}}(x+h_1-h_4)|^2 \gg 1,$

and the claim follows (note that for the quadruples obeying the stated lower bound, ${h_1,h_2,h_3,h_4}$ must lie in ${H}$). $\Box$

Applying this lemma to our current situation, we find many additive quadruples ${(h_1,h_2,h_3,h_4)}$ for which

$\displaystyle |\mathop{\bf E}_{x \in V} e( \phi_{h_1}(x) + \phi_{h_2}(x+h_1-h_4) - \phi_{h_3}(x) - \phi_{h_4}(x+h_1-h_4) )| \gg 1.$

In particular, by the equidistribution theory in Notes 4, the polynomial ${\phi_{h_1} + \phi_{h_2} - \phi_{h_3} - \phi_{h_4}}$ is low rank.

The above discussion is valid in any value of ${d \geq 2}$, but is particularly simple when ${d=2}$, as the ${\phi_h}$ are now linear, and so ${\phi_{h_1} + \phi_{h_2} - \phi_{h_3} - \phi_{h_4}}$ is now constant. Writing ${\phi_h(x) = \xi_h \cdot x + \theta_h}$ for some ${\xi_h \in V}$ using the standard dot product on ${V}$, and some (irrelevant) constant term ${\theta_h \in {\bf F}}$, we conclude that

$\displaystyle \xi_{h_1} + \xi_{h_2} = \xi_{h_3} + \xi_{h_4} \ \ \ \ \ (3)$

for many additive quadruples ${h_1,h_2,h_3,h_4}$.

We now have to solve an additive combinatorics problem, namely to classify the functions ${h \mapsto \xi_h}$ from ${V}$ to ${V}$ which are “${1\%}$ affine linear” in the sense that the property (3) holds for many additive quadruples; equivalently, the graph ${\{ (h, \xi_h): h \in H \}}$ in ${V \times V}$ has high “additive energy”, defined as the number of additive quadruples that it contains. An obvious example of a function with this property is an affine-linear function ${\xi_h = Mh + \xi_0}$, where ${M: V \rightarrow V}$ is a linear transformation and ${\xi_0 \in V}$. As it turns out, this is essentially the only example:

Proposition 8 (Balog-Szemerédi-Gowers-Freiman theorem for vector spaces) Let ${H \subset V}$, and let ${h \mapsto \xi_h}$ be a map from ${H}$ to ${V}$ such that (3) holds for ${\gg |V|^3}$ additive quadruples in ${H}$. Then there exists an affine function ${h \mapsto Mh + \xi_0}$ such that ${\xi_h = Mh + \xi_0}$ for ${\gg |V|}$ values of ${h}$ in ${H}$.

This proposition is a consequence of standard results in additive combinatorics, in particular the Balog-Szemerédi-Gowers lemma and Freiman’s theorem for vector spaces; see Section 11.3 of my book with Van for further discussion. The proof is elementary but a little lengthy and would take us too far afield, so we simply assume this proposition for now and keep going. We conclude that

$\displaystyle |\mathop{\bf E}_{x \in V} \Delta_h f(x) e( Mh \cdot x ) e( \xi_0 \cdot x )| \gg 1 \ \ \ \ \ (4)$

for many ${h \in V}$.

The most difficult term to deal with here is the quadratic term ${Mh \cdot x}$. To deal with this term, suppose temporarily that ${M}$ is symmetric, thus ${Mh \cdot x = Mx \cdot h}$. Then (since we are in odd characteristic) we can integrate ${Mh \cdot x}$ as

$\displaystyle Mh \cdot x = \partial_h ( \frac{1}{2} Mx \cdot x ) - \frac{1}{2} M h \cdot h$

and thus

$\displaystyle |\mathop{\bf E}_{x \in V} f(x+h) e( \frac{1}{2} M(x+h) \cdot (x+h) ) \overline{f(x)} e( - \frac{1}{2} Mx \cdot x ) e( \xi_0 \cdot x )| \gg 1$

for many ${h \in H}$. Taking ${L^2}$ norms in ${h}$, we conclude that the ${U^2}$ inner product between two copies of ${f(x) e( \frac{1}{2} Mx \cdot x)}$ and two copies of ${f(x) e( \frac{1}{2} Mx \cdot x) e(-\xi_0 \cdot x)}$. Applying the ${U^2}$ Cauchy-Schwarz-Gowers inequality, followed by the ${U^2}$ inverse theorem, we conclude that ${f(x) e(\frac{1}{2} Mx \cdot x)}$ correlates with ${e(\phi)}$ for some linear phase, and thus ${f}$ itself correlates with ${e(\psi)}$ for some quadratic phase.

This argument also works (with minor modification) when ${M}$ is virtually symmetric, in the sense that there exist a bounded index subspace of ${V}$ such that the restriction of the form ${Mh \cdot x}$ to ${V}$ is symmetric, by foliating into cosets of that subspace; we omit the details. On the other hand, if ${M}$ is not virtually symmetric, there is no obvious way to “integrate” the phase ${e(Mh \cdot x)}$ to eliminate it as above. (Indeed, in order for ${Mh \cdot x}$ to be “exact” in the sense that it is the “derivative” of something (modulo lower order terms), e.g. ${Mh \cdot x \approx \partial_h \Phi}$ for some ${\Phi}$, it must first be “closed” in the sense that ${\partial_k(Mh \cdot x) \approx \partial_h(Mk \cdot x)}$ in some sense, since we have ${\partial_h \partial_k = \partial_k \partial_h}$; thus we again see the emergence of cohomological concepts in the background.)

To establish the required symmetry on ${M}$, we return to Gowers’ Cauchy-Schwarz argument from Lemma 7, and tweak it slightly. We start with (4) and rewrite it as

$\displaystyle |\mathop{\bf E}_{x \in V} f(x+h) f'(x) e(Mh \cdot x)| \gg 1$

where ${f'(x) := \overline{f(x)} e(\xi_0 \cdot x)}$. We square-average this in ${h}$ to obtain

$\displaystyle |\mathop{\bf E}_{x,y,h \in V} f(x+h) f'(x) \overline{f(y+h)} \overline{f'(y)} e(Mh \cdot (x-y))| \gg 1.$

Now we make the somewhat unusual substitution ${z=x+y+h}$ to obtain

$\displaystyle |\mathop{\bf E}_{x,y,z \in V} f(z-y) f'(x) \overline{f(z-x)} \overline{f'(y)} e(M(z-x-y) \cdot (x-y))| \gg 1.$

Thus there exists ${z}$ such that

$\displaystyle |\mathop{\bf E}_{x,y \in V} f(z-y) f'(x) \overline{f(z-x)} \overline{f'(y)} e(M(z-x-y) \cdot (x-y))| \gg 1.$

We collect all terms that depend only on ${x}$ (and ${z}$) or only on ${y}$ (and ${z}$) to obtain

$\displaystyle |\mathop{\bf E}_{x,y \in V} f_{z,1}(x) f_{z,2}(y) e(M x \cdot y - M y \cdot x)| \gg 1$

for some bounded functions ${f_{z,1}, f_{z,2}}$. Eliminating these functions by two applications of Cauchy-Schwarz, we obtain

$\displaystyle |\mathop{\bf E}_{x,y,x',y' \in V} e(M (x-x') \cdot (y-y') - M (y-y') \cdot (x-x'))| \gg 1$

or, on making the change of variables ${a := x-x', b := y-y'}$,

$\displaystyle |\mathop{\bf E}_{a,b \in V} e(M a \cdot b - Mb \cdot a )| \gg 1.$

Using equidistribution theory, this means that the quadratic form ${(a,b) \mapsto M a \cdot b - M b \cdot a}$ is low rank, which easily implies that ${M}$ is virtually symmetric.

Now we turn to the general ${d}$ case. In principle, the above argument should still work, say for ${d=3}$. The main sticking point is that instead of dealing with a vector-valued function ${h \mapsto \xi_h}$ that is approximately linear in the sense that (3) holds for many additive quadruples, in the ${d=3}$ case one is now faced with a \xi_{h_1}

$\displaystyle M_{h_1} + M_{h_2} = M_{h_3} + M_{h_4} + LR_{h_1,h_2,h_3,h_4}$

for many additive quadruples ${h_1,h_2,h_3,h_4}$, where the matrix ${LR_{h_1,h_2,h_3,h_4}}$ has bounded rank. With our current level of additive combinatorics technology, we are not able to deal properly with this bounded rank error (the main difficulty being that the set of low rank matrices has no good “doubling” properties). Because of this obstruction, no generalisation of the above arguments to higher ${d}$ has been found.

There is however another approach, based ultimately on the ergodic theory work of Host-Kra and of Ziegler, that can handle the general ${d}$ case, which was worked out in two papers, one by myself and Ziegler, and one by Bergelson, Ziegler, and myself. It turns out that it is convenient to phrase these arguments in the language of ergodic theory. However, in order not to have to introduce too much additional material, I will try to describe the arguments here in the case ${d=3}$ without explicitly using ergodic theory notation. To do this, though, I will have to sacrifice a lot of rigour and only work with some illustrative special cases rather than the general case, and also use somewhat vague terminology (e.g. “general position” or “low rank”).

To simplify things further, we will establish the ${U^3}$ inverse theorem only for a special type of function, namely a quartic phase ${e( \phi )}$, where ${\phi: V \rightarrow {\bf F}}$ is a classical polynomial of degree ${4}$. (A good example to keep in mind is the symmetric polynomial phase ${e(S_2/2)}$, though one has to take some care with this example due to the low characteristic.) The claim to show then is that if ${\|e(\phi)\|_{U^3(V)} \gg 1}$, then ${e(\phi)}$ correlates with a cubic phase. In the high characteristic case ${p > 4}$, this result can be handled by equidistribution theory. Indeed, since

$\displaystyle \|e(\phi)\|_{U^3(V)}^8 = \mathop{\bf E}_{x,h_1,h_2,h_3,h_4} e( \partial_{h_1} \partial_{h_2} \partial_{h_3} \partial_{h_4} \phi(x) ),$

that theory tells us that the quartic polynomial ${(x,h_1,h_2,h_3,h_4) \mapsto \partial_{h_1} \partial_{h_2} \partial_{h_3} \partial_{h_4} \phi(x)}$ is low rank. On the other hand, in high characteristic one has the Taylor expansion

$\displaystyle \phi(x) = \frac{1}{4!} \partial_x \partial_x \partial_x \partial_x \phi(0) + Q(x)$

for some cubic function ${Q}$ (as can be seen for instance by decomposing into monomials). From this we easily conclude that ${\phi}$ itself has low rank (i.e. it is a function of boundedly many cubic (or lower degree) polynomials), at which point it is easy to see from Fourier analysis that ${e(\phi)}$ will correlate with the exponential of a polynomial of degree at most ${3}$.

Now we present a different argument that relies slightly less on the quartic nature of ${\phi}$; it is a substantially more difficult argument, and we will skip some steps here to simplify the exposition, but the argument happens to extend to more general situations. As ${\|e(\phi)\|_{U^3} \gg 1}$, we have ${\| \Delta_h e(\phi) \|_{U^2} \gg 1}$ for many ${h}$, thus by the inverse ${U^2}$ theorem, ${\Delta_h e(\phi) = e(\partial_h \phi)}$ correlates with a quadratic phase. Using equidistribution theory, we conclude that the cubic polynomial ${\partial_h \phi}$ is low rank.

At present, the low rank property for ${\partial_h \phi}$ is only true for many ${h}$. But from the cocycle identity

$\displaystyle \partial_{h+k} \phi = \partial_h \phi + T^h \partial_k \phi, \ \ \ \ \ (5)$

we see that if ${\partial_h \phi}$ and ${\partial_k \phi}$ are both low rank, then so is ${\partial_{h+k} \phi}$; thus the property of ${\partial_h \phi}$ being low rank is in some sense preserved by addition. Using this and a bit of additive combinatorics, one can conclude that ${\partial_h \phi}$ is low rank for all ${h}$ in a bounded index subspace of ${V}$; restricting to that subspace, we will now assume that ${\partial_h \phi}$ is low rank for all ${h \in V}$. Thus we have

$\displaystyle \partial_h \phi = F_h( \vec Q_h )$

where ${\vec Q_h}$ is some bounded collection of quadratic polynomials for each ${h}$, and ${F_h}$ is some function. To simplify the discussion, let us pretend that ${\vec Q_h}$ in fact consists of just a single quadratic ${Q_h}$, plus some linear polynomials ${\vec L_h}$, thus

$\displaystyle \partial_h \phi = F_h( Q_h, \vec L_h ) \ \ \ \ \ (6)$

There are two extreme cases to consider, depending on how ${Q_h}$ depends on ${h}$. Consider first a “core” case when ${Q_h = Q}$ is independent of ${h}$. Thus

$\displaystyle \partial_h \phi = F_h( Q, \vec L_h ) \ \ \ \ \ (7)$

If ${Q}$ is low rank, then we can absorb it into the ${L_h}$ factors, so suppose instead thaat ${Q}$ is high rank, and thus equidistributed even after fixing the values of ${L_h}$.

The function ${\partial_h \phi}$ is cubic, and ${Q}$ is a high rank quadratic. Because of this, the function ${F'(Q,L_h)}$ must be at most linear in the ${Q}$ variable; this can be established by another application of equidistribution theory, see Section 8 of this paper of Ben and myself; thus one can factorise

$\displaystyle \partial_h \phi = Q F'_h( L_h ) + F''_h(L_h)$

for some functions ${F'_h, F''_h}$. In fact, as ${\partial_h \phi}$ is cubic, ${F'_h}$ must be linear, while ${F''_h}$ is cubic.

By comparing the ${Q}$ coefficients ${F''_h(L_h)}$ in the cocycle equation (5), we see that the function ${\rho_h := F''_h(L_h)}$ is itself a cocycle:

$\displaystyle \rho_{h+k} = \rho_h + T^h \rho_k.$

As a consequence, we have ${\rho_h = \partial_h R}$ for some function ${R: V \rightarrow {\bf R}/{\bf Z}}$. Since ${\rho_h}$ is linear, ${R}$ is quadratic; thus we have

$\displaystyle \partial_h \phi = Q \partial_h R + F''_h(L_h). \ \ \ \ \ (8)$

With a high characteristic assumption ${p > 2}$, one can ensure ${R}$ is classical. We will assume that ${R}$ is high rank, as this is the most difficult case.

Suppose first that ${Q=R}$. In high characteristic, one can then integrate ${Q \partial_h Q}$ by expressing this as ${\partial_h (\frac{1}{2} Q^2 )}$ plus lower order terms, thus ${\partial_h (\phi - \frac{1}{2} Q^2 )}$ is an order ${1}$ function in the sense that it is a function of a bounded number of linear functions. In particular, ${e( \partial_h (\phi - \frac{1}{2} Q^2 ) )}$ has a large ${U^2}$ norm for all ${h}$, which implies that ${e( \phi - \frac{1}{2} Q^2 )}$ has a large ${U^3}$ norm, and thus correlates with a quadratic phase. Since ${e( \frac{1}{2} Q^2 )}$ can be decomposed by Fourier analysis into a linear combination of quadratic phases, we conclude that ${e(\phi)}$ correlates with a quadratic phase and one is thus done in this case.

Now consider the other extreme, in which ${Q}$ and ${R}$ lie in general position. Then, if we differentiate (8) in ${k}$, we obtain one has

$\displaystyle \partial_k \partial_h \phi = \partial_k Q \partial_h R + Q \partial_k \partial_h R + \partial_k Q (\partial_k \partial_h R) + \partial_k F''_h(L_h),$

and then anti-symmetrising in ${k,h}$ one has

$\displaystyle 0 = \partial_k Q \partial_h R - \partial_h Q \partial_k R + (\partial_k Q - \partial_h Q) \partial_k \partial_h R + \partial_k F''_h(L_h) - \partial_h F''_k(L_h).$

If ${Q}$ and ${R}$ are unrelated, then the linear forms ${\partial_k Q, \partial_k R}$ will typically be in general position with respect to each other and with ${L_h}$, and similarly ${\partial_h Q, \partial_h R}$ will be in general position with respect to each other and with ${L_k}$. From this, one can show that the above equation is not satisfiable generically, because the mixed terms ${\partial_k Q \partial_h R - \partial_h Q \partial_k R}$ cannot be cancelled by the simpler terms in the above expression.

An interpolation of the above two arguments can handle the case in which ${Q_h}$ does not depend on ${h}$. Now we consider the other extreme, in which ${Q_h}$ varies in ${h}$, so that ${Q_h}$ and ${Q_k}$ are in general position for generic ${h,k}$, and similarly for ${Q_h}$ and ${Q_{h+k}}$, or for ${Q_k}$ and ${Q_{h+k}}$. (Note though that we cannot simultaneously assume that ${Q_h, Q_k, Q_{h+k}}$ are in general position; indeed, ${Q_h}$ might vary linearly in ${h}$, and indeed we expect this to be the basic behaviour of ${Q_h}$ here, as was observed in the preceding argument.)

$\displaystyle F_{h+k}( Q_{h+k}, \vec L_{h+k} ) = F_{h}( Q_{h}, \vec L_{h} ) + T^h F_{k}( Q_{k}, \vec L_{k} ). \ \ \ \ \ (9)$

Because any two of ${Q_{h+k}, Q_h, Q_k}$ can be assumed to be in general position, one can show using equidistribution theory that the above equation can only be satisfied when the ${F_h}$ are linear in the ${Q_h}$ variable, thus

$\displaystyle \partial_h \phi = Q_h F'_h(\vec L_h) + F''_h(\vec L_h)$

much as before. Furthermore, the coefficients ${F'_h(\vec L_h)}$ must now be (essentially) constant in ${h}$ in order to obtain (9). Absorbing this constant into the definition of ${Q_h}$, we now have

$\displaystyle \partial_h \phi = Q_h + F''_h(\vec L_h).$

We will once again pretend that ${\vec L_h}$ is just a single linear form ${L_h}$. Again we consider two extremes. If ${L_h = L}$ is independent of ${h}$, then by passing to a bounded index subspace (the level set of ${L}$) we now see that ${\partial_h \phi}$ is quadratic, hence ${\phi}$ is cubic, and we are done. Now suppose instead that ${L_h}$ varies in ${h}$, so that ${L_h, L_k}$ are in general position for generic ${h, k}$. We look at the cocycle equation again, which now tells us that ${F''_h(\vec L_h)}$ obeys the quasicocycle condition

$\displaystyle Q_{h,k} + F''_{h+k}(\vec L_{h+k}) = F''_{h}(\vec L_{h}) + T^h F''_{k}(\vec L_{k})$

where ${Q_{h,k} := Q_{h+k} - Q_h - T^h Q_k}$ is a quadratic polynomial. With any two of ${L_h, L_k, L_{h+k}}$ in general position, one can then conclude (using equidistribution theory) that ${F''_h, F''_k, F''_{h+k}}$ are quadratic polynomials. Thus ${\partial_h \phi}$ is quadratic, and ${\phi}$ is cubic as before. This completes the heuristic discussion of various extreme model cases; the general case is handled by a rather complicated combination of all of these special case methods, and is best performed in the framework of ergodic theory (in particular, the idea of extracting out the coefficient of a key polynomial, such as the coerfficient ${F'_h(L_h)}$ of ${Q}$, is best captured by the ergodic theory concept of vertical differentiation); see this paper of Bergelson, Ziegler, and myself.

— 4. Consequences of the Gowers inverse conjecture —

We now discuss briefly some of the consequences of the Gowers inverse conjecture, beginning with Szemerédi’s theorem in vector fields (Theorem 4). We will use the density increment method (an energy increment argument is also possible, but is more complicated; see this paper). Let ${A \subset V = {\bf F}^n}$ be a set of density at least ${\delta}$ containing no lines. This implies that the ${p}$-linear form

$\displaystyle \Lambda(1_A,\ldots,1_A) := \mathop{\bf E}_{x, r \in {\bf F}^n} 1_A(x) \ldots 1_A(x+(p-1)r)$

has size ${o(1)}$. On the other hand, as this pattern has complexity ${p-2}$, one has from Notes 3 the bound

$\displaystyle |\Lambda(f_0, \ldots,f_{p-1})| \leq \sup_{0 \leq j \leq p-1} \|f_j\|_{U^{p-1}(V)}$

whenever ${f_0,\ldots,f_{p-1}}$ are bounded in magnitude by ${1}$. Splitting ${1_A = \delta + (1_A - \delta)}$, we conclude that

$\displaystyle \Lambda(1_A,\ldots,1_A) = \delta^p + O_p( \|1_A-\delta\|_{U^{p-1}(V)} )$

and thus (for ${n}$ large enough)

$\displaystyle \|1_A-\delta\|_{U^{p-1}(V)} \gg_{p,\delta} 1.$

Applying Theorem 3, we find that there exists a polynomial ${\phi}$ of degree at most ${p-2}$ such that

$\displaystyle |\langle 1_A-\delta, e(\phi) \rangle| \gg_{p,\delta} 1.$

To proceed we need the following analogue of Proposition 5 of Notes 2:

Exercise 6 (Fragmenting a polynomial into subspaces) Let ${\phi: {\bf F}^n \rightarrow {\bf F}}$ be a classical polynomial of degree ${d < p}$. Show that one can partition ${V}$ into affine subspaces ${W}$ of dimension at least ${n'(n,d,p)}$, where ${n' \rightarrow \infty}$ as ${n \rightarrow \infty}$ for fixed ${d,p}$, such that ${\phi}$ is constant on each ${W}$. (Hint: Induct on ${d}$, and use Exercise 6 of Notes 4 repeatedly to find a good initial partition into subspaces on which ${\phi}$ has degree at most ${d-1}$.)

Exercise 7 Use the previous exercise to complete the proof of Theorem 4. (Hint: mimic the density increment argument from Notes 2.)

By using the inverse theorem in place of the Fourier-analytic analogue in Lemma 7 of Notes 2, one obtains the following regularity lemma, analogous to Theorem 10 of Notes 2:

Theorem 9 (Strong arithmetic regularity lemma) Suppose that ${\hbox{char}({\bf F}) = p > d \geq 0}$. Let ${f: V \rightarrow [0,1]}$, let ${\epsilon > 0}$, and let ${F: {\bf R}^+ \rightarrow {\bf R}^+}$ be an arbitrary function. Then we can decompose ${f = f_{str} + f_{sml} + f_{psd}}$ and find ${1 \leq M = O_{\epsilon,F,d,p}(1)}$ such that

• (Nonnegativity) ${f_{str}, f_{str}+f_{sml}}$ take values in ${[0,1]}$, and ${f_{sml}, f_{psd}}$ have mean zero;
• (Structure) ${f_{str}}$ is a function of ${M}$ classical polynomials of degree at most ${d}$;
• (Smallness) ${f_{sml}}$ has an ${L^2(V)}$ norm of at most ${\epsilon}$; and
• (Pseudorandomness) One has ${\| f_{psd} \|_{U^{d+1}(V)} \leq 1/F(M)}$ for all ${\alpha \in {\bf R}}$.

For a proof, see this paper of mine. The argument is similar to that appearing in Theorem 10 of Notes 2, but the discrete nature of polynomials in bounded characteristic allows one to avoid a number of technical issues regarding measurability.

This theorem can then be used for a variety of applications in additive combinatorics. For instance, it gives the following variant of a result of Bergelson, Host, and Kra:

Proposition 10 (Bergelson-Host-Kra type result) Let ${p > 4 \geq k}$, let ${{\bf F} = {\bf F}_p}$, and let ${A \subset {\bf F}^n}$ with ${|A| \geq \delta |{\bf F}^n|}$, and let ${\epsilon > 0}$. Then for ${\gg_{\delta,\epsilon,p} |{\bf F}^n|}$ values of ${h \in {\bf F}^n}$, one has

$\displaystyle |\{ x \in {\bf F}^n: x, x+h, \ldots, x+(k-1)h \in A \}| \geq (\delta^k - \epsilon) |{\bf F}^n|.$

Roughly speaking, the idea is to apply the regularity lemma to ${f := 1_A}$, discard the contribution of the ${f_{sml}}$ and ${f_{psd}}$ errors, and then control the structured component using the equidistribution theory from Notes 4. A proof of this result can be found in this paper of Ben Green; see also this paper of Ben and myself for an analogous result in ${{\bf Z}/N{\bf Z}}$. Curiously, the claim fails when ${4}$ is replaced by any larger number; this is essentially an observation of Ruzsa that appears in the appendix of the paper of Bergelson, Host, and Kra.

The above regularity lemma (or more precisely, a close relative of this lemma) was also used by Gowers and Wolf to determine the true complexity of a linear system:

Theorem 11 (Gowers-Wolf theorem) Let ${\Psi = (\psi_1,\ldots,\psi_t)}$ be a collection of linear forms with integer coefficients, with no two forms being linearly dependent. Let ${{\bf F}}$ have sufficiently large characteristic, and suppose that ${f_1,\ldots,f_t: {\bf F}^n \rightarrow {\bf C}}$ are functions bounded in magnitude by ${1}$ such that

$\displaystyle |\Lambda_\Psi(f_1,\ldots,f_t)| \geq \delta$

where ${\Lambda_\Psi}$ was the form defined in Notes 3. Then for each ${1 \leq i \leq t}$ there exists a classical polynomial ${\phi_i}$ of degree at most ${d}$ such that

$\displaystyle |\langle f_i, e(\phi_i) \rangle_{L^2({\bf F}^n)}| \gg_{d,\Psi,\delta} 1,$

where ${d}$ is the true complexity of the system ${\Psi}$ defined in Notes 3. This ${d}$ is best possible.