You are currently browsing the tag archive for the ‘Gowers uniformity norms’ tag.

The von Neumann ergodic theorem (the Hilbert space version of the mean ergodic theorem) asserts that if ${U: H \rightarrow H}$ is a unitary operator on a Hilbert space ${H}$, and ${v \in H}$ is a vector in that Hilbert space, then one has

$\displaystyle \lim_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N U^n v = \pi_{H^U} v$

in the strong topology, where ${H^U := \{ w \in H: Uw = w \}}$ is the ${U}$-invariant subspace of ${H}$, and ${\pi_{H^U}}$ is the orthogonal projection to ${H^U}$. (See e.g. these previous lecture notes for a proof.) The same proof extends to more general amenable groups: if ${G}$ is a countable amenable group acting on a Hilbert space ${H}$ by unitary transformations ${T^g: H \rightarrow H}$ for ${g \in G}$, and ${v \in H}$ is a vector in that Hilbert space, then one has

$\displaystyle \lim_{N \rightarrow \infty} \mathop{\bf E}_{g \in \Phi_N} T^g v = \pi_{H^G} v \ \ \ \ \ (1)$

for any Folner sequence ${\Phi_N}$ of ${G}$, where ${H^G := \{ w \in H: T^g w = w \hbox{ for all }g \in G \}}$ is the ${G}$-invariant subspace, and ${\mathop{\bf E}_{a \in A} f(a) := \frac{1}{|A|} \sum_{a \in A} f(a)}$ is the average of ${f}$ on ${A}$. Thus one can interpret ${\pi_{H^G} v}$ as a certain average of elements of the orbit ${Gv := \{ T^g v: g \in G \}}$ of ${v}$.

In a previous blog post, I noted a variant of this ergodic theorem (due to Alaoglu and Birkhoff) that holds even when the group ${G}$ is not amenable (or not discrete), using a more abstract notion of averaging:

Theorem 1 (Abstract ergodic theorem) Let ${G}$ be an arbitrary group acting unitarily on a Hilbert space ${H}$, and let ${v}$ be a vector in ${H}$. Then ${\pi_{H^G} v}$ is the element in the closed convex hull of ${Gv := \{ T^g v: g \in G \}}$ of minimal norm, and is also the unique element of ${H^G}$ in this closed convex hull.

I recently stumbled upon a different way to think about this theorem, in the additive case ${G = (G,+)}$ when ${G}$ is abelian, which has a closer resemblance to the classical mean ergodic theorem. Given an arbitrary additive group ${G = (G,+)}$ (not necessarily discrete, or countable), let ${{\mathcal F}}$ denote the collection of finite non-empty multisets in ${G}$ – that is to say, unordered collections ${\{a_1,\dots,a_n\}}$ of elements ${a_1,\dots,a_n}$ of ${G}$, not necessarily distinct, for some positive integer ${n}$. Given two multisets ${A = \{a_1,\dots,a_n\}}$, ${B = \{b_1,\dots,b_m\}}$ in ${{\mathcal F}}$, we can form the sum set ${A + B := \{ a_i + b_j: 1 \leq i \leq n, 1 \leq j \leq m \}}$. Note that the sum set ${A+B}$ can contain multiplicity even when ${A, B}$ do not; for instance, ${\{ 1,2\} + \{1,2\} = \{2,3,3,4\}}$. Given a multiset ${A = \{a_1,\dots,a_n\}}$ in ${{\mathcal F}}$, and a function ${f: G \rightarrow H}$ from ${G}$ to a vector space ${H}$, we define the average ${\mathop{\bf E}_{a \in A} f(a)}$ as

$\displaystyle \mathop{\bf E}_{a \in A} f(a) = \frac{1}{n} \sum_{j=1}^n f(a_j).$

Note that the multiplicity function of the set ${A}$ affects the average; for instance, we have ${\mathop{\bf E}_{a \in \{1,2\}} a = \frac{3}{2}}$, but ${\mathop{\bf E}_{a \in \{1,2,2\}} a = \frac{5}{3}}$.

We can define a directed set on ${{\mathcal F}}$ as follows: given two multisets ${A,B \in {\mathcal F}}$, we write ${A \geq B}$ if we have ${A = B+C}$ for some ${C \in {\mathcal F}}$. Thus for instance we have ${\{ 1, 2, 2, 3\} \geq \{1,2\}}$. It is easy to verify that this operation is transitive and reflexive, and is directed because any two elements ${A,B}$ of ${{\mathcal F}}$ have a common upper bound, namely ${A+B}$. (This is where we need ${G}$ to be abelian.) The notion of convergence along a net, now allows us to define the notion of convergence along ${{\mathcal F}}$; given a family ${x_A}$ of points in a topological space ${X}$ indexed by elements ${A}$ of ${{\mathcal F}}$, and a point ${x}$ in ${X}$, we say that ${x_A}$ converges to ${x}$ along ${{\mathcal F}}$ if, for every open neighbourhood ${U}$ of ${x}$ in ${X}$, one has ${x_A \in U}$ for sufficiently large ${A}$, that is to say there exists ${B \in {\mathcal F}}$ such that ${x_A \in U}$ for all ${A \geq B}$. If the topological space ${V}$ is Hausdorff, then the limit ${x}$ is unique (if it exists), and we then write

$\displaystyle x = \lim_{A \rightarrow G} x_A.$

When ${x_A}$ takes values in the reals, one can also define the limit superior or limit inferior along such nets in the obvious fashion.

We can then give an alternate formulation of the abstract ergodic theorem in the abelian case:

Theorem 2 (Abelian abstract ergodic theorem) Let ${G = (G,+)}$ be an arbitrary additive group acting unitarily on a Hilbert space ${H}$, and let ${v}$ be a vector in ${H}$. Then we have

$\displaystyle \pi_{H^G} v = \lim_{A \rightarrow G} \mathop{\bf E}_{a \in A} T^a v$

in the strong topology of ${H}$.

Proof: Suppose that ${A \geq B}$, so that ${A=B+C}$ for some ${C \in {\mathcal F}}$, then

$\displaystyle \mathop{\bf E}_{a \in A} T^a v = \mathop{\bf E}_{c \in C} T^c ( \mathop{\bf E}_{b \in B} T^b v )$

so by unitarity and the triangle inequality we have

$\displaystyle \| \mathop{\bf E}_{a \in A} T^a v \|_H \leq \| \mathop{\bf E}_{b \in B} T^b v \|_H,$

thus ${\| \mathop{\bf E}_{a \in A} T^a v \|_H^2}$ is monotone non-increasing in ${A}$. Since this quantity is bounded between ${0}$ and ${\|v\|_H}$, we conclude that the limit ${\lim_{A \rightarrow G} \| \mathop{\bf E}_{a \in A} T^a v \|_H^2}$ exists. Thus, for any ${\varepsilon > 0}$, we have for sufficiently large ${A}$ that

$\displaystyle \| \mathop{\bf E}_{b \in B} T^b v \|_H^2 \geq \| \mathop{\bf E}_{a \in A} T^a v \|_H^2 - \varepsilon$

for all ${B \geq A}$. In particular, for any ${g \in G}$, we have

$\displaystyle \| \mathop{\bf E}_{b \in A + \{0,g\}} T^b v \|_H^2 \geq \| \mathop{\bf E}_{a \in A} T^a v \|_H^2 - \varepsilon.$

We can write

$\displaystyle \mathop{\bf E}_{b \in A + \{0,g\}} T^b v = \frac{1}{2} \mathop{\bf E}_{a \in A} T^a v + \frac{1}{2} T^g \mathop{\bf E}_{a \in A} T^a v$

and so from the parallelogram law and unitarity we have

$\displaystyle \| \mathop{\bf E}_{a \in A} T^a v - T^g \mathop{\bf E}_{a \in A} T^a v \|_H^2 \leq 4 \varepsilon$

for all ${g \in G}$, and hence by the triangle inequality (averaging ${g}$ over a finite multiset ${C}$)

$\displaystyle \| \mathop{\bf E}_{a \in A} T^a v - \mathop{\bf E}_{b \in A+C} T^b v \|_H^2 \leq 4 \varepsilon$

for any ${C \in {\mathcal F}}$. This shows that ${\mathop{\bf E}_{a \in A} T^a v}$ is a Cauchy sequence in ${H}$ (in the strong topology), and hence (by the completeness of ${H}$) tends to a limit. Shifting ${A}$ by a group element ${g}$, we have

$\displaystyle \lim_{A \rightarrow G} \mathop{\bf E}_{a \in A} T^a v = \lim_{A \rightarrow G} \mathop{\bf E}_{a \in A + \{g\}} T^a v = T^g \lim_{A \rightarrow G} \mathop{\bf E}_{a \in A} T^a v$

and hence ${\lim_{A \rightarrow G} \mathop{\bf E}_{a \in A} T^a v}$ is invariant under shifts, and thus lies in ${H^G}$. On the other hand, for any ${w \in H^G}$ and ${A \in {\mathcal F}}$, we have

$\displaystyle \langle \mathop{\bf E}_{a \in A} T^a v, w \rangle_H = \mathop{\bf E}_{a \in A} \langle v, T^{-a} w \rangle_H = \langle v, w \rangle_H$

and thus on taking strong limits

$\displaystyle \langle \lim_{A \rightarrow G} \mathop{\bf E}_{a \in A} T^a v, w \rangle_H = \langle v, w \rangle_H$

and so ${v - \lim_{A \rightarrow G} \mathop{\bf E}_{a \in A} T^a v}$ is orthogonal to ${H^G}$. Combining these two facts we see that ${\lim_{A \rightarrow G} \mathop{\bf E}_{a \in A} T^a v}$ is equal to ${\pi_{H^G} v}$ as claimed. $\Box$

To relate this result to the classical ergodic theorem, we observe

Lemma 3 Let ${G}$ be a countable additive group, with a F{\o}lner sequence ${\Phi_n}$, and let ${f_g}$ be a bounded sequence in a normed vector space indexed by ${G}$. If ${\lim_{A \rightarrow G} \mathop{\bf E}_{a \in A} f_a}$ exists, then ${\lim_{n \rightarrow \infty} \mathop{\bf E}_{a \in \Phi_n} f_a}$ exists, and the two limits are equal.

Proof: From the F{\o}lner property, we see that for any ${A}$ and any ${\varepsilon>0}$, the averages ${\mathop{\bf E}_{a \in \Phi_n} f_a}$ and ${\mathop{\bf E}_{a \in A+\Phi_n} f_a}$ differ by at most ${\varepsilon}$ in norm if ${n}$ is sufficiently large depending on ${A}$, ${\varepsilon}$ (and the ${f_a}$). On the other hand, by the existence of the limit ${\lim_{A \rightarrow G} \mathop{\bf E}_{a \in A} f_a}$, the averages ${\mathop{\bf E}_{a \in A} f_a}$ and ${\mathop{\bf E}_{a \in A + \Phi_n} f_a}$ differ by at most ${\varepsilon}$ in norm if ${A}$ is sufficiently large depending on ${\varepsilon}$ (regardless of how large ${n}$ is). The claim follows. $\Box$

It turns out that this approach can also be used as an alternate way to construct the GowersHost-Kra seminorms in ergodic theory, which has the feature that it does not explicitly require any amenability on the group ${G}$ (or separability on the underlying measure space), though, as pointed out to me in comments, even uncountable abelian groups are amenable in the sense of possessing an invariant mean, even if they do not have a F{\o}lner sequence.

Given an arbitrary additive group ${G}$, define a ${G}$-system ${({\mathrm X}, T)}$ to be a probability space ${{\mathrm X} = (X, {\mathcal X}, \mu)}$ (not necessarily separable or standard Borel), together with a collection ${T^g: X \rightarrow X}$ of invertible, measure-preserving maps, such that ${T^0}$ is the identity and ${T^g T^h = T^{g+h}}$ (modulo null sets) for all ${g,h \in G}$. This then gives isomorphisms ${T^g: L^p({\mathrm X}) \rightarrow L^p({\mathrm X})}$ for ${1 \leq p \leq \infty}$ by setting ${T^g f(x) := f(T^{-g} x)}$. From the above abstract ergodic theorem, we see that

$\displaystyle {\mathbf E}( f | {\mathcal X}^G ) = \lim_{A \rightarrow G} \mathop{\bf E}_{a \in A} T^g f$

in the strong topology of ${L^2({\mathrm X})}$ for any ${f \in L^2({\mathrm X})}$, where ${{\mathcal X}^G}$ is the collection of measurable sets ${E}$ that are essentially ${G}$-invariant in the sense that ${T^g E = E}$ modulo null sets for all ${g \in G}$, and ${{\mathbf E}(f|{\mathcal X}^G)}$ is the conditional expectation of ${f}$ with respect to ${{\mathcal X}^G}$.

In a similar spirit, we have

Theorem 4 (Convergence of Gowers-Host-Kra seminorms) Let ${({\mathrm X},T)}$ be a ${G}$-system for some additive group ${G}$. Let ${d}$ be a natural number, and for every ${\omega \in\{0,1\}^d}$, let ${f_\omega \in L^{2^d}({\mathrm X})}$, which for simplicity we take to be real-valued. Then the expression

$\displaystyle \langle (f_\omega)_{\omega \in \{0,1\}^d} \rangle_{U^d({\mathrm X})} := \lim_{A_1,\dots,A_d \rightarrow G}$

$\displaystyle \mathop{\bf E}_{h_1 \in A_1-A_1,\dots,h_d \in A_d-A_d} \int_X \prod_{\omega \in \{0,1\}^d} T^{\omega_1 h_1 + \dots + \omega_d h_d} f_\omega\ d\mu$

converges, where we write ${\omega = (\omega_1,\dots,\omega_d)}$, and we are using the product direct set on ${{\mathcal F}^d}$ to define the convergence ${A_1,\dots,A_d \rightarrow G}$. In particular, for ${f \in L^{2^d}({\mathrm X})}$, the limit

$\displaystyle \| f \|_{U^d({\mathrm X})}^{2^d} = \lim_{A_1,\dots,A_d \rightarrow G}$

$\displaystyle \mathop{\bf E}_{h_1 \in A_1-A_1,\dots,h_d \in A_d-A_d} \int_X \prod_{\omega \in \{0,1\}^d} T^{\omega_1 h_1 + \dots + \omega_d h_d} f\ d\mu$

converges.

We prove this theorem below the fold. It implies a number of other known descriptions of the Gowers-Host-Kra seminorms ${\|f\|_{U^d({\mathrm X})}}$, for instance that

$\displaystyle \| f \|_{U^d({\mathrm X})}^{2^d} = \lim_{A \rightarrow G} \mathop{\bf E}_{h \in A-A} \| f T^h f \|_{U^{d-1}({\mathrm X})}^{2^{d-1}}$

for ${d > 1}$, while from the ergodic theorem we have

$\displaystyle \| f \|_{U^1({\mathrm X})} = \| {\mathbf E}( f | {\mathcal X}^G ) \|_{L^2({\mathrm X})}.$

This definition also manifestly demonstrates the cube symmetries of the Host-Kra measures ${\mu^{[d]}}$ on ${X^{\{0,1\}^d}}$, defined via duality by requiring that

$\displaystyle \langle (f_\omega)_{\omega \in \{0,1\}^d} \rangle_{U^d({\mathrm X})} = \int_{X^{\{0,1\}^d}} \bigotimes_{\omega \in \{0,1\}^d} f_\omega\ d\mu^{[d]}.$

In a subsequent blog post I hope to present a more detailed study of the ${U^2}$ norm and its relationship with eigenfunctions and the Kronecker factor, without assuming any amenability on ${G}$ or any separability or topological structure on ${{\mathrm X}}$.

I’ve just finished writing the first draft of my third book coming out of the 2010 blog posts, namely “Higher order Fourier analysis“, which was based primarily on my graduate course in the topic, though it also contains material from some additional posts related to linear and higher order Fourier analysis on the blog.  It is available online here.  As usual, comments and corrections are welcome.  There is also a stub page for the book, which at present does not contain much more than the above link.

﻿

Tanja Eisner and I have just uploaded to the arXiv our paper “Large values of the Gowers-Host-Kra seminorms“, submitted to Journal d’Analyse Mathematique. This paper is concerned with the properties of three closely related families of (semi)norms, indexed by a positive integer ${k}$:

• The Gowers uniformity norms ${\|f\|_{U^k(G)}}$ of a (bounded, measurable, compactly supported) function ${f: G \rightarrow {\bf C}}$ taking values on a locally compact abelian group ${G}$, equipped with a Haar measure ${\mu}$;
• The Gowers uniformity norms ${\|f\|_{U^k([N])}}$ of a function ${f: [N] \rightarrow {\bf C}}$ on a discrete interval ${\{1,\ldots,N\}}$; and
• The Gowers-Host-Kra seminorms ${\|f\|_{U^k(X)}}$ of a function ${f \in L^\infty(X)}$ on an ergodic measure-preserving system ${X = (X,{\mathcal X},\mu,T)}$.

These norms have been discussed in depth in previous blog posts, so I will just quickly review the definition of the first norm here (the other two (semi)norms are defined similarly). The ${U^k(G)}$ norm is defined recursively by setting

$\displaystyle \| f \|_{U^1(G)} := |\int_G f\ d\mu|$

and

$\displaystyle \|f\|_{U^k(G)}^{2^k} := \int_G \| \Delta_h f \|_{U^{k-1}(G)}^{2^{k-1}}\ d\mu(h)$

where ${\Delta_h f(x) := f(x+h) \overline{f(x)}}$. Equivalently, one has

$\displaystyle \|f\|_{U^k(G)} := (\int_G \ldots \int_G \Delta_{h_1} \ldots \Delta_{h_k} f(x)\ d\mu(x) d\mu(h_1) \ldots d\mu(h_k))^{1/2^k}.$

Informally, the Gowers uniformity norm ${\|f\|_{U^k(G)}}$ measures the extent to which (the phase of ${f}$) behaves like a polynomial of degree less than ${k}$. Indeed, if ${\|f\|_{L^\infty(G)} \leq 1}$ and ${G}$ is compact with normalised Haar measure ${\mu(G)=1}$, it is not difficult to show that ${\|f\|_{U^k(G)}}$ is at most ${1}$, with equality if and only if ${f}$ takes the form ${f = e(P) := e^{2\pi iP}}$ almost everywhere, where ${P: G \rightarrow {\bf R}/{\bf Z}}$ is a polynomial of degree less than ${k}$ (which means that ${\partial_{h_1} \ldots \partial_{h_k} P(x) = 0}$ for all ${x,h_1,\ldots,h_k \in G}$).

Our first result is to show that this result is robust, uniformly over all choices of group ${G}$:

Theorem 1 (${L^\infty}$-near extremisers) Let ${G}$ be a compact abelian group with normalised Haar measure ${\mu(G)=1}$, and let ${f \in L^\infty(G)}$ be such that ${\|f\|_{L^\infty(G)} \leq 1}$ and ${\|f\|_{U^k(G)} \geq 1-\epsilon}$ for some ${\epsilon > 0}$ and ${k \geq 1}$. Then there exists a polynomial ${P: G \rightarrow {\bf R}/{\bf Z}}$ of degree at most ${k-1}$ such that ${\|f-e(P)\|_{L^1(G)} = o(1)}$, where ${o(1)}$ is bounded by a quantity ${c_k(\epsilon)}$ that goes to zero as ${\epsilon \rightarrow 0}$ for fixed ${k}$.

The quantity ${o(1)}$ can be described effectively (it is of polynomial size in ${\epsilon}$), but we did not seek to optimise it here. This result was already known in the case of vector spaces ${G = {\bf F}_p^n}$ over a fixed finite field ${{\bf F}_p}$ (where it is essentially equivalent to the assertion that the property of being a polynomial of degree at most ${k-1}$ is locally testable); the extension to general groups ${G}$ turns out to fairly routine. The basic idea is to use the recursive structure of the Gowers norms, which tells us in particular that if ${\|f\|_{U^k(G)}}$ is close to one, then ${\|\Delta_h f\|_{U^{k-1}(G)}}$ is close to one for most ${h}$, which by induction implies that ${\Delta_h f}$ is close to ${e(Q_h)}$ for some polynomials ${Q_h}$ of degree at most ${k-2}$ and for most ${h}$. (Actually, it is not difficult to use cocycle equations such as ${\Delta_{h+k} f = \Delta_h f \times T^h \Delta_k f}$ (when ${|f|=1}$) to upgrade “for most ${h}$” to “for all ${h}$“.) To finish the job, one would like to express the ${Q_h}$ as derivatives ${Q_h = \partial_h P}$ of a polynomial ${P}$ of degree at most ${k-1}$. This turns out to be equivalent to requiring that the ${Q_h}$ obey the cocycle equation

$\displaystyle Q_{h+k} = Q_h + T^h Q_k$

where ${T^h F(x) := F(x+h)}$ is the translate of ${F}$ by ${h}$. (In the paper, the sign conventions are reversed, so that ${T^h F(x) := F(x-h)}$, in order to be compatible with ergodic theory notation, but this makes no substantial difference to the arguments or results.) However, one does not quite get this right away; instead, by using some separation properties of polynomials, one can show the weaker statement that

$\displaystyle Q_{h+k} = Q_h + T^h Q_k + c_{h,k} \ \ \ \ \ (1)$

where the ${c_{h,k}}$ are small real constants. To eliminate these constants, one exploits the trivial cohomology of the real line. From (1) one soon concludes that the ${c_{h,k}}$ obey the ${2}$-cocycle equation

$\displaystyle c_{h,k} + c_{h+k,l} = c_{h,k+l} + c_{k,l}$

and an averaging argument then shows that ${c_{h,k}}$ is a ${2}$-coboundary in the sense that

$\displaystyle c_{h,k} = b_{h+k} - b_h - b_k$

for some small scalar ${b_h}$ depending on ${h}$. Subtracting ${b_h}$ from ${Q_h}$ then gives the claim.

Similar results and arguments also hold for the ${U^k([N])}$ and ${U^k(X)}$ norms, which we will not detail here.

Dimensional analysis reveals that the ${L^\infty}$ norm is not actually the most natural norm with which to compare the ${U^k}$ norms against. An application of Young’s convolution inequality in fact reveals that one has the inequality

$\displaystyle \|f\|_{U^k(G)} \leq \|f\|_{L^{p_k}(G)} \ \ \ \ \ (2)$

where ${p_k}$ is the critical exponent ${p_k := 2^k/(k+1)}$, without any compactness or normalisation hypothesis on the group ${G}$ and the Haar measure ${\mu}$. This allows us to extend the ${U^k(G)}$ norm to all of ${L^{p_k}(G)}$. There is then a stronger inverse theorem available:

Theorem 2 (${L^{p_k}}$-near extremisers) Let ${G}$ be a locally compact abelian group, and let ${f \in L^{p_k}(G)}$ be such that ${\|f\|_{L^{p_k}(G)} \leq 1}$ and ${\|f\|_{U^k(G)} \geq 1-\epsilon}$ for some ${\epsilon > 0}$ and ${k \geq 1}$. Then there exists a coset ${H}$ of a compact open subgroup ${H}$ of ${G}$, and a polynomial ${P: H to {\bf R}/{\bf Z}}$ of degree at most ${k-1}$ such that ${\|f-e(P) 1_H\|_{L^{p_k}(G)} = o(1)}$.

Conversely, it is not difficult to show that equality in (2) is attained when ${f}$ takes the form ${e(P) 1_H}$ as above. The main idea of proof is to use an inverse theorem for Young’s inequality due to Fournier to reduce matters to the ${L^\infty}$ case that was already established. An analogous result is also obtained for the ${U^k(X)}$ norm on an ergodic system; but for technical reasons, the methods do not seem to apply easily to the ${U^k([N])}$ norm. (This norm is essentially equivalent to the ${U^k({\bf Z}/\tilde N{\bf Z})}$ norm up to constants, with ${\tilde N}$ comparable to ${N}$, but when working with near-extremisers, norms that are only equivalent up to constants can have quite different near-extremal behaviour.)

In the case when ${G}$ is a Euclidean group ${{\bf R}^d}$, it is possible to use the sharp Young inequality of Beckner and of Brascamp-Lieb to improve (2) somewhat. For instance, when ${k=3}$, one has

$\displaystyle \|f\|_{U^3({\bf R}^d)} \leq 2^{-d/8} \|f\|_{L^2({\bf R}^d)}$

with equality attained if and only if ${f}$ is a gaussian modulated by a quadratic polynomial phase. This additional gain of ${2^{-d/8}}$ allows one to pinpoint the threshold ${1-\epsilon}$ for the previous near-extremiser results in the case of ${U^3}$ norms. For instance, by using the Host-Kra machinery of characteristic factors for the ${U^3(X)}$ norm, combined with an explicit and concrete analysis of the ${2}$-step nilsystems generated by that machinery, we can show that

$\displaystyle \|f\|_{U^3(X)} \leq 2^{-1/8} \|f\|_{L^2(X)}$

whenever ${X}$ is a totally ergodic system and ${f}$ is orthogonal to all linear and quadratic eigenfunctions (which would otherwise form immediate counterexamples to the above inequality), with the factor ${2^{-1/8}}$ being best possible. We can also establish analogous results for the ${U^3([N])}$ and ${U^3({\bf Z}/N{\bf Z})}$ norms (using the inverse ${U^3}$ theorem of Ben Green and myself, in place of the Host-Kra machinery), although it is not clear to us whether the ${2^{-1/8}}$ threshold remains best possible in this case.

In Notes 5, we saw that the Gowers uniformity norms on vector spaces ${{\bf F}^n}$ in high characteristic were controlled by classical polynomial phases ${e(\phi)}$.

Now we study the analogous situation on cyclic groups ${{\bf Z}/N{\bf Z}}$. Here, there is an unexpected surprise: the polynomial phases (classical or otherwise) are no longer sufficient to control the Gowers norms ${U^{s+1}({\bf Z}/N{\bf Z})}$ once ${s}$ exceeds ${1}$. To resolve this problem, one must enlarge the space of polynomials to a larger class. It turns out that there are at least three closely related options for this class: the local polynomials, the bracket polynomials, and the nilsequences. Each of the three classes has its own strengths and weaknesses, but in my opinion the nilsequences seem to be the most natural class, due to the rich algebraic and dynamical structure coming from the nilpotent Lie group undergirding such sequences. For reasons of space we shall focus primarily on the nilsequence viewpoint here.

Traditionally, nilsequences have been defined in terms of linear orbits ${n \mapsto g^n x}$ on nilmanifolds ${G/\Gamma}$; however, in recent years it has been realised that it is convenient for technical reasons (particularly for the quantitative “single-scale” theory) to generalise this setup to that of polynomial orbits ${n \mapsto g(n) \Gamma}$, and this is the perspective we will take here.

A polynomial phase ${n \mapsto e(\phi(n))}$ on a finite abelian group ${H}$ is formed by starting with a polynomial ${\phi: H \rightarrow {\bf R}/{\bf Z}}$ to the unit circle, and then composing it with the exponential function ${e: {\bf R}/{\bf Z} \rightarrow {\bf C}}$. To create a nilsequence ${n \mapsto F(g(n) \Gamma)}$, we generalise this construction by starting with a polynomial ${g \Gamma: H \rightarrow G/\Gamma}$ into a nilmanifold ${G/\Gamma}$, and then composing this with a Lipschitz function ${F: G/\Gamma \rightarrow {\bf C}}$. (The Lipschitz regularity class is convenient for minor technical reasons, but one could also use other regularity classes here if desired.) These classes of sequences certainly include the polynomial phases, but are somewhat more general; for instance, they almost include bracket polynomial phases such as ${n \mapsto e( \lfloor \alpha n \rfloor \beta n )}$. (The “almost” here is because the relevant functions ${F: G/\Gamma \rightarrow {\bf C}}$ involved are only piecewise Lipschitz rather than Lipschitz, but this is primarily a technical issue and one should view bracket polynomial phases as “morally” being nilsequences.)

In these notes we set out the basic theory for these nilsequences, including their equidistribution theory (which generalises the equidistribution theory of polynomial flows on tori from Notes 1) and show that they are indeed obstructions to the Gowers norm being small. This leads to the inverse conjecture for the Gowers norms that shows that the Gowers norms on cyclic groups are indeed controlled by these sequences.

In Notes 3, we saw that the number of additive patterns in a given set was (in principle, at least) controlled by the Gowers uniformity norms of functions associated to that set.

Such norms can be defined on any finite additive group (and also on some other types of domains, though we will not discuss this point here). In particular, they can be defined on the finite-dimensional vector spaces ${V}$ over a finite field ${{\bf F}}$.

In this case, the Gowers norms ${U^{d+1}(V)}$ are closely tied to the space ${\hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})}$ of polynomials of degree at most ${d}$. Indeed, as noted in Exercise 20 of Notes 4, a function ${f: V \rightarrow {\bf C}}$ of ${L^\infty(V)}$ norm ${1}$ has ${U^{d+1}(V)}$ norm equal to ${1}$ if and only if ${f = e(\phi)}$ for some ${\phi \in \hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})}$; thus polynomials solve the “${100\%}$ inverse problem” for the trivial inequality ${\|f\|_{U^{d+1}(V)} \leq \|f\|_{L^\infty(V)}}$. They are also a crucial component of the solution to the “${99\%}$ inverse problem” and “${1\%}$ inverse problem”. For the former, we will soon show:

Proposition 1 (${99\%}$ inverse theorem for ${U^{d+1}(V)}$) Let ${f: V \rightarrow {\bf C}}$ be such that ${\|f\|_{L^\infty(V)}}$ and ${\|f\|_{U^{d+1}(V)} \geq 1-\epsilon}$ for some ${\epsilon > 0}$. Then there exists ${\phi \in \hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})}$ such that ${\| f - e(\phi)\|_{L^1(V)} = O_{d, {\bf F}}( \epsilon^c )}$, where ${c = c_d > 0}$ is a constant depending only on ${d}$.

Thus, for the Gowers norm to be almost completely saturated, one must be very close to a polynomial. The converse assertion is easily established:

Exercise 1 (Converse to ${99\%}$ inverse theorem for ${U^{d+1}(V)}$) If ${\|f\|_{L^\infty(V)} \leq 1}$ and ${\|f-e(\phi)\|_{L^1(V)} \leq \epsilon}$ for some ${\phi \in \hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})}$, then ${\|F\|_{U^{d+1}(V)} \geq 1 - O_{d,{\bf F}}( \epsilon^c )}$, where ${c = c_d > 0}$ is a constant depending only on ${d}$.

In the ${1\%}$ world, one no longer expects to be close to a polynomial. Instead, one expects to correlate with a polynomial. Indeed, one has

Lemma 2 (Converse to the ${1\%}$ inverse theorem for ${U^{d+1}(V)}$) If ${f: V \rightarrow {\bf C}}$ and ${\phi \in \hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})}$ are such that ${|\langle f, e(\phi) \rangle_{L^2(V)}| \geq \epsilon}$, where ${\langle f, g \rangle_{L^2(V)} := {\bf E}_{x \in G} f(x) \overline{g(x)}}$, then ${\|f\|_{U^{d+1}(V)} \geq \epsilon}$.

Proof: From the definition of the ${U^1}$ norm (equation (18) from Notes 3), the monotonicity of the Gowers norms (Exercise 19 of Notes 3), and the polynomial phase modulation invariance of the Gowers norms (Exercise 21 of Notes 3), one has

$\displaystyle |\langle f, e(\phi) \rangle| = \| f e(-\phi) \|_{U^1(V)}$

$\displaystyle \leq \|f e(-\phi) \|_{U^{d+1}(V)}$

$\displaystyle = \|f\|_{U^{d+1}(V)}$

and the claim follows. $\Box$

In the high characteristic case ${\hbox{char}({\bf F}) > d}$ at least, this can be reversed:

Theorem 3 (${1\%}$ inverse theorem for ${U^{d+1}(V)}$) Suppose that ${\hbox{char}({\bf F}) > d \geq 0}$. If ${f: V \rightarrow {\bf C}}$ is such that ${\|f\|_{L^\infty(V)} \leq 1}$ and ${\|f\|_{U^{d+1}(V)} \geq \epsilon}$, then there exists ${\phi \in \hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})}$ such that ${|\langle f, e(\phi) \rangle_{L^2(V)}| \gg_{\epsilon,d,{\bf F}} 1}$.

This result is sometimes referred to as the inverse conjecture for the Gowers norm (in high, but bounded, characteristic). For small ${d}$, the claim is easy:

Exercise 2 Verify the cases ${d=0,1}$ of this theorem. (Hint: to verify the ${d=1}$ case, use the Fourier-analytic identities ${\|f\|_{U^2(V)} = (\sum_{\xi \in \hat V} |\hat f(\xi)|^4)^{1/4}}$ and ${\|f\|_{L^2(V)} = (\sum_{\xi \in \hat V} |\hat f(\xi)|^2)^{1/2}}$, where ${\hat V}$ is the space of all homomorphisms ${\xi: x \mapsto \xi \cdot x}$ from ${V}$ to ${{\bf R}/{\bf Z}}$, and ${\hat f(\xi) := \mathop{\bf E}_{x \in V} f(x) e(-\xi \cdot x)}$ are the Fourier coefficients of ${f}$.)

This conjecture for larger values of ${d}$ are more difficult to establish. The ${d=2}$ case of the theorem was established by Ben Green and myself in the high characteristic case ${\hbox{char}({\bf F}) > 2}$; the low characteristic case ${\hbox{char}({\bf F}) = d = 2}$ was independently and simultaneously established by Samorodnitsky. The cases ${d>2}$ in the high characteristic case was established in two stages, firstly using a modification of the Furstenberg correspondence principle, due to Ziegler and myself. to convert the problem to an ergodic theory counterpart, and then using a modification of the methods of Host-Kra and Ziegler to solve that counterpart, as done in this paper of Bergelson, Ziegler, and myself.

The situation with the low characteristic case in general is still unclear. In the high characteristic case, we saw from Notes 4 that one could replace the space of non-classical polynomials ${\hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})}$ in the above conjecture with the essentially equivalent space of classical polynomials ${\hbox{Poly}_{\leq d}(V \rightarrow {\bf F})}$. However, as we shall see below, this turns out not to be the case in certain low characteristic cases (a fact first observed by Lovett, Meshulam, and Samorodnitsky, and independently by Ben Green and myself), for instance if ${\hbox{char}({\bf F}) = 2}$ and ${d \geq 3}$; this is ultimately due to the existence in those cases of non-classical polynomials which exhibit no significant correlation with classical polynomials of equal or lesser degree. This distinction between classical and non-classical polynomials appears to be a rather non-trivial obstruction to understanding the low characteristic setting; it may be necessary to obtain a more complete theory of non-classical polynomials in order to fully settle this issue.

The inverse conjecture has a number of consequences. For instance, it can be used to establish the analogue of Szemerédi’s theorem in this setting:

Theorem 4 (Szemerédi’s theorem for finite fields) Let ${{\bf F} = {\bf F}_p}$ be a finite field, let ${\delta > 0}$, and let ${A \subset {\bf F}^n}$ be such that ${|A| \geq \delta |{\bf F}^n|}$. If ${n}$ is sufficiently large depending on ${p,\delta}$, then ${A}$ contains an (affine) line ${\{ x, x+r, \ldots, x+(p-1)r\}}$ for some ${x,r \in {\bf F}^n}$ with ${ r\neq 0}$.

Exercise 3 Use Theorem 4 to establish the following generalisation: with the notation as above, if ${k \geq 1}$ and ${n}$ is sufficiently large depending on ${p,\delta}$, then ${A}$ contains an affine ${k}$-dimensional subspace.

We will prove this theorem in two different ways, one using a density increment method, and the other using an energy increment method. We discuss some other applications below the fold.

A (complex, semi-definite) inner product space is a complex vector space ${V}$ equipped with a sesquilinear form ${\langle, \rangle: V \times V \rightarrow {\bf C}}$ which is conjugate symmetric, in the sense that ${\langle w, v \rangle = \overline{\langle v, w \rangle}}$ for all ${v,w \in V}$, and non-negative in the sense that ${\langle v, v \rangle \geq 0}$ for all ${v \in V}$. By inspecting the non-negativity of ${\langle v+\lambda w, v+\lambda w\rangle}$ for complex numbers ${\lambda \in {\bf C}}$, one obtains the Cauchy-Schwarz inequality

$\displaystyle |\langle v, w \rangle| \leq |\langle v, v \rangle|^{1/2} |\langle w, w \rangle|^{1/2};$

if one then defines ${\|v\| := |\langle v, v \rangle|^{1/2}}$, one then quickly concludes the triangle inequality

$\displaystyle \|v + w \| \leq \|v\| + \|w\|$

which then soon implies that ${\| \|}$ is a semi-norm on ${V}$. If we make the additional assumption that the inner product ${\langle,\rangle}$ is positive definite, i.e. that ${\langle v, v \rangle > 0}$ whenever ${v}$ is non-zero, then this semi-norm becomes a norm. If ${V}$ is complete with respect to the metric ${d(v,w) := \|v-w\|}$ induced by this norm, then ${V}$ is called a Hilbert space.

The above material is extremely standard, and can be found in any graduate real analysis course; I myself covered it here. But what is perhaps less well known (except inside the fields of additive combinatorics and ergodic theory) is that the above theory of classical Hilbert spaces is just the first case of a hierarchy of higher order Hilbert spaces, in which the binary inner product ${f, g \mapsto \langle f, g \rangle}$ is replaced with a ${2^d}$-ary inner product ${(f_\omega)_{\omega \in \{0,1\}^d} \mapsto \langle (f_\omega)_{\omega \in \{0,1\}^d}}$ that obeys an appropriate generalisation of the conjugate symmetry, sesquilinearity, and positive semi-definiteness axioms. Such inner products then obey a higher order Cauchy-Schwarz inequality, known as the Cauchy-Schwarz-Gowers inequality, and then also obey a triangle inequality and become semi-norms (or norms, if the inner product was non-degenerate). Examples of such norms and spaces include the Gowers uniformity norms ${\| \|_{U^d(G)}}$, the Gowers box norms ${\| \|_{\Box^d(X_1 \times \ldots \times X_d)}}$, and the Gowers-Host-Kra seminorms ${\| \|_{U^d(X)}}$; a more elementary example are the family of Lebesgue spaces ${L^{2^d}(X)}$ when the exponent is a power of two. They play a central role in modern additive combinatorics and to certain aspects of ergodic theory, particularly those relating to Szemerédi’s theorem (or its ergodic counterpart, the Furstenberg multiple recurrence theorem); they also arise in the regularity theory of hypergraphs (which is not unrelated to the other two topics).

A simple example to keep in mind here is the order two Hilbert space ${L^4(X)}$ on a measure space ${X = (X,{\mathcal B},\mu)}$, where the inner product takes the form

$\displaystyle \langle f_{00}, f_{01}, f_{10}, f_{11} \rangle_{L^4(X)} := \int_X f_{00}(x) \overline{f_{01}(x)} \overline{f_{10}(x)} f_{11}(x)\ d\mu(x).$

In this brief note I would like to set out the abstract theory of such higher order Hilbert spaces. This is not new material, being already implicit in the breakthrough papers of Gowers and Host-Kra, but I just wanted to emphasise the fact that the material is abstract, and is not particularly tied to any explicit choice of norm so long as a certain axiom are satisfied. (Also, I wanted to write things down so that I would not have to reconstruct this formalism again in the future.) Unfortunately, the notation is quite heavy and the abstract axiom is a little strange; it may be that there is a better way to formulate things. In this particular case it does seem that a concrete approach is significantly clearer, but abstraction is at least possible.

Note: the discussion below is likely to be comprehensible only to readers who already have some exposure to the Gowers norms.

In the previous lecture notes, we used (linear) Fourier analysis to control the number of three-term arithmetic progressions ${a, a+r, a+2r}$ in a given set ${A}$. The power of the Fourier transform for this problem ultimately stemmed from the identity

$\displaystyle \mathop{\bf E}_{n,r \in {\bf Z}/N'{\bf Z}} 1_A(n) 1_A(n+r) 1_A(n+2r)$

$\displaystyle = \sum_{\alpha \in \frac{1}{N'}{\bf Z} / {\bf Z}} \hat 1_A(\alpha) \hat 1_A(-2\alpha) \hat 1_A(\alpha) \ \ \ \ \ (1)$

for any cyclic group ${{\bf Z}/N'{\bf Z}}$ and any subset ${A}$ of that group (analogues of this identity also exist for other finite abelian groups, and to a lesser extent to non-abelian groups also, although that is not the focus of my current discussion). As it turns out, linear Fourier analysis is not able to discern higher order patterns, such as arithmetic progressions of length four; we give some demonstrations of this below the fold, taking advantage of the polynomial recurrence theory from Notes 1.

The main objective of this course is to introduce the (still nascent) theory of higher order Fourier analysis, which is capable of studying higher order patterns. The full theory is still rather complicated (at least, at our present level of understanding). However, one aspect of the theory is relatively simple, namely that we can largely reduce the study of arbitrary additive patterns to the study of a single type of additive pattern, namely the parallelopipeds

$\displaystyle ( x + \omega_1 h_1 + \ldots + \omega_d h_d )_{\omega_1,\ldots,\omega_d \in \{0,1\}}. \ \ \ \ \ (2)$

Thus for instance, for ${d=1}$ one has the line segments

$\displaystyle x, x+h_1 \ \ \ \ \ (3)$

for ${d=2}$ one has the parallelograms

$\displaystyle x, x+h_1, x+h_2, x+h_1+h_2, \ \ \ \ \ (4)$

for ${d=3}$ one has the parallelopipeds

$\displaystyle x, x+h_1, x+h_2, x+h_3, x+h_1+h_2, x+h_1+h_3, x+h_2+h_3, x+h_1+h_2+h_3. \ \ \ \ \ (5)$

These patterns are particularly pleasant to handle, thanks to the large number of symmetries available on the discrete cube ${\{0,1\}^d}$. For instance, whereas establishing the presence of arbitrarily long arithmetic progressions in dense sets is quite difficult (Szemerédi’s theorem), establishing arbitrarily high-dimensional parallelopipeds is much easier:

Exercise 1 Let ${A \subset [N]}$ be such that ${|A| > \delta N}$ for some ${0 < \delta \leq 1}$. If ${N}$ is sufficiently large depending on ${\delta}$, show that there exists an integer ${1 \leq h \ll 1/\delta}$ such that ${|A \cap (A+h)| \gg \delta^2 N}$. (Hint: obtain upper and lower bounds on the set ${\{ (x,y) \in A \times A: x < y \leq x + 10/\delta \}}$.)

Exercise 2 (Hilbert cube lemma) Let ${A \subset [N]}$ be such that ${|A| > \delta N}$ for some ${0 < \delta \leq 1}$, and let ${d \geq 1}$ be an integer. Show that if ${N}$ is sufficiently large depending on ${\delta,d}$, then ${A}$ contains a parallelopiped of the form (2), with ${1 \leq h_1,\ldots,h_d \ll_\delta 1}$ positive integers. (Hint: use the previous exercise and induction.) Conclude that if ${A \subset {\bf Z}}$ has positive upper density, then it contains infinitely many such parallelopipeds for each ${d}$.

Exercise 3 Show that if ${q \geq 1}$ is an integer, and ${d}$ is sufficiently large depending on ${q}$, then for any parallelopiped (2) in the integers ${{\bf Z}}$, there exists ${\omega_1,\ldots,\omega_d \in \{0,1\}}$, not all zero, such that ${x + h_1 \omega_1 + \ldots + h_d \omega_d = x \hbox{ mod } q}$. (Hint: pigeonhole the ${h_i}$ in the residue classes modulo ${q}$.) Use this to conclude that if ${A}$ is the set of all integers ${n}$ such that ${|n-km!| \geq m}$ for all integers ${k, m \geq 1}$, then ${A}$ is a set of positive upper density (and also positive lower density) which does not contain any infinite parallelopipeds (thus one cannot take ${d=\infty}$ in the Hilbert cube lemma).

The standard way to control the parallelogram patterns (and thus, all other (finite complexity) linear patterns) are the Gowers uniformity norms

$\displaystyle \| f\|_{U^d(G)} := {\bf E}_{x,h_1,\ldots,h_d \in G} \prod_{\omega_1,\ldots,\omega_d \in \{0,1\}^d} {\mathcal C}^{\omega_1+\ldots+\omega_d} f(x+\omega_1 h_1 + \ldots + \omega_d h_d) \ \ \ \ \ (6)$

with ${f: G \rightarrow {\bf C}}$ a function on a finite abelian group ${G}$, and ${{\mathcal C}: z \mapsto \overline{z}}$ is the complex conjugation operator; analogues of this norm also exist for group-like objects such as the progression ${[N]}$, and also for measure-preserving systems (where they are known as the Gowers-Host-Kra uniformity seminorms, see this paper of Host-Kra for more discussion). In this set of notes we will focus on the basic properties of these norms; the deepest fact about them, known as the inverse conjecture for these norms, will be discussed in later notes.

Ben Green, and I have just uploaded to the arXiv a short (six-page) paper “Yet another proof of Szemeredi’s theorem“, submitted to the 70th birthday conference proceedings for Endre Szemerédi. In this paper we put in print a folklore observation, namely that the inverse conjecture for the Gowers norm, together with the density increment argument, easily implies Szemerédi’s famous theorem on arithmetic progressions. This is unsurprising, given that Gowers’ proof of Szemerédi’s theorem proceeds through a weaker version of the inverse conjecture and a density increment argument, and also given that it is possible to derive Szemerédi’s theorem from knowledge of the characteristic factor for multiple recurrence (the ergodic theory analogue of the inverse conjecture, first established by Host and Kra), as was done by Bergelson, Leibman, and Lesigne (and also implicitly in the earlier paper of Bergelson, Host, and Kra); but to our knowledge the exact derivation of Szemerédi’s theorem from the inverse conjecture was not in the literature. Ordinarily this type of folklore might be considered too trifling (and too well known among experts in the field) to publish; but we felt that the venue of the Szemerédi birthday conference provided a natural venue for this particular observation.

The key point is that one can show (by an elementary argument relying primarily an induction on dimension argument and the Weyl recurrence theorem, i.e. that given any real ${\alpha}$ and any integer ${s \geq 1}$, that the expression ${\alpha n^s}$ gets arbitrarily close to an integer) that given a (polynomial) nilsequence ${n \mapsto F(g(n)\Gamma)}$, one can subdivide any long arithmetic progression (such as ${[N] = \{1,\ldots,N\}}$) into a number of medium-sized progressions, where the nilsequence is nearly constant on each progression. As a consequence of this and the inverse conjecture for the Gowers norm, if a set has no arithmetic progressions, then it must have an elevated density on a subprogression; iterating this observation as per the usual density-increment argument as introduced long ago by Roth, one obtains the claim. (This is very close to the scheme of Gowers’ proof.)

Technically, one might call this the shortest proof of Szemerédi’s theorem in the literature (and would be something like the sixteenth such genuinely distinct proof, by our count), but that would be cheating quite a bit, primarily due to the fact that it assumes the inverse conjecture for the Gowers norm, our current proof of which is checking in at about 100 pages…