In the modern theory of additive combinatorics, a large role is played by the Gowers uniformity norms {\|f\|_{U^k(G)}}, where {k \geq 1}, {G = (G,+)} is a finite abelian group, and {f: G \rightarrow {\bf C}} is a function (one can also consider these norms in finite approximate groups such as {[N] = \{1,\dots,N\}} instead of finite groups, but we will focus on the group case here for simplicity). These norms can be defined by the formula

\displaystyle \|f\|_{U^k(G)} := (\mathop{\bf E}_{x,h_1,\dots,h_k \in G} \Delta_{h_1} \dots \Delta_{h_k} f(x))^{1/2^k}

where we use the averaging notation

\displaystyle \mathop{\bf E}_{x \in A} f(x) := \frac{1}{|A|} \sum_{x \in A} f(x)

for any non-empty finite set {A} (with {|A|} denoting the cardinality of {A}), and {\Delta_h} is the multiplicative discrete derivative operator

\displaystyle \Delta_h f(x) := f(x+h) \overline{f(x)}.

One reason why these norms play an important role is that they control various multilinear averages. We give two sample examples here:

Proposition 1 Let {G = {\bf Z}/N{\bf Z}}.

  • (i) If {a_1,\dots,a_k} are distinct elements of {G} for some {k \geq 2}, and {f_1,\dots,f_k: G \rightarrow {\bf C}} are {1}-bounded functions (thus {|f_j(x)| \leq 1} for all {j=1,\dots,k} and {x \in G}), then

    \displaystyle \mathop{\bf E}_{x, h \in G} f_1(x+a_1 h) \dots f_k(x+a_k h) \leq \|f_i\|_{U^{k-1}(G)} \ \ \ \ \ (1)

     

    for any {i=1,\dots,k}.

  • (ii) If {f_1,f_2,f_3: G \rightarrow {\bf C}} are {1}-bounded, then one has

    \displaystyle \mathop{\bf E}_{x, h \in G} f_1(x) f_2(x+h) f_3(x+h^2) \ll \|f_3\|_{U^4(G)} + N^{-1/4}.

We establish these claims a little later in this post.

In some more recent literature (e.g., this paper of Conlon, Fox, and Zhao), the role of Gowers norms have been replaced by (generalisations) of the cut norm, a concept originating from graph theory. In this blog post, it will be convenient to define these cut norms in the language of probability theory (using boldface to denote random variables).

Definition 2 (Cut norm) Let {{\bf X}_1,\dots,{\bf X}_k, {\bf Y}_1,\dots,{\bf Y}_l} be independent random variables with {k,l \geq 0}; to avoid minor technicalities we assume that these random variables are discrete and take values in a finite set. Given a random variable {{\bf F} = F( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )} of these independent random variables, we define the cut norm

\displaystyle \| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )} := \sup | \mathop{\bf E} {\bf F} {\bf B}_1 \dots {\bf B}_k |

where the supremum ranges over all choices {{\bf B}_1,\dots,{\bf B}_k} of random variables {{\bf B}_i = B_i( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )} that are {1}-bounded (thus {|{\bf B}_i| \leq 1} surely), and such that {{\bf B}_i} does not depend on {{\bf X}_i}.

If {l=0}, we abbreviate {\| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )}} as {\| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k )}}.

Strictly speaking, the cut norm is only a cut semi-norm when {k=0,1}, but we will abuse notation by referring to it as a norm nevertheless.

Example 3 If {G = (V_1,V_2,E)} is a bipartite graph, and {\mathbf{v_1}}, {\mathbf{v_2}} are independent random variables chosen uniformly from {V_1,V_2} respectively, then

\displaystyle \| 1_E(\mathbf{v_1},\mathbf{v_2}) \|_{\mathrm{CUT}(\mathbf{v_1}, \mathbf{v_2})}

\displaystyle = \sup_{\|f\|_\infty, \|g\|_\infty \leq 1} |\mathop{\bf E}_{v_1 \in V_1, v_2 \in V_2} 1_E(v_1,v_2) f(v_1) g(v_2)|

where the supremum ranges over all {1}-bounded functions {f: V_1 \rightarrow [-1,1]}, {g: V_2 \rightarrow [-1,1]}. The right hand side is essentially the cut norm of the graph {G}, as defined for instance by Frieze and Kannan.

The cut norm is basically an expectation when {k=0,1}:

Example 4 If {k=0}, we see from definition that

\displaystyle \| {\bf F} \|_{\mathrm{CUT}( ; {\bf Y}_1,\dots,{\bf Y}_l )} =| \mathop{\bf E} {\bf F} |.

If {k=1}, one easily checks that

\displaystyle \| {\bf F} \|_{\mathrm{CUT}( {\bf X}; {\bf Y}_1,\dots,{\bf Y}_l )} = \mathop{\bf E} | \mathop{\bf E}_{\bf X} {\bf F} |,

where {\mathop{\bf E}_{\bf X} {\bf F} = \mathop{\bf E}( {\bf F} | {\bf Y}_1,\dots,{\bf Y}_l )} is the conditional expectation of {{\bf F}} to the {\sigma}-algebra generated by all the variables other than {{\bf X}}, i.e., the {\sigma}-algebra generated by {{\bf Y}_1,\dots,{\bf Y}_l}. In particular, if {{\bf X}, {\bf Y}_1,\dots,{\bf Y}_l} are independent random variables drawn uniformly from {X,Y_1,\dots,Y_l} respectively, then

\displaystyle \| F( {\bf X}; {\bf Y}_1,\dots, {\bf Y}_l) \|_{\mathrm{CUT}( {\bf X}; {\bf Y}_1,\dots,{\bf Y}_l )}

\displaystyle = \mathop{\bf E}_{y_1 \in Y_1,\dots, y_l \in Y_l} |\mathop{\bf E}_{x \in X} F(x; y_1,\dots,y_l)|.

Here are some basic properties of the cut norm:

Lemma 5 (Basic properties of cut norm) Let {{\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l} be independent discrete random variables, and {{\bf F} = F({\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l)} a function of these variables.

  • (i) (Permutation invariance) The cut norm {\| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )}} is invariant with respect to permutations of the {{\bf X}_1,\dots,{\bf X}_k}, or permutations of the {{\bf Y}_1,\dots,{\bf Y}_l}.
  • (ii) (Conditioning) One has

    \displaystyle \| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )} = \mathop{\bf E} \| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k )}

    where on the right-hand side we view, for each realisation {y_1,\dots,y_l} of {{\bf Y}_1,\dots,{\bf Y}_l}, {{\bf F}} as a function {F( {\bf X}_1,\dots,{\bf X}_k; y_1,\dots,y_l)} of the random variables {{\bf X}_1,\dots, {\bf X}_k} alone, thus the right-hand side may be expanded as

    \displaystyle \sum_{y_1,\dots,y_l} \| F( {\bf X}_1,\dots,{\bf X}_k; y_1,\dots,y_l) \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k )}

    \displaystyle \times \mathop{\bf P}( Y_1=y_1,\dots,Y_l=y_l).

  • (iii) (Monotonicity) If {k \geq 1}, we have

    \displaystyle \| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )} \geq \| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_{k-1}; {\bf X}_k, {\bf Y}_1,\dots,{\bf Y}_l )}.

  • (iv) (Multiplicative invariances) If {{\bf B} = B({\bf X}_1,\dots,{\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l)} is a {1}-bounded function that does not depend on one of the {{\bf X}_i}, then

    \displaystyle \| {\bf B} {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )} \leq \| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )}.

    In particular, if we additionally assume {|{\bf B}|=1}, then

    \displaystyle \| {\bf B} {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )} = \| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )}.

  • (v) (Cauchy-Schwarz) If {k \geq 1}, one has

    \displaystyle \| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )} \leq \| \Box_{{\bf X}_1, {\bf X}'_1} {\bf F} \|_{\mathrm{CUT}( {\bf X}_2, \dots, {\bf X}_k; {\bf X}_1, {\bf X}'_1, {\bf Y}_1,\dots,{\bf Y}_l )}^{1/2}

    where {{\bf X}'_1} is a copy of {{\bf X}_1} that is independent of {{\bf X}_1,\dots,{\bf X}_k,{\bf Y}_1,\dots,{\bf Y}_l} and {\Box_{{\bf X}_1, {\bf X}'_1} {\bf F}} is the random variable

    \displaystyle \Box_{{\bf X}_1, {\bf X}'_1} {\bf F} := F( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )

    \displaystyle \times \overline{F}( {\bf X}'_1, {\bf X}_2, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l ).

  • (vi) (Averaging) If {k \geq 1} and {{\bf F} = \mathop{\bf E}_{\bf Z} {\bf F}_{\bf Z}}, where {{\bf Z}} is another random variable independent of {{\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l}, and {{\bf F}_{\bf Z} = F_{\bf Z}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )} is a random variable depending on both {{\bf Z}} and {{\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l}, then

    \displaystyle \| {\bf F} \|_{\mathrm{CUT}( {\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )} \leq \| {\bf F}_{\bf Z} \|_{\mathrm{CUT}( ({\bf X}_1, {\bf Z}), {\bf X}_2, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l )}

Proof: The claims (i), (ii) are clear from expanding out all the definitions. The claim (iii) also easily follows from the definitions (the left-hand side involves a supremum over a more general class of multipliers {{\bf B}_1,\dots,{\bf B}_{k}}, while the right-hand side omits the {{\bf B}_k} multiplier), as does (iv) (the multiplier {{\bf B}} can be absorbed into one of the multipliers in the definition of the cut norm). The claim (vi) follows by expanding out the definitions, and observing that all of the terms in the supremum appearing in the left-hand side also appear as terms in the supremum on the right-hand side. It remains to prove (v). By definition, the left-hand side is the supremum over all quantities of the form

\displaystyle |{\bf E} {\bf F} {\bf B}_1 \dots {\bf B}_k|

where the {{\bf B}_i} are {1}-bounded functions of {{\bf X}_1, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l} that do not depend on {{\bf X}_i}. We average out in the {{\bf X}_1} direction (that is, we condition out the variables {{\bf X}_2, \dots, {\bf X}_k; {\bf Y}_1,\dots,{\bf Y}_l}), and pull out the factor {{\bf B}_1} (which does not depend on {{\bf X}_1}), to write this as

\displaystyle |{\bf E} {\bf B}_1 {\bf E}_{{\bf X}_1}( {\bf F} {\bf B}_2 \dots {\bf B}_k )|,

which by Cauchy-Schwarz is bounded by

\displaystyle ( |{\bf E} |{\bf E}_{{\bf X}_1}( {\bf F} {\bf B}_2 \dots {\bf B}_k )|^2)^{1/2},

which can be expanded using the copy {{\bf X}_1} as

\displaystyle |{\bf E} \Box_{{\bf X}_1,{\bf X}'_1} ({\bf F} {\bf B}_2 \dots {\bf B}_k) |^{1/2}.

Expanding

\displaystyle \Box_{{\bf X}_1,{\bf X}'_1} ({\bf F} {\bf B}_2 \dots {\bf B}_k) = (\Box_{{\bf X}_1,{\bf X}'_1} {\bf F}) (\Box_{{\bf X}_1,{\bf X}'_1} {\bf B}_2) \dots (\Box_{{\bf X}_1,{\bf X}'_1} {\bf B}_k)

and noting that each {\Box_{{\bf X}_1,{\bf X}'_1} {\bf B}_i} is {1}-bounded and independent of {{\bf X}_i} for {i=2,\dots,k}, we obtain the claim. \Box

Now we can relate the cut norm to Gowers uniformity norms:

Lemma 6 Let {G} be a finite abelian group, let {{\bf x}, {\bf h}_1,\dots,{\bf h}_k} be independent random variables uniformly drawn from {G} for some {k \geq 0}, and let {f: G \rightarrow {\bf C}}. Then

\displaystyle \| f({\bf x} + {\bf h}_1 + \dots + {\bf h}_k) \|_{\mathrm{CUT}( {\bf h}_1,\dots,{\bf h}_k, {\bf x} )} \leq \|f\|_{U^{k+1}(G)} \ \ \ \ \ (2)

and similarly (if {k \geq 1})

\displaystyle \| f({\bf x} + {\bf h}_1 + \dots + {\bf h}_k) \|_{\mathrm{CUT}( {\bf h}_1,\dots,{\bf h}_k; {\bf x} )} \leq \|f\|_{U^{k}(G)} \ \ \ \ \ (3)

If {f} is additionally assumed to be {1}-bounded, we have the converse inequalities

\displaystyle \|f\|_{U^{k+1}(G)}^{2^{k+1}} \leq \| f({\bf x} + {\bf h}_1 + \dots + {\bf h}_k) \|_{\mathrm{CUT}( {\bf h}_1,\dots,{\bf h}_k, {\bf x} )} \ \ \ \ \ (4)

and (if {k \geq 1})

\displaystyle \|f\|_{U^{k}(G)}^{2^{k}} \leq \| f({\bf x} + {\bf h}_1 + \dots + {\bf h}_k) \|_{\mathrm{CUT}( {\bf h}_1,\dots,{\bf h}_k; {\bf x} )}. \ \ \ \ \ (5)

 

Proof: Applying Lemma 5(v) {k} times, we can bound

\displaystyle \| f({\bf x} + {\bf h}_1 + \dots + {\bf h}_k) \|_{\mathrm{CUT}( {\bf h_1},\dots,{\bf h_k}, {\bf x} )}

by

\displaystyle \| \Box_{{\bf h}_k,{\bf h}'_k} \dots \Box_{{\bf h}_1,{\bf h}'_1} (f({\bf x} + {\bf h}_1 + \dots + {\bf h}_k)) \|_{\mathrm{CUT}( {\bf x}; {\bf h}_1, {\bf h}'_1, \dots, {\bf h}_k, {\bf h}'_k )}^{1/2^k} \ \ \ \ \ (6)

where {{\bf h}'_1,\dots,{\bf h}'_k} are independent copies of {{\bf h}_1,\dots,{\bf h}_k} that are also independent of {{\bf x}}. The expression inside the norm can also be written as

\displaystyle \Delta_{{\bf h}_k - {\bf h}'_k} \dots \Delta_{{\bf h}_1 - {\bf h}'_1} f({\bf x} + {\bf h}'_1 + \dots + {\bf h}'_k)

so by Example 4 one can write (6) as

\displaystyle |\mathop{\bf E}_{h_1,\dots,h_k,h'_1,\dots,h'_k \in G} |\mathop{\bf E}_{x \in G} \Delta_{h_k - h'_k} \dots \Delta_{h_1 - h'_1} f(x+h'_1+\dots+h'_k)||^{1/2^k}

which after some change of variables simplifies to

\displaystyle |\mathop{\bf E}_{h_1,\dots,h_k \in G} |\mathop{\bf E}_{x \in G} \Delta_{h_k} \dots \Delta_{h_1} f(x)||^{1/2^k}

which by Cauchy-Schwarz is bounded by

\displaystyle |\mathop{\bf E}_{h_1,\dots,h_k \in G} |\mathop{\bf E}_{x \in G} \Delta_{h_k} \dots \Delta_{h_1} f(x)|^2|^{1/2^{k+1}}

which one can rearrange as

\displaystyle |\mathop{\bf E}_{h_1,\dots,h_k,h_{k+1},x \in G} \Delta_{h_{k+1}} \Delta_{h_k} \dots \Delta_{h_1} f(x)|^{1/2^{k+1}}

giving (2). A similar argument bounds

\displaystyle \| f({\bf x} + {\bf h}_1 + \dots + {\bf h}_k) \|_{\mathrm{CUT}( {\bf h_1},\dots,{\bf h_k}; {\bf x} )}

by

\displaystyle |\mathop{\bf E}_{h_1,\dots,h_k \in G} \mathop{\bf E}_{x \in G} \Delta_{h_k} \dots \Delta_{h_1} f(x)|^{1/2^k}

which gives (3).

For (4), we can reverse the above steps and expand {\|f\|_{U^{k+1}(G)}^{2^{k+1}}} as

\displaystyle \mathop{\bf E}_{h_1,\dots,h_k \in G} |\mathop{\bf E}_{x \in G} \Delta_{h_k} \dots \Delta_{h_1} f(x)|^2

which we can write as

\displaystyle |\mathop{\bf E}_{h_1,\dots,h_k \in G} b(h_1,\dots,h_k) \mathop{\bf E}_{x \in G} \Delta_{h_k} \dots \Delta_{h_1} f(x)|

for some {1}-bounded function {b}. This can in turn be expanded as

\displaystyle |\mathop{\bf E}_{h_1,\dots,h_k,x \in G} f(x+h_1+\dots+h_k) b(h_1,\dots,h_k) \prod_{i=1}^k b_i(x,h_1,\dots,h_k)|

for some {1}-bounded functions {b_i} that do not depend on {h_i}. By Example 4, this can be written as

\displaystyle \| f({\bf x} + {\bf h_1}+\dots+{\bf h}_k) b({\bf h}_1,\dots,{\bf h}_k) \prod_{i=1}^k b_i(x,h_1,\dots,h_k) \|_{\mathrm{CUT}(; {\bf h}_1,\dots,{\bf h}_k, {\bf x})}

which by several applications of Theorem 5(iii) and then Theorem 5(iv) can be bounded by

\displaystyle \| f({\bf x} + {\bf h_1}+\dots+{\bf h}_k) \|_{\mathrm{CUT}( {\bf h}_1,\dots,{\bf h}_k, {\bf x})},

giving (4). A similar argument gives (5). \Box

Now we can prove Proposition 1. We begin with part (i). By permutation we may assume {i=k}, then by translation we may assume {a_k=0}. Replacing {x} by {x+h_1+\dots+h_{k-1}} and {h} by {h - a_1^{-1} h_1 - \dots - a_{k-1}^{-1} h_{k-1}}, we can write the left-hand side of (1) as

\displaystyle \mathop{\bf E}_{x,h,h_1,\dots,h_{k-1} \in G} f_k(x+h_1+\dots+h_{k-1}) \prod_{i=1}^{k-1} b_i(x,h,h_1,\dots,h_{k-1})

where

\displaystyle b_i(x,h,h_1,\dots,h_{k-1})

\displaystyle := f_i( x + h_1+\dots+h_{k-1}+ a_i(h - a_1^{-1} h_1 - \dots - a_k^{-1} h_{k-1}))

is a {1}-bounded function that does not depend on {h_i}. Taking {{\bf x}, {\bf h}, {\bf h}_1,\dots,{\bf h}_k} to be independent random variables drawn uniformly from {G}, the left-hand side of (1) can then be written as

\displaystyle \mathop{\bf E} f_k({\bf x}+{\bf h}_1+\dots+{\bf h}_{k-1}) \prod_{i=1}^{k-1} b_i({\bf x},{\bf h},{\bf h}_1,\dots,{\bf h}_{k-1})

which by Example 4 is bounded in magnitude by

\displaystyle \| f_k({\bf x}+{\bf h}_1+\dots+{\bf h}_{k-1}) \prod_{i=1}^{k-1} b_i({\bf x},{\bf h},{\bf h}_1,\dots,{\bf h}_{k-1}) \|_{\mathrm{CUT}(; {\bf h}_1,\dots,{\bf h}_{k-1}, {\bf x}, {\bf h})}.

After many applications of Lemma 5(iii), (iv), this is bounded by

\displaystyle \| f_k({\bf x}+{\bf h_1}+\dots+{\bf h_{k-1}}) \|_{\mathrm{CUT}({\bf h}_1,\dots,{\bf h}_{k-1}; {\bf x}, {\bf h})}

By Lemma 5(ii) we may drop the {{\bf h}} variable, and then the claim follows from Lemma 6.

For part (ii), we replace {x} by {x+a-h^2} and {h} by {h-a+b} to write the left-hand side as

\displaystyle \mathop{\bf E}_{x, a,b,h \in G} f_1(x+a-h^2) f_2(x+h+b-h^2) f_3(x+a+(h-a+b)^2-h^2);

the point here is that the first factor does not involve {b}, the second factor does not involve {a}, and the third factor has no quadratic terms in {h}. Letting {{\bf x}, {\bf a}, {\bf b}, {\bf h}} be independent variables drawn uniformly from {G}, we can use Example 4 to bound this in magnitude by

\displaystyle \| f_1({\bf x}+{\bf a}-{\bf h}^2) f_2({\bf x}+{\bf h}+{\bf b}-{\bf h}^2)

\displaystyle f_3( {\bf x}+{\bf a}+({\bf h}-{\bf a}+{\bf b})^2-{\bf h}^2 ) \|_{\mathrm{CUT}(; {\bf x}, {\bf a}, {\bf b}, {\bf h})}

which by Lemma 5(i),(iii),(iv) is bounded by

\displaystyle \| f_3( {\bf x}+{\bf a}+({\bf h}-{\bf a}+{\bf b})^2 - {\bf h}^2 ) \|_{\mathrm{CUT}({\bf a}, {\bf b}; {\bf x}, {\bf h})}

and then by Lemma 5(v) we may bound this by

\displaystyle \| \Box_{{\bf a}, {\bf a}'} \Box_{{\bf b}, {\bf b}'} f_3( {\bf x}+{\bf a}+({\bf h}-{\bf a}+{\bf b})^2 - {\bf h}^2 ) \|_{\mathrm{CUT}(;{\bf a}, {\bf a}', {\bf b}, {\bf b}', {\bf x}, {\bf h})}^{1/4}

which by Example 4 is

\displaystyle |\mathop{\bf E} \Box_{{\bf a}, {\bf a}'} \Box_{{\bf b}, {\bf b}'} f_3( {\bf x}+{\bf a}+({\bf h}-{\bf a}+{\bf b})^2 - {\bf h}^2 )|^{1/4}

Now the expression inside the expectation is the product of four factors, each of which is {f_3} or {\overline{f}_3} applied to an affine form {{\bf x} + {\bf c} + {\bf a} {\bf h}} where {{\bf c}} depends on {{\bf a}, {\bf a}', {\bf b}, {\bf b}'} and {{\bf a}} is one of {2({\bf b}-{\bf a})}, {2({\bf b}'-{\bf a})}, {2({\bf b}-{\bf a}')}, {2({\bf b}'-{\bf a}')}. With probability {1-O(1/N)}, the four different values of {{\bf a}} are distinct, and then by part (i) we have

\displaystyle |\mathop{\bf E}(\Box_{{\bf a}, {\bf a}'} \Box_{{\bf b}, {\bf b}'} f_3( {\bf x}+{\bf a}+({\bf h}-{\bf a}+{\bf b})^2 - {\bf h}^2 )|{\bf a}, {\bf a}', {\bf b}, {\bf b}')| \leq \|f_3\|_{U^4({\bf Z}/N{\bf Z})}.

When they are not distinct, we can instead bound this quantity by {1}. Taking expectations in {{\bf a}, {\bf a}', {\bf b}, {\bf b}'}, we obtain the claim. \Box

The analogue of the inverse {U^2} theorem for cut norms is the following claim (which I learned from Ben Green):

Lemma 7 ({U^2}-type inverse theorem) Let {\mathbf{x}, \mathbf{h}} be independent random variables drawn from a finite abelian group {G}, and let {f: G \rightarrow {\bf C}} be {1}-bounded. Then we have

\displaystyle \| f(\mathbf{x} + \mathbf{h}) \|_{\mathrm{CUT}(\mathbf{x}, \mathbf{h})} = \sup_{\xi \in\hat G} \| f(\mathbf{x}) e(\xi \cdot \mathbf{x}) \|_{\mathrm{CUT}(\mathbf{x})}

where {\hat G} is the group of homomorphisms {\xi: x \mapsto \xi \cdot x} is a homomorphism from {G} to {{\bf R}/{\bf Z}}, and {e(\theta) := e^{2\pi i \theta}}.

Proof: Suppose first that {\| f(\mathbf{x} + \mathbf{h}) \|_{\mathrm{CUT}(\mathbf{x}, \mathbf{h})} > \delta} for some {\delta}, then by definition

\displaystyle |\mathop{\bf E}_{x,h \in G} f(x+h) a(x) b(h)| > \delta

for some {1}-bounded {a,b: G \rightarrow {\bf C}}. By Fourier expansion, the left-hand side is also

\displaystyle \sum_{\xi \in \hat G} \hat f(-\xi) \hat a(\xi) \hat b(\xi)

where {\hat f(\xi) := \mathop{\bf E}_{x \in G} f(x) e(-\xi \cdot x)}. From Plancherel’s theorem we have

\displaystyle \sum_{\xi \in \hat G} |\hat a(\xi)|^2, \sum_{\xi \in \hat G} |\hat b(\xi)|^2 \leq 1

hence by Hölder’s inequality one has {|\hat f(-\xi)| > \delta} for some {\xi \in \hat G}, and hence

\displaystyle \sup_{\xi \in\hat G} \| f(\mathbf{x}) e(\xi \cdot \mathbf{x}) \|_{\mathrm{CUT}(\mathbf{x})} > \delta. \ \ \ \ \ (7)

Conversely, suppose (7) holds. Then there is {\xi \in \hat G} such that

\displaystyle \| f(\mathbf{x}) e(\xi \cdot \mathbf{x}) \|_{\mathrm{CUT}(\mathbf{x})} > \delta

which on substitution and Example 4 implies

\displaystyle \| f(\mathbf{x}+\mathbf{h}) e(\xi \cdot (\mathbf{x}+\mathbf{h})) \|_{\mathrm{CUT}(;\mathbf{x}, \mathbf{h})} > \delta.

The term {e(\xi \cdot (\mathbf{x}+\mathbf{h}))} splits into the product of a factor {e(\xi \cdot \mathbf{x})} not depending on {\mathbf{h}}, and a factor {e(\xi \cdot \mathbf{h})} not depending on {\mathbf{x}}. Applying Lemma 5(iii), (iv) we conclude that

\displaystyle \| f(\mathbf{x}+\mathbf{h}) \|_{\mathrm{CUT}(\mathbf{x}, \mathbf{h})} > \delta.

The claim follows. \Box

The higher order inverse theorems are much less trivial (and the optimal quantitative bounds are not currently known). However, there is a useful degree lowering argument, due to Peluse and Prendiville, that can allow one to lower the order of a uniformity norm in some cases. We give a simple version of this argument here:

Lemma 8 (Degree lowering argument, special case) Let {G} be a finite abelian group, let {Y} be a non-empty finite set, and let {f: G \rightarrow {\bf C}} be a function of the form {f(x) := \mathop{\bf E}_{y \in Y} F_y(x)} for some {1}-bounded functions {F_y: G \rightarrow {\bf C}} indexed by {y \in Y}. Suppose that

\displaystyle \|f\|_{U^k(G)} \geq \delta

for some {k \geq 2} and {0 < \delta \leq 1}. Then one of the following claims hold (with implied constants allowed to depend on {k}):

  • (i) (Degree lowering) one has {\|f\|_{U^{k-1}(G)} \gg \delta^{O(1)}}.
  • (ii) (Non-zero frequency) There exist {h_1,\dots,h_{k-2} \in G} and non-zero {\xi \in \hat G} such that

    \displaystyle |\mathop{\bf E}_{x \in G, y \in Y} \Delta_{h_1} \dots \Delta_{h_{k-2}} F_y(x) e( \xi \cdot x )| \gg \delta^{O(1)}.

There are more sophisticated versions of this argument in which the frequency {\xi} is “minor arc” rather than “zero frequency”, and then the Gowers norms are localised to suitable large arithmetic progressions; this is implicit in the above-mentioned paper of Peluse and Prendiville.

Proof: One can write

\displaystyle \|f\|_{U^k(G)}^{2^k} = \mathop{\bf E}_{h_1,\dots,h_{k-2} \in G} \|\Delta_{h_1} \dots \Delta_{h_{k-2}} f \|_{U^2(G)}^4

and hence we conclude that

\displaystyle \|\Delta_{h_1} \dots \Delta_{h_{k-2}} f \|_{U^2(G)} \gg \delta^{O(1)}

for a set {\Sigma} of tuples {(h_1,\dots,h_{k-2}) \in G^{k-2}} of density {h_1,\dots,h_{k-2}}. Applying Lemma 6 and Lemma 7, we see that for each such tuple, there exists {\phi(h_1,\dots,h_{k-2}) \in \hat G} such that

\displaystyle \| \Delta_{h_1} \dots \Delta_{h_{k-2}} f({\bf x}) e( \phi(h_1,\dots,h_{k-2}) \cdot {\bf x} ) \|_{\mathrm{CUT}({\bf x})} \gg \delta^{O(1)}, \ \ \ \ \ (8)

where {{\bf x}} is drawn uniformly from {G}.

Let us adopt the convention that {e( \phi( _1,\dots,h_{k-2}) \cdot {\bf x} ) } vanishes for {(h_1,\dots,h_{k-2})} not in {\Sigma}, then from Lemma 5(ii) we have

\displaystyle \| \Delta_{{\bf h}_1} \dots \Delta_{{\bf h}_{k-2}} f({\bf x}) e( \phi({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot {\bf x} ) \|_{\mathrm{CUT}({\bf x}; {\bf h}_1,\dots, {\bf h}_{k-2})} \gg \delta^{O(1)},

where {{\bf h}_1,\dots,{\bf h}_{k-2}} are independent random variables drawn uniformly from {G} and also independent of {{\bf x}}. By repeated application of Lemma 5(iii) we then have

\displaystyle \| \Delta_{{\bf h}_1} \dots \Delta_{{\bf h}_{k-2}} f({\bf x}) e( \phi({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot {\bf x} ) \|_{\mathrm{CUT}({\bf x},{\bf h}_1,\dots, {\bf h}_{k-2})} \gg \delta^{O(1)}.

Expanding out {\Delta_{h_1} \dots \Delta_{h_{k-2}} f({\bf x})} and using Lemma 5(iv) repeatedly we conclude that

\displaystyle \| f({\bf x} + {\bf h}_1 + \dots + {\bf h}_{k-2}) e( \phi({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot {\bf x} ) \|_{\mathrm{CUT}({\bf x},{\bf h}_1,\dots, {\bf h}_{k-2})} \gg \delta^{O(1)}.

From definition of {f} we then have

\displaystyle \| {\bf E}_{y \in Y} F_y({\bf x} + {\bf h}_1 + \dots + {\bf h}_{k-2}) e( \phi({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot {\bf x} ) \|_{\mathrm{CUT}({\bf x},{\bf h}_1,\dots, {\bf h}_{k-2})}

\displaystyle \gg \delta^{O(1)}.

By Lemma 5(vi), we see that the left-hand side is less than

\displaystyle \| F_{\bf y}({\bf x} + {\bf h}_1 + \dots + {\bf h}_{k-2}) e( \phi({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot {\bf x} ) \|_{\mathrm{CUT}(({\bf x}, {\bf y}),{\bf h}_1,\dots, {\bf h}_{k-2})},

where {{\bf y}} is drawn uniformly from {Y}, independently of {{\bf x}, {\bf h}_1,\dots,{\bf h}_{k-2}}. By repeated application of Lemma 5(i), (v) repeatedly, we conclude that

\displaystyle \| \Box_{{\bf h}_1, {\bf h}'_1} \dots \Box_{{\bf h}_{k-2}, {\bf h}'_{k-2}} (F_{\bf y}({\bf x} + {\bf h}_1 + \dots + {\bf h}_{k-2}) e( \phi({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot {\bf x} )) \|_{\mathrm{CUT}(({\bf x},{\bf y}); {\bf h}_1,{\bf h}'_1,\dots, {\bf h}_{k-2}, {\bf h}'_{k-2})} \gg \delta^{O(1)},

where {{\bf h}'_1,\dots,{\bf h}'_{k-2}} are independent copies of {{\bf h}_1,\dots,{\bf h}_{k-2}} that are also independent of {{\bf x}}, {{\bf y}}. By Lemma 5(ii) and Example 4 we conclude that

\displaystyle |\mathop{\bf E}( \Box_{{\bf h}_1, {\bf h}'_1} \dots \Box_{{\bf h}_{k-2}, {\bf h}'_{k-2}} (F_{\bf y}({\bf x} + {\bf h}_1 + \dots + {\bf h}_{k-2}) e( \phi({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot {\bf x} )) | {\bf h}_1,{\bf h}'_1,\dots, {\bf h}_{k-2}, {\bf h}'_{k-2}) )| \gg \delta^{O(1)} \ \ \ \ \ (9)

with probability {\gg \delta^{O(1)}}.

The left-hand side can be rewritten as

\displaystyle |\mathop{\bf E}_{x \in G, y \in Y} \Delta_{{\bf h}_1 - {\bf h}'_1} \dots \Delta_{{\bf h}_{k-2} - {\bf h}'_{k-2}} F_y( x + {\bf h}'_1 + \dots + {\bf h}'_{k-2})

\displaystyle e( \delta_{{\bf h}_1, {\bf h}'_1} \dots \delta_{{\bf h}_{k-2}, {\bf h}'_{k-2}} \phi({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot x )|

where {\delta_{{\bf h}_1, {\bf h}'_1}} is the additive version of {\Box_{{\bf h}_1, {\bf h}'_1}}, thus

\displaystyle \delta_{{\bf h}_1, {\bf h}'_1} \phi({\bf h}_1,\dots,{\bf h}_{k-2}) := \phi({\bf h}_1,\dots,{\bf h}_{k-2}) - \phi({\bf h}'_1,\dots,{\bf h}_{k-2}).

Translating {x}, we can simplify this a little to

\displaystyle |\mathop{\bf E}_{x \in G, y \in Y} \Delta_{{\bf h}_1 - {\bf h}'_1} \dots \Delta_{{\bf h}_k - {\bf h}'_k} F_y( x ) e( \delta_{{\bf h}_1, {\bf h}'_1} \dots \delta_{{\bf h}_{k-2}, {\bf h}'_{k-2}} \phi({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot x )|

If the frequency {\delta_{{\bf h}_1, {\bf h}'_1} \dots \delta_{{\bf h}_{k-2}, {\bf h}'_{k-2}} \phi({\bf h}_1,\dots,{\bf h}_{k-2})} is ever non-vanishing in the event (9) then conclusion (ii) applies. We conclude that

\displaystyle \delta_{{\bf h}_1, {\bf h}'_1} \dots \delta_{{\bf h}_{k-2}, {\bf h}'_{k-2}} \phi({\bf h}_1,\dots,{\bf h}_{k-2}) = 0

with probability {\gg \delta^{O(1)}}. In particular, by the pigeonhole principle, there exist {h'_1,\dots,h'_{k-2} \in G} such that

\displaystyle \delta_{{\bf h}_1, h'_1} \dots \delta_{{\bf h}_{k-2}, h'_{k-2}} \phi({\bf h}_1,\dots,{\bf h}_{k-2}) = 0

with probability {\gg \delta^{O(1)}}. Expanding this out, we obtain a representation of the form

\displaystyle \phi({\bf h}_1,\dots,{\bf h}_{k-2}) = \sum_{i=1}^{k-2} \phi_i({\bf h}_1,\dots,{\bf h}_{k-2})

holding with probability {\gg \delta^{O(1)}}, where the {\phi_i: G^{k-2} \rightarrow {\bf R}/{\bf Z}} are functions that do not depend on the {i^{th}} coordinate. From (8) we conclude that

\displaystyle \| \Delta_{h_1} \dots \Delta_{h_{k-2}} f({\bf x}) e( \sum_{i=1}^{k-2} \phi_i(h_1,\dots,h_{k-2}) \cdot {\bf x} ) \|_{\mathrm{CUT}({\bf x})} \gg \delta^{O(1)}

for {\gg \delta^{O(1)}} of the tuples {(h_1,\dots,h_{k-2}) \in G^{k-2}}. Thus by Lemma 5(ii)

\displaystyle \| \Delta_{{\bf h}_1} \dots \Delta_{{\bf h}_{k-2}} f({\bf x}) e( \sum_{i=1}^{k-2} \phi_i({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot {\bf x} ) \|_{\mathrm{CUT}({\bf x}; {\bf h}_1,\dots,{\bf h}_{k-2})} \gg \delta^{O(1)}.

By repeated application of Lemma 5(iii) we then have

\displaystyle \| \Delta_{{\bf h}_1} \dots \Delta_{{\bf h}_{k-2}} f({\bf x}) e( \sum_{i=1}^{k-2} \phi_i({\bf h}_1,\dots,{\bf h}_{k-2}) \cdot {\bf x} ) \|_{\mathrm{CUT}({\bf x}, {\bf h}_1,\dots,{\bf h}_{k-2})} \gg \delta^{O(1)}

and then by repeated application of Lemma 5(iv)

\displaystyle \| f({\bf x} + {\bf h}_1 + \dots + {\bf h}_{k-2}) \|_{\mathrm{CUT}({\bf x}, {\bf h}_1,\dots,{\bf h}_{k-2})} \gg \delta^{O(1)}

and then the conclusion (i) follows from Lemma 6. \Box

As an application of degree lowering, we give an inverse theorem for the average in Proposition 1(ii), first established by Bourgain-Chang and later reproved by Peluse (by different methods from those given here):

Proposition 9 Let {G = {\bf Z}/N{\bf Z}} be a cyclic group of prime order. Suppose that one has {1}-bounded functions {f_1,f_2,f_3: G \rightarrow {\bf C}} such that

\displaystyle |\mathop{\bf E}_{x, h \in G} f_1(x) f_2(x+h) f_3(x+h^2)| \geq \delta \ \ \ \ \ (10)

for some {\delta > 0}. Then either {N \ll \delta^{-O(1)}}, or one has

\displaystyle |\mathop{\bf E}_{x \in G} f_1(x)|, |\mathop{\bf E}_{x \in G} f_2(x)| \gg \delta^{O(1)}.

We remark that a modification of the arguments below also give {|\mathop{\bf E}_{x \in G} f_3(x)| \gg \delta^{O(1)}}.

Proof: The left-hand side of (10) can be written as

\displaystyle |\mathop{\bf E}_{x \in G} F(x) f_3(x)|

where {F} is the dual function

\displaystyle F(x) := \mathop{\bf E}_{h \in G} f_1(x-h^2) f_2(x-h^2+h).

By Cauchy-Schwarz one thus has

\displaystyle |\mathop{\bf E}_{x \in G} F(x) \overline{F}(x)| \geq \delta^2

and hence by Proposition 1, we either have {N \ll \delta^{-O(1)}} (in which case we are done) or

\displaystyle \|F\|_{U^4(G)} \gg \delta^2.

Writing {F = \mathop{\bf E}_{h \in G} F_h} with {F_h(x) := f_1(x-h^2) f_2(x-h^2+h)}, we conclude that either {\|F\|_{U^3(G)} \gg \delta^{O(1)}}, or that

\displaystyle |\mathop{\bf E}_{x,h \in G} \Delta_{h_1} \Delta_{h_2} F_h(x) e(\xi x / N )| \gg \delta^{O(1)}

for some {h_1,h_2 \in G} and non-zero {\xi \in G}. The left-hand side can be rewritten as

\displaystyle |\mathop{\bf E}_{x,h \in G} g_1(x-h^2) g_2(x-h^2+h) e(\xi x/N)|

where {g_1 = \Delta_{h_1} \Delta_{h_2} f_1} and {g_2 = \Delta_{h_1} \Delta_{h_2} f_2}. We can rewrite this in turn as

\displaystyle |\mathop{\bf E}_{x,y \in G} g_1(x) g_2(y) e(\xi (x + (y-x)^2) / N)|

which is bounded by

\displaystyle \| e(\xi({\bf x} + ({\bf y}-{\bf x})^2)/N) \|_{\mathrm{CUT}({\bf x}, {\bf y})}

where {{\bf x}, {\bf y}} are independent random variables drawn uniformly from {G}. Applying Lemma 5(v), we conclude that

\displaystyle \| \Box_{{\bf y}, {\bf y}'} e(\xi({\bf x} + ({\bf y}-{\bf x})^2)/N) \|_{\mathrm{CUT}({\bf x}; {\bf y}, {\bf y}')} \gg \delta^{O(1)}.

However, a routine Gauss sum calculation reveals that the left-hand side is {O(N^{-c})} for some absolute constant {c>0} because {\xi} is non-zero, so that {N \ll \delta^{-O(1)}}. The only remaining case to consider is when

\displaystyle \|F\|_{U^3(G)} \gg \delta^{O(1)}.

Repeating the above arguments we then conclude that

\displaystyle \|F\|_{U^2(G)} \gg \delta^{O(1)},

and then

\displaystyle \|F\|_{U^1(G)} \gg \delta^{O(1)}.

The left-hand side can be computed to equal {|\mathop{\bf E}_{x \in G} f_1(x)| |\mathop{\bf E}_{x \in G} f_2(x)|}, and the claim follows. \Box

This argument was given for the cyclic group setting, but the argument can also be applied to the integers (see Peluse-Prendiville) and can also be used to establish an analogue over the reals (that was first obtained by Bourgain).