In the previous lecture notes, we used (linear) Fourier analysis to control the number of three-term arithmetic progressions {a, a+r, a+2r} in a given set {A}. The power of the Fourier transform for this problem ultimately stemmed from the identity

\displaystyle \mathop{\bf E}_{n,r \in {\bf Z}/N'{\bf Z}} 1_A(n) 1_A(n+r) 1_A(n+2r)

\displaystyle = \sum_{\alpha \in \frac{1}{N'}{\bf Z} / {\bf Z}} \hat 1_A(\alpha) \hat 1_A(-2\alpha) \hat 1_A(\alpha) \ \ \ \ \ (1)

for any cyclic group {{\bf Z}/N'{\bf Z}} and any subset {A} of that group (analogues of this identity also exist for other finite abelian groups, and to a lesser extent to non-abelian groups also, although that is not the focus of my current discussion). As it turns out, linear Fourier analysis is not able to discern higher order patterns, such as arithmetic progressions of length four; we give some demonstrations of this below the fold, taking advantage of the polynomial recurrence theory from Notes 1.

The main objective of this course is to introduce the (still nascent) theory of higher order Fourier analysis, which is capable of studying higher order patterns. The full theory is still rather complicated (at least, at our present level of understanding). However, one aspect of the theory is relatively simple, namely that we can largely reduce the study of arbitrary additive patterns to the study of a single type of additive pattern, namely the parallelopipeds

\displaystyle ( x + \omega_1 h_1 + \ldots + \omega_d h_d )_{\omega_1,\ldots,\omega_d \in \{0,1\}}. \ \ \ \ \ (2)

Thus for instance, for {d=1} one has the line segments

\displaystyle x, x+h_1 \ \ \ \ \ (3)

for {d=2} one has the parallelograms

\displaystyle x, x+h_1, x+h_2, x+h_1+h_2, \ \ \ \ \ (4)

for {d=3} one has the parallelopipeds

\displaystyle x, x+h_1, x+h_2, x+h_3, x+h_1+h_2, x+h_1+h_3, x+h_2+h_3, x+h_1+h_2+h_3. \ \ \ \ \ (5)

These patterns are particularly pleasant to handle, thanks to the large number of symmetries available on the discrete cube {\{0,1\}^d}. For instance, whereas establishing the presence of arbitrarily long arithmetic progressions in dense sets is quite difficult (Szemerédi’s theorem), establishing arbitrarily high-dimensional parallelopipeds is much easier:

Exercise 1 Let {A \subset [N]} be such that {|A| > \delta N} for some {0 < \delta \leq 1}. If {N} is sufficiently large depending on {\delta}, show that there exists an integer {1 \leq h \ll 1/\delta} such that {|A \cap (A+h)| \gg \delta^2 N}. (Hint: obtain upper and lower bounds on the set {\{ (x,y) \in A \times A: x < y \leq x + 10/\delta \}}.)

Exercise 2 (Hilbert cube lemma) Let {A \subset [N]} be such that {|A| > \delta N} for some {0 < \delta \leq 1}, and let {d \geq 1} be an integer. Show that if {N} is sufficiently large depending on {\delta,d}, then {A} contains a parallelopiped of the form (2), with {1 \leq h_1,\ldots,h_d \ll_\delta 1} positive integers. (Hint: use the previous exercise and induction.) Conclude that if {A \subset {\bf Z}} has positive upper density, then it contains infinitely many such parallelopipeds for each {d}.

Exercise 3 Show that if {q \geq 1} is an integer, and {d} is sufficiently large depending on {q}, then for any parallelopiped (2) in the integers {{\bf Z}}, there exists {\omega_1,\ldots,\omega_d \in \{0,1\}}, not all zero, such that {x + h_1 \omega_1 + \ldots + h_d \omega_d = x \hbox{ mod } q}. (Hint: pigeonhole the {h_i} in the residue classes modulo {q}.) Use this to conclude that if {A} is the set of all integers {n} such that {|n-km!| \geq m} for all integers {k, m \geq 1}, then {A} is a set of positive upper density (and also positive lower density) which does not contain any infinite parallelopipeds (thus one cannot take {d=\infty} in the Hilbert cube lemma).

The standard way to control the parallelogram patterns (and thus, all other (finite complexity) linear patterns) are the Gowers uniformity norms

\displaystyle \| f\|_{U^d(G)} := {\bf E}_{x,h_1,\ldots,h_d \in G} \prod_{\omega_1,\ldots,\omega_d \in \{0,1\}^d} {\mathcal C}^{\omega_1+\ldots+\omega_d} f(x+\omega_1 h_1 + \ldots + \omega_d h_d) \ \ \ \ \ (6)

with {f: G \rightarrow {\bf C}} a function on a finite abelian group {G}, and {{\mathcal C}: z \mapsto \overline{z}} is the complex conjugation operator; analogues of this norm also exist for group-like objects such as the progression {[N]}, and also for measure-preserving systems (where they are known as the Gowers-Host-Kra uniformity seminorms, see this paper of Host-Kra for more discussion). In this set of notes we will focus on the basic properties of these norms; the deepest fact about them, known as the inverse conjecture for these norms, will be discussed in later notes.

— 1. Linear Fourier analysis does not control length four progressions —

Let {A \subset {\bf Z}/N{\bf Z}} be a subset of a cyclic group {{\bf Z}/N{\bf Z}} with density {|A| = \delta N}; we think of {0 < \delta \leq 1} as being fixed, and {N} as being very large or goingn off to infinity.

For each {k \geq 1}, consider the number

\displaystyle \{ (n,r) \in {\bf Z}/N{\bf Z} \times {\bf Z}/N{\bf Z}: n, n+r, \ldots, n+(k-1)r \in A \} \ \ \ \ \ (7)

of {k}-term arithmetic progressions in {A} (including degenerate progressions). Heuristically, this expression should typically be close to {\delta^k N^2}. Since there are {N^2} pairs {(n,r)} and we would expect each pair to have a {\delta^k} “probability” that {n, n+r, \ldots, n+(k-1)r} simultaneously lie in {A}. Indeed, using standard probabilistic tools such as Chernoff’s inequality, it is not difficult to justify this heuristic with probability asymptotically close to {1} in the case that {A} is a randomly chosen set of the given density.

Let’s see how this heuristic holds up for small values of {k}. For {k=0,1,2}, this prediction is exactly accurate (with no error term) for any set {A} with cardinality {\delta N}; no randomness hypothesis of any sort is required. For {k=3}, we see from (1) and the observaation that {\hat 1_A(0)=\delta} that (7) is given by the formula

\displaystyle N^2 (\delta^3 + \sum_{\xi \in {\bf Z}/N{\bf Z}: \xi \neq 0} \hat 1_A(\xi)^2 \hat 1_A(-2\xi) ).

Let us informally say that {A} is Fourier-pseudorandom if one has

\displaystyle \sup_{\xi \in {\bf Z}/N{\bf Z}: \xi \neq 0} |\hat 1_A(\xi)| = o(1)

where {o(1)} is a quantity that goes to zero as {N \rightarrow \infty}. Then from applying Plancherel’s formula and Cauchy-Schwarz as in the previous lecture notes, we see that the number of three-term arithmetic progressions is

\displaystyle N^2 (\delta^3+o(1)).

Thus we see that the Fourier-pseudorandomness hypothesis allows us to count three-term arithmetic progressions almost exactly.

On the other hand, without the Fourier-pseudorandomness hypothesis, the count (7) can be significantly different from {\delta^3 N^2}. For instance, if {A} is an interval {A = [\delta N]}, then it is not hard to see that (7) is comparable to {\delta^2 N^2} rather than {\delta^3 N^2}; the point is that with a set as structured as an interval, once {n} and {n+r} lie in {A}, there is already a very strong chance that {n+2r} lies in {A} also. In the other direction, a construction of Behrend (mentioned in the previous notes) shows the quantity (7) can in fact dip below {\delta^C N^2} for any fixed {C} (and in fact one can be as small as {\delta^{c\log \frac{1}{\delta}} N^2} for some absolute constant {c>0}).

Now we consider the {k=4} case of (7), which counts four-term progressions. Here, it turns out that Fourier-pseudorandomness is insufficient; it is possible for the quantity (7) to be significantly larger or smaller than {\delta^4 N^2} even if {A} is pseudorandom, as was observed by Gowers (with a closely related observation in the context of ergodic theory by Furstenberg).

Exercise 4 Let {\alpha} be an irrational real number, let {0 < \delta < 1}, and let {A := \{ n \in [N]: 0 \leq \{ \alpha n^2 \} \leq \delta \}}. Show that {A} is Fourier-pseudorandom (keeping {\alpha} and {\delta} fixed and letting {N \rightarrow \infty}). Hint: One can use Exercise 21 from Notes 1 to show that sums of the form {\mathop{\bf E}_{n \in [N]} e(k \alpha n^2) e(\xi n)} cannot be large.

Exercise 5 Continuing the previous exercise, show that the expression (7) for {k=4} is equal to {(c \delta^3 + o(1)) N^2} as {N \rightarrow \infty}, for some absolute constant {c>0}, if {\delta > 0} is sufficiently small. (Hint: first show, using the machinery in Notes 1, that the two-dimensional sequence {(n,r) \mapsto (\alpha n^2, \alpha (n+r)^2, \alpha(n+2r)^2, \alpha(n+3r)^2) \hbox{ mod } {\bf Z}^4} is asymptotically equidistributed in the torus {\{ (x_1,x_2,x_3,x_4) \in {\bf T}^4: x_1-3x_2+3x_3-x_4=0 \}}.)

The above exercises show that a Fourier-pseudorandom set can have a four-term progression count (7) significantly larger than {\delta^4 N}. One can also make the count significantly smaller than {\delta^4 N} (another observation of Gowers), but this requires more work.

Exercise 6 Let {0 < \delta < 1}. Show that there exists a function {f: {\bf T}^2 \rightarrow [0,1]} with {\int_{\bf T} f(x,y)\ dy = \delta} for all {x \in {\bf T}}, such that the expression

\displaystyle \int_V f(x_1,y_1) \ldots f(x_4,y_4) \ \ \ \ \ (8)

is strictly less than {\delta^4}, where {V \leq ({\bf T}^2)^4} is the subspace of quadruplets {((x_1,y_1),\ldots,(x_4,y_4))} such that {x_1,\ldots,x_4} is in arithmetic progression (i.e. {x_i = x + i r} for some {x,r \in {\bf T}}) and the {y_1,\ldots,y_4} obey the constraint

\displaystyle y_1 - 3y_2 + 3y_3 - y_4 = 0.

Hint: Take {f} of the form

\displaystyle f(x,y) := \delta + \epsilon ( f_1(x) \cos(2\pi y) + f_3(x) \cos(6\pi y) )

where {\epsilon > 0} is a small number, and {f_1, f_3} are carefully chosen to make the {\epsilon^2} term in (8) negative.

Exercise 7 Show that there exists an absolute constant {c>0} such that for all sufficiently small {\delta > 0} and sufficiently large {N} (depending on {\delta}) and a set {A \subset [N]} with {|A| \geq \delta N}, such that (7) with {k=4} is less than {\delta^{4+c} N^2}. (Hint: take {\delta \sim 2^{-m}} for some {m \geq 1}, and let {A} be a random subset of {[N]} with each element {n} of {[N]} lying in {A} with an independent probability of

\displaystyle \prod_{j=1}^m f( \alpha_j n \hbox{ mod } 1, \alpha_j n^2 \hbox{ mod } 1 ),

where {f} is the function in the previous exercise (with {\delta=1/2}), and {\alpha_1,\ldots,\alpha_m} are real numbers which are linearly independent over {{\bf Z}} modulo {1}.)

For further discussion of this topic, see these slides of Wolf.

— 2. The {100\%} case —

Now we consider the question of counting more general linear (or affine) patterns than arithmetic progressions. A reasonably general setting is to count patterns of the form

\displaystyle \Psi(\vec x) := (\psi_1(\vec x), \ldots, \psi_t(\vec x))

in a subset {A} of a finite abelian group {G} (e.g. a cyclic group {G={\bf Z}/N{\bf Z}}), where {\vec x = (x_1,\ldots,x_d) \in G^d}, and the {\psi_1,\ldots,\psi_t: G^d \rightarrow G} are affine-linear forms

\displaystyle \psi_i(x_1,\ldots,x_d) = c_i + \sum_{j=1}^d c_{i,j} x_j

for some fixed integers {c_{i,j} \in {\bf Z}} and group elements {c_{i} \in G}. To avoid degeneracies, we will assume that all the {\psi_i} are surjective (or equivalently, that the {c_{i,1},\ldots,c_{i,d}} do not have a common factor that divides the order of {G}). This count would then be given by

\displaystyle |G|^d \Lambda_\Psi(1_A,\ldots,1_A)

where {\Lambda_\Psi} is the {d}-linear form

\displaystyle \Lambda_\Psi(f_1,\ldots,f_d) := \mathop{\bf E}_{\vec x \in G^d} f_1(\psi_1(\vec x)) \ldots f_t(\psi_t(\vec x)).

For instance, the task of counting arithmetic progressions {n,n+r,\ldots,n+(k-1)r} corresponds to the case {d=2, t=k}, and {\psi_i(x_1,x_2) := x_1 +(i-1) x_2}.

We have the trivial bound

\displaystyle |\Lambda_\Psi(f_1,\ldots,f_t)| \leq \| f_1\|_{L^\infty(G)} \ldots \|f_t\|_{L^\infty(G)} \ \ \ \ \ (9)

where

\displaystyle \|f\|_{L^\infty(G)} := \sup_{x \in G} |f(x)|.

Remark 1 One can replace the {L^\infty} norm on {f_i} in (9) with an {L^{p_i}} norm for various values of {p_1,\ldots,p_t}. The set of all admissible {p_1,\ldots,p_t} is described by the Brascamp-Lieb inequality, see this paper for further discussion. We will not need these variants of (9).

Improving this trivial bound turns out to be a key step in the theory of counting general linear patterns. In particular, it turns out that for any {\epsilon > 0}, one usually has

\displaystyle |\Lambda_\Psi(f_1,\ldots,f_t)| < \epsilon \| f_1\|_{L^\infty(G)} \ldots \|f_t\|_{L^\infty(G)}

except when {f_1,\ldots,f_t} take a very special form (or at least correlate with functions of a very special form, such as linear or higher order characters).

To reiterate: the key to the subject is to understand the inverse problem of characterising those functions {f_1,\ldots,f_d} for which one has

\displaystyle |\Lambda_\Psi(f_1,\ldots,f_t)| \geq \epsilon \| f_1\|_{L^\infty(G)} \ldots \|f_t\|_{L^\infty(G)}.

This problem is of most interest (and the most difficult) in the “{1\%} world” when {\epsilon} is small (e.g. {\epsilon = 0.01}), but it is also instructive to consider the simpler cases of the “{99\%} world” when {\epsilon} is very close to one (e.g. {\epsilon = 0.99}), or the “{100\%} world” when {\epsilon} is exactly equal to one. In these model cases one can use additional techniques (error-correction and similar techniques (often of a theoretical computer science flavour) in the {99\%} world, or exact algebraic manipulation in the {100\%} world) to understand this expression.

Let us thus begin with analysing the {100\%} situation. Specifically, we assume that we are given functions {f_1,\ldots,f_t \in L^\infty(G)} with

\displaystyle |\Lambda_\Psi(f_1,\ldots,f_t)| = \| f_1\|_{L^\infty(G)} \ldots \|f_t\|_{L^\infty(G)}

and wish to classify the functions {f_1,\ldots,f_t} as best we can. We will normalise all the norms on the right-hand side to be one, thus {|f_i(x)| \leq 1} for all {x \in G} and {i=1,\ldots,t}, and

\displaystyle |\Lambda_\Psi(f_1,\ldots,f_t)| = 1. \ \ \ \ \ (10)

By the triangle inequality, we conclude that

\displaystyle \Lambda_\Psi(|f_1|,\ldots,|f_t|) \geq 1.

On the other hand, we have the crude bound

\displaystyle \Lambda_\Psi(|f_1|,\ldots,|f_t|) \leq 1.

Thus equality occurs, which (by the surjectivity hypothesis on all the {\psi_i}) shows that {|f_i(x)|=1} for all {x \in G} and {i=1,\ldots,t}. Thus we may write {f_i(x) = e(\phi_i(x))} for some phase functions {\phi_i: G \rightarrow {\bf R}/{\bf Z}}. We then have

\displaystyle \Lambda_\Psi(f_1,\ldots,f_t) = \mathop{\bf E}_{\vec x \in G^d} e( \sum_{i=1}^t \phi_i(\psi_i(\vec x)) )

and so from (10) one has the equation

\displaystyle \sum_{i=1}^t \phi_i(\psi_i(\vec x)) = c \ \ \ \ \ (11)

for all {\vec x \in G^d} and some constant {c}.

So the problem now reduces to the algebraic problem of solving functional equations such as (11). To illustrate this type of problem, let us consider a simple case when {d=2, t=3} and

\displaystyle \psi_1(x,y) = x; \psi_2(x,y) = y; \psi_3(x,y) = x+y

in which case we are trying to understand solutions {\phi_1, \phi_2, \phi_3: G \rightarrow {\bf R}/{\bf Z}} to the functional equation

\displaystyle \phi_1(x) + \phi_2(y) + \phi_3(x+y) = c. \ \ \ \ \ (12)

This equation involves three unknown functions {\phi_1,\phi_2,\phi_3}. But we can eliminate two of the functions by taking discrete derivatives. To motivate this idea, let us temporarily assume that {G} is the real line {{\bf R}} rather than a finite group, and that the functions {\phi_1, \phi_2, \phi_3} are smooth. If we then apply the partial derivative operator {\partial_x} to the above functional equation, one eliminates {\phi_2} and obtains

\displaystyle \phi'_1(x) + \phi'_3(x+y) = 0;

applying {\partial_y} then eliminates {\phi_1} and leaves us with

\displaystyle \phi''_3(x+y) = 0,

thus {\phi''_3} vanishes identically; we can integrate this twice to conclude that {\phi_3} is a linear function of its input,

\displaystyle \phi_3(x) = a_3 x + b_3

for some constants {a_3, b_3 \in {\bf R}}. A similar argument (using the partial derivative operator {\partial_x - \partial_y} to eliminate {\phi_3}, or by applying change of variables such as {(x,z) := (x,x+y)}) shows that {\phi_1(x) = a_1 x + b_1} and {\phi_2(x) = a_2 x + b_2} for some additional constants {a_1,b_1,a_2,b_2}. Finally, by returning to (12) and comparing coefficients we obtain the additional compatibility condition {a_3 = -a_1 = -a_2}, which one then easily verifies to completely describe all possible solutions to this equation in the case of smooth functions on {{\bf R}}.

Returning now to the discrete world, we mimic the continuous operation of a partial derivative by introducing difference operators

\displaystyle \partial_h \phi(x) := \phi(x+h) - \phi(x)

for {h \in G}. If we difference (12) in the {x} variable by an arbitrary shift {h \in G} by replacing {x} by {x+h} and then subtracting, we eliminate {\phi_2} and obtain

\displaystyle (\partial_h \phi_1)(x) + (\partial_h \phi_3)(x+y) = 0;

if we then difference in the {y} variable by a second arbitrary shift {k \in G}, one obtains

\displaystyle (\partial_k \partial_h \phi_3)(x+y)=0

for all {x,y,h,k \in G}; in particular, {\partial_k \partial_h \phi_3 \equiv 0} for all {k, h \in G}. Such functions are affine-linear:

Exercise 8 Let {\phi: G \rightarrow {\bf R}/{\bf Z}} be a function. Show that {\partial_k \partial_h \phi = 0} if and only if one has {\phi(x) = a(x) + b} for some {b \in G} and some homomorphism {a: G \rightarrow {\bf R}/{\bf Z}}. Conclude that the solutions to (12) are given by the form {\phi_i(x) = a_i(x) + b_i}, where {b_1,b_2,b_3 \in G} and {a_1,a_2,a_3: G \rightarrow {\bf R}/{\bf Z}} are homomorphisms with {a_1 = a_2 = - a_3}.

Having solved the functional equation (12), let us now look at an equation related to four term arithmetic progressions, namely

\displaystyle \phi_1(x) + \phi_2(x+y) + \phi_3(x+2y) + \phi_4(x+3y) = c \ \ \ \ \ (13)

for all {x,y \in G}, some constant {c \in {\bf R}/{\bf Z}}, and some functions {\phi_1,\phi_2,\phi_3,\phi_4: G \rightarrow {\bf R}/{\bf Z}}. We will try to isolate {\phi_4} by using discrete derivatives as before to eliminate the other functions. Firstly, we differentiate in the {y} direction by an arbitrary shift {h \in G}, leading to

\displaystyle (\partial_h \phi_2)(x+y) + (\partial_{2h} \phi_3)(x+2y) + (\partial_{3h} \phi_4)(x+3y) = 0.

In preparation for then eliminating {\phi_2}, we shift {x} backwards by {y}, obtaining

\displaystyle (\partial_h \phi_2)(x) + (\partial_{2h} \phi_3)(x+y) + (\partial_{3h} \phi_4)(x+2y) = 0.

Differentiating in the {y} direction by another arbitrary shift {k \in G}, we obtain

\displaystyle (\partial_k \partial_{2h} \phi_3)(x+y) + (\partial_{2k} \partial_{3h} \phi_4)(x+2y) = 0.

We shift {x} backwards by {y} again:

\displaystyle (\partial_k \partial_{2h} \phi_3)(x) + (\partial_{2k} \partial_{3h} \phi_4)(x+y) = 0.

One final differentiation in {y} by an arbitrary shift {l \in G} gives

\displaystyle (\partial_l \partial_{2k} \partial_{3h} \phi_4)(x+y) = 0.

For simplicity, we now make the assumption that the order {|G|} of {G} is not divisible by either {2} or {3}, so that the homomorphisms {k \mapsto 2k} and {h \mapsto 3h} are automorphisms of {G}. We conclude that

\displaystyle \partial_l \partial_k \partial_h \phi_4 \equiv 0 \ \ \ \ \ (14)

for all {l,k,h}. Such functions will be called quadratic functions from {G} to {{\bf R}/{\bf Z}}, thus {\phi_4} is quadratic. A similar argument shows that {\phi_1,\phi_2,\phi_3} are quadratic.

Just as (affine-)linear functions can be completely described in terms of homomorphisms, quadratic functions can be described in terms of bilinear forms, as long as one avoids the characteristic {2} case:

Exercise 9 Let {G} be a finite abelian group with {|G|} not divisible by {2}. Show that a map {\phi: G \rightarrow {\bf R}/{\bf Z}} is quadratic if and only one has a representation of the form

\displaystyle \phi(x) = B(x,x) + L(x) + c

where {c \in {\bf R}/{\bf Z}}, {L: G \rightarrow {\bf R}/{\bf Z}} is a homomorphism, and {B: G \times G \rightarrow {\bf R}/{\bf Z}} is a symmetric bihomomorphism (i.e. {B(x,y) = B(y,x)}, and {B} is a homomorphism in each of {x,y} individually (holding the other variable fixed)). (Hint: Heuristically, one should set {B(h,k) := \frac{1}{2} \partial_h \partial_k \phi(x)}, but there is a difficulty because the operation of dividing by {\frac{1}{2}} is not well-defined on {{\bf R}/{\bf Z}}. It is, however, well-defined on {|G|^{th}} roots of unity, thanks to {|G|} not being divisible by two. Once {B} has been constructed, subtract it off and use Exercise 8.) What goes wrong when {|G|} is divisible by {2}?

Exercise 10 Show that when {|G|} is not divisible by {2, 3}, that the complete solution to (13) is given by

\displaystyle \phi_i(x) = B_i(x,x) + L_i(x) + c_i

for {i=1,2,3,4}, {c_i \in {\bf R}/{\bf Z}}, homomorphisms {L_i: G \rightarrow{\bf R}/{\bf Z}}, and symmetric bihomomorphisms {B_i: G \times G \rightarrow {\bf R}/{\bf Z}} with {B_2 = -3B_1, B_3 = 3B_1, B_4 = - B_1} and {L_1 + L_2 + L_3 + L_4 = L_2 + 2L_3 + 3L_4 = 0}.

Exercise 11 Obtain a complete solution to the functional equation (13) in the case when {|G|} is allowed to be divisible by {2} or {3}. (This is an open-ended and surprisingly tricky exercise; it of course depends on what one is willing to call a “solution” to the problem. Use your own judgement.)

Exercise 12 Call a map {\phi: G \rightarrow {\bf R}/{\bf Z}} a polynomial of degree {\leq d} if one has {\partial_{h_1} \ldots \partial_{h_{d+1}} \phi(x) = 0} for all {x,h_1,\ldots,h_{d+1} \in G}. Show that if {k \geq 1} and {\phi_1,\ldots,\phi_k} obey the functional equation

\displaystyle \phi_1(x) + \phi_2(x+y) + \ldots + \phi_k(x+(k-1)y) = c

and {|G|} is not divisible by any integer between {2} and {k-1}, then {\phi_1,\ldots,\phi_k} are polynomials of degree {\leq k-2}.

We are now ready to turn to the general case of solving equations of the form (11). We relied on two main tricks to solve these equations: differentiation, and change of variables. When solving an equation such as (13), we alternated these two tricks in turn. To handle the general case, it is more convenient to rearrange the argument by doing all the change of variables in advance. For instance, another way to solve (13) is to first make the (non-injective) change of variables

\displaystyle (x,y) := (b+2c+3d, -a-b-c-d)

for arbitrary {a,b,c,d \in G}, so that

\displaystyle (x,x+y,x+2y,x+3y) = (b+2c+3d, -a+c+2d, -2a-b+d, -3a-2b-c)

and (13) becomes

\displaystyle \phi_1(b+2c+3d) + \phi_2(-a+c+2d) + \phi_3(-2a-b+d) + \phi_4(-3a-2b-c) = const \ \ \ \ \ (15)

for all {a,b,c,d \in G}. The point of performing this change of variables is that while the {\phi_4} term (for instance) involves all the three variables {a,b,c}, the remaining terms only depend on two of the {a,b,c} at a time. If we now pick {h,k,l \in G} arbitrarily, and then differentiate in the {a,b,c} variables by the shifts {h,k,l} respectively, then we eliminate the {\phi_1, \phi_2, \phi_3} terms and arrive at

\displaystyle (\partial_{-l} \partial_{-2k} \partial_{-3h} \phi_4)(-3a-2b-c) = 0

which soon places us back at (14) (assuming as before that {|G|} is not divisible by {2} or {3}).

Now we can do the general case, once we put in place a definition (from this paper of mine with Ben Green):

Definition 1 (Cauchy-Schwarz complexity) A system {\psi_1,\ldots,\psi_t: G^d \rightarrow G} of affine-linear forms (with linear coefficients in {{\bf Z}}) have Cauchy-Schwarz complexity at most {s} if, for every {1 \leq i \leq t}, one can partition {[t] \backslash \{i\}} into {s+1} classes (some of which may be empty), such that {\psi_i} does not lie in the affine-linear span (over {{\bf Q}}) of the forms in any of these classes. The Cauchy-Schwarz complexity of a system is defined to be the least such {s} with this property, or {\infty} if no such {s} exists.

The adjective “Cauchy-Schwarz” (introduced by Gowers and Wolf) may be puzzling at present, but will be motivated later.

This is a somewhat strange definition to come to grips with at first, so we illustrate it with some examples. The system of forms {x,y,x+y} is of complexity {1}; given any form here, such as {y}, one can partition the remaining forms into two classes, namely {\{x\}} and {\{x+y\}}, such that {y} is not in the affine-linear span of either. On the other hand, as {y} is in the affine linear span of {\{x,x+y\}}, the Cauchy-Schwarz complexity is not zero.

Exercise 13 Show that for any {k \geq 2}, the system of forms {x, x+y, \ldots, x+(k-1)y} has complexity {k-2}.

Exercise 14 Show that a system of non-constant forms has finite Cauchy-Schwarz complexity if and only if no form is an affine-linear combination of another.

There is an equivalent way to formulate the notion of Cauchy-Schwarz complexity, in the spirit of the change of variables mentioned earlier. Define the characteristic of a finite abelian group {G} to be the least order of a non-identity element.

Proposition 2 (Equivalent formulation of Cauchy-Schwarz complexity) Let {\psi_1,\ldots,\psi_t: G^d \rightarrow G} be a system of affine-linear forms. Suppose that the characteristic of {G} is sufficiently large depending on the coefficients of {\psi_1,\ldots,\psi_t}. Then {\psi_1,\ldots,\psi_t} has Cauchy-Schwarz complexity at most {s} if and only if, for each {1 \leq i \leq t}, one can find a linear change of variables {\vec x = L_i( y_1,\ldots,y_{s+1}, z_1,\ldots,z_m )} over {{\bf Q}} such that the form {\dot \psi_i(L_i(y_1,\ldots,y_{s+1},z_1,\ldots,z_m))} has non-zero {y_1,\ldots,y_{s+1}} coefficients, but all the other forms {\dot \psi_j(L_i(y_1,\ldots,y_{s+1},z_1,\ldots,z_m))} with {j \neq i} have at least one vanishing {y_1,\ldots,y_{s+1}} coefficient, and {\dot \psi_i: {\bf Q}^d \rightarrow {\bf Q}} is the linear form induced by the integer coefficients of {\psi_i}.

Proof: To show the “only if” part, observe that if {1 \leq i \leq t} and {L_i} is as above, then we can partition the {\psi_j}, {j \neq i} into {s+1} classes depending on which {y_k} coefficient vanishes for {k=1,\ldots,s+1} (breaking ties arbitrarily), and then {\psi_i} is not representable as an affine-linear combination of the forms from any of these classes (here we use the large characteristic hypothesis). Conversely, suppose {\psi_1,\ldots,\psi_t} has Cauchy-Schwarz complexity at most {s}, and let {1 \leq i \leq s}. We can then partition the {j \neq i} into {s+1} classes {{\mathcal A}_1,\ldots,{\mathcal A}_{s+1}}, such that {\psi_i} cannot be expressed as an affine-linear combination of the {\psi_j} from {{\mathcal A}_k} for any {1 \leq k \leq s+1}. By duality, one can then find vectors {v_k \in {\bf Q}^d} for each {1 \leq k \leq s+1} such that {\dot \psi_i} does not annihilate {v_k}, but all the {\dot \psi_j} from {{\mathcal A}_k} do. If we then set

\displaystyle L_i(y_1,\ldots,y_{s+1},z_1,\ldots,z_d) := (z_1,\ldots,z_d) + y_1 v_1 + \ldots + y_{s+1} v_{s+1}

then we obtain the claim. \Box

Exercise 15 Let {\psi_1,\ldots,\psi_t: G^d \rightarrow G} be a system of affine-linear forms with Cauchy-Schwarz complexity at most {s}, and suppose that the equation (11) holds for some finite abelian group {G} and some {\phi_1,\ldots,\phi_t: G \rightarrow {\bf R}/{\bf Z}}. Suppose also that the characteristic of {G} is sufficiently large depending on the coefficients of {\psi_1,\ldots,\psi_t}. Conclude that all of the {\phi_1,\ldots,\phi_t} are polynomials of degree {\leq t}.

It turns out that this result is not quite best possible. Define the true complexity of a system of affine-linear forms {\psi_1,\ldots,\psi_t: G^d \rightarrow G} to be the largest {s} such that the powers {\dot \psi_1^s, \ldots, \dot\psi_t^s:{\bf Q}^d \rightarrow {\bf Q}} are linearly independent over {{\bf Q}}.

Exercise 16 Show that the true complexity is always less than or equal to the Cauchy-Schwarz complexity, and give an example to show that strict inequality can occur. Also, show that the true complexity is finite if and only if the Cauchy-Schwarz complexity is finite.

Exercise 17 Show that Exercise 15 continues to hold if Cauchy-Schwarz complexity is replaced by true complexity. (Hint: first understand the cyclic case {G = {\bf Z}/N{\bf Z}}, and use Exercise 15 to reduce to the case when all the {\phi_i} are polynomials of bounded degree. The main point is to use a “Lefschetz principle” to lift statements in {{\bf Z}/N{\bf Z}} to a characteristic zero field such as {{\bf Q}}.) Show that the true complexity cannot be replaced by any smaller quantity.

See this paper of Gowers and Wolf for further discussion of the relationship between Cauchy-Schwarz complexity and true complexity.

— 3. The Gowers uniformity norms —

In the previous section, we saw that equality in the trivial inequality (9) only occurred when the functions {f_1,\ldots,f_t} were of the form {f_i = e(\phi_i)} for some polynomials {\phi_i} of degree at most {s}, where {s} was the true complexity (or Cauchy-Schwarz complexity) of the system {\psi_1,\ldots,\psi_t}. Another way of phrasing this latter fact is that one has the identity

\displaystyle \Delta_{h_1} \ldots \Delta_{h_{s+1}} f_i(x) = 1

for all {h_1,\ldots,h_{s+1},x\in G}, where {\Delta_h} is the multiplicative derivative

\displaystyle \Delta_h f(x) := f(x+h) \overline{f(x)}.

This phenomenon extends beyond the “{100\%} world” of exact equalities. For any {f: G \rightarrow {\bf C}} and {d \geq 1}, we define the Gowers norm {\|f\|_{U^d(G)}} by the formula

\displaystyle \|f\|_{U^d(G)} := (\mathop{\bf E}_{h_1,\ldots,h_d,x \in G} \Delta_{h_1} \ldots \Delta_{h_d} f(x))^{1/2^d}; \ \ \ \ \ (16)

note that this is equivalent to (6). Using the identity

\displaystyle \mathop{\bf E}_{h,x \in G} \Delta_h f(x) = |\mathop{\bf E}_{x \in G} f(x)|^2

we easily verify that the expectation in the definition of (16) is a non-negative real. We also have the recursive formula

\displaystyle \|f\|_{U^d(G)} := (\mathop{\bf E}_{h\in G} \|\Delta_{h} f \|_{U^{d-1}(G)}^{2^{d-1}})^{1/2^d} \ \ \ \ \ (17)

for all {d \geq 1}.

The {U^1} norm essentially just the mean:

\displaystyle \|f\|_{U^1(G)} = |\mathop{\bf E}_{x \in G} f(x)|. \ \ \ \ \ (18)

As such, it is actually a seminorm rather than a norm.

The {U^2} norm can be computed in terms of the Fourier transform:

Exercise 18 (Fourier representation of {U^2}) Define the Pontryagin dual {\hat G} of a finite abelian group {G} to be the space of all homomorphisms {\xi: G \rightarrow {\bf R}/{\bf Z}}. For each function {f: G \rightarrow {\bf C}}, define the Fourier transform {\hat f: \hat G \rightarrow {\bf C}} by the formula {\hat f(\xi) := \mathop{\bf E}_{x \in G} f(x) e(-\xi(x))}. Establish the identity

\displaystyle \|f\|_{U^2(G)} = \| \hat f \|_{\ell^4(\hat G)} := (\sum_{\xi \in \hat G} |\hat f(\xi)|^4)^{1/4}.

In particular, the {U^2} norm is a genuine norm (thanks to the norm properties of {\ell^4(G)}, and the injectivity of the Fourier transform).

For the higher Gowers norms, there is not nearly as nice a formula known in terms of things like the Fourier transform, and it is not immediately obvious that these are indeed norms. But this can be established by introducing the more general Gowers inner product

\displaystyle \langle (f_\omega)_{\omega \in \{0,1\}^d} \rangle_{U^d(G)} := \mathop{\bf E}_{x,h_1,\ldots,h_d \in G} \prod_{\omega_1,\ldots,\omega_d \in \{0,1\}^d}

\displaystyle {\mathcal C}^{\omega_1+\ldots+\omega_d} f_{\omega_1,\ldots,\omega_d}(x+\omega_1 h_1 + \ldots + \omega_d h_d)

for any {2^d}-tuple {(f_\omega)_{\omega \in \{0,1\}^d}} of functions {f_\omega: G \rightarrow {\bf C}}, thus in particular

\displaystyle \langle (f)_{\omega \in \{0,1\}^d} \rangle_{U^d(G)} = \|f\|_{U^d(G)}^{2^d}.

The relationship between the Gowers inner product and the Gowers uniformity norm is analogous to that between a Hilbert space inner product and the Hilbert space norm. In particular, we have the following analogue of the Cauchy-Schwarz inequality:

Exercise 19 (Cauchy-Schwarz-Gowers inequality) For any tuple {(f_\omega)_{\omega \in \{0,1\}^d}} of functions {f_\omega: G \rightarrow {\bf C}}, use the Cauchy-Schwarz inequality to show that

\displaystyle |\langle (f_\omega)_{\omega \in \{0,1\}^d} \rangle_{U^d(G)}| \leq \prod_{j=0,1} |\langle (f_{\pi_{i,j}(\omega)})_{\omega \in \{0,1\}^d} \rangle_{U^d(G)}|^{1/2}

for all {1 \leq i \leq d}, where for {j=0,1} and {\omega \in \{0,1\}^d}, {\pi_{i,j}(\omega) \in \{0,1\}^d} is formed from {\omega} by replacing the {i^{th}} coordinate with {j}. Iterate this to conclude that

\displaystyle |\langle (f_\omega)_{\omega \in \{0,1\}^d} \rangle_{U^d(G)}| \leq \prod_{\omega \in \{0,1\}^d} \|f_\omega\|_{U^d(G)}.

Then use this to conclude the monotonicity formula

\displaystyle \|f\|_{U^d(G)} \leq \|f\|_{U^{d+1}(G)}

for all {d \geq 1}, and the triangle inequality

\displaystyle \|f+g\|_{U^d(G)} \leq \|f\|_{U^d(G)} + \|g\|_{U^d(G)}

for all {f, g: G \rightarrow {\bf C}}. (Hint: For the latter inequality, raise both sides to the power {2^d} and expand the left-hand side.) Conclude in particular that the {U^d(G)} norms are indeed norms for all {d \geq 2}.

The Gowers uniformity norms can be viewed as a quantitative measure of how well a given function behaves like a polynomial. One piece of evidence in this direction is:

Exercise 20 (Inverse conjecture for the Gowers norm, {100\%} case) Let {f: G \rightarrow {\bf C}} be such that {\|f\|_{L^\infty(G)}=1}, and let {s \geq 0}. Show that {\|f\|_{U^{s+1}(G)} \leq 1}, with equality if and only if {f = e(\phi)} for some polynomial {\phi: G \rightarrow {\bf R}/{\bf Z}} of degree at most {s}.

The problem of classifying smaller values of {\|f\|_{U^{s+1}(G)}} is significantly more difficult, and will be discussed in later notes.

Exercise 21 (Polynomial phase invariance) If {f: G \rightarrow {\bf C}} is a function and {\phi: G \rightarrow {\bf R}/{\bf Z}} is a polynomial of degree at most {s}, show that {\| e(\phi) f \|_{U^{s+1}(G)} = \|f\|_{U^{s+1}(G)}}. Conclude in particular that

\displaystyle \sup_{\phi} |\mathop{\bf E}_{x \in G} e( \phi(x) ) f(x) | \leq \|f\|_{U^{s+1}(G)}

where {\phi} ranges over polynomials of degree at most {s}.

The main utility for the Gowers norms in this subject comes from the fact that they control many other expressions of interest. Here is a basic example:

Exercise 22 Let {f: G \rightarrow {\bf C}} be a function, and for each {1 \leq i \leq s+1}, let {g_i: G^{s+1} \rightarrow {\bf C}} be a function bounded in magnitude by {1} which is independent of the {i^{th}} coordinate of {G^{s+1}}. Let {a_1,\ldots,a_{s+1}} be non-zero integers, and suppose that the characteristic of {G} exceeds the magnitude of any of the {a_i}. Show that

\displaystyle |\mathop{\bf E}_{x_1,\ldots,x_{s+1} \in G} f(a_1 x_1 + \ldots + a_{s+1} x_{s+1}) \prod_{i=1}^{s+1} g_i(x_1,\ldots,x_{s+1})|

\displaystyle \leq \|f\|_{U^{s+1}(G)}.

Hint: induct on {s} and use (17) and the Cauchy-Schwarz inequality.

This gives us an analogue of Exercise 15:

Exercise 23 (Generalised von Neumann inequality) Let {\Psi = (\psi_1,\ldots,\psi_t)} be a collection of affine-linear forms {\psi_i: G^d \rightarrow G} with Cauchy-Schwarz complexity {s}. If the characteristic of {G} is sufficiently large depending on the linear coefficients of {\psi_1,\ldots,\psi_t}, show that one has the bound

\displaystyle |\Lambda_\Psi(f_1,\ldots,f_t)| \leq \inf_{1 \leq i \leq t} \|f_i\|_{U^{s+1}(G)}

whenever {f_1,\ldots,f_t: G \rightarrow {\bf C}} are bounded in magnitude by one.

Conclude in particular that if {A} is a subset of {G} with {|A|=\delta|G|}, then

\displaystyle \Lambda_\Psi(1_A,\ldots,1_A) = \delta^t + O_t( \|1_A - \delta \|_{U^{s+1}(G)} ).

From the above inequality, we see that if {A} has some positive density {\delta > 0} but has much fewer than {\delta^t N^d / 2} (say) patterns of the form {\psi_1(\vec x), \ldots, \psi_t(\vec x)} with {\vec x \in G^d}, then we have

\displaystyle \|1_A - \delta \|_{U^{s+1}(G)} \gg_{t,\delta} 1.

This is the initial motivation for studying inverse theorems for the Gowers norms, which give necessary conditions for a (bounded) function to have large {U^{s+1}(G)} norm. This will be a focus of subsequent notes.