In the previous lectures, we have focused mostly on the equidistribution or linear patterns on a subset of the integers {{\bf Z}}, and in particular on intervals {[N]}. The integers are of course a very important domain to study in additive combinatorics; but there are also other fundamental model examples of domains to study. One of these is that of a vector space {V} over a finite field {{\bf F} = {\bf F}_p} of prime order. Such domains are of interest in computer science (particularly when {p=2}) and also in number theory; but they also serve as an important simplified “dyadic model” for the integers. See this survey article of Green for further discussion of this point.

The additive combinatorics of the integers {{\bf Z}}, and of vector spaces {V} over finite fields, are analogous, but not quite identical. For instance, the analogue of an arithmetic progression in {{\bf Z}} is a subspace of {V}. In many cases, the finite field theory is a little bit simpler than the integer theory; for instance, subspaces are closed under addition, whereas arithmetic progressions are only “almost” closed under addition in various senses. (For instance, {[N]} is closed under addition approximately half of the time.) However, there are some ways in which the integers are better behaved. For instance, because the integers can be generated by a single generator, a homomorphism from {{\bf Z}} to some other group {G} can be described by a single group element {g}: {n \mapsto g^n}. However, to specify a homomorphism from a vector space {V} to {G} one would need to specify one group element for each dimension of {V}. Thus we see that there is a tradeoff when passing from {{\bf Z}} (or {[N]}) to a vector space model; one gains a bounded torsion property, at the expense of conceding the bounded generation property. (Of course, if one wants to deal with arbitrarily large domains, one has to concede one or the other; the only additive groups that have both bounded torsion and boundedly many generators, are bounded.)

The starting point for this course (Notes 1) was the study of equidistribution of polynomials {P: {\bf Z} \rightarrow {\bf R}/{\bf Z}} from the integers to the unit circle. We now turn to the parallel theory of equidistribution of polynomials {P: V \rightarrow {\bf R}/{\bf Z}} from vector spaces over finite fields to the unit circle. Actually, for simplicity we will mostly focus on the classical case, when the polynomials in fact take values in the {p^{th}} roots of unity (where {p} is the characteristic of the field {{\bf F} = {\bf F}_p}). As it turns out, the non-classical case is also of importance (particularly in low characteristic), but the theory is more difficult; see these notes for some further discussion.

— 1. Polynomials: basic theory —

Throughout this section, {V} will be a finite-dimensional vector space over a finite field {{\bf F} = {\bf F}_p} of prime order {p}.

Recall from the previous notes that a function {P: V \rightarrow {\bf R}/{\bf Z}} is a function is a polynomial of degree at most {d} if

\displaystyle  \partial_{h_1} \ldots \partial_{h_{d+1}} P(x) = 0

for all {x,h_1,\ldots,h_{d+1} \in V}, where {\partial_h P(x) := P(x+h)-P(x)}. As mentioned in previous notes, this is equivalent to the assertion that the Gowers uniformity norm {\|e(P)\|_{U^{d+1}(V)}=1}. The space of polynomials of degree at most {d} will be denoted {\hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})}; it is clearly an additive group. Note that a polynomial of degree zero is the same thing as a constant function, thus {\hbox{Poly}_{\leq 0}(V \rightarrow {\bf R}/{\bf Z}) \equiv {\bf R}/{\bf Z}}.

An important special case of polynomials are the classical polynomials, which take values in {{\bf F}} (which we identify with the {p^{th}} roots of unity in {{\bf R}/{\bf Z}} in the obvious manner); the space of such polynomials of degree at most {d} will be denoted {\hbox{Poly}_{\leq d}(V \rightarrow {\bf F})}; this is clearly a vector space over {{\bf F}}. The classical polynomials have a familiar description, once we use a basis to identify {V} with {{\bf F}^n}:

Exercise 1 Let {P: {\bf F}^n \rightarrow {\bf F}} be a function, and {d \geq 0} be an integer. Show that {P} is a (classical) polynomial of degree at most {d} if and only if one has a representation of the form

\displaystyle  P(x_1,\ldots,x_n) := \sum_{i_1,\ldots,i_n \geq 0: i_1+\ldots+i_n \leq d} c_{i_1,\ldots,i_n} x_1^{i_1} \ldots x_n^{i_n}

for some coefficients {c_{i_1,\ldots,i_n} \in {\bf F}}. Furthermore, show that we can restrict the exponents {i_1,\ldots,i_n} to lie in the range {\{0,\ldots,p-1\}}, and that once one does so, the representation is unique. (Hint: First establish the {d=1} case, which can be done for instance by a dimension counting argument, and then induct on dimension.)

Exercise 2 Show that the cardinality of {\hbox{Poly}_{\leq d}(V \rightarrow {\bf F})} is at most {p^{\binom{d+\hbox{dim}(V)}{d}}}, with equality if and only if {d < p}.

Now we study more general polynomials. A basic fact here is that multiplying a polynomial by the characteristic {p} lowers the degree:

Lemma 1 If {P \in \hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})}, then {pP \in \hbox{Poly}_{\leq \max(d-p+1,0)}(V \rightarrow {\bf R}/{\bf Z})}.

Proof: Without loss of generality we may take {d \geq p-1}; an easy induction on {d} then shows it suffices to verify the base case {d=p-1}. Our task is now to show that {pP} is constant, or equivalently that {p \Delta_e P = 0} for all {e \in V}.

Fix {e}. The operator {1+\Delta_e} represents a shift by {e}. Since {pe=0}, we conclude that {(1+\Delta_e)^p P = P}. On the other hand, as {P} has degree at most {p-1}, {\Delta_e^p P = 0}, and so

\displaystyle  ((1+\Delta_e)^p - 1 - \Delta_e^p) P = 0.

Using the binomial formula, we can factorise the left-hand side as

\displaystyle  (1 + \frac{p-1}{2} \Delta_e + \ldots + \Delta_e^{p-2}) (p \Delta_e P) = 0.

The first factor can be inverted by Neumann series since {\Delta_e} acts nilpotently on polynomials. We conclude that {p\Delta_e P=0} as required. \Box

Exercise 3 Establish the identity

\displaystyle  p (T^j - 1) = (-1)^{p-1} (T^j - 1) (T-1) (T^2-1) \ldots (T^{p-1}-1)

\displaystyle  \hbox{ mod } T^p-1

for an indeterminate {T} and any integer {j}, by testing on {p^{th}} roots of unity. (This identity was communicated to me by Andrew Granville.) Use this to give an alternate proof of Lemma 1.

This classifies all polynomials in the high characteristic case {p>d}:

Corollary 2 If {p > d}, then {\hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z}) = \hbox{Poly}_{\leq d}(V \rightarrow {\bf F}) + ({\bf R}/{\bf Z})}. In other words, every polynomial of degree at most {d} is the sum of a classical polynomial and a constant.

The situation is more complicated in the low characteristic case {p \leq d}, in which non-classical polynomials can occur (polynomials that are not simply a classical polynomial up to constants). For instance, consider the function {P: {\bf F}_2 \rightarrow {\bf R}/{\bf Z}} defined by {P(0)=0} and {P(1)=1/4}. One easily verifies that this is a (non-classical) quadratic (i.e. a polynomial of degree at most {2}), but is clearly not a shifted version of a classical polynomial since its range is not a shift of the second roots {\{0,1/2\} \hbox{ mod } 1} of unity.

Exercise 4 Let {P: {\bf F}_2 \rightarrow {\bf R}/{\bf Z}} be a function. Show that {P} is a polynomial of degree at most {d} if and only if the range of {P} is a translate of the {(2^d)^{th}} roots of unity (i.e. {2^d P} is constant).

For further discussion of non-classical polynomials, see this previous blog post. Henceforth we shall avoid this technical issue by restricting to the high characteristic case {p>d} (or equivalently, the low degree case {d<p}).

— 2. Equidistribution —

Let us now consider the equidistribution theory of a classical polynomial {P: V \rightarrow {\bf F}}, where we think of {{\bf F}} as being a fixed field (in particular, {p=O(1)}), and the dimension of {V} as being very large; {V} will play the role here that the interval {[N]} played in Notes 1. This theory is classical for linear and quadratic polynomials. The general theory was studied first by Ben Green and myself in the high characteristic case {p>d}, and extended to the low characteristic case by Kaufman and Lovett. An analogous theory surely exists for the non-classical case, although this is not currently in the literature.

The situation here is simpler because a classical polynomial can only take {p} values, so that in the equidistributed case one expects each value to be obtained about {|V|/p} times. Inspired by this, let us call a classical polynomial {P} {\delta}-equidistributed if one has

\displaystyle  |\{ x \in V: P(x) = a \} - |V|/p| \leq \delta |V|

for all {a \in {\bf F}}.

Exercise 5 Show that this is equivalent to the notion of {\delta}-equidistribution given in Notes 1, if one gives {{\bf F}} the metric induced from {{\bf R}/{\bf Z}}, and if one is willing to modify {\delta} by a multiplicative factor depending on {p} in the equivalences.

Before we study equidistribution in earnest, we first give a classical estimate.

Exercise 6 (Chevalley-Warning theorem) Let {V} be a finite dimensional space, and let {P: V \rightarrow {\bf F}} be a classical polynomial of degree less than {(p-1) \hbox{dim}(V)}. Show that {\sum_{x \in V} P(x) = 0}. (Hint: Identify {V} with {{\bf F}^n} for some {n} and apply Exercise 1. Use the fact that {\sum_{x \in {\bf F}} x^i = 0} for all {1 \leq i < p-1}, which can be deduced by using a change of variables {x \mapsto bx}.) If furthermore {P} has degree less than {\hbox{dim}(V)}, conclude that for every {a \in {\bf F}}, that {|\{ x \in V: P(x)=a \}|} is a multiple of {p}. (Hint: Apply Fermat’s little theorem to the quantity {(P-a)^{p-1}}.) In particular, if {x_0 \in V}, then there exists at least one further {x \in V} such that {P(x)=P(x_0)}.

If {P} has degree at most {d} and {x_0 \in V}, obtain the recurrence inequality

\displaystyle  |\{ x \in V: P(x) = P(x_0)\}| \gg_{p,d} |V|.

Hint: normalise {x_0=0}, then average the previous claim over all subspaces of {V} of a certain dimension.

The above exercise goes some way towards establishing equidistribution, by showing that every element in the image of {P} is attained a fairly large number of times. But additional techniques will be needed (together with additional hypotheses on {P}) in order to obtain full equidistribution. It will be convenient to work in the ultralimit setting. Define a limit classical polynomial {P: V \rightarrow {\bf F}} on a limit finite-dimensional vector space {V = \prod_{\alpha \rightarrow \alpha_\infty} V_\alpha} of degree at most {d} to be an ultralimit of classical polynomials {P_\alpha: V_\alpha \rightarrow {\bf F}} of degree at most {d} (we keep {{\bf F}} and {d} fixed independently of {\alpha}). We say that a limit classical polynomial {P} is equidistributed if one has

\displaystyle  |\{ x \in V: P(x) = a \}| = |V|/p + o(|V|)

for all {a \in {\bf F}}, where the cardinalities here are of course limit cardinalities.

Exercise 7 Let {V} be a limit finite-dimensional vector space. Show that a limit function {P: V \rightarrow {\bf F}} is a limit classical polynomial of degree at most {d} if and only if it is a classical polynomial of degree at most {d} (observing here that every limit vector space is automatically a vector space).

Exercise 8 Let {P = \lim_{\alpha \rightarrow \alpha_\infty} P_\alpha} be a limit classical polynomial. Show that {P} is equidistributed if and only if, for every {\delta > 0}, {P_\alpha} is {\delta}-equidistributed for {\alpha} sufficiently close to {\alpha_\infty}.

Exercise 9 Let {P: V \rightarrow {\bf F}} be a limit classical polynomial which is linear (i.e. of degree at most {1}). Show that {P} is equidistributed if and only if {P} is non-constant.

There is an analogue of the Weyl criterion in this setting. Call a limit function {P: V \rightarrow {\bf F}} biased if {|{\bf E}_{x \in V} e(P(x))| \gg 1}, and unbiased if {{\bf E}_{x \in V} e(P(x)) = o(1)}, where we identify {P(x) \in {\bf F}} with an element of {{\bf R}/{\bf Z}}.

Exercise 10 (Weyl criterion) Let {P: V \rightarrow {\bf F}} be a limit function. Show that {P} is equidistributed if and only if {kP} is unbiased for all non-zero {k \in {\bf F}}.

Thus to understand the equidistribution of polynomials, it suffices to understand the size of exponential sums {{\bf E}_{x \in V} e(P(x))}. For linear polynomials, this is an easy application of Fourier analysis:

Exercise 11 Let {P: V \rightarrow {\bf F}} be a polynomial of degree at most {1}. Show that {|{\bf E}_{x \in V} e(P(x))|} equals {1} if {P} is constant, and equals {0} if {P} is not constant. (Note that this is completely consistent with the previous two exercises.)

Next, we turn attention to the quadratic case. Here, we can use the Weyl differencing trick, which we phrase as an identity

\displaystyle  |\mathop{\bf E}_{x \in V} f(x)|^2 = \mathop{\bf E}_{h \in V} \mathop{\bf E}_{x \in V} \Delta_h f(x) \ \ \ \ \ (1)

for any finite vector space {V} and function {f: V \rightarrow {\bf C}}, where {\Delta_h f(x) := f(x+h) \overline{f(x)}} is the multiplicative derivative. Taking ultralimits, we see that the identity also holds for limit functions on limit finite dimensional vector spaces. In particular, we have

\displaystyle  |\mathop{\bf E}_{x \in V} e(P(x))|^2 = \mathop{\bf E}_{h \in V} \mathop{\bf E}_{x \in V} e( \partial_h P(x) ) \ \ \ \ \ (2)

for any limit function {P: V \rightarrow {\bf F}} on a limit finite dimensional space.

If {P} is quadratic, then {\partial_h P} is linear. Applying (11), we conclude that if {P} is biased, then {\partial_h P} must be constant for {\gg |V|} values of {h \in V}.

On the other hand, by using the cocycle identity

\displaystyle  \partial_{h+k} P(x) = \partial_h P(x+k) + \partial_k P(x)

we see that the set of {h \in V} for which {\partial_h P} is constant is a limit subspace of {W}. On that subspace, {P} is then linear; passing to a codimension one subspace {W'} of {W}, {P} is then constant on {W'}. As {\partial_h P} is linear for every {h}, {P} is then linear on each coset {h+W'} of {W'}. As {|W'| \gg |V|}, there are only a bounded number of such cosets; thus {P} is piecewise linear, and thus piecewise constant on slightly smaller cosets. Intersecting all the subspaces together, we can thus find another limit subspace {U} with {|U| \gg |V|} such that {P} is constant on each coset of {U}. To put it another way, if we view {U} as the intersection of a bounded number of kernels of linear homomorphism {L_1,\ldots,L_d: V \rightarrow {\bf F}} (where {d=O(1)} is the codimension of {U}), then {P} is constant on every simultaneous level set of {L_1,\ldots,L_d}, and can thus be expressed as a function {F(L_1,\ldots,L_d)} of these linear polynomials.

More generally, let us say that a limit classical polynomial {P} of degree {\leq d} is low rank if it can be expressed as {P = F(Q_1,\ldots,Q_d)} where {Q_1,\ldots,Q_d} are a bounded number of polynomials of degree {\leq d-1}. We can summarise the above discussion (and also Exercise 11) as follows:

Proposition 3 Let {d \leq 2}, and let {P: V \rightarrow {\bf F}} be a limit classical polynomial. If {P} is biased, then {P} is low rank.

In particular, from the Weyl criterion, we see that if {P} is not equidistributed, then {P} is of low rank.

Of course, the claim fails if the low rank hypothesis is dropped. For instance, consider a limit classical quadratic {Q=L_1L_2} that is the product of two linearly independent linear polynomials {L_1, L_2}. Then {Q} attains each non-zero value with a density of {(p-1)/p^2} rather than {1/p} (and attains {0} with a density of {(2p-1)/p^2} rather than {1/p}).

Exercise 12 Suppose that the characteristic {p} of {{\bf F}} is greater than {2}, and suppose that {P: {\bf F}^n \rightarrow {\bf F}} is a quadratic polynomial of the form {P(x) = x^T M x + b^T x + c}, where {c \in {\bf F}}, {b \in {\bf F}^n}, {M} is a symmetric {n \times n} matrix with coefficients in {{\bf F}}, and {x^T} is the transpose of {x}. Show that {|\mathop{\bf E}_{x \in V} e(P(x))| \leq p^{-r/2}}, where {r} is the rank of {M}. Furthermore, if {b} is orthogonal to the kernel of {M}, show that equality is attained, and otherwise {\mathop{\bf E}_{x \in V} e(P(x))} vanishes.

What happens in the even characteristic case (assuming now that {M} is not symmetric)?

Exercise 13 (Van der Corput lemma) Let {P: V \rightarrow {\bf F}} be a limit function on a limit finite dimensional vector space {V}, and suppose that there exists a limit subset {H} of {V} which is sparse in the sense that {|H|=o(|V|)}, and such that {\partial_h P} is equidistributed for all {h \in V \backslash H}. Show that {P} itself is equidistributed. Use this to give an alternate proof of 3.

Exercise 14 (Space of polynomials is discrete) Let {P: V \rightarrow {\bf F}} be a polynomial of degree at most {d} such that {{\bf E}_{x \in V} |e(P(x)) - c| < 2^{-d+1}} for some constant {c \in S^1}. Show that {P} is constant. (Hint: induct on {d}.) Conclude that if {P, Q} are two distinct polynomials of degree at most {d}, that {\|e(P)-e(Q)\|_{L^2(V)} \gg 1}.

The fact that high rank polynomials are equidistributed extends to higher degrees also:

Theorem 4 Let {P: V \rightarrow {\bf F}} be a limit classical polynomial. If {P} is biased, then {P} is low rank.

In particular, from the Weyl criterion, we see that if {P} is not equidistributed, then {P} is of low rank.

In the high characteristic case {p>d}, this claim was obtained by Ben Green and myself; the generalisation to the low characteristic case {p \leq d} was carried out by Kaufman and Lovett. The statement is phrased in the language of ultrafilters, but it has an equivalent finitary analogue:

Exercise 15 Show that Theorem 4 is equivalent to the claim that for every {d \geq 1} and {\delta > 0}, and every classical polynomial {P: V \rightarrow {\bf F}} of degree at most {d} on a finite-dimensional vector space with {|{\bf E}_{x \in V} e(P(x))| \geq \delta}, that {P} can be expressed as a function of at most {O_{\delta,d}(1)} classical polynomials of degree at most {d-1}.

The proof of Theorem 4 is a little lengthy. It splits up into two pieces. We say that a limit function {P: V \rightarrow {\bf F}} (not necessarily a polynomial) is of order {<d} if it can be expressed as a function of a bounded number of polynomials of degree less than {d}. Our task is thus to show that every polynomial of degree {d} which is biased, is of order {<d}. We first get within an epsilon of this goal, using an argument of Bogdanov and Viola:

Lemma 5 (Bogdanov-Viola lemma) Let {P: V\rightarrow {\bf F}} be a limit polynomial of degree {d} which is biased, and let {\epsilon > 0} be standard. Then one can find a limit function {Q: V \rightarrow {\bf F}} of order {<d} such that {|\{ x \in V: P(x) \neq Q(x) \}| \leq \epsilon |V|}.

Proof: Let {\kappa > 0} be a small standard number (depending on {\epsilon}) to be chosen later, let {M} be a large standard integer (depending on {\epsilon,\kappa}) to be chosen later, and let {h_1,\ldots,h_M} be chosen uniformly at random from {V}. An application of the second moment method (which we leave as an exercise) shows that if {M} is large enough, then with probability at least {1-\epsilon}, one has

\displaystyle  |\mathop{\bf E}_{m \in M} e( P( x + h_m ) ) - \mathop{\bf E}_{x \in V} e( P(x) )| \leq \kappa

for at least {(1-\epsilon)|V|} choices of {x}. We can rearrange this as

\displaystyle  |e(P(x)) - \frac{1}{\delta} \mathop{\bf E}_{m \in M} e( -\partial_{h_m} P( x ) )| \leq \kappa/\delta

where {\delta := |\mathop{\bf E}_{x \in V} e(P(x))|}; note from hypothesis that {\delta \gg 1}. If we let {F(x)} be the nearest {p^{th}} root of unity to {\frac{1}{\delta} \mathop{\bf E}_{m \in M} e( -\partial_{h_m} P( x ) )}, then (if {\kappa} is small enough) we conclude that {e(P(x)) = F(x)} for at least {(1-\epsilon)|V|} choices of {x}. On the other hand, {F} is clearly of order {<d}, and the claim follows. \Box

Exercise 16 Establish the claim left as an exercise in the above proof.

To conclude the proof of Theorem 4 from Lemma 5, it thus suffices to show

Proposition 6 (Rigidity) Let {P: V \rightarrow {\bf F}} be a limit polynomial of degree {d} which is equal to a limit function {Q: V \rightarrow {\bf F}} of order {<d} on at least {1-\epsilon} of {V}, where {\epsilon > 0} is standard. If {\epsilon} is sufficiently small with respect to {d}, then {P} is also of order {<d}.

This proposition is somewhat tricky to prove, even in the high characteristic case {p>d}. We fix {d<p} and assume inductively that the proposition (and hence) Theorem 4 has been demonstrated for all smaller values of {d}.

The main idea here is to start with the “noisy polynomial” {Q}, and perform some sort of “error correction” on {Q} to recover {P}; the key is then to show that this error correction procedure preserves the property of being order {<d}. From Exercise 14 we know that in principle, this error correction is possible if {\epsilon} is small enough; but in order to preserve the order {<d} property we need a more explicit error correction algorithm which is tractable for analysis. This is provided by the following lemma.

Lemma 7 (Error correction of polynomials) Let {P: V \rightarrow {\bf F}} be a (limit) classical polynomial of degree at most {d}, and let {Q: V \rightarrow {\bf F}} be a (limit) function which agrees with {P} at least {1-\epsilon} of the time for some {\epsilon \leq 2^{-d-2}}. Then for every {x \in V}, {P(x)} is equal to the most common value (i.e. the mode) of {\sum_{\omega \in \{0,1\}^{d+1} \backslash \{0\}} (-1)^{|\omega|-1} Q(x+\omega_1 h_1+ \ldots + \omega_{d+1} h_{d+1})} as {h_1,\ldots,h_{d+1}} vary in {V}.

Proof: As {P} is a polynomial of degree at most {d}, one has

\displaystyle  \partial_{h_1} \ldots \partial_{h_{d+1}} P(x) = 0

for all {x,h_1,\ldots,h_{d+1} \in V}. We rearrange this as

\displaystyle  P(x) = \sum_{\omega \in \{0,1\}^{d+1} \backslash \{0\}} (-1)^{|\omega|-1} P(x+\omega_1 h_1 + \ldots + \omega_{d+1} h_{d+1} ).

We conclude that

\displaystyle  P(x) = \sum_{\omega \in \{0,1\}^{d+1} \backslash \{0\}} (-1)^{|\omega|-1} Q(x+\omega_1 h_1 + \ldots + \omega_{d+1} h_{d+1} \omega_{d+1}) \ \ \ \ \ (3)

holds unless {P} and {Q} differ at {x+h_1\omega_1+\ldots+h_{d+1} \omega_{d+1}} for some {\omega \in \{0,1\}^{d+1} \backslash \{0\}}.

On the other hand, if {x} is fixed and {h_1,\ldots,h_{d+1}} are chosen independently and uniformly at random from {V}, then for each {\omega \in \{0,1\}^{d+1} \backslash \{0\}}, {x+h_1\omega_1+\ldots+h_{d+1} \omega_{d+1}} is also uniformly distributed in {V}, and so the probability that {P} and {Q} differ at {x+h_1\omega_1+\ldots+h_{d+1} \omega_{d+1}} is at most {2^{-d-2}}. Applying the union bound for the {2^{d+1}-1 < 2^{d+1}} values of {\omega} under consideration, we conclude that (3) happens more than half the time, and the claim follows. \Box

Note that the above argument in fact shows that the mode is attained for at least {1 - 2^{d+1} \epsilon} of the choices of {h_1,\ldots,h_{d+1}}.

In view of this lemma, the goal is now to show that if {Q} is of order {<d} and is sufficiently close to a polynomial of degree {d}, then the mode of {\sum_{\omega \in \{0,1\}^{d+1} \backslash \{0\}} (-1)^{|\omega|-1} Q(x+\omega_1 h_1+ \ldots + \omega_{d+1} h_{d+1})} is also of order {<d}.

By hypothesis, we have {Q = F(R_1,\ldots,R_m)} for some standard {m} and some polynomials {R_1,\ldots,R_m} of degree {d-1}. To motivate the general argument, let us first work in an easy model case, in which the {R_1,\ldots,R_m} are polynomials of degree {d-1} that are linearly independent modulo low rank (i.e. order {<d-2}) errors, i.e. no non-trivial linear combination of {R_1,\ldots,R_m} over {{\bf F}} is of low rank. This is not the most general case, but is somewhat simpler and will serve to illustrate the main ideas.

The linear independence, combined with the inductive hypothesis, implies that any non-trivial linear combination of {R_1,\ldots,R_m} is unbiased. From this and Fourier analysis, we see that {\vec R := (R_1,\ldots,R_m)} is jointly equidistributed, thus in particular we have

\displaystyle  |S_r| = (p^{-m} + o(1)) |V| \ \ \ \ \ (4)

for all {r \in {\bf F}^m}, where {S_r := \{ x \in V: \vec R(x)=r\}}.

In fact, we have a much stronger equidistribution property than this; not only do we understand the distribution of {\vec R(x)} for a single {x}, but more generally we can control the distribution of an entire parallelopiped

\displaystyle  \vec R^{[D]}(x,h_1,\ldots,h_D) := ( \vec R(x + \omega_1 h_1 + \ldots + \omega_D h_D ) )_{\omega_1,\ldots,\omega_D \in \{0,1\}}

for any standard integer {D \geq 0}. Because all the components {\vec R} are polynomials of degree {d-1}, the quantity {\vec R^{[D]}(x,h_1,\ldots,h_D)} is constrained to the space {\Sigma^{[D]}}, defined as the subspace of {({\bf F}^m)^{2^D}} consisting of all tuples {r = ( r_{\omega} )_{\omega \in \{0,1\}^D}} obeying the constraints

\displaystyle  \sum_{\omega \in F} (-1)^{|\omega|} r_\omega = 0

for all faces {F \subset \{0,1\}^D} of dimension {d}, where {|\omega| := \omega_1+\ldots+\omega_D} is the sign of {\omega}. (This constraint is of course vacuous if {D<d}.)

Proposition 8 {\vec R^{[D]}} is equidistributed in {\Sigma^{[D]}}, thus

\displaystyle  |\{ (x,h_1,\ldots,h_D) \in V^{d+1}: \vec R^{[D]}(x,h_1,\ldots,h_D) = r \}|

\displaystyle  = (\frac{1}{|\Sigma^{[D]}|}+o(1)) |V|^{d+1}

for all {r \in \Sigma^{[D]}}. Furthermore, we have the refined bound

\displaystyle  |\{ (h_1,\ldots,h_D) \in V^{d}: \vec R^{[D]}(x,h_1,\ldots,h_D) = r \}|

\displaystyle  = (\frac{p^m}{|\Sigma^{[D]}|}+o(1)) |V|^{d}

for all {r \in \Sigma^{[D]}} and all {x \in S_{r_0}}.

Proof: It suffices to prove the second claim. Fix {x} and {r = (r_\omega)_{\omega \in \{0,1\}^D}} From the definition of {\Sigma^{[D]}}, we see that {r} is uniquely determined by the component {r_0} and {r_{<d} := (r_\omega)_{\omega \in \{0,1\}^D: 0 < |\omega| < d}}. It will thus suffice to show that

\displaystyle  |\{ (x,h_1,\ldots,h_D) \in V^{d}: \vec R^{[D]}_{<d}(x,h_1,\ldots,h_D) = r_{<d} \}|

\displaystyle = (\frac{p^m}{|\Sigma^{[D]}|}+o(1)) |V|^{d}

for all {r_{<d} \in ({\bf F}^m)^{\{ \omega \in \{0,1\}^D: 0 < |\omega| < d \}}}, where

\displaystyle  \vec R^{[D]}_{<d}(x,h_1,\ldots,h_D) :=

\displaystyle ( \vec R(x + \omega_1 h_1 + \ldots + \omega_D h_D ) )_{\omega \in \{0,1\}^D: 0 < |\omega| < d}.

By Fourier analysis, it suffices to show that

\displaystyle  \mathop{\bf E}_{h_1,\ldots,h_D \in V} e( \xi \cdot \vec R^{[D]}_{<d}(x,h_1,\ldots,h_D) ) = o(1)

for any non-zero {\xi \in ({\bf F}^m)^{\{ \omega \in \{0,1\}^D: 0 < |\omega| < d \}}}. In other words, we need to show that

\displaystyle  \mathop{\bf E}_{h_1,\ldots,h_D \in V} e(\sum_{\omega \in \{0,1\}^D: |\omega| < d} \xi_\omega \cdot \vec R(x + \omega_1 h_1 + \ldots + \omega_D h_D ) ) = o(1) \ \ \ \ \ (5)

whenever the {\xi_\omega \in {\bf F}^m} for {\omega \in \{0,1\}^D, 0<|\omega|<d} are not all zero.

Let {\omega_0} be such that {\xi_{\omega_0} \neq 0}, and such that {|\omega|} is as large as possible; let us write {d' := |\omega_0|}, so that {0 \leq d' < d}. Without loss of generality we may take {\omega_0 = (1,\ldots,1,0,\ldots,0)}. Suppose (5) failed, then by the pigeonhole principle one can find {h_{d'+1},\ldots,h_D} such that

\displaystyle  | \mathop{\bf E}_{h_1,\ldots,h_{d'} \in V} e(\sum_{\omega \in \{0,1\}^D: |\omega| < d} \xi_\omega \cdot \vec R(x + \omega_1 h_1 + \ldots + \omega_D h_D ) )| \gg 1.

We write the left-hand side as

\displaystyle  | \mathop{\bf E}_{h_1,\ldots,h_{d'} \in V} e( \xi_{\omega_0} \cdot \vec R( x + h_1 + \ldots + h_{d'} ) ) \prod_{j=1}^{d'} f_j(h_1,\ldots,h_{d'}) |

where {f_j} are bounded limit functions depending on {x,h_{d'+1},\ldots,h_D} that are independent of {h_j}.

We can eliminate each {f_j} term in turn by the Cauchy-Schwarz argument used in Notes 3, and conclude that

\displaystyle  \| e( \xi_{\omega_0} \cdot \vec R ) \|_{U^{d'}(V)} \gg 1,

and thus by the monotonicity of Gowers norms

\displaystyle  \| e( \xi_{\omega_0} \cdot \vec R ) \|_{U^{d-1}(V)} \gg 1,

or in other words that the degree {d-1} polynomial {(x,h_1,\ldots,h_{d-1}) \mapsto \partial_{h_1} \ldots \partial_{h_{d-1}} (\xi_{\omega_0} \cdot \vec R)(x)} is biased. By the induction hypothesis, this polynomial must be low rank.

At this point we crucially exploit the high characteristic hypothesis by noting the Taylor expansion formula

\displaystyle  P(y) = \frac{1}{(d-1)!} \partial_y^{d-1} P(y) + \hbox{ low rank errors}.

The high characteristic is necessary here to invert {(d-1)!}. We conclude that {\xi_{\omega_0} \cdot \vec R} is of low rank, but this contradicts the hypothesis on the {R_1,\ldots,R_m} and the non-zero nature of {\xi_{\omega_0}}, and the claim follows. \Box

Let {x \in V} and {r = (r_\omega)_{\omega \in \{0,1\}^D} \in \Sigma^{[D]}}. From the above proposition we have an equidistribution result for a cube pinned at {x}:

\displaystyle  |\{ (h_1,\ldots,h_D) \in V^{D}: x+\omega_1 h_1 + \ldots + \omega_D h_D \in S_{r_\omega} \ \ \ \ \ (6)

\displaystyle  \hbox{ for all } \omega \in \{0,1\}^D \}| = (\frac{p^m}{|\Sigma^{[D]}|} + o(1)) |V|^{D}.

In fact, we can do a bit better than this, and obtain equidistribution even after fixing a second vertex:

Exercise 17 (Equidistribution of doubly pinned cubes) Let {(r_\omega)_{\omega \in \{0,1\}^D} \in \Sigma^{[D]}}, let {x \in S_{r_0}}, let {\omega' \in \{0,1\}^D \backslash \{0\}}. Then for all but {o(|V|)} elements {y} of {S_{r_{\omega_0}}}, one has

\displaystyle  |\{ (h_1,\ldots,h_D) \in V^{D}: x+\omega_1 h_1 + \ldots + \omega_D h_D \in S_{r_\omega} \ \ \ \ \ (7)

\displaystyle  \hbox{ for all } \omega \in \{0,1\}^D; x + \omega'_1 h_1 + \ldots + \omega'_D h_D = y \}|

\displaystyle  = (\frac{p^m}{|\Sigma^{[D]}|} + o(1)) |V|^{D-1}.

(Hint: One can proceed by applying Proposition 8 with {D} replaced by a larger dimension, such as {2D}; details can be found in the paper of Ben Green and myself.)

We can now establish Proposition 6 in the case where the {R_1,\ldots,R_m} are independent modulo low rank errors. Let {r_0 \in {\bf F}^m} and {x \in S_{r_0}}. It will suffice to show that {P(x)} does not depend on {x} as long as {x} stays inside {r_0}.

Call an atom {S_r} good if {P} and {Q} agree for at least {1-\sqrt{\epsilon}} of the elements of {S_r}; by Markov’s inequality (and (4)) we see that at least {1-\sqrt{\epsilon}+o(1)} of the atoms are good. From this and an easy counting argument we can find an element {r = (r_\omega)_{\omega \in \{0,1\}^d}} in {\Sigma^{[d]}} with the specified value of {r_0}, such that {r_\omega} is good for every {\{0,1\}^d \backslash \{0\}}.

Fix {r}. Now consider all the pinned cubes {(x+h_1 \omega_1+\ldots+h_d\omega_d)_{\omega_1,\ldots,\omega_d \in \{0,1\}^d}} with {x+h_1\omega_1+\ldots+h_d\omega_d \in S_{r_\omega}} for all {\omega \in \{0,1\}^d \backslash \{0\}}. By (6), the number of such cubes is {(\frac{p^m}{|\Sigma^{[d]}|} + o(1)) |V|^{d}}. On the other hand, by Proposition 17, the total number of such cubes for which

\displaystyle  P(x+h_1\omega_1+\ldots+h_d\omega_d) \neq Q(x+h_1\omega_1+\ldots+h_d\omega_d)

for some {\omega \in \{0,1\}^d \backslash \{0\}} is {o(|V|^{d-1})}. We conclude that there exists a pinned cube for which

\displaystyle  P(x+h_1\omega_1+\ldots+h_d\omega_d) = Q(x+h_1\omega_1+\ldots+h_d\omega_d)

for all {\omega \in \{0,1\}^d \backslash \{0\}}, and in particular (3) holds. However, as {Q} is constant on each of the {S_r}, we see that the right-hand side of (3) does not depend on {x}, and so the left-hand side does also.

This completes the proof of Proposition 6 in the independent case. In the general case, one reduces to a (slight generalisation of) this case by the following regularity lemma:

Lemma 9 (Regularity lemma) Let {R_1,\ldots,R_m} be a bounded number of limit classical polynomials of degree {\leq d-1}. Then there exists a limit classical bounded number of polynomials {S_{d',1},\ldots,S_{d',m_{d'}}} of degree {\leq d'} for each {1 \leq d' \leq d-1}, such that each {R_1,\ldots,R_m} is a function of the {S_{d',i}} for {1 \leq d' \leq d} and {1 \leq i \leq m_{d'}}, and such that for each {d'}, the {S_{d',1},\ldots,S_{d',m_{d'}}} are independent modulo low rank polynomials of degree {d'}.

Proof: We induct on {d}. The claim is vacuously true for {d=1}, so suppose that {d>1} and that the claim has already been proven for {d-1}.

Let {Poly_{d-1}} be the space of limit classical polynomials of degree {\leq d-1}, and let {Poly_{d-1}^0} be the subspace of low rank limit classical polynomials. Working in the quotient space {Poly_{d-1}/Poly_{d-1}^0}, we see that {R_1,\ldots,R_m} generates a finite-dimensional space here, which thus has a basis {S_{d-1,1},\ldots,S_{d-1,m_{d-1}} \hbox{ mod } Poly_{d-1}^0}, thus {S_{d-1,1},\ldots,S_{d-1,m_{d-1}}} are linearly independent modulo low rank polynomials of degree {d-1}, and the {R_1,\ldots,R_m} are linear combinations of the {S_{d-1,1},\ldots,S_{d-1,m_{d-1}}} plus combinations of some additional polynomials {R'_1,\ldots,R'_{m'}} of degree {d-2}. Applying the induction hypothesis to those additional polynomials, one obtains the claim. \Box

Exercise 18 Show that the polynomials {S := (S_{d',i})_{1 \leq d' \leq d-1; 1 \leq i \leq m_{d'}} )} appearing in the above lemma are equidistributed in the sense that

\displaystyle  |\{ x \in V: S(x) = s \}| = (\frac{1}{p^{\sum_{d'=1}^d m_{d'}}} +o(1)) |V|

for any {s = ( s_{d',i})_{1 \leq d' \leq d-1; 1\leq i \leq m_{d'} }} with {s_{d',i} \in {\bf F}}.

Applying the above lemma, one can express any order {<d} function {Q} in the form {Q = F( (S_{d',i})_{1 \leq d' \leq d-1; 1 \leq i \leq m_{d'}} )}. It is then possible to modify the previous arguments to obtain Proposition 6; see the paper of Ben Green and myself for more details. (We phrase the arguments in a finitary setting rather than a nonstandard one, but the two approaches are equivalent; see this blog post for more discussion.)

It is possible to modify the above arguments to handle the low characteristic case, but due to the lack of a good Taylor expansion, one has to regularise the derivatives of the polynomials, as well as the polynomials themselves; see the paper of Kaufman and Lovett for details.

— 3. Analytic rank —

Define the rank {rank_{d-1}(P)} of a degree {d} (limit) classical polynomial {P} to be the least number {m} of degree {\leq d-1} (limit) classical polynomials {R_1,\ldots,R_m} such that {P} is a function of {R_1,\ldots,R_m}. Theorem 3 tells us that {P} is equidistributed whenever the rank is unbounded. However, the proof was rather involved. There is a more elementary approach to equidistribution due to Gowers and Wolf which replaces the rank by a different object, called analytic rank, and which can serve as a simpler substitute for the concept of rank in some applications.

Definition 10 (Analytic rank) The analytic rank {arank_{d-1}(P)} of a (limit) classical polynomial {P: V \rightarrow {\bf F}} of degree {\leq d} is defined to be the quantity

\displaystyle  arank_d(P) := - \log_p \mathop{\bf E}_{x,h_1,\ldots,h_d \in V} e( \partial_{h_1} \ldots \partial_{h_d} P(x) )

\displaystyle  = -2^{d} \log_p \| e(P) \|_{U^d(V)}.

From the properties of the Gowers norms we see that this quantity is non-negative, is zero if and only if {P} is a polynomial of degree {<d}, and is finite (or limit finite) for {d>2}. (For {d=1}, the analytic rank is infinite if {P} is non-constant and zero if {P} is constant.)

Exercise 19 Show that if {p>2} and {P} is a (limit) classical polynomial of degree {2}, then {rank_1(P)=arank_1(P)}.

Exercise 20 Show that if the analytic rank {arank_{d-1}(P)} of a limit classical polynomial {P} of degree {d} is unbounded, then {P} is equidistributed.

Exercise 21 Suppose we are in the high characteristic case {p>d} Using Theorem 3, show that a limit classical polynomial has bounded analytic rank if and only if it has bounded rank. (Hint: One direction follows from the preceding exercise. For the other direction, use the Taylor formula {P(x) = \frac{1}{d!} \partial_x^d P(x)}.) This is a special case of the inverse conjecture for the Gowers norms, which we will discuss in more detail in later notes.

Conclude the following finitary version: if {P: V \rightarrow {\bf F}} is a classical polynomial of degree {d} on a finite-dimensoinal vector space {V}, and {arank_{d-1}(P) \leq M}, then {rank_{d-1}(P) \ll_{M,p,d} 1}; conversely, if {rank_{d-1}(P) \leq M}, then {arank_{d-1}(P) \ll_{M,p,d} 1}.

Exercise 22 Show that if {P} is a (limit) classical polynomial of degree {d}, then {rank_{d-1}(P)=rank_{d-1}(cP)} and {arank_{d-1}(P) = arank_{d-1}(cP)} for all {c \in {\bf F} \backslash 0}, and {rank_{d-1}(P+Q) = rank_{d-1}(P)} and {arank_{d-1}(P+Q) = arank_{d-1}(P)} for all (limit) classical polynomials {Q} of degree {\leq d-1}.

It is clear that the rank obeys the triangle inequality {rank_{d-1}(P+Q) \leq rank_{d-1}(P)+rank_{d-1}(Q)} for all (limit) classical polynomials of degree {\leq d}. There is an analogue for analytic rank, due to Gowers and Wolf:

Proposition 11 (Quasi-triangle inequality for analytic rank) Let {P, Q: V \rightarrow {\bf F}} be (limit) classical polynomials of degree {\leq d}. Then {arank_{d-1}(P+Q) \leq 2^d( arank_{d-1}(P) + arank_{d-1}(Q) )}.

Proof: Let {T_1(h_1,\ldots,h_d)} be the {d}-linear form

\displaystyle  T_1(h_1,\ldots,h_d) := \partial_{h_1} \ldots \partial_{h_d} P(x)

(note that the right-hand side is independent of {x}); similarly define

\displaystyle  T_2(h_1,\ldots,h_d) := \partial_{h_1} \ldots \partial_{h_d} P(x)

By definition, we have

\displaystyle  {\bf E}_{h_1,\ldots,h_d \in V} e( T_1(h_1,\ldots,h_d) ) = p^{-arank_{d-1}(P)}

and

\displaystyle  {\bf E}_{h_1,\ldots,h_d \in V} e( T_2(h_1,\ldots,h_d) ) = p^{-arank_{d-1}(Q)}

and thus

\displaystyle  {\bf E}_{h_1,\ldots,h_d,h'_1,\ldots,h'_d \in V} e( T_1(h_1,\ldots,h_d) + T_2(h'_1,\ldots,h'_d) )

\displaystyle = p^{-arank_{d-1}(P)-arank_{d-1}(Q)}.

We make the substitution {h'_j = h_j + k_j}. Using the multilinearity of {T_2}, we can write the left-hand side as

\displaystyle  {\bf E}_{k_1,\ldots,k_d \in V} {\bf E}_{h_1,\ldots,h_d \in V} e( (T_1+T_2)(h_1,\ldots,h_d) ) \times

\displaystyle  \prod_{j=1}^d f_j( h_1,\ldots,h_d,k_1,\ldots,k_d)

where the {f_j} are functions bounded in magnitude by {1} that are independent of the {h_j} variable. Eliminating all these factors by Cauchy-Schwarz as in the previous notes, we can bound the above expression by

\displaystyle  (\mathop{\bf E}_{h^0_1,\ldots,h^0_d,h^1_1,\ldots,h^1_d \in V} e( \sum_{\omega \in \{0,1\}^d} (-1)^{|\omega|} (T_1+T_2)(h_1^{\omega_1}, \ldots, h_d^{\omega_d})|^{1/2^d}

which using the substitution {h_i := h^1_i - h^0_i} and the multilinearity of {T_1+T_2} simplifies to

\displaystyle  (\mathop{\bf E}_{h_1,\ldots,h_d \in V} e( (T_1+T_2)(h_1,\ldots,h_d) )|^{1/2^d}

which by definition of analytic rank is

\displaystyle  p^{-arank_{d-1}(P+Q)/2^d},

and the claim follows. \Box