You are currently browsing the tag archive for the ‘finite fields’ tag.

In analytic number theory, there is a well known analogy between the prime factorisation of a large integer, and the cycle decomposition of a large permutation; this analogy is central to the topic of “anatomy of the integers”, as discussed for instance in this survey article of Granville. Consider for instance the following two parallel lists of facts (stated somewhat informally). Firstly, some facts about the prime factorisation of large integers:

• Every positive integer ${m}$ has a prime factorisation

$\displaystyle m = p_1 p_2 \dots p_r$

into (not necessarily distinct) primes ${p_1,\dots,p_r}$, which is unique up to rearrangement. Taking logarithms, we obtain a partition

$\displaystyle \log m = \log p_1 + \log p_2 + \dots + \log p_r$

of ${\log m}$.

• (Prime number theorem) A randomly selected integer ${m}$ of size ${m \sim N}$ will be prime with probability ${\approx \frac{1}{\log N}}$ when ${N}$ is large.
• If ${m \sim N}$ is a randomly selected large integer of size ${N}$, and ${p = p_i}$ is a randomly selected prime factor of ${m = p_1 \dots p_r}$ (with each index ${i}$ being chosen with probability ${\frac{\log p_i}{\log m}}$), then ${\log p_i}$ is approximately uniformly distributed between ${0}$ and ${\log N}$. (See Proposition 9 of this previous blog post.)
• The set of real numbers ${\{ \frac{\log p_i}{\log m}: i=1,\dots,r \}}$ arising from the prime factorisation ${m = p_1 \dots p_r}$ of a large random number ${m \sim N}$ converges (away from the origin, and in a suitable weak sense) to the Poisson-Dirichlet process in the limit ${N \rightarrow \infty}$. (See the previously mentioned blog post for a definition of the Poisson-Dirichlet process, and a proof of this claim.)

Now for the facts about the cycle decomposition of large permutations:

• Every permutation ${\sigma \in S_n}$ has a cycle decomposition

$\displaystyle \sigma = C_1 \dots C_r$

into disjoint cycles ${C_1,\dots,C_r}$, which is unique up to rearrangement, and where we count each fixed point of ${\sigma}$ as a cycle of length ${1}$. If ${|C_i|}$ is the length of the cycle ${C_i}$, we obtain a partition

$\displaystyle n = |C_1| + \dots + |C_r|$

of ${n}$.

• (Prime number theorem for permutations) A randomly selected permutation of ${S_n}$ will be an ${n}$-cycle with probability exactly ${1/n}$. (This was noted in this previous blog post.)
• If ${\sigma}$ is a random permutation in ${S_n}$, and ${C_i}$ is a randomly selected cycle of ${\sigma}$ (with each ${i}$ being selected with probability ${|C_i|/n}$), then ${|C_i|}$ is exactly uniformly distributed on ${\{1,\dots,n\}}$. (See Proposition 8 of this blog post.)
• The set of real numbers ${\{ \frac{|C_i|}{n} \}}$ arising from the cycle decomposition ${\sigma = C_1 \dots C_r}$ of a random permutation ${\sigma \in S_n}$ converges (in a suitable sense) to the Poisson-Dirichlet process in the limit ${n \rightarrow \infty}$. (Again, see this previous blog post for details.)

See this previous blog post (or the aforementioned article of Granville, or the Notices article of Arratia, Barbour, and Tavaré) for further exploration of the analogy between prime factorisation of integers and cycle decomposition of permutations.

There is however something unsatisfying about the analogy, in that it is not clear why there should be such a kinship between integer prime factorisation and permutation cycle decomposition. It turns out that the situation is clarified if one uses another fundamental analogy in number theory, namely the analogy between integers and polynomials ${P \in {\mathbf F}_q[T]}$ over a finite field ${{\mathbf F}_q}$, discussed for instance in this previous post; this is the simplest case of the more general function field analogy between number fields and function fields. Just as we restrict attention to positive integers when talking about prime factorisation, it will be reasonable to restrict attention to monic polynomials ${P}$. We then have another analogous list of facts, proven very similarly to the corresponding list of facts for the integers:

• Every monic polynomial ${f \in {\mathbf F}_q[T]}$ has a factorisation

$\displaystyle f = P_1 \dots P_r$

into irreducible monic polynomials ${P_1,\dots,P_r \in {\mathbf F}_q[T]}$, which is unique up to rearrangement. Taking degrees, we obtain a partition

$\displaystyle \hbox{deg} f = \hbox{deg} P_1 + \dots + \hbox{deg} P_r$

of ${\hbox{deg} f}$.

• (Prime number theorem for polynomials) A randomly selected monic polynomial ${f \in {\mathbf F}_q[T]}$ of degree ${n}$ will be irreducible with probability ${\approx \frac{1}{n}}$ when ${q}$ is fixed and ${n}$ is large.
• If ${f \in {\mathbf F}_q[T]}$ is a random monic polynomial of degree ${n}$, and ${P_i}$ is a random irreducible factor of ${f = P_1 \dots P_r}$ (with each ${i}$ selected with probability ${\hbox{deg} P_i / n}$), then ${\hbox{deg} P_i}$ is approximately uniformly distributed in ${\{1,\dots,n\}}$ when ${q}$ is fixed and ${n}$ is large.
• The set of real numbers ${\{ \hbox{deg} P_i / n \}}$ arising from the factorisation ${f = P_1 \dots P_r}$ of a randomly selected polynomial ${f \in {\mathbf F}_q[T]}$ of degree ${n}$ converges (in a suitable sense) to the Poisson-Dirichlet process when ${q}$ is fixed and ${n}$ is large.

The above list of facts addressed the large ${n}$ limit of the polynomial ring ${{\mathbf F}_q[T]}$, where the order ${q}$ of the field is held fixed, but the degrees of the polynomials go to infinity. This is the limit that is most closely analogous to the integers ${{\bf Z}}$. However, there is another interesting asymptotic limit of polynomial rings to consider, namely the large ${q}$ limit where it is now the degree ${n}$ that is held fixed, but the order ${q}$ of the field goes to infinity. Actually to simplify the exposition we will use the slightly more restrictive limit where the characteristic ${p}$ of the field goes to infinity (again keeping the degree ${n}$ fixed), although all of the results proven below for the large ${p}$ limit turn out to be true as well in the large ${q}$ limit.

The large ${q}$ (or large ${p}$) limit is technically a different limit than the large ${n}$ limit, but in practice the asymptotic statistics of the two limits often agree quite closely. For instance, here is the prime number theorem in the large ${q}$ limit:

Theorem 1 (Prime number theorem) The probability that a random monic polynomial ${f \in {\mathbf F}_q[T]}$ of degree ${n}$ is irreducible is ${\frac{1}{n}+o(1)}$ in the limit where ${n}$ is fixed and the characteristic ${p}$ goes to infinity.

Proof: There are ${q^n}$ monic polynomials ${f \in {\mathbf F}_q[T]}$ of degree ${n}$. If ${f}$ is irreducible, then the ${n}$ zeroes of ${f}$ are distinct and lie in the finite field ${{\mathbf F}_{q^n}}$, but do not lie in any proper subfield of that field. Conversely, every element ${\alpha}$ of ${{\mathbf F}_{q^n}}$ that does not lie in a proper subfield is the root of a unique monic polynomial in ${{\mathbf F}_q[T]}$ of degree ${f}$ (the minimal polynomial of ${\alpha}$). Since the union of all the proper subfields of ${{\mathbf F}_{q^n}}$ has size ${o(q^n)}$, the total number of irreducible polynomials of degree ${n}$ is thus ${\frac{q^n - o(q^n)}{n}}$, and the claim follows. $\Box$

Remark 2 The above argument and inclusion-exclusion in fact gives the well known exact formula ${\frac{1}{n} \sum_{d|n} \mu(\frac{n}{d}) q^d}$ for the number of irreducible monic polynomials of degree ${n}$.

Now we can give a precise connection between the cycle distribution of a random permutation, and (the large ${p}$ limit of) the irreducible factorisation of a polynomial, giving a (somewhat indirect, but still connected) link between permutation cycle decomposition and integer factorisation:

Theorem 3 The partition ${\{ \hbox{deg}(P_1), \dots, \hbox{deg}(P_r) \}}$ of a random monic polynomial ${f= P_1 \dots P_r\in {\mathbf F}_q[T]}$ of degree ${n}$ converges in distribution to the partition ${\{ |C_1|, \dots, |C_r|\}}$ of a random permutation ${\sigma = C_1 \dots C_r \in S_n}$ of length ${n}$, in the limit where ${n}$ is fixed and the characteristic ${p}$ goes to infinity.

We can quickly prove this theorem as follows. We first need a basic fact:

Lemma 4 (Most polynomials square-free in large ${q}$ limit) A random monic polynomial ${f \in {\mathbf F}_q[T]}$ of degree ${n}$ will be square-free with probability ${1-o(1)}$ when ${n}$ is fixed and ${q}$ (or ${p}$) goes to infinity. In a similar spirit, two randomly selected monic polynomials ${f,g}$ of degree ${n,m}$ will be coprime with probability ${1-o(1)}$ if ${n,m}$ are fixed and ${q}$ or ${p}$ goes to infinity.

Proof: For any polynomial ${g}$ of degree ${m}$, the probability that ${f}$ is divisible by ${g^2}$ is at most ${1/q^{2m}}$. Summing over all polynomials of degree ${1 \leq m \leq n/2}$, and using the union bound, we see that the probability that ${f}$ is not squarefree is at most ${\sum_{1 \leq m \leq n/2} \frac{q^m}{q^{2m}} = o(1)}$, giving the first claim. For the second, observe from the first claim (and the fact that ${fg}$ has only a bounded number of factors) that ${fg}$ is squarefree with probability ${1-o(1)}$, giving the claim. $\Box$

Now we can prove the theorem. Elementary combinatorics tells us that the probability of a random permutation ${\sigma \in S_n}$ consisting of ${c_k}$ cycles of length ${k}$ for ${k=1,\dots,r}$, where ${c_k}$ are nonnegative integers with ${\sum_{k=1}^r k c_k = n}$, is precisely

$\displaystyle \frac{1}{\prod_{k=1}^r c_k! k^{c_k}},$

since there are ${\prod_{k=1}^r c_k! k^{c_k}}$ ways to write a given tuple of cycles ${C_1,\dots,C_r}$ in cycle notation in nondecreasing order of length, and ${n!}$ ways to select the labels for the cycle notation. On the other hand, by Theorem 1 (and using Lemma 4 to isolate the small number of cases involving repeated factors) the number of monic polynomials of degree ${n}$ that are the product of ${c_k}$ irreducible polynomials of degree ${k}$ is

$\displaystyle \frac{1}{\prod_{k=1}^r c_k!} \prod_{k=1}^r ( (\frac{1}{k}+o(1)) q^k )^{c_k} + o( q^n )$

which simplifies to

$\displaystyle \frac{1+o(1)}{\prod_{k=1}^r c_k! k^{c_k}} q^n,$

and the claim follows.

This was a fairly short calculation, but it still doesn’t quite explain why there is such a link between the cycle decomposition ${\sigma = C_1 \dots C_r}$ of permutations and the factorisation ${f = P_1 \dots P_r}$ of a polynomial. One immediate thought might be to try to link the multiplication structure of permutations in ${S_n}$ with the multiplication structure of polynomials; however, these structures are too dissimilar to set up a convincing analogy. For instance, the multiplication law on polynomials is abelian and non-invertible, whilst the multiplication law on ${S_n}$ is (extremely) non-abelian but invertible. Also, the multiplication of a degree ${n}$ and a degree ${m}$ polynomial is a degree ${n+m}$ polynomial, whereas the group multiplication law on permutations does not take a permutation in ${S_n}$ and a permutation in ${S_m}$ and return a permutation in ${S_{n+m}}$.

I recently found (after some discussions with Ben Green) what I feel to be a satisfying conceptual (as opposed to computational) explanation of this link, which I will place below the fold.

Let ${V}$ be a quasiprojective variety defined over a finite field ${{\bf F}_q}$, thus for instance ${V}$ could be an affine variety

$\displaystyle V = \{ x \in {\bf A}^d: P_1(x) = \dots = P_m(x) = 0\} \ \ \ \ \ (1)$

where ${{\bf A}^d}$ is ${d}$-dimensional affine space and ${P_1,\dots,P_m: {\bf A}^d \rightarrow {\bf A}}$ are a finite collection of polynomials with coefficients in ${{\bf F}_q}$. Then one can define the set ${V[{\bf F}_q]}$ of ${{\bf F}_q}$-rational points, and more generally the set ${V[{\bf F}_{q^n}]}$ of ${{\bf F}_{q^n}}$-rational points for any ${n \geq 1}$, since ${{\bf F}_{q^n}}$ can be viewed as a field extension of ${{\bf F}_q}$. Thus for instance in the affine case (1) we have

$\displaystyle V[{\bf F}_{q^n}] := \{ x \in {\bf F}_{q^n}^d: P_1(x) = \dots = P_m(x) = 0\}.$

The Weil conjectures are concerned with understanding the number

$\displaystyle S_n := |V[{\bf F}_{q^n}]| \ \ \ \ \ (2)$

of ${{\bf F}_{q^n}}$-rational points over a variety ${V}$. The first of these conjectures was proven by Dwork, and can be phrased as follows.

Theorem 1 (Rationality of the zeta function) Let ${V}$ be a quasiprojective variety defined over a finite field ${{\bf F}_q}$, and let ${S_n}$ be given by (2). Then there exist a finite number of algebraic integers ${\alpha_1,\dots,\alpha_k, \beta_1,\dots,\beta_{k'} \in O_{\overline{{\bf Q}}}}$ (known as characteristic values of ${V}$), such that

$\displaystyle S_n = \alpha_1^n + \dots + \alpha_k^n - \beta_1^n - \dots - \beta_{k'}^n$

for all ${n \geq 1}$.

After cancelling, we may of course assume that ${\alpha_i \neq \beta_j}$ for any ${i=1,\dots,k}$ and ${j=1,\dots,k'}$, and then it is easy to see (as we will see below) that the ${\alpha_1,\dots,\alpha_k,\beta_1,\dots,\beta_{k'}}$ become uniquely determined up to permutations of the ${\alpha_1,\dots,\alpha_k}$ and ${\beta_1,\dots,\beta_{k'}}$. These values are known as the characteristic values of ${V}$. Since ${S_n}$ is a rational integer (i.e. an element of ${{\bf Z}}$) rather than merely an algebraic integer (i.e. an element of the ring of integers ${O_{\overline{{\bf Q}}}}$ of the algebraic closure ${\overline{{\bf Q}}}$ of ${{\bf Q}}$), we conclude from the above-mentioned uniqueness that the set of characteristic values are invariant with respect to the Galois group ${Gal(\overline{{\bf Q}} / {\bf Q} )}$. To emphasise this Galois invariance, we will not fix a specific embedding ${\iota_\infty: \overline{{\bf Q}} \rightarrow {\bf C}}$ of the algebraic numbers into the complex field ${{\bf C} = {\bf C}_\infty}$, but work with all such embeddings simultaneously. (Thus, for instance, ${\overline{{\bf Q}}}$ contains three cube roots of ${2}$, but which of these is assigned to the complex numbers ${2^{1/3}}$, ${e^{2\pi i/3} 2^{1/3}}$, ${e^{4\pi i/3} 2^{1/3}}$ will depend on the choice of embedding ${\iota_\infty}$.)

An equivalent way of phrasing Dwork’s theorem is that the (${T}$-form of the) zeta function

$\displaystyle \zeta_V(T) := \exp( \sum_{n=1}^\infty \frac{S_n}{n} T^n )$

associated to ${V}$ (which is well defined as a formal power series in ${T}$, at least) is equal to a rational function of ${T}$ (with the ${\alpha_1,\dots,\alpha_k}$ and ${\beta_1,\dots,\beta_{k'}}$ being the poles and zeroes of ${\zeta_V}$ respectively). Here, we use the formal exponential

$\displaystyle \exp(X) := 1 + X + \frac{X^2}{2!} + \frac{X^3}{3!} + \dots.$

Equivalently, the (${s}$-form of the) zeta-function ${s \mapsto \zeta_V(q^{-s})}$ is a meromorphic function on the complex numbers ${{\bf C}}$ which is also periodic with period ${2\pi i/\log q}$, and which has only finitely many poles and zeroes up to this periodicity.

Dwork’s argument relies primarily on ${p}$-adic analysis – an analogue of complex analysis, but over an algebraically complete (and metrically complete) extension ${{\bf C}_p}$ of the ${p}$-adic field ${{\bf Q}_p}$, rather than over the Archimedean complex numbers ${{\bf C}}$. The argument is quite effective, and in particular gives explicit upper bounds for the number ${k+k'}$ of characteristic values in terms of the complexity of the variety ${V}$; for instance, in the affine case (1) with ${V}$ of degree ${D}$, Bombieri used Dwork’s methods (in combination with Deligne’s theorem below) to obtain the bound ${k+k' \leq (4D+9)^{2d+1}}$, and a subsequent paper of Hooley established the slightly weaker bound ${k+k' \leq (11D+11)^{d+m+2}}$ purely from Dwork’s methods (a similar bound had also been pointed out in unpublished work of Dwork). In particular, one has bounds that are uniform in the field ${{\bf F}_q}$, which is an important fact for many analytic number theory applications.

These ${p}$-adic arguments stand in contrast with Deligne’s resolution of the last (and deepest) of the Weil conjectures:

Theorem 2 (Riemann hypothesis) Let ${V}$ be a quasiprojective variety defined over a finite field ${{\bf F}_q}$, and let ${\lambda \in \overline{{\bf Q}}}$ be a characteristic value of ${V}$. Then there exists a natural number ${w}$ such that ${|\iota_\infty(\lambda)|_\infty = q^{w/2}}$ for every embedding ${\iota_\infty: \overline{{\bf Q}} \rightarrow {\bf C}}$, where ${| |_\infty}$ denotes the usual absolute value on the complex numbers ${{\bf C} = {\bf C}_\infty}$. (Informally: ${\lambda}$ and all of its Galois conjugates have complex magnitude ${q^{w/2}}$.)

To put it another way that closely resembles the classical Riemann hypothesis, all the zeroes and poles of the ${s}$-form ${s \mapsto \zeta_V(q^{-s})}$ lie on the critical lines ${\{ s \in {\bf C}: \hbox{Re}(s) = \frac{w}{2} \}}$ for ${w=0,1,2,\dots}$. (See this previous blog post for further comparison of various instantiations of the Riemann hypothesis.) Whereas Dwork uses ${p}$-adic analysis, Deligne uses the essentially orthogonal technique of ell-adic cohomology to establish his theorem. However, ell-adic methods can be used (via the Grothendieck-Lefschetz trace formula) to establish rationality, and conversely, in this paper of Kedlaya p-adic methods are used to establish the Riemann hypothesis. As pointed out by Kedlaya, the ell-adic methods are tied to the intrinsic geometry of ${V}$ (such as the structure of sheaves and covers over ${V}$), while the ${p}$-adic methods are more tied to the extrinsic geometry of ${V}$ (how ${V}$ sits inside its ambient affine or projective space).

In this post, I would like to record my notes on Dwork’s proof of Theorem 1, drawing heavily on the expositions of Serre, Hooley, Koblitz, and others.

The basic strategy is to control the rational integers ${S_n}$ both in an “Archimedean” sense (embedding the rational integers inside the complex numbers ${{\bf C}_\infty}$ with the usual norm ${||_\infty}$) as well as in the “${p}$-adic” sense, with ${p}$ the characteristic of ${{\bf F}_q}$ (embedding the integers now in the “complexification” ${{\bf C}_p}$ of the ${p}$-adic numbers ${{\bf Q}_p}$, which is equipped with a norm ${||_p}$ that we will recall later). (This is in contrast to the methods of ell-adic cohomology, in which one primarily works over an ${\ell}$-adic field ${{\bf Q}_\ell}$ with ${\ell \neq p,\infty}$.) The Archimedean control is trivial:

Proposition 3 (Archimedean control of ${S_n}$) With ${S_n}$ as above, and any embedding ${\iota_\infty: \overline{{\bf Q}} \rightarrow {\bf C}}$, we have

$\displaystyle |\iota_\infty(S_n)|_\infty \leq C q^{A n}$

for all ${n}$ and some ${C, A >0}$ independent of ${n}$.

Proof: Since ${S_n}$ is a rational integer, ${|\iota_\infty(S_n)|_\infty}$ is just ${|S_n|_\infty}$. By decomposing ${V}$ into affine pieces, we may assume that ${V}$ is of the affine form (1), then we trivially have ${|S_n|_\infty \leq q^{nd}}$, and the claim follows. $\Box$

Another way of thinking about this Archimedean control is that it guarantees that the zeta function ${T \mapsto \zeta_V(T)}$ can be defined holomorphically on the open disk in ${{\bf C}_\infty}$ of radius ${q^{-A}}$ centred at the origin.

The ${p}$-adic control is significantly more difficult, and is the main component of Dwork’s argument:

Proposition 4 (${p}$-adic control of ${S_n}$) With ${S_n}$ as above, and using an embedding ${\iota_p: \overline{{\bf Q}} \rightarrow {\bf C}_p}$ (defined later) with ${p}$ the characteristic of ${{\bf F}_q}$, we can find for any real ${A > 0}$ a finite number of elements ${\alpha_1,\dots,\alpha_k,\beta_1,\dots,\beta_{k'} \in {\bf C}_p}$ such that

$\displaystyle |\iota_p(S_n) - (\alpha_1^n + \dots + \alpha_k^n - \beta_1^n - \dots - \beta_{k'}^n)|_p \leq q^{-An}$

for all ${n}$.

Another way of thinking about this ${p}$-adic control is that it guarantees that the zeta function ${T \mapsto \zeta_V(T)}$ can be defined meromorphically on the entire ${p}$-adic complex field ${{\bf C}_p}$.

Proposition 4 is ostensibly much weaker than Theorem 1 because of (a) the error term of ${p}$-adic magnitude at most ${Cq^{-An}}$; (b) the fact that the number ${k+k'}$ of potential characteristic values here may go to infinity as ${A \rightarrow \infty}$; and (c) the potential characteristic values ${\alpha_1,\dots,\alpha_k,\beta_1,\dots,\beta_{k'}}$ only exist inside the complexified ${p}$-adics ${{\bf C}_p}$, rather than in the algebraic integers ${O_{\overline{{\bf Q}}}}$. However, it turns out that by combining ${p}$-adic control on ${S_n}$ in Proposition 4 with the trivial control on ${S_n}$ in Proposition 3, one can obtain Theorem 1 by an elementary argument that does not use any further properties of ${S_n}$ (other than the obvious fact that the ${S_n}$ are rational integers), with the ${A}$ in Proposition 4 chosen to exceed the ${A}$ in Proposition 3. We give this argument (essentially due to Borel) below the fold.

The proof of Proposition 4 can be split into two pieces. The first piece, which can be viewed as the number-theoretic component of the proof, uses external descriptions of ${V}$ such as (1) to obtain the following decomposition of ${S_n}$:

Proposition 5 (Decomposition of ${S_n}$) With ${\iota_p}$ and ${S_n}$ as above, we can decompose ${\iota_p(S_n)}$ as a finite linear combination (over the integers) of sequences ${S'_n \in {\bf C}_p}$, such that for each such sequence ${n \mapsto S'_n}$, the zeta functions

$\displaystyle \zeta'(T) := \exp( \sum_{n=1}^\infty \frac{S'_n}{n} T^n ) = \sum_{n=0}^\infty c_n T^n$

are entire in ${{\bf C}_p}$, by which we mean that

$\displaystyle |c_n|_p^{1/n} \rightarrow 0$

as ${n \rightarrow \infty}$.

This proposition will ultimately be a consequence of the properties of the Teichmuller lifting ${\tau: \overline{{\bf F}_p}^\times \rightarrow {\bf C}_p^\times}$.

The second piece, which can be viewed as the “${p}$-adic complex analytic” component of the proof, relates the ${p}$-adic entire nature of a zeta function with control on the associated sequence ${S'_n}$, and can be interpreted (after some manipulation) as a ${p}$-adic version of the Weierstrass preparation theorem:

Proposition 6 (${p}$-adic Weierstrass preparation theorem) Let ${S'_n}$ be a sequence in ${{\bf C}_p}$, such that the zeta function

$\displaystyle \zeta'(T) := \exp( \sum_{n=1}^\infty \frac{S'_n}{n} T^n )$

is entire in ${{\bf C}_p}$. Then for any real ${A > 0}$, there exist a finite number of elements ${\beta_1,\dots,\beta_{k'} \in {\bf C}_p}$ such that

$\displaystyle |\iota_p(S'_n) + \beta_1^n + \dots + \beta_{k'}^n|_p \leq q^{-An}$

for all ${n}$ and some ${C>0}$.

Clearly, the combination of Proposition 5 and Proposition 6 (and the non-Archimedean nature of the ${||_p}$ norm) imply Proposition 4.

Let ${{\bf F}_q}$ be a finite field of order ${q = p^n}$, and let ${C}$ be an absolutely irreducible smooth projective curve defined over ${{\bf F}_q}$ (and hence over the algebraic closure ${k := \overline{{\bf F}_q}}$ of that field). For instance, ${C}$ could be the projective elliptic curve

$\displaystyle C = \{ [x,y,z]: y^2 z = x^3 + ax z^2 + b z^3 \}$

in the projective plane ${{\bf P}^2 = \{ [x,y,z]: (x,y,z) \neq (0,0,0) \}}$, where ${a,b \in {\bf F}_q}$ are coefficients whose discriminant ${-16(4a^3+27b^2)}$ is non-vanishing, which is the projective version of the affine elliptic curve

$\displaystyle \{ (x,y): y^2 = x^3 + ax + b \}.$

To each such curve ${C}$ one can associate a genus ${g}$, which we will define later; for instance, elliptic curves have genus ${1}$. We can also count the cardinality ${|C({\bf F}_q)|}$ of the set ${C({\bf F}_q)}$ of ${{\bf F}_q}$-points of ${C}$. The Hasse-Weil bound relates the two:

Theorem 1 (Hasse-Weil bound) ${||C({\bf F}_q)| - q - 1| \leq 2g\sqrt{q}}$.

The usual proofs of this bound proceed by first establishing a trace formula of the form

$\displaystyle |C({\bf F}_{p^n})| = p^n - \sum_{i=1}^{2g} \alpha_i^n + 1 \ \ \ \ \ (1)$

for some complex numbers ${\alpha_1,\dots,\alpha_{2g}}$ independent of ${n}$; this is in fact a special case of the Lefschetz-Grothendieck trace formula, and can be interpreted as an assertion that the zeta function associated to the curve ${C}$ is rational. The task is then to establish a bound ${|\alpha_i| \leq \sqrt{p}}$ for all ${i=1,\dots,2g}$; this (or more precisely, the slightly stronger assertion ${|\alpha_i| = \sqrt{p}}$) is the Riemann hypothesis for such curves. This can be done either by passing to the Jacobian variety of ${C}$ and using a certain duality available on the cohomology of such varieties, known as Rosati involution; alternatively, one can pass to the product surface ${C \times C}$ and apply the Riemann-Roch theorem for that surface.

In 1969, Stepanov introduced an elementary method (a version of what is now known as the polynomial method) to count (or at least to upper bound) the quantity ${|C({\bf F}_q)|}$. The method was initially restricted to hyperelliptic curves, but was soon extended to general curves. In particular, Bombieri used this method to give a short proof of the following weaker version of the Hasse-Weil bound:

Theorem 2 (Weak Hasse-Weil bound) If ${q}$ is a perfect square, and ${q \geq (g+1)^4}$, then ${|C({\bf F}_q)| \leq q + (2g+1) \sqrt{q} + 1}$.

In fact, the bound on ${|C({\bf F}_q)|}$ can be sharpened a little bit further, as we will soon see.

Theorem 2 is only an upper bound on ${|C({\bf F}_q)|}$, but there is a Galois-theoretic trick to convert (a slight generalisation of) this upper bound to a matching lower bound, and if one then uses the trace formula (1) (and the “tensor power trick” of sending ${n}$ to infinity to control the weights ${\alpha_i}$) one can then recover the full Hasse-Weil bound. We discuss these steps below the fold.

I’ve discussed Bombieri’s proof of Theorem 2 in this previous post (in the special case of hyperelliptic curves), but now wish to present the full proof, with some minor simplifications from Bombieri’s original presentation; it is mostly elementary, with the deepest fact from algebraic geometry needed being Riemann’s inequality (a weak form of the Riemann-Roch theorem).

The first step is to reinterpret ${|C({\bf F}_q)|}$ as the number of points of intersection between two curves ${C_1,C_2}$ in the surface ${C \times C}$. Indeed, if we define the Frobenius endomorphism ${\hbox{Frob}_q}$ on any projective space by

$\displaystyle \hbox{Frob}_q( [x_0,\dots,x_n] ) := [x_0^q, \dots, x_n^q]$

then this map preserves the curve ${C}$, and the fixed points of this map are precisely the ${{\bf F}_q}$ points of ${C}$:

$\displaystyle C({\bf F}_q) = \{ z \in C: \hbox{Frob}_q(z) = z \}.$

Thus one can interpret ${|C({\bf F}_q)|}$ as the number of points of intersection between the diagonal curve

$\displaystyle \{ (z,z): z \in C \}$

and the Frobenius graph

$\displaystyle \{ (z, \hbox{Frob}_q(z)): z \in C \}$

which are copies of ${C}$ inside ${C \times C}$. But we can use the additional hypothesis that ${q}$ is a perfect square to write this more symmetrically, by taking advantage of the fact that the Frobenius map has a square root

$\displaystyle \hbox{Frob}_q = \hbox{Frob}_{\sqrt{q}}^2$

with ${\hbox{Frob}_{\sqrt{q}}}$ also preserving ${C}$. One can then also interpret ${|C({\bf F}_q)|}$ as the number of points of intersection between the curve

$\displaystyle C_1 := \{ (z, \hbox{Frob}_{\sqrt{q}}(z)): z \in C \} \ \ \ \ \ (2)$

and its transpose

$\displaystyle C_2 := \{ (\hbox{Frob}_{\sqrt{q}}(w), w): w \in C \}.$

Let ${k(C \times C)}$ be the field of rational functions on ${C \times C}$ (with coefficients in ${k}$), and define ${k(C_1)}$, ${k(C_2)}$, and ${k(C_1 \cap C_2)}$ analogously )(although ${C_1 \cap C_2}$ is likely to be disconnected, so ${k(C_1 \cap C_2)}$ will just be a ring rather than a field. We then (morally) have the commuting square

$\displaystyle \begin{array}{ccccc} && k(C \times C) && \\ & \swarrow & & \searrow & \\ k(C_1) & & & & k(C_2) \\ & \searrow & & \swarrow & \\ && k(C_1 \cap C_2) && \end{array},$

if we ignore the issue that a rational function on, say, ${C \times C}$, might blow up on all of ${C_1}$ and thus not have a well-defined restriction to ${C_1}$. We use ${\pi_1: k(C \times C) \rightarrow k(C_1)}$ and ${\pi_2: k(C \times C) \rightarrow k(C_2)}$ to denote the restriction maps. Furthermore, we have obvious isomorphisms ${\iota_1: k(C_1) \rightarrow k(C)}$, ${\iota_2: k(C_2) \rightarrow k(C)}$ coming from composing with the graphing maps ${z \mapsto (z, \hbox{Frob}_{\sqrt{q}}(z))}$ and ${w \mapsto (\hbox{Frob}_{\sqrt{q}}(w), w)}$.

The idea now is to find a rational function ${f \in k(C \times C)}$ on the surface ${C \times C}$ of controlled degree which vanishes when restricted to ${C_1}$, but is non-vanishing (and not blowing up) when restricted to ${C_2}$. On ${C_2}$, we thus get a non-zero rational function ${f \downharpoonright_{C_2}}$ of controlled degree which vanishes on ${C_1 \cap C_2}$ – which then lets us bound the cardinality of ${C_1 \cap C_2}$ in terms of the degree of ${f \downharpoonright_{C_2}}$. (In Bombieri’s original argument, one required vanishing to high order on the ${C_1}$ side, but in our presentation, we have factored out a ${\hbox{Frob}_{\sqrt{q}}}$ term which removes this high order vanishing condition.)

To find this ${f}$, we will use linear algebra. Namely, we will locate a finite-dimensional subspace ${V}$ of ${k(C \times C)}$ (consisting of certain “controlled degree” rational functions) which projects injectively to ${k(C_2)}$, but whose projection to ${k(C_1)}$ has strictly smaller dimension than ${V}$ itself. The rank-nullity theorem then forces the existence of a non-zero element ${P}$ of ${V}$ whose projection to ${k(C_1)}$ vanishes, but whose projection to ${k(C_2)}$ is non-zero.

Now we build ${V}$. Pick a ${{\bf F}_q}$ point ${P_\infty}$ of ${C}$, which we will think of as being a point at infinity. (For the purposes of proving Theorem 2, we may clearly assume that ${C({\bf F}_q)}$ is non-empty.) Thus ${P_\infty}$ is fixed by ${\hbox{Frob}_q}$. To simplify the exposition, we will also assume that ${P_\infty}$ is fixed by the square root ${\hbox{Frob}_{\sqrt{q}}}$ of ${\hbox{Frob}_q}$; in the opposite case when ${\hbox{Frob}_{\sqrt{q}}}$ has order two when acting on ${P_\infty}$, the argument is essentially the same, but all references to ${P_\infty}$ in the second factor of ${C \times C}$ need to be replaced by ${\hbox{Frob}_{\sqrt{q}} P_\infty}$ (we leave the details to the interested reader).

For any natural number ${n}$, define ${R_n}$ to be the set of rational functions ${f \in k(C)}$ which are allowed to have a pole of order up to ${n}$ at ${P_\infty}$, but have no other poles on ${C}$; note that as we are assuming ${C}$ to be smooth, it is unambiguous what a pole is (and what order it will have). (In the fancier language of divisors and Cech cohomology, we have ${R_n = H^0( C, {\mathcal O}_C(-n P_\infty) )}$.) The space ${R_n}$ is clearly a vector space over ${k}$; one can view intuitively as the space of “polynomials” on ${C}$ of “degree” at most ${n}$. When ${n=0}$, ${R_0}$ consists just of the constant functions. Indeed, if ${f \in R_0}$, then the image ${f(C)}$ of ${f}$ avoids ${\infty}$ and so lies in the affine line ${k = {\mathbf P}^1 \backslash \{\infty\}}$; but as ${C}$ is projective, the image ${f(C)}$ needs to be compact (hence closed) in ${{\mathbf P}^1}$, and must therefore be a point, giving the claim.

For higher ${n \geq 1}$, we have the easy relations

$\displaystyle \hbox{dim}(R_{n-1}) \leq \hbox{dim}(R_n) \leq \hbox{dim}(R_{n-1})+1. \ \ \ \ \ (3)$

The former inequality just comes from the trivial inclusion ${R_{n-1} \subset R_n}$. For the latter, observe that if two functions ${f, g}$ lie in ${R_n}$, so that they each have a pole of order at most ${n}$ at ${P_\infty}$, then some linear combination of these functions must have a pole of order at most ${n-1}$ at ${P_\infty}$; thus ${R_{n-1}}$ has codimension at most one in ${R_n}$, giving the claim.

From (3) and induction we see that each of the ${R_n}$ are finite dimensional, with the trivial upper bound

$\displaystyle \hbox{dim}(R_n) \leq n+1. \ \ \ \ \ (4)$

Riemann’s inequality complements this with the lower bound

$\displaystyle \hbox{dim}(R_n) \geq n+1-g, \ \ \ \ \ (5)$

thus one has ${\hbox{dim}(R_n) = \hbox{dim}(R_{n-1})+1}$ for all but at most ${g}$ exceptions (in fact, exactly ${g}$ exceptions as it turns out). This is a consequence of the Riemann-Roch theorem; it can be proven from abstract nonsense (the snake lemma) if one defines the genus ${g}$ in a non-standard fashion (as the dimension of the first Cech cohomology ${H^1(C)}$ of the structure sheaf ${{\mathcal O}_C}$ of ${C}$), but to obtain this inequality with a standard definition of ${g}$ (e.g. as the dimension of the zeroth Cech cohomolgy ${H^0(C, \Omega_C^1)}$ of the line bundle of differentials) requires the more non-trivial tool of Serre duality.

At any rate, now that we have these vector spaces ${R_n}$, we will define ${V \subset k(C \times C)}$ to be a tensor product space

$\displaystyle V = R_\ell \otimes R_m$

for some natural numbers ${\ell, m \geq 0}$ which we will optimise in later. That is to say, ${V}$ is spanned by functions of the form ${(z,w) \mapsto f(z) g(w)}$ with ${f \in R_\ell}$ and ${g \in R_m}$. This is clearly a linear subspace of ${k(C \times C)}$ of dimension ${\hbox{dim}(R_\ell) \hbox{dim}(R_m)}$, and hence by Rieman’s inequality we have

$\displaystyle \hbox{dim}(V) \geq (\ell+1-g) (m+1-g) \ \ \ \ \ (6)$

if

$\displaystyle \ell,m \geq g-1. \ \ \ \ \ (7)$

Observe that ${\iota_1 \circ \pi_1}$ maps a tensor product ${(z,w) \mapsto f(z) g(w)}$ to a function ${z \mapsto f(z) g(\hbox{Frob}_{\sqrt{q}} z)}$. If ${f \in R_\ell}$ and ${g \in R_m}$, then we see that the function ${z \mapsto f(z) g(\hbox{Frob}_{\sqrt{q}} z)}$ has a pole of order at most ${\ell+m\sqrt{q}}$ at ${P_\infty}$. We conclude that

$\displaystyle \iota_1 \circ \pi_1( V ) \subset R_{\ell + m\sqrt{q}} \ \ \ \ \ (8)$

and in particular by (4)

$\displaystyle \hbox{dim}(\pi_1(V)) \leq \ell + m \sqrt{q} + 1 \ \ \ \ \ (9)$

and similarly

$\displaystyle \hbox{dim}(\pi_2(V)) \leq \ell \sqrt{q} + m + 1. \ \ \ \ \ (10)$

We will choose ${m}$ to be a bit bigger than ${\ell}$, to make the ${\pi_2}$ image of ${V}$ smaller than that of ${\pi_1}$. From (6), (10) we see that if we have the inequality

$\displaystyle (\ell+1-g) (m+1-g) > \ell \sqrt{q}+m + 1 \ \ \ \ \ (11)$

(together with (7)) then ${\pi_2}$ cannot be injective.

On the other hand, we have the following basic fact:

Lemma 3 (Injectivity) If

$\displaystyle \ell < \sqrt{q}, \ \ \ \ \ (12)$

then ${\pi_1: V \rightarrow \pi_1(V)}$ is injective.

Proof: From (3), we can find a linear basis ${f_1,\dots,f_a}$ of ${R_\ell}$ such that each of the ${f_i}$ has a distinct order ${d_i}$ of pole at ${P_\infty}$ (somewhere between ${0}$ and ${\ell}$ inclusive). Similarly, we may find a linear basis ${g_1,\dots,g_b}$ of ${R_m}$ such that each of the ${g_j}$ has a distinct order ${e_j}$ of pole at ${P_\infty}$ (somewhere between ${0}$ and ${m}$ inclusive). The functions ${z \mapsto f_i(z) g_j(\hbox{Frob}_{\sqrt{q}} z)}$ then span ${\iota_1(\pi_1(V))}$, and the order of pole at ${P_\infty}$ is ${d_i + \sqrt{q} e_j}$. But since ${\ell < \sqrt{q}}$, these orders are all distinct, and so these functions must be linearly independent. The claim follows. $\Box$

This gives us the following bound:

Proposition 4 Let ${\ell,m}$ be natural numbers such that (7), (11), (12) hold. Then ${|C({\bf F}_q)| \leq \ell + m \sqrt{q}}$.

Proof: As ${\pi_2}$ is not injective, we can find ${f \in V}$ with ${\pi_2(f)}$ vanishing. By the above lemma, the function ${\iota_1(\pi_1(f))}$ is then non-zero, but it must also vanish on ${\iota_1(C_1 \cap C_2)}$, which has cardinality ${|C({\bf F}_q)|}$. On the other hand, by (8), ${\iota_1(\pi_1(f))}$ has a pole of order at most ${\ell+m\sqrt{q}}$ at ${P_\infty}$ and no other poles. Since the number of poles and zeroes of a rational function on a projective curve must add up to zero, the claim follows. $\Box$

If ${q \geq (g+1)^4}$, we may make the explicit choice

$\displaystyle m := \sqrt{q}+2g; \quad \ell := \lfloor \frac{g}{g+1} \sqrt{q} \rfloor + g + 1$

and a brief calculation then gives Theorem 2. In some cases one can optimise things a bit further. For instance, in the genus zero case ${g=0}$ (e.g. if ${C}$ is just the projective line ${{\mathbf P}^1}$) one may take ${\ell=1, m = \sqrt{q}}$ and conclude the absolutely sharp bound ${|C({\bf F}_q)| \leq q+1}$ in this case; in the case of the projective line ${{\mathbf P}^1}$, the function ${f}$ is in fact the very concrete function ${f(z,w) := z - w^{\sqrt{q}}}$.

Remark 1 When ${q = p^{2n+1}}$ is not a perfect square, one can try to run the above argument using the factorisation ${\hbox{Frob}_q = \hbox{Frob}_{p^n} \hbox{Frob}_{p^{n+1}}}$ instead of ${\hbox{Frob}_q = \hbox{Frob}_{\sqrt{q}} \hbox{Frob}_{\sqrt{q}}}$. This gives a weaker version of the above bound, of the shape ${|C({\bf F}_q)| \leq q + O( \sqrt{p} \sqrt{q} )}$. In the hyperelliptic case at least, one can erase this loss by working with a variant of the argument in which one requires ${f}$ to vanish to high order at ${C_1}$, rather than just to first order; see this survey article of mine for details.

Vitaly Bergelson, Tamar Ziegler, and I have just uploaded to the arXiv our joint paper “Multiple recurrence and convergence results associated to ${{\bf F}_{p}^{\omega}}$-actions“. This paper is primarily concerned with limit formulae in the theory of multiple recurrence in ergodic theory. Perhaps the most basic formula of this type is the mean ergodic theorem, which (among other things) asserts that if ${(X,{\mathcal X}, \mu,T)}$ is a measure-preserving ${{\bf Z}}$-system (which, in this post, means that ${(X,{\mathcal X}, \mu)}$ is a probability space and ${T: X \mapsto X}$ is measure-preserving and invertible, thus giving an action ${(T^n)_{n \in {\bf Z}}}$ of the integers), and ${f,g \in L^2(X,{\mathcal X}, \mu)}$ are functions, and ${X}$ is ergodic (which means that ${L^2(X,{\mathcal X}, \mu)}$ contains no ${T}$-invariant functions other than the constants (up to almost everywhere equivalence, of course)), then the average

$\displaystyle \frac{1}{N} \sum_{n=1}^N \int_X f(x) g(T^n x)\ d\mu \ \ \ \ \ (1)$

converges as ${N \rightarrow \infty}$ to the expression

$\displaystyle (\int_X f(x)\ d\mu) (\int_X g(x)\ d\mu);$

see e.g. this previous blog post. Informally, one can interpret this limit formula as an equidistribution result: if ${x}$ is drawn at random from ${X}$ (using the probability measure ${\mu}$), and ${n}$ is drawn at random from ${\{1,\ldots,N\}}$ for some large ${N}$, then the pair ${(x, T^n x)}$ becomes uniformly distributed in the product space ${X \times X}$ (using product measure ${\mu \times \mu}$) in the limit as ${N \rightarrow \infty}$.

If we allow ${(X,\mu)}$ to be non-ergodic, then we still have a limit formula, but it is a bit more complicated. Let ${{\mathcal X}^T}$ be the ${T}$-invariant measurable sets in ${{\mathcal X}}$; the ${{\bf Z}}$-system ${(X, {\mathcal X}^T, \mu, T)}$ can then be viewed as a factor of the original system ${(X, {\mathcal X}, \mu, T)}$, which is equivalent (in the sense of measure-preserving systems) to a trivial system ${(Z_0, {\mathcal Z}_0, \mu_{Z_0}, 1)}$ (known as the invariant factor) in which the shift is trivial. There is then a projection map ${\pi_0: X \rightarrow Z_0}$ to the invariant factor which is a factor map, and the average (1) converges in the limit to the expression

$\displaystyle \int_{Z_0} (\pi_0)_* f(z) (\pi_0)_* g(z)\ d\mu_{Z_0}(x), \ \ \ \ \ (2)$

where ${(\pi_0)_*: L^2(X,{\mathcal X},\mu) \rightarrow L^2(Z_0,{\mathcal Z}_0,\mu_{Z_0})}$ is the pushforward map associated to the map ${\pi_0: X \rightarrow Z_0}$; see e.g. this previous blog post. We can interpret this as an equidistribution result. If ${(x,T^n x)}$ is a pair as before, then we no longer expect complete equidistribution in ${X \times X}$ in the non-ergodic, because there are now non-trivial constraints relating ${x}$ with ${T^n x}$; indeed, for any ${T}$-invariant function ${f: X \rightarrow {\bf C}}$, we have the constraint ${f(x) = f(T^n x)}$; putting all these constraints together we see that ${\pi_0(x) = \pi_0(T^n x)}$ (for almost every ${x}$, at least). The limit (2) can be viewed as an assertion that this constraint ${\pi_0(x) = \pi_0(T^n x)}$ are in some sense the “only” constraints between ${x}$ and ${T^n x}$, and that the pair ${(x,T^n x)}$ is uniformly distributed relative to these constraints.

Limit formulae are known for multiple ergodic averages as well, although the statement becomes more complicated. For instance, consider the expression

$\displaystyle \frac{1}{N} \sum_{n=1}^N \int_X f(x) g(T^n x) h(T^{2n} x)\ d\mu \ \ \ \ \ (3)$

for three functions ${f,g,h \in L^\infty(X, {\mathcal X}, \mu)}$; this is analogous to the combinatorial task of counting length three progressions in various sets. For simplicity we assume the system ${(X,{\mathcal X},\mu,T)}$ to be ergodic. Naively one might expect this limit to then converge to

$\displaystyle (\int_X f\ d\mu) (\int_X g\ d\mu) (\int_X h\ d\mu)$

which would roughly speaking correspond to an assertion that the triplet ${(x,T^n x, T^{2n} x)}$ is asymptotically equidistributed in ${X \times X \times X}$. However, even in the ergodic case there can be additional constraints on this triplet that cannot be seen at the level of the individual pairs ${(x,T^n x)}$, ${(x, T^{2n} x)}$. The key obstruction here is that of eigenfunctions of the shift ${T: X \rightarrow X}$, that is to say non-trivial functions ${f: X \rightarrow S^1}$ that obey the eigenfunction equation ${Tf = \lambda f}$ almost everywhere for some constant (or ${T}$-invariant) ${\lambda}$. Each such eigenfunction generates a constraint

$\displaystyle f(x) \overline{f(T^n x)}^2 f(T^{2n} x) = 1 \ \ \ \ \ (4)$

tying together ${x}$, ${T^n x}$, and ${T^{2n} x}$. However, it turns out that these are in some sense the only constraints on ${x,T^n x, T^{2n} x}$ that are relevant for the limit (3). More precisely, if one sets ${{\mathcal X}_1}$ to be the sub-algebra of ${{\mathcal X}}$ generated by the eigenfunctions of ${T}$, then it turns out that the factor ${(X, {\mathcal X}_1, \mu, T)}$ is isomorphic to a shift system ${(Z_1, {\mathcal Z}_1, \mu_{Z_1}, x \mapsto x+\alpha)}$ known as the Kronecker factor, for some compact abelian group ${Z_1 = (Z_1,+)}$ and some (irrational) shift ${\alpha \in Z_1}$; the factor map ${\pi_1: X \rightarrow Z_1}$ pushes eigenfunctions forward to (affine) characters on ${Z_1}$. It is then known that the limit of (3) is

$\displaystyle \int_\Sigma (\pi_1)_* f(x_0) (\pi_1)_* g(x_1) (\pi_1)_* h(x_2)\ d\mu_\Sigma$

where ${\Sigma \subset Z_1^3}$ is the closed subgroup

$\displaystyle \Sigma = \{ (x_1,x_2,x_3) \in Z_1^3: x_1-2x_2+x_3=0 \}$

and ${\mu_\Sigma}$ is the Haar probability measure on ${\Sigma}$; see this previous blog post. The equation ${x_1-2x_2+x_3=0}$ defining ${\Sigma}$ corresponds to the constraint (4) mentioned earlier. Among other things, this limit formula implies Roth’s theorem, which in the context of ergodic theory is the assertion that the limit (or at least the limit inferior) of (3) is positive when ${f=g=h}$ is non-negative and not identically vanishing.

If one considers a quadruple average

$\displaystyle \frac{1}{N} \sum_{n=1}^N \int_X f(x) g(T^n x) h(T^{2n} x) k(T^{3n} x)\ d\mu \ \ \ \ \ (5)$

(analogous to counting length four progressions) then the situation becomes more complicated still, even in the ergodic case. In addition to the (linear) eigenfunctions that already showed up in the computation of the triple average (3), a new type of constraint also arises from quadratic eigenfunctions ${f: X \rightarrow S^1}$, which obey an eigenfunction equation ${Tf = \lambda f}$ in which ${\lambda}$ is no longer constant, but is now a linear eigenfunction. For such functions, ${f(T^n x)}$ behaves quadratically in ${n}$, and one can compute the existence of a constraint

$\displaystyle f(x) \overline{f(T^n x)}^3 f(T^{2n} x)^3 \overline{f(T^{3n} x)} = 1 \ \ \ \ \ (6)$

between ${x}$, ${T^n x}$, ${T^{2n} x}$, and ${T^{3n} x}$ that is not detected at the triple average level. As it turns out, this is not the only type of constraint relevant for (5); there is a more general class of constraint involving two-step nilsystems which we will not detail here, but see e.g. this previous blog post for more discussion. Nevertheless there is still a similar limit formula to previous examples, involving a special factor ${(Z_2, {\mathcal Z}_2, \mu_{Z_2}, S)}$ which turns out to be an inverse limit of two-step nilsystems; this limit theorem can be extracted from the structural theory in this paper of Host and Kra combined with a limit formula for nilsystems obtained by Lesigne, but will not be reproduced here. The pattern continues to higher averages (and higher step nilsystems); this was first done explicitly by Ziegler, and can also in principle be extracted from the structural theory of Host-Kra combined with nilsystem equidistribution results of Leibman. These sorts of limit formulae can lead to various recurrence results refining Roth’s theorem in various ways; see this paper of Bergelson, Host, and Kra for some examples of this.

The above discussion was concerned with ${{\bf Z}}$-systems, but one can adapt much of the theory to measure-preserving ${G}$-systems for other discrete countable abelian groups ${G}$, in which one now has a family ${(T_g)_{g \in G}}$ of shifts indexed by ${G}$ rather than a single shift, obeying the compatibility relation ${T_{g+h}=T_g T_h}$. The role of the intervals ${\{1,\ldots,N\}}$ in this more general setting is replaced by that of Folner sequences. For arbitrary countable abelian ${G}$, the theory for double averages (1) and triple limits (3) is essentially identical to the ${{\bf Z}}$-system case. But when one turns to quadruple and higher limits, the situation becomes more complicated (and, for arbitrary ${G}$, still not fully understood). However one model case which is now well understood is the finite field case when ${G = {\bf F}_p^\omega = \bigcup_{n=1}^\infty {\bf F}_p^n}$ is an infinite-dimensional vector space over a finite field ${{\bf F}_p}$ (with the finite subspaces ${{\bf F}_p^n}$ then being a good choice for the Folner sequence). Here, the analogue of the structural theory of Host and Kra was worked out by Vitaly, Tamar, and myself in these previous papers (treating the high characteristic and low characteristic cases respectively). In the finite field setting, it turns out that nilsystems no longer appear, and one only needs to deal with linear, quadratic, and higher order eigenfunctions (known collectively as phase polynomials). It is then natural to look for a limit formula that asserts, roughly speaking, that if ${x}$ is drawn at random from a ${{\bf F}_p^\omega}$-system and ${n}$ drawn randomly from a large subspace of ${{\bf F}_p^\omega}$, then the only constraints between ${x, T^n x, \ldots, T^{(p-1)n} x}$ are those that arise from phase polynomials. The main theorem of this paper is to establish this limit formula (which, again, is a little complicated to state explicitly and will not be done here). In particular, we establish for the first time that the limit actually exists (a result which, for ${{\bf Z}}$-systems, was one of the main results of this paper of Host and Kra).

As a consequence, we can recover finite field analogues of most of the results of Bergelson-Host-Kra, though interestingly some of the counterexamples demonstrating sharpness of their results for ${{\bf Z}}$-systems (based on Behrend set constructions) do not seem to be present in the finite field setting (cf. this previous blog post on the cap set problem). In particular, we are able to largely settle the question of when one has a Khintchine-type theorem that asserts that for any measurable set ${A}$ in an ergodic ${{\bf F}_p^\omega}$-system and any ${\epsilon>0}$, one has

$\displaystyle \mu( T_{c_1 n} A \cap \ldots \cap T_{c_k n} A ) > \mu(A)^k - \epsilon$

for a syndetic set of ${n}$, where ${c_1,\ldots,c_k \in {\bf F}_p}$ are distinct residue classes. It turns out that Khintchine-type theorems always hold for ${k=1,2,3}$ (and for ${k=1,2}$ ergodicity is not required), and for ${k=4}$ it holds whenever ${c_1,c_2,c_3,c_4}$ form a parallelogram, but not otherwise (though the counterexample here was such a painful computation that we ended up removing it from the paper, and may end up putting it online somewhere instead), and for larger ${k}$ we could show that the Khintchine property failed for generic choices of ${c_1,\ldots,c_k}$, though the problem of determining exactly the tuples for which the Khintchine property failed looked to be rather messy and we did not completely settle it.

Much as group theory is the study of groups, or graph theory is the study of graphs, model theory is the study of models (also known as structures) of some language ${{\mathcal L}}$ (which, in this post, will always be a single-sorted, first-order language). A structure is a set ${X}$, equipped with one or more operations, constants, and relations. This is of course an extremely general type of mathematical object, but (quite remarkably) one can still say a substantial number of interesting things about very broad classes of structures.

We will observe the common abuse of notation of using the set ${X}$ as a metonym for the entire structure, much as we usually refer to a group ${(G,1,\cdot,()^{-1})}$ simply as ${G}$, a vector space ${(V, 0, +, \cdot)}$ simply as ${V}$, and so forth. Following another common bending of the rules, we also allow some operations on structures (such as the multiplicative inverse operation on a group or field) to only be partially defined, and we allow use of the usual simplifying conventions for mathematical formulas (e.g. writing ${a+b+c}$ instead of ${(a+b)+c}$ or ${a+(b+c)}$, in cases where associativity is known). We will also deviate slightly from the usual practice in logic by emphasising individual structures, rather than the theory of general classes of structures; for instance, we will talk about the theory of a single field such as ${{\bf R}}$ or ${{\bf C}}$, rather than the theory of all fields of a certain type (e.g. real closed fields or algebraically closed fields).

Once one has a structure ${X}$, one can introduce the notion of a definable subset of ${X}$, or more generally of a Cartesian power ${X^n}$ of ${X}$, defined as a set ${E \subset X^n}$ of the form

$\displaystyle E = \{ (x_1,\ldots,x_n): P(x_1,\ldots,x_n) \hbox{ true} \} \ \ \ \ \ (1)$

for some formula ${P}$ in the language ${{\mathcal L}}$ with ${n}$ free variables and any number of constants from ${X}$ (that is, ${P(x_1,\ldots,x_n)}$ is a well-formed formula built up from a finite number of constants ${c_1,\ldots,c_m}$ in ${X}$, the relations and operations on ${X}$, logical connectives such as ${\neg}$, ${\wedge}$, ${\implies}$, and the quantifiers ${\forall, \exists}$). Thus, for instance, in the theory of the arithmetic of the natural numbers ${{\bf N} = ({\bf N}, 0, 1, +, \times)}$, the set of primes ${{\mathcal P}}$ is a definable set, since we have

$\displaystyle {\mathcal P} = \{ x \in {\bf N}: (\exists y: x=y+2) \wedge \neg (\exists z,w: x = (z+2)(w+2)) \}.$

In the theory of the field of reals ${{\bf R} = ({\bf R}, 0, 1, +, -, \times, ()^{-1})}$, the unit circle ${S^1}$ is an example of a definable set,

$\displaystyle S^1 = \{ (x,y) \in {\bf R}^2: x^2+y^2 = 1 \},$

but so is the the complement of the circle,

$\displaystyle {\bf R}^2 \backslash S^1 = \{ (x,y) \in {\bf R}^2: \neg(x^2+y^2 = 1) \}$

and the interval ${[-1,1]}$:

$\displaystyle [-1,1] = \{ x \in {\bf R}: \exists y: x^2+y^2 = 1\}.$

Due to the unlimited use of constants, any finite subset of a power ${X^n}$ of any structure ${X}$ is, by our conventions, definable in that structure. (One can of course also consider definability without parameters (also known as ${0}$-definability), in which arbitrary constants are not permitted, but we will not do so here.)

We can isolate some special subclasses of definable sets:

• An atomic definable set is a set of the form (1) in which ${P()}$ is an atomic formula (i.e. it does not contain any logical connectives or quantifiers).
• A quantifier-free definable set is a set of the form (1) in which ${P()}$ is quantifier-free (i.e. it can contain logical connectives, but does not contain the quantifiers ${\forall, \exists}$).

Example 1 In the theory of a field such as ${{\bf R}}$, an atomic definable set is the same thing as an affine algebraic set (also known as an affine algebraic variety, with the understanding that varieties are not necessarily assumed to be irreducible), and a quantifier-free definable set is known as a constructible set; thus we see that algebraic geometry can be viewed in some sense as a special case of model theory. (Conversely, it can in fact be quite profitable to think of model theory as an abstraction of algebraic geometry; for instance, the concepts of Morley rank and Morley degree in model theory (discussed in this previous blog post) directly generalises the concepts of dimension and degree in algebraic geometry.) Over ${{\bf R}}$, the interval ${[-1,1]}$ is a definable set, but not a quantifier-free definable set (and certainly not an atomic definable set); and similarly for the primes over ${{\bf N}}$.

A quantifier-free definable set in ${X^n}$ is nothing more than a finite boolean combination of atomic definable sets; in other words, the class of quantifier-free definable sets over ${X}$ is the smallest class that contains the atomic definable sets and is closed under boolean operations such as complementation and union (which generate all the other boolean operations). Similarly, the class of definable sets over ${X}$ is the smallest class that contains the quantifier-free definable sets, and is also closed under the operation of projection ${\pi_n: E \mapsto \pi_n(E)}$ from ${X^{n+1}}$ to ${X^n}$ for every natural number ${n}$, where ${\pi_n: X^{n+1} \rightarrow X^n}$ is the map ${\pi_n(x_1,\ldots,x_n,x_{n+1}) := (x_1,\ldots,x_n)}$.

Some structures have the property of enjoying quantifier elimination, which means that every definable set is in fact a quantifier-free definable set, or equivalently that the projection of a quantifier-free definable set is again quantifier-free. For instance, an algebraically closed field ${k}$ (with the field operations) has quantifier elimination (i.e. the projection of a constructible set is again constructible); this fact can be proven by the classical tool of resultants, and among other things can be used to give a proof of Hilbert’s nullstellensatz. (Note though that projection does not necessary preserve the property of being atomic; for instance, the projection of the atomic set ${\{ (x,y) \in k^2: xy=1 \}}$ is the non-atomic, but still quantifier-free definable, set ${\{ x \in k: \neg (k=0) \}}$.) In the converse direction, it is not difficult to use the nullstellensatz to deduce quantifier elimination. For theory of the real field ${{\bf R}}$, which is not algebraically closed, one does not have quantifier elimination, as one can see from the example of the unit circle (which is a quantifier-free definable set) projecting down to the interval ${[-1,1]}$ (which is definable, but not quantifer-free definable). However, if one adds the additional operation of order ${<}$ to the reals, giving it the language of an ordered field rather than just a field, then quantifier elimination is recovered (the class of quantifier-free definable sets now enlarges to match the class of definable sets, which in this case is also the class of semi-algebraic sets); this is the famous Tarski-Seidenberg theorem.

On the other hand, many important structures do not have quantifier elimination; typically, the projection of a quantifier-free definable set is not, in general, quantifier-free definable. This failure of the projection property also shows up in many contexts outside of model theory; for instance, Lebesgue famously made the error of thinking that the projection of a Borel measurable set remained Borel measurable (it is merely an analytic set instead). Turing’s halting theorem can be viewed as an assertion that the projection of a decidable set (also known as a computable or recursive set) is not necessarily decidable (it is merely semi-decidable (or recursively enumerable) instead). The notorious P=NP problem can also be essentially viewed in this spirit; roughly speaking (and glossing over the placement of some quantifiers), it asks whether the projection of a polynomial-time decidable set is again polynomial-time decidable. And so forth. (See this blog post of Dick Lipton for further discussion of the subtleties of projections.)

Now we consider the status of quantifier elimination for the theory of a finite field ${F}$. If interpreted naively, quantifier elimination is trivial for a finite field ${F}$, since every subset of ${F^n}$ is finite and thus quantifier-free definable. However, we can recover an interesting question in one of two (essentially equivalent) ways. One is to work in the asymptotic regime in which the field ${F}$ is large, but the length of the formulae used to construct one’s definable sets stays bounded uniformly in the size of ${F}$ (where we view any constant in ${F}$ as contributing a unit amount to the length of a formula, no matter how large ${F}$ is). A simple counting argument then shows that only a small number of subsets of ${F^n}$ become definable in the asymptotic limit ${|F| \rightarrow \infty}$, since the number of definable sets clearly grows at most polynomially in ${|F|}$ for any fixed bound on the formula length, while the number of all subsets of ${|F|^n}$ grows exponentially in ${n}$.

Another way to proceed is to work not with a single finite field ${F}$, or even with a sequence ${F_m}$ of finite fields, but with the ultraproduct ${F = \prod_{m \rightarrow p} F_m}$ of a sequence of finite fields, and to study the properties of definable sets over this ultraproduct. (We will be using the notation of ultraproducts and nonstandard analysis from this previous blog post.) This approach is equivalent to the more finitary approach mentioned in the previous paragraph, at least if one does not care to track of the exact bounds on the length of the formulae involved. Indeed, thanks to Los’s theorem, a definable subset ${E}$ of ${F^n}$ is nothing more than the ultraproduct ${E = \prod_{m \rightarrow p} E_m}$ of definable subsets ${E_m}$ of ${F_m^n}$ for all ${m}$ sufficiently close to ${p}$, with the length of the formulae used to define ${E_m}$ uniformly bounded in ${m}$. In the language of nonstandard analysis, one can view ${F}$ as a nonstandard finite field.

The ultraproduct ${F}$ of finite fields is an important example of a pseudo-finite field – a field that obeys all the sentences in the languages of fields that finite fields do, but is not necessarily itself a finite field. The model theory of pseudo-finite fields was first studied systematically by Ax (in the same paper where the Ax-Grothendieck theorem, discussed previously on this blog, was established), with important further contributions by Kiefe, by Fried-Sacerdote, by two papers of Chatzidakis-van den Dries-Macintyre, and many other authors.

As mentioned before, quantifier elimination trivially holds for finite fields. But for infinite pseudo-finite fields, such as the ultraproduct ${F = \prod_{m \rightarrow p} F_m}$ of finite fields with ${|F_m|}$ going to infinity, quantifier elimination fails. For instance, in a finite field ${F_m}$, the set ${E_m := \{ x \in F_m: \exists y \in F_m: x=y^2 \}}$ of quadratic residues is a definable set, with a bounded formula length, and so in the ultraproduct ${F =\prod_{m \rightarrow p} F_m}$, the set ${E := \prod_{m\rightarrow p} E_m}$ of nonstandard quadratic residues is also a definable set. However, in one dimension, we see from the factor theorem that the only atomic definable sets are either finite or the whole field ${F}$, and so the only constructible sets (i.e. the only quantifier-free definable sets) are either finite or cofinite in ${F}$. Since the quadratic residues have asymptotic density ${1/2}$ in a large finite field, they cannot form a quantifier-free definable set, despite being definable.

Nevertheless, there is a very nice almost quantifier elimination result for these fields, in characteristic zero at least, which we phrase here as follows:

Theorem 1 (Almost quantifier elimination) Let ${F}$ be a nonstandard finite field of characteristic zero, and let ${E \subset F^n}$ be a definable set over ${F}$. Then ${E}$ is the union of finitely many sets of the form

$\displaystyle E = \{ x \in V(F): \exists t \in F: P(x,t) = 0 \} \ \ \ \ \ (2)$

where ${V(F)}$ is an atomic definable subset of ${F^n}$ (i.e. the ${F}$-points of an algebraic variety ${V}$ defined over ${F}$ in ${F^n}$) and ${P: F^{n+1} \rightarrow F}$ is a polynomial.

Results of this type were first obtained essentially due to Catarina Kiefe, although the formulation here is closer to that of Chatzidakis-van den Dries-Macintyre.

Informally, this theorem says that while we cannot quite eliminate all quantifiers from a definable set over a nonstandard finite field, we can eliminate all but one existential quantifier. Note that negation has also been eliminated in this theorem; for instance, the definable set ${F \backslash \{0\} = \{ x \in F: \neg(x=0) \}}$ uses a negation, but can also be described using a single existential quantifier as ${\{ x \in F: \exists t: xt = 1 \}}$.) I believe that there are more complicated analogues of this result in positive characteristic, but I have not studied this case in detail (Kiefe’s result does not assume characteristic zero, but her conclusion is slightly different from the one given here). In the one-dimensional case ${n=1}$, the only varieties ${V}$ are the affine line and finite sets, and we can simplify the above statement, namely that any definable subset of ${F}$ takes the form ${\{ x\in F: \exists t \in F: P(x,t) = 0 \}}$ for some polynomial ${P}$ (i.e. definable sets in ${F}$ are nothing more than the projections of the ${F}$-points of a plane curve).

There is an equivalent formulation of this theorem for standard finite fields, namely that if ${F}$ is a finite field and ${E \subset F^n}$ is definable using a formula of length at most ${M}$, then ${E}$ can be expressed in the form (2) with the degree of ${P}$ bounded by some quantity ${C_{M,n}}$ depending on ${M}$ and ${n}$, assuming that the characteristic of ${F}$ is sufficiently large depending on ${M, n}$.

The theorem gives quite a satisfactory description of definable sets in either standard or nonstandard finite fields (at least if one does not care about effective bounds in some of the constants, and if one is willing to exclude the small characteristic case); for instance, in conjunction with the Lang-Weil bound discussed in this recent blog post, it shows that any non-empty definable subset of a nonstandard finite field has a nonstandard cardinality of ${(\alpha + O(|F|^{-1/2})) |F|^d}$ for some positive standard rational ${\alpha}$ and integer ${d}$. Equivalently, any non-empty definable subset of ${F^n}$ for some standard finite field ${F}$ using a formula of length at most ${M}$ has a standard cardinality of ${(\alpha + O_{M,n}(|F|^{-1/2})) |F|^d}$ for some positive rational of height ${O_{M,n}(1)}$ and some natural number ${d}$ between ${0}$ and ${n}$. (For instance, in the example of the quadratic residues given above, ${d}$ is equal to ${1}$ and ${\alpha}$ equal to ${1/2}$.) There is a more precise statement to this effect, namely that the Poincaré series of a definable set is rational; see Kiefe’s paper for details.

Below the fold I give a proof of Theorem 1, which relies primarily on the Lang-Weil bound mentioned above.

Let ${F}$ be a finite field, with algebraic closure ${\overline{F}}$, and let ${V}$ be an (affine) algebraic variety defined over ${\overline{F}}$, by which I mean a set of the form

$\displaystyle V = \{ x \in \overline{F}^d: P_1(x) = \ldots = P_m(x) = 0 \}$

for some ambient dimension ${d \geq 0}$, and some finite number of polynomials ${P_1,\ldots,P_m: \overline{F}^d \rightarrow \overline{F}}$. In order to reduce the number of subscripts later on, let us say that ${V}$ has complexity at most ${M}$ if ${d}$, ${m}$, and the degrees of the ${P_1,\ldots,P_m}$ are all less than or equal to ${M}$. Note that we do not require at this stage that ${V}$ be irreducible (i.e. not the union of two strictly smaller varieties), or defined over ${F}$, though we will often specialise to these cases later in this post. (Also, everything said here can also be applied with almost no changes to projective varieties, but we will stick with affine varieties for sake of concreteness.)

One can consider two crude measures of how “big” the variety ${V}$ is. The first measure, which is algebraic geometric in nature, is the dimension ${\hbox{dim}(V)}$ of the variety ${V}$, which is an integer between ${0}$ and ${d}$ (or, depending on convention, ${-\infty}$, ${-1}$, or undefined, if ${V}$ is empty) that can be defined in a large number of ways (e.g. it is the largest ${r}$ for which the generic linear projection from ${V}$ to ${\overline{F}^r}$ is dominant, or the smallest ${r}$ for which the intersection with a generic codimension ${r}$ subspace is non-empty). The second measure, which is number-theoretic in nature, is the number ${|V(F)| = |V \cap F^d|}$ of ${F}$-points of ${V}$, i.e. points ${x = (x_1,\ldots,x_d)}$ in ${V}$ all of whose coefficients lie in the finite field, or equivalently the number of solutions to the system of equations ${P_i(x_1,\ldots,x_d) = 0}$ for ${i=1,\ldots,m}$ with variables ${x_1,\ldots,x_d}$ in ${F}$.

These two measures are linked together in a number of ways. For instance, we have the basic Schwarz-Zippel type bound (which, in this qualitative form, goes back at least to Lemma 1 of the work of Lang and Weil in 1954).

Lemma 1 (Schwarz-Zippel type bound) Let ${V}$ be a variety of complexity at most ${M}$. Then we have ${|V(F)| \ll_M |F|^{\hbox{dim}(V)}}$.

Proof: (Sketch) For the purposes of exposition, we will not carefully track the dependencies of implied constants on the complexity ${M}$, instead simply assuming that all of these quantities remain controlled throughout the argument. (If one wished, one could obtain ineffective bounds on these quantities by an ultralimit argument, as discussed in this previous post, or equivalently by moving everything over to a nonstandard analysis framework; one could also obtain such uniformity using the machinery of schemes.)

We argue by induction on the ambient dimension ${d}$ of the variety ${V}$. The ${d=0}$ case is trivial, so suppose ${d \geq 1}$ and that the claim has already been proven for ${d-1}$. By breaking up ${V}$ into irreducible components we may assume that ${V}$ is irreducible (this requires some control on the number and complexity of these components, but this is available, as discussed in this previous post). For each ${x_1,\ldots,x_{d-1} \in \overline{F}}$, the fibre ${\{ x_d \in \overline{F}: (x_1,\ldots,x_{d-1},x_d) \in V \}}$ is either one-dimensional (and thus all of ${\overline{F}}$) or zero-dimensional. In the latter case, one has ${O_M(1)}$ points in the fibre from the fundamental theorem of algebra (indeed one has a bound of ${D}$ in this case), and ${(x_1,\ldots,x_{d-1})}$ lives in the projection of ${V}$ to ${\overline{F}^{d-1}}$, which is a variety of dimension at most ${\hbox{dim}(V)}$ and controlled complexity, so the contribution of this case is acceptable from the induction hypothesis. In the former case, the fibre contributes ${|F|}$ ${F}$-points, but ${(x_1,\ldots,x_{d-1})}$ lies in a variety in ${\overline{F}^{d-1}}$ of dimension at most ${\hbox{dim}(V)-1}$ (since otherwise ${V}$ would contain a subvariety of dimension at least ${\hbox{dim}(V)+1}$, which is absurd) and controlled complexity, and so the contribution of this case is also acceptable from the induction hypothesis. $\Box$

One can improve the bound on the implied constant to be linear in the degree of ${V}$ (see e.g. Claim 7.2 of this paper of Dvir, Kollar, and Lovett, or Lemma A.3 of this paper of Ellenberg, Oberlin, and myself), but we will not be concerned with these improvements here.

Without further hypotheses on ${V}$, the above upper bound is sharp (except for improvements in the implied constants). For instance, the variety

$\displaystyle V := \{ (x_1,\ldots,x_d) \in \overline{F}^d: \prod_{j=1}^D (x_d - a_j) = 0\},$

where ${a_1,\ldots,a_D \in F}$ are distict, is the union of ${D}$ distinct hyperplanes of dimension ${d-1}$, with ${|V(F)| = D |F|^{d-1}}$ and complexity ${\max(D,d)}$; similar examples can easily be concocted for other choices of ${\hbox{dim}(V)}$. In the other direction, there is also no non-trivial lower bound for ${|V(F)|}$ without further hypotheses on ${V}$. For a trivial example, if ${a}$ is an element of ${\overline{F}}$ that does not lie in ${F}$, then the hyperplane

$\displaystyle V := \{ (x_1,\ldots,x_d) \in \overline{F}^d: x_d - a = 0 \}$

clearly has no ${F}$-points whatsoever, despite being a ${d-1}$-dimensional variety in ${\overline{F}^d}$ of complexity ${d}$. For a slightly less non-trivial example, if ${a}$ is an element of ${F}$ that is not a quadratic residue, then the variety

$\displaystyle V := \{ (x_1,\ldots,x_d) \in \overline{F}^d: x_d^2 - a = 0 \},$

which is the union of two hyperplanes, still has no ${F}$-points, even though this time the variety is defined over ${F}$ instead of ${\overline{F}}$ (by which we mean that the defining polynomial(s) have all of their coefficients in ${F}$). There is however the important Lang-Weil bound that allows for a much better estimate as long as ${V}$ is both defined over ${F}$ and irreducible:

Theorem 2 (Lang-Weil bound) Let ${V}$ be a variety of complexity at most ${M}$. Assume that ${V}$ is defined over ${F}$, and that ${V}$ is irreducible as a variety over ${\overline{F}}$ (i.e. ${V}$ is geometrically irreducible or absolutely irreducible). Then

$\displaystyle |V(F)| = (1 + O_M(|F|^{-1/2})) |F|^{\hbox{dim}(V)}.$

Again, more explicit bounds on the implied constant here are known, but will not be the focus of this post. As the previous examples show, the hypotheses of definability over ${F}$ and geometric irreducibility are both necessary.

The Lang-Weil bound is already non-trivial in the model case ${d=2, \hbox{dim}(V)=1}$ of plane curves:

Theorem 3 (Hasse-Weil bound) Let ${P: \overline{F}^2 \rightarrow \overline{F}}$ be an irreducible polynomial of degree ${D}$ with coefficients in ${F}$. Then

$\displaystyle |\{ (x,y) \in F^2: P(x,y) = 0 \}| = |F| + O_D( |F|^{1/2} ).$

Thus, for instance, if ${a,b \in F}$, then the elliptic curve ${\{ (x,y) \in F^2: y^2 = x^3 + ax + b \}}$ has ${|F| + O(|F|^{1/2})}$ ${F}$-points, a result first established by Hasse. The Hasse-Weil bound is already quite non-trivial, being the analogue of the Riemann hypothesis for plane curves. For hyper-elliptic curves, an elementary proof (due to Stepanov) is discussed in this previous post. For general plane curves, the first proof was by Weil (leading to his famous Weil conjectures); there is also a nice version of Stepanov’s argument due to Bombieri covering this case which is a little less elementary (relying crucially on the Riemann-Roch theorem for the upper bound, and a lifting trick to then get the lower bound), which I briefly summarise later in this post. The full Lang-Weil bound is deduced from the Hasse-Weil bound by an induction argument using generic hyperplane slicing, as I will also summarise later in this post.

The hypotheses of definability over ${F}$ and geometric irreducibility in the Lang-Weil can be removed after inserting a geometric factor:

Corollary 4 (Lang-Weil bound, alternate form) Let ${V}$ be a variety of complexity at most ${M}$. Then one has

$\displaystyle |V(F)| = (c(V) + O_M(|F|^{-1/2})) |F|^{\hbox{dim}(V)}$

where ${c(V)}$ is the number of top-dimensional components of ${V}$ (i.e. geometrically irreducible components of ${V}$ of dimension ${\hbox{dim}(V)}$) that are definable over ${F}$, or equivalently are invariant with respect to the Frobenius endomorphism ${x \mapsto x^{|F|}}$ that defines ${F}$.

Proof: By breaking up a general variety ${V}$ into components (and using Lemma 1 to dispose of any lower-dimensional components), it suffices to establish this claim when ${V}$ is itself geometrically irreducible. If ${V}$ is definable over ${F}$, the claim follows from Theorem 2. If ${V}$ is not definable over ${F}$, then it is not fixed by the Frobenius endomorphism ${Frob}$ (since otherwise one could produce a set of defining polynomials that were fixed by Frobenius and thus defined over ${F}$ by using some canonical basis (such as a reduced Grobner basis) for the associated ideal), and so ${V \cap Frob(V)}$ has strictly smaller dimension than ${V}$. But ${V \cap Frob(V)}$ captures all the ${F}$-points of ${V}$, so in this case the claim follows from Lemma 1. $\Box$

Note that if ${V}$ is reducible but is itself defined over ${F}$, then the Frobenius endomorphism preserves ${V}$ itself, but may permute the components of ${V}$ around. In this case, ${c(V)}$ is the number of fixed points of this permutation action of Frobenius on the components. In particular, ${c(V)}$ is always a natural number between ${0}$ and ${O_M(1)}$; thus we see that regardless of the geometry of ${V}$, the normalised count ${|V(F)|/|F|^{\hbox{dim}(V)}}$ is asymptotically restricted to a bounded range of natural numbers (in the regime where the complexity stays bounded and ${|F|}$ goes to infinity).

Example 1 Consider the variety

$\displaystyle V := \{ (x,y) \in \overline{F}^2: x^2 - ay^2 = 0 \}$

for some non-zero parameter ${a \in F}$. Geometrically (by which we basically mean “when viewed over the algebraically closed field ${\overline{F}}$“), this is the union of two lines, with slopes corresponding to the two square roots of ${a}$. If ${a}$ is a quadratic residue, then both of these lines are defined over ${F}$, and are fixed by Frobenius, and ${c(V) = 2}$ in this case. If ${a}$ is not a quadratic residue, then the lines are not defined over ${F}$, and the Frobenius automorphism permutes the two lines while preserving ${V}$ as a whole, giving ${c(V)=0}$ in this case.

Corollary 4 effectively computes (at least to leading order) the number-theoretic size ${|V(F)|}$ of a variety in terms of geometric information about ${V}$, namely its dimension ${\hbox{dim}(V)}$ and the number ${c(V)}$ of top-dimensional components fixed by Frobenius. It turns out that with a little bit more effort, one can extend this connection to cover not just a single variety ${V}$, but a family of varieties indexed by points in some base space ${W}$. More precisely, suppose we now have two affine varieties ${V,W}$ of bounded complexity, together with a regular map ${\phi: V \rightarrow W}$ of bounded complexity (the definition of complexity of a regular map is a bit technical, see e.g. this paper, but one can think for instance of a polynomial or rational map of bounded degree as a good example). It will be convenient to assume that the base space ${W}$ is irreducible. If the map ${\phi}$ is a dominant map (i.e. the image ${\phi(V)}$ is Zariski dense in ${W}$), then standard algebraic geometry results tell us that the fibres ${\phi^{-1}(\{w\})}$ are an unramified family of ${\hbox{dim}(V)-\hbox{dim}(W)}$-dimensional varieties outside of an exceptional subset ${W'}$ of ${W}$ of dimension strictly smaller than ${\hbox{dim}(W)}$ (and with ${\phi^{-1}(W')}$ having dimension strictly smaller than ${\hbox{dim}(V)}$); see e.g. Section I.6.3 of Shafarevich.

Now suppose that ${V}$, ${W}$, and ${\phi}$ are defined over ${F}$. Then, by Lang-Weil, ${W(F)}$ has ${(1 + O(|F|^{-1/2})) |F|^{\hbox{dim}(W)}}$ ${F}$-points, and by Schwarz-Zippel, for all but ${O( |F|^{\hbox{dim}(W)-1})}$ of these ${F}$-points ${w}$ (the ones that lie in the subvariety ${W'}$), the fibre ${\phi^{-1}(\{w\})}$ is an algebraic variety defined over ${F}$ of dimension ${\hbox{dim}(V)-\hbox{dim}(W)}$. By using ultraproduct arguments (see e.g. Lemma 3.7 of this paper of mine with Emmanuel Breuillard and Ben Green), this variety can be shown to have bounded complexity, and thus by Corollary 4, has ${(c(\phi^{-1}(\{w\})) + O(|F|^{-1/2}) |F|^{\hbox{dim}(V)-\hbox{dim}(W)}}$ ${F}$-points. One can then ask how the quantity ${c(\phi^{-1}(\{w\})}$ is distributed. A simple but illustrative example occurs when ${V=W=F}$ and ${\phi: F \rightarrow F}$ is the polynomial ${\phi(x) := x^2}$. Then ${c(\phi^{-1}(\{w\})}$ equals ${2}$ when ${w}$ is a non-zero quadratic residue and ${0}$ when ${w}$ is a non-zero quadratic non-residue (and ${1}$ when ${w}$ is zero, but this is a negligible fraction of all ${w}$). In particular, in the asymptotic limit ${|F| \rightarrow \infty}$, ${c(\phi^{-1}(\{w\})}$ is equal to ${2}$ half of the time and ${0}$ half of the time.

Now we describe the asymptotic distribution of the ${c(\phi^{-1}(\{w\}))}$. We need some additional notation. Let ${w_0}$ be an ${F}$-point in ${W \backslash W'}$, and let ${\pi_0( \phi^{-1}(\{w_0\}) )}$ be the connected components of the fibre ${\phi^{-1}(\{w_0\})}$. As ${\phi^{-1}(\{w_0\})}$ is defined over ${F}$, this set of components is permuted by the Frobenius endomorphism ${Frob}$. But there is also an action by monodromy of the fundamental group ${\pi_1(W \backslash W')}$ (this requires a certain amount of étale machinery to properly set up, as we are working over a positive characteristic field rather than over the complex numbers, but I am going to ignore this rather important detail here, as I still don’t fully understand it). This fundamental group may be infinite, but (by the étale construction) is always profinite, and in particular has a Haar probability measure, in which every finite index subgroup (and their cosets) are measurable. Thus we may meaningfully talk about elements drawn uniformly at random from this group, so long as we work only with the profinite ${\sigma}$-algebra on ${\pi_1(W \backslash W')}$ that is generated by the cosets of the finite index subgroups of this group (which will be the only relevant sets we need to measure when considering the action of this group on finite sets, such as the components of a generic fibre).

Theorem 5 (Lang-Weil with parameters) Let ${V, W}$ be varieties of complexity at most ${M}$ with ${W}$ irreducible, and let ${\phi: V \rightarrow W}$ be a dominant map of complexity at most ${M}$. Let ${w_0}$ be an ${F}$-point of ${W \backslash W'}$. Then, for any natural number ${a}$, one has ${c(\phi^{-1}(\{w\})) = a}$ for ${(\mathop{\bf P}( X = a ) + O_M(|F|^{-1/2})) |F|^{\hbox{dim}(W)}}$ values of ${w \in W(F)}$, where ${X}$ is the random variable that counts the number of components of a generic fibre ${\phi^{-1}(w_0)}$ that are invariant under ${g \circ Frob}$, where ${g}$ is an element chosen uniformly at random from the étale fundamental group ${\pi_1(W \backslash W')}$. In particular, in the asymptotic limit ${|F| \rightarrow \infty}$, and with ${w}$ chosen uniformly at random from ${W(F)}$, ${c(\phi^{-1}(\{w\}))}$ (or, equivalently, ${|\phi^{-1}(\{w\})(F)| / |F|^{\hbox{dim}(V)-\hbox{dim}(W)}}$) and ${X}$ have the same asymptotic distribution.

This theorem generalises Corollary 4 (which is the case when ${W}$ is just a point, so that ${\phi^{-1}(\{w\})}$ is just ${V}$ and ${g}$ is trivial). Informally, the effect of a non-trivial parameter space ${W}$ on the Lang-Weil bound is to push around the Frobenius map by monodromy for the purposes of counting invariant components, and a randomly chosen set of parameters corresponds to a randomly chosen loop on which to perform monodromy.

Example 2 Let ${V=W=F}$ and ${\phi(x) = x^m}$ for some fixed ${m \geq 1}$; to avoid some technical issues let us suppose that ${m}$ is coprime to ${|F|}$. Then ${W'}$ can be taken to be ${\{0\}}$, and for a base point ${w_0 \in W \backslash W'}$ we can take ${w_0=1}$. The fibre ${\phi^{-1}(\{1\})}$ – the ${m^{th}}$ roots of unity – can be identified with the cyclic group ${{\bf Z}/m{\bf Z}}$ by using a primitive root of unity. The étale fundamental group ${\pi(W \backslash W') = \pi(\overline{F} \backslash 0)}$ is (I think) isomorphic to the profinite closure ${\hat {\bf Z}}$ of the integers ${{\bf Z}}$ (excluding the part of that closure coming from the characteristic of ${F}$). Not coincidentally, the integers ${{\bf Z}}$ are the fundamental group of the complex analogue ${{\bf C} \backslash \{0\}}$ of ${W \backslash W'}$. (Brian Conrad points out to me though that for more complicated varieties, such as covers of ${\overline{F} \backslash \{0\}}$ by a power of the characteristic, the etale fundamental group is more complicated than just a profinite closure of the ordinary fundamental group, due to the presence of Artin-Schreier covers that are only ramified at infinity.) The action of this fundamental group on the fibres ${{\bf Z}/m{\bf Z}}$ can given by translation. Meanwhile, the Frobenius map ${Frob}$ on ${{\bf Z}/m{\bf Z}}$ is given by multiplication by ${|F|}$. A random element ${g \circ Frob}$ then becomes a random affine map ${x \mapsto |F|x+b}$ on ${{\bf Z}/m{\bf Z}}$, where ${b}$ chosen uniformly at random from ${{\bf Z}/m{\bf Z}}$. The number of fixed points of this map is equal to the greatest common divisor ${(|F|-1,m)}$ of ${|F|-1}$ and ${m}$ when ${b}$ is divisible by ${(|F|-1,m)}$, and equal to ${0}$ otherwise. This matches up with the elementary number fact that a randomly chosen non-zero element of ${F}$ will be an ${m^{th}}$ power with probability ${1/(|F|-1,m)}$, and when this occurs, the number of ${m^{th}}$ roots in ${F}$ will be ${(|F|-1,m)}$.

Example 3 (Thanks to Jordan Ellenberg for this example.) Consider a random elliptic curve ${E = \{ y^2 = x^3 + ax + b \}}$, where ${a,b}$ are chosen uniformly at random, and let ${m \geq 1}$. Let ${E[m]}$ be the ${m}$-torsion points of ${E}$ (i.e. those elements ${g \in E}$ with ${mg = 0}$ using the elliptic curve addition law); as a group, this is isomorphic to ${{\bf Z}/m{\bf Z} \times {\bf Z}/m{\bf Z}}$ (assuming that ${F}$ has sufficiently large characteristic, for simplicity), and consider the number of ${F}$ points of ${E[m]}$, which is a random variable taking values in the natural numbers between ${0}$ and ${m^2}$. In this case, the base variety ${W}$ is the modular curve ${X(1)}$, and the covering variety ${V}$ is the modular curve ${X_1(m)}$. The generic fibre here can be identified with ${{\bf Z}/m{\bf Z} \times {\bf Z}/m{\bf Z}}$, the monodromy action projects down to the action of ${SL_2({\bf Z}/m{\bf Z})}$, and the action of Frobenius on this fibre can be shown to be given by a ${2 \times 2}$ matrix with determinant ${|F|}$ (with the exact choice of matrix depending on the choice of fibre and of the identification), so the distribution of the number of ${F}$-points of ${E[m]}$ is asymptotic to the distribution of the number of fixed points ${X}$ of a random linear map of determinant ${|F|}$ on ${{\bf Z}/m{\bf Z} \times {\bf Z}/m{\bf Z}}$.

Theorem 5 seems to be well known “folklore” among arithmetic geometers, though I do not know of an explicit reference for it. I enjoyed deriving it for myself (though my derivation is somewhat incomplete due to my lack of understanding of étale cohomology) from the ordinary Lang-Weil theorem and the moment method. I’m recording this derivation later in this post, mostly for my own benefit (as I am still in the process of learning this material), though perhaps some other readers may also be interested in it.

Caveat: not all details are fully fleshed out in this writeup, particularly those involving the finer points of algebraic geometry and étale cohomology, as my understanding of these topics is not as complete as I would like it to be.

Many thanks to Brian Conrad and Jordan Ellenberg for helpful discussions on these topics.

Ben Green and I have just uploaded to the arXiv our paper “New bounds for Szemeredi’s theorem, Ia: Progressions of length 4 in finite field geometries revisited“, submitted to Proc. Lond. Math. Soc.. This is both an erratum to, and a replacement for, our previous paper “New bounds for Szemeredi’s theorem. I. Progressions of length 4 in finite field geometries“. The main objective in both papers is to bound the quantity ${r_4(F^n)}$ for a vector space ${F^n}$ over a finite field ${F}$ of characteristic greater than ${4}$, where ${r_4(F^n)}$ is defined as the cardinality of the largest subset of ${F^n}$ that does not contain an arithmetic progression of length ${4}$. In our earlier paper, we gave two arguments that bounded ${r_4(F^n)}$ in the regime when the field ${F}$ was fixed and ${n}$ was large. The first “cheap” argument gave the bound

$\displaystyle r_4(F^n) \ll |F|^n \exp( - c \sqrt{\log n} )$

and the more complicated “expensive” argument gave the improvement

$\displaystyle r_4(F^n) \ll |F|^n n^{-c} \ \ \ \ \ (1)$

for some constant ${c>0}$ depending only on ${F}$.

Unfortunately, while the cheap argument is correct, we discovered a subtle but serious gap in our expensive argument in the original paper. Roughly speaking, the strategy in that argument is to employ the density increment method: one begins with a large subset ${A}$ of ${F^n}$ that has no arithmetic progressions of length ${4}$, and seeks to locate a subspace on which ${A}$ has a significantly increased density. Then, by using a “Koopman-von Neumann theorem”, ultimately based on an iteration of the inverse ${U^3}$ theorem of Ben and myself (and also independently by Samorodnitsky), one approximates ${A}$ by a “quadratically structured” function ${f}$, which is (locally) a combination of a bounded number of quadratic phase functions, which one can prepare to be in a certain “locally equidistributed” or “locally high rank” form. (It is this reduction to the high rank case that distinguishes the “expensive” argument from the “cheap” one.) Because ${A}$ has no progressions of length ${4}$, the count of progressions of length ${4}$ weighted by ${f}$ will also be small; by combining this with the theory of equidistribution of quadratic phase functions, one can then conclude that there will be a subspace on which ${f}$ has increased density.

The error in the paper was to conclude from this that the original function ${1_A}$ also had increased density on the same subspace; it turns out that the manner in which ${f}$ approximates ${1_A}$ is not strong enough to deduce this latter conclusion from the former. (One can strengthen the nature of approximation until one restores such a conclusion, but only at the price of deteriorating the quantitative bounds on ${r_4(F^n)}$ one gets at the end of the day to be worse than the cheap argument.)

After trying unsuccessfully to repair this error, we eventually found an alternate argument, based on earlier papers of ourselves and of Bergelson-Host-Kra, that avoided the density increment method entirely and ended up giving a simpler proof of a stronger result than (1), and also gives the explicit value of ${c = 2^{-22}}$ for the exponent ${c}$ in (1). In fact, it gives the following stronger result:

Theorem 1 Let ${A}$ be a subset of ${F^n}$ of density at least ${\alpha}$, and let ${\epsilon>0}$. Then there is a subspace ${W}$ of ${F^n}$ of codimension ${O( \epsilon^{-2^{20}})}$ such that the number of (possibly degenerate) progressions ${a, a+r, a+2r, a+3r}$ in ${A \cap W}$ is at least ${(\alpha^4-\epsilon)|W|^2}$.

The bound (1) is an easy consequence of this theorem after choosing ${\epsilon := \alpha^4/2}$ and removing the degenerate progressions from the conclusion of the theorem.

The main new idea is to work with a local Koopman-von Neumann theorem rather than a global one, trading a relatively weak global approximation to ${1_A}$ with a significantly stronger local approximation to ${1_A}$ on a subspace ${W}$. This is somewhat analogous to how sometimes in graph theory it is more efficient (from the point of view of quantative estimates) to work with a local version of the Szemerédi regularity lemma which gives just a single regular pair of cells, rather than attempting to regularise almost all of the cells. This local approach is well adapted to the inverse ${U^3}$ theorem we use (which also has this local aspect), and also makes the reduction to the high rank case much cleaner. At the end of the day, one ends up with a fairly large subspace ${W}$ on which ${A}$ is quite dense (of density ${\alpha-O(\epsilon)}$) and which can be well approximated by a “pure quadratic” object, namely a function of a small number of quadratic phases obeying a high rank condition. One can then exploit a special positivity property of the count of length four progressions weighted by pure quadratic objects, essentially due to Bergelson-Host-Kra, which then gives the required lower bound.

Tamar Ziegler and I have just uploaded to the arXiv our paper “The inverse conjecture for the Gowers norm over finite fields in low characteristic“, submitted to Annals of Combinatorics. This paper completes another case of the inverse conjecture for the Gowers norm, this time for vector spaces ${{\bf F}^n}$ over a fixed finite field ${{\bf F} = {\bf F}_p}$ of prime order; with Vitaly Bergelson, we had previously established this claim when the characteristic of the field was large, so the main new result here is the extension to the low characteristic case. (The case of a cyclic group ${{\bf Z}/N{\bf Z}}$ or interval ${[N]}$ was established by Ben Green and ourselves in another recent paper. For an arbitrary abelian (or nilpotent) group, a general but less explicit description of the obstructions to Gowers uniformity was recently obtained by Szegedy; the latter result recovers the high-characteristic case of our result (as was done in a subsequent paper of Szegedy), as well as our results with Green, but it is not immediately evident whether Szegedy’s description of the obstructions matches up with the one predicted by the inverse conjecture in low characteristic.)

The statement of the main theorem is as follows. Given a finite-dimensional vector space ${V = {\bf F}^n}$ and a function ${f: V \rightarrow {\bf C}}$, and an integer ${s \geq 0}$, one can define the Gowers uniformity norm ${\|f\|_{U^{s+1}(V)}}$ by the formula

$\displaystyle \|f\|_{U^{s+1}(V)} := \left( \mathop{\bf E}_{x,h_1,\ldots,h_{s+1} \in V} \Delta_{h_1} \ldots \Delta_{h_{s+1}} f(x) \right)^{1/2^{s+1}}$

where ${\Delta_h f(x) := f(x+h) \overline{f(x)}}$. If ${f}$ is bounded in magnitude by ${1}$, it is easy to see that ${\|f\|_{U^{s+1}(V)}}$ is bounded by ${1}$ also, with equality if and only if ${f(x) = e(P)}$ for some non-classical polynomial ${P: V \rightarrow {\bf R}/{\bf Z}}$ of degree at most ${s}$, where ${e(x) := e^{2\pi ix}}$, and a non-classical polynomial of degree at most ${s}$ is a function whose ${s+1^{th}}$ “derivatives” vanish in the sense that

$\displaystyle \partial_{h_1} \ldots \partial_{h_{s+1}} P(x) = 0$

for all ${x,h_1,\ldots,h_{s+1} \in V}$, where ${\partial_h P(x) := P(x+h) - P(x)}$. Our result generalises this to the case when the uniformity norm is not equal to ${1}$, but is still bounded away from zero:

Theorem 1 (Inverse conjecture) Let ${f: V \rightarrow {\bf C}}$ be bounded by ${1}$ with ${\|f\|_{U^{s+1}(V)} \geq \delta > 0}$ for some ${s \geq 0}$. Then there exists a non-classical polynomial ${P: V \rightarrow {\bf R}/{\bf Z}}$ of degree at most ${s}$ such that ${|\langle f, e(P) \rangle_{L^2(V)}| := |{\bf E}_{x \in V} f(x) e(-P(x))| \geq c(s,p, \delta) > 0}$, where ${c(s,p, \delta)}$ is a positive quantity depending only on the indicated parameters.

This theorem is trivial for ${s=0}$, and follows easily from Fourier analysis for ${s=1}$. The case ${s=2}$ was done in odd characteristic by Ben Green and myself, and in even characteristic by Samorodnitsky. In two papers, one with Vitaly Bergelson, we established this theorem in the “high characteristic” case when the characteristic ${p}$ of ${{\bf F}}$ was greater than ${s}$ (in which case there is essentially no distinction between non-classical polynomials and their classical counterparts, as discussed previously on this blog). The need to deal with genuinely non-classical polynomials is the main new difficulty in this paper that was not dealt with in previous literature.

In our previous paper with Bergelson, a “weak” version of the above theorem was proven, in which the polynomial ${P}$ in the conclusion had bounded degree ${O_{s,p}(1)}$, rather than being of degree at most ${s}$. In the current paper, we use this weak inverse theorem to reduce the inverse conjecture to a statement purely about polynomials:

Theorem 2 (Inverse conjecture for polynomials) Let ${s \geq 0}$, and let ${P: V \rightarrow {\bf C}}$ be a non-classical polynomial of degree at most ${s+1}$ such that ${\|e(P)\|_{U^{s+1}(V)} \geq \delta > 0}$. Then ${P}$ has bounded rank in the sense that ${P}$ is a function of ${O_{s,p,\delta}(1)}$ polynomials of degree at most ${s}$.

This type of inverse theorem was first introduced by Bogdanov and Viola. The deduction of Theorem 1 from Theorem 2 and the weak inverse Gowers conjecture is fairly standard, so the main difficulty is to show Theorem 2.

The quantity ${-\log_{|{\bf F}|} \|e(P)\|_{U^{s+1}(V)}^{1/2^{s+1}}}$ of a polynomial ${P}$ of degree at most ${s+1}$ was denoted the analytic rank of ${P}$ by Gowers and Wolf. They observed that the analytic rank of ${P}$ was closely related to the rank of ${P}$, defined as the least number of degree ${s}$ polynomials needed to express ${P}$. For instance, in the quadratic case ${s=1}$ the two ranks are identical (in odd characteristic, at least). For general ${s}$, it was easy to see that bounded rank implied bounded analytic rank; Theorem 2 is the converse statement.

We tried a number of ways to show that bounded analytic rank implied bounded rank, in particular spending a lot of time on ergodic-theoretic approaches, but eventually we settled on a “brute force” approach that relies on classifying those polynomials of bounded analytic rank as precisely as possible. The argument splits up into establishing three separate facts:

1. (Classical case) If a classical polynomial has bounded analytic rank, then it has bounded rank.
2. (Multiplication by ${p}$) If a non-classical polynomial ${P}$ (of degree at most ${s+1}$) has bounded analytic rank, then ${pP}$ (which can be shown to have degree at most ${\max(s-p,0)}$) also has bounded analytic rank.
3. (Division by ${p}$) If ${Q}$ is a non-clsasical polynomial of degree ${\max(s-p,0)}$ of bounded rank, then there is a non-classical polynomial ${P}$ of degree at most ${s+1}$ of bounded rank such that ${pQ=P}$.

The multiplication by ${p}$ and division by ${p}$ facts allow one to easily extend the classical case of the theorem to the non-classical case of the theorem, basically because classical polynomials are the kernel of the multiplication-by-${p}$ homomorphism. Indeed, if ${P}$ is a non-classical polynomial of bounded analytic rank of the right degree, then the multiplication by ${p}$ claim tells us that ${pP}$ also has bounded analytic rank, which by an induction hypothesis implies that ${pP}$ has bounded rank. Applying the division by ${p}$ claim, we find a bounded rank polynomial ${P'}$ such that ${pP = pP'}$, thus ${P}$ differs from ${P'}$ by a classical polynomial, which necessarily has bounded analytic rank, hence bounded rank by the classical claim, and the claim follows.

Of the three claims, the multiplication-by-${p}$ claim is the easiest to prove using known results; after a bit of Fourier analysis, it turns out to follow more or less immediately from the multidimensional Szemerédi theorem over finite fields of Bergelson, Leibman, and McCutcheon (one can also use the density Hales-Jewett theorem here if one desires).

The next easiest claim is the classical case. Here, the idea is to analyse a degree ${s+1}$ classical polynomial ${P: V \rightarrow {\bf F}}$ via its derivative ${d^{s+1} P: V^{s+1} \rightarrow {\bf F}}$, defined by the formula

$\displaystyle d^{s+1} P( h_1,\ldots,h_{s+1}) := \partial_{h_1} \ldots \partial_{h_{s+1}} P(x)$

for any ${x,h_1,\ldots,h_{s+1} \in V}$ (the RHS is independent of ${x}$ as ${P}$ has degree ${s+1}$). This is a multilinear form, and if ${P}$ has bounded analytic rank, this form is biased (in the sense that the mean of ${e(d^{s+1} P)}$ is large). Applying a general equidistribution theorem of Kaufman and Lovett (based on this earlier paper of Green and myself) this implies that ${d^{s+1} P}$ is a function of a bounded number of multilinear forms of lower degree. Using some “regularity lemma” theory to clean up these forms so that they have good equidistribution properties, it is possible to understand exactly how the original multilinear form ${d^{s+1} P}$ depends on these lower degree forms; indeed, the description one eventually obtains is so explicit that one can write down by inspection another bounded rank polynomial ${Q}$ such that ${d^{s+1} P}$ is equal to ${d^{s+1} Q}$. Thus ${P}$ differs from the bounded rank polynomial ${Q}$ by a lower degree error, which is automatically of bounded rank also, and the claim follows.

The trickiest thing to establish is the division by ${p}$ claim. The polynomial ${Q}$ is some function ${F(R_1,\ldots,R_m)}$ of lower degree polynomials ${R_1,\ldots,R_m}$. Ideally, one would like to find a function ${F'(R_1,\ldots,R_m)}$ of the same polynomials with ${pF' = F}$, such that ${F'(R_1,\ldots,R_m)}$ has the correct degree; however, we have counterexamples that show that this is not always possible. (These counterexamples are the main obstruction to making the ergodic theory approach work: in ergodic theory, one is only allowed to work with “measurable” functions, which are roughly analogous in this context to functions of the indicated polynomials ${Q, R_1,\ldots,R_m}$ and their shifts.) To get around this we have to first apply a regularity lemma to place ${R_1,\ldots,R_m}$ in a suitably equidistributed form (although the fact that ${R_1,\ldots,R_m}$ may be non-classical leads to a rather messy and technical description of this equidistribution), and then we have to extend each ${R_j}$ to a higher degree polynomial ${R'_j}$ with ${pR'_j = R_j}$. There is a crucial “exact roots” property of polynomials that allows one to do this, with ${R'_j}$ having degree exactly ${p-1}$ higher than ${R_j}$. It turns out that it is possible to find a function ${P = F'(R'_1,\ldots,R'_m)}$ of these extended polynomials that have the right degree and which solves the required equation ${pP=Q}$; this is established by classifying completely all functions of the equidistributed polynomials ${R_1,\ldots,R_m}$ or ${R'_1,\ldots,R'_m}$ that are of a given degree.

In the previous lectures, we have focused mostly on the equidistribution or linear patterns on a subset of the integers ${{\bf Z}}$, and in particular on intervals ${[N]}$. The integers are of course a very important domain to study in additive combinatorics; but there are also other fundamental model examples of domains to study. One of these is that of a vector space ${V}$ over a finite field ${{\bf F} = {\bf F}_p}$ of prime order. Such domains are of interest in computer science (particularly when ${p=2}$) and also in number theory; but they also serve as an important simplified “dyadic model” for the integers. See this survey article of Green for further discussion of this point.

The additive combinatorics of the integers ${{\bf Z}}$, and of vector spaces ${V}$ over finite fields, are analogous, but not quite identical. For instance, the analogue of an arithmetic progression in ${{\bf Z}}$ is a subspace of ${V}$. In many cases, the finite field theory is a little bit simpler than the integer theory; for instance, subspaces are closed under addition, whereas arithmetic progressions are only “almost” closed under addition in various senses. (For instance, ${[N]}$ is closed under addition approximately half of the time.) However, there are some ways in which the integers are better behaved. For instance, because the integers can be generated by a single generator, a homomorphism from ${{\bf Z}}$ to some other group ${G}$ can be described by a single group element ${g}$: ${n \mapsto g^n}$. However, to specify a homomorphism from a vector space ${V}$ to ${G}$ one would need to specify one group element for each dimension of ${V}$. Thus we see that there is a tradeoff when passing from ${{\bf Z}}$ (or ${[N]}$) to a vector space model; one gains a bounded torsion property, at the expense of conceding the bounded generation property. (Of course, if one wants to deal with arbitrarily large domains, one has to concede one or the other; the only additive groups that have both bounded torsion and boundedly many generators, are bounded.)

The starting point for this course (Notes 1) was the study of equidistribution of polynomials ${P: {\bf Z} \rightarrow {\bf R}/{\bf Z}}$ from the integers to the unit circle. We now turn to the parallel theory of equidistribution of polynomials ${P: V \rightarrow {\bf R}/{\bf Z}}$ from vector spaces over finite fields to the unit circle. Actually, for simplicity we will mostly focus on the classical case, when the polynomials in fact take values in the ${p^{th}}$ roots of unity (where ${p}$ is the characteristic of the field ${{\bf F} = {\bf F}_p}$). As it turns out, the non-classical case is also of importance (particularly in low characteristic), but the theory is more difficult; see these notes for some further discussion.

Jean-Pierre Serre (whose papers are, of course, always worth reading) recently posted a lovely lecture on the arXiv entitled “How to use finite fields for problems concerning infinite fields”. In it, he describes several ways in which algebraic statements over fields of zero characteristic, such as ${{\mathbb C}}$, can be deduced from their positive characteristic counterparts such as ${F_{p^m}}$, despite the fact that there is no non-trivial field homomorphism between the two types of fields. In particular finitary tools, including such basic concepts as cardinality, can now be deployed to establish infinitary results. This leads to some simple and elegant proofs of non-trivial algebraic results which are not easy to establish by other means.

One deduction of this type is based on the idea that positive characteristic fields can partially model zero characteristic fields, and proceeds like this: if a certain algebraic statement failed over (say) ${{\mathbb C}}$, then there should be a “finitary algebraic” obstruction that “witnesses” this failure over ${{\mathbb C}}$. Because this obstruction is both finitary and algebraic, it must also be definable in some (large) finite characteristic, thus leading to a comparable failure over a finite characteristic field. Taking contrapositives, one obtains the claim.

Algebra is definitely not my own field of expertise, but it is interesting to note that similar themes have also come up in my own area of additive combinatorics (and more generally arithmetic combinatorics), because the combinatorics of addition and multiplication on finite sets is definitely of a “finitary algebraic” nature. For instance, a recent paper of Vu, Wood, and Wood establishes a finitary “Freiman-type” homomorphism from (finite subsets of) the complex numbers to large finite fields that allows them to pull back many results in arithmetic combinatorics in finite fields (e.g. the sum-product theorem) to the complex plane. (Van Vu and I also used a similar trick to control the singularity property of random sign matrices by first mapping them into finite fields in which cardinality arguments became available.) And I have a particular fondness for correspondences between finitary and infinitary mathematics; the correspondence Serre discusses is slightly different from the one I discuss for instance in here or here, although there seems to be a common theme of “compactness” (or of model theory) tying these correspondences together.

As one of his examples, Serre cites one of my own favourite results in algebra, discovered independently by Ax and by Grothendieck (and then rediscovered many times since). Here is a special case of that theorem:

Theorem 1 (Ax-Grothendieck theorem, special case) Let ${P: {\mathbb C}^n \rightarrow {\mathbb C}^n}$ be a polynomial map from a complex vector space to itself. If ${P}$ is injective, then ${P}$ is bijective.

The full version of the theorem allows one to replace ${{\mathbb C}^n}$ by an algebraic variety ${X}$ over any algebraically closed field, and for ${P}$ to be an morphism from the algebraic variety ${X}$ to itself, but for simplicity I will just discuss the above special case. This theorem is not at all obvious; it is not too difficult (see Lemma 4 below) to show that the Jacobian of ${P}$ is non-degenerate, but this does not come close to solving the problem since one would then be faced with the notorious Jacobian conjecture. Also, the claim fails if “polynomial” is replaced by “holomorphic”, due to the existence of Fatou-Bieberbach domains.

In this post I would like to give the proof of Theorem 1 based on finite fields as mentioned by Serre, as well as another elegant proof of Rudin that combines algebra with some elementary complex variable methods. (There are several other proofs of this theorem and its generalisations, for instance a topological proof by Borel, which I will not discuss here.)

Update, March 8: Some corrections to the finite field proof. Thanks to Matthias Aschenbrenner also for clarifying the relationship with Tarski’s theorem and some further references.