You are currently browsing the tag archive for the ‘finite fields’ tag.

Vitaly Bergelson, Tamar Ziegler, and I have just uploaded to the arXiv our joint paper “Multiple recurrence and convergence results associated to ${{\bf F}_{p}^{\omega}}$-actions“. This paper is primarily concerned with limit formulae in the theory of multiple recurrence in ergodic theory. Perhaps the most basic formula of this type is the mean ergodic theorem, which (among other things) asserts that if ${(X,{\mathcal X}, \mu,T)}$ is a measure-preserving ${{\bf Z}}$-system (which, in this post, means that ${(X,{\mathcal X}, \mu)}$ is a probability space and ${T: X \mapsto X}$ is measure-preserving and invertible, thus giving an action ${(T^n)_{n \in {\bf Z}}}$ of the integers), and ${f,g \in L^2(X,{\mathcal X}, \mu)}$ are functions, and ${X}$ is ergodic (which means that ${L^2(X,{\mathcal X}, \mu)}$ contains no ${T}$-invariant functions other than the constants (up to almost everywhere equivalence, of course)), then the average

$\displaystyle \frac{1}{N} \sum_{n=1}^N \int_X f(x) g(T^n x)\ d\mu \ \ \ \ \ (1)$

converges as ${N \rightarrow \infty}$ to the expression

$\displaystyle (\int_X f(x)\ d\mu) (\int_X g(x)\ d\mu);$

see e.g. this previous blog post. Informally, one can interpret this limit formula as an equidistribution result: if ${x}$ is drawn at random from ${X}$ (using the probability measure ${\mu}$), and ${n}$ is drawn at random from ${\{1,\ldots,N\}}$ for some large ${N}$, then the pair ${(x, T^n x)}$ becomes uniformly distributed in the product space ${X \times X}$ (using product measure ${\mu \times \mu}$) in the limit as ${N \rightarrow \infty}$.

If we allow ${(X,\mu)}$ to be non-ergodic, then we still have a limit formula, but it is a bit more complicated. Let ${{\mathcal X}^T}$ be the ${T}$-invariant measurable sets in ${{\mathcal X}}$; the ${{\bf Z}}$-system ${(X, {\mathcal X}^T, \mu, T)}$ can then be viewed as a factor of the original system ${(X, {\mathcal X}, \mu, T)}$, which is equivalent (in the sense of measure-preserving systems) to a trivial system ${(Z_0, {\mathcal Z}_0, \mu_{Z_0}, 1)}$ (known as the invariant factor) in which the shift is trivial. There is then a projection map ${\pi_0: X \rightarrow Z_0}$ to the invariant factor which is a factor map, and the average (1) converges in the limit to the expression

$\displaystyle \int_{Z_0} (\pi_0)_* f(z) (\pi_0)_* g(z)\ d\mu_{Z_0}(x), \ \ \ \ \ (2)$

where ${(\pi_0)_*: L^2(X,{\mathcal X},\mu) \rightarrow L^2(Z_0,{\mathcal Z}_0,\mu_{Z_0})}$ is the pushforward map associated to the map ${\pi_0: X \rightarrow Z_0}$; see e.g. this previous blog post. We can interpret this as an equidistribution result. If ${(x,T^n x)}$ is a pair as before, then we no longer expect complete equidistribution in ${X \times X}$ in the non-ergodic, because there are now non-trivial constraints relating ${x}$ with ${T^n x}$; indeed, for any ${T}$-invariant function ${f: X \rightarrow {\bf C}}$, we have the constraint ${f(x) = f(T^n x)}$; putting all these constraints together we see that ${\pi_0(x) = \pi_0(T^n x)}$ (for almost every ${x}$, at least). The limit (2) can be viewed as an assertion that this constraint ${\pi_0(x) = \pi_0(T^n x)}$ are in some sense the “only” constraints between ${x}$ and ${T^n x}$, and that the pair ${(x,T^n x)}$ is uniformly distributed relative to these constraints.

Limit formulae are known for multiple ergodic averages as well, although the statement becomes more complicated. For instance, consider the expression

$\displaystyle \frac{1}{N} \sum_{n=1}^N \int_X f(x) g(T^n x) h(T^{2n} x)\ d\mu \ \ \ \ \ (3)$

for three functions ${f,g,h \in L^\infty(X, {\mathcal X}, \mu)}$; this is analogous to the combinatorial task of counting length three progressions in various sets. For simplicity we assume the system ${(X,{\mathcal X},\mu,T)}$ to be ergodic. Naively one might expect this limit to then converge to

$\displaystyle (\int_X f\ d\mu) (\int_X g\ d\mu) (\int_X h\ d\mu)$

which would roughly speaking correspond to an assertion that the triplet ${(x,T^n x, T^{2n} x)}$ is asymptotically equidistributed in ${X \times X \times X}$. However, even in the ergodic case there can be additional constraints on this triplet that cannot be seen at the level of the individual pairs ${(x,T^n x)}$, ${(x, T^{2n} x)}$. The key obstruction here is that of eigenfunctions of the shift ${T: X \rightarrow X}$, that is to say non-trivial functions ${f: X \rightarrow S^1}$ that obey the eigenfunction equation ${Tf = \lambda f}$ almost everywhere for some constant (or ${T}$-invariant) ${\lambda}$. Each such eigenfunction generates a constraint

$\displaystyle f(x) \overline{f(T^n x)}^2 f(T^{2n} x) = 1 \ \ \ \ \ (4)$

tying together ${x}$, ${T^n x}$, and ${T^{2n} x}$. However, it turns out that these are in some sense the only constraints on ${x,T^n x, T^{2n} x}$ that are relevant for the limit (3). More precisely, if one sets ${{\mathcal X}_1}$ to be the sub-algebra of ${{\mathcal X}}$ generated by the eigenfunctions of ${T}$, then it turns out that the factor ${(X, {\mathcal X}_1, \mu, T)}$ is isomorphic to a shift system ${(Z_1, {\mathcal Z}_1, \mu_{Z_1}, x \mapsto x+\alpha)}$ known as the Kronecker factor, for some compact abelian group ${Z_1 = (Z_1,+)}$ and some (irrational) shift ${\alpha \in Z_1}$; the factor map ${\pi_1: X \rightarrow Z_1}$ pushes eigenfunctions forward to (affine) characters on ${Z_1}$. It is then known that the limit of (3) is

$\displaystyle \int_\Sigma (\pi_1)_* f(x_0) (\pi_1)_* g(x_1) (\pi_1)_* h(x_2)\ d\mu_\Sigma$

where ${\Sigma \subset Z_1^3}$ is the closed subgroup

$\displaystyle \Sigma = \{ (x_1,x_2,x_3) \in Z_1^3: x_1-2x_2+x_3=0 \}$

and ${\mu_\Sigma}$ is the Haar probability measure on ${\Sigma}$; see this previous blog post. The equation ${x_1-2x_2+x_3=0}$ defining ${\Sigma}$ corresponds to the constraint (4) mentioned earlier. Among other things, this limit formula implies Roth’s theorem, which in the context of ergodic theory is the assertion that the limit (or at least the limit inferior) of (3) is positive when ${f=g=h}$ is non-negative and not identically vanishing.

If one considers a quadruple average

$\displaystyle \frac{1}{N} \sum_{n=1}^N \int_X f(x) g(T^n x) h(T^{2n} x) k(T^{3n} x)\ d\mu \ \ \ \ \ (5)$

(analogous to counting length four progressions) then the situation becomes more complicated still, even in the ergodic case. In addition to the (linear) eigenfunctions that already showed up in the computation of the triple average (3), a new type of constraint also arises from quadratic eigenfunctions ${f: X \rightarrow S^1}$, which obey an eigenfunction equation ${Tf = \lambda f}$ in which ${\lambda}$ is no longer constant, but is now a linear eigenfunction. For such functions, ${f(T^n x)}$ behaves quadratically in ${n}$, and one can compute the existence of a constraint

$\displaystyle f(x) \overline{f(T^n x)}^3 f(T^{2n} x)^3 \overline{f(T^{3n} x)} = 1 \ \ \ \ \ (6)$

between ${x}$, ${T^n x}$, ${T^{2n} x}$, and ${T^{3n} x}$ that is not detected at the triple average level. As it turns out, this is not the only type of constraint relevant for (5); there is a more general class of constraint involving two-step nilsystems which we will not detail here, but see e.g. this previous blog post for more discussion. Nevertheless there is still a similar limit formula to previous examples, involving a special factor ${(Z_2, {\mathcal Z}_2, \mu_{Z_2}, S)}$ which turns out to be an inverse limit of two-step nilsystems; this limit theorem can be extracted from the structural theory in this paper of Host and Kra combined with a limit formula for nilsystems obtained by Lesigne, but will not be reproduced here. The pattern continues to higher averages (and higher step nilsystems); this was first done explicitly by Ziegler, and can also in principle be extracted from the structural theory of Host-Kra combined with nilsystem equidistribution results of Leibman. These sorts of limit formulae can lead to various recurrence results refining Roth’s theorem in various ways; see this paper of Bergelson, Host, and Kra for some examples of this.

The above discussion was concerned with ${{\bf Z}}$-systems, but one can adapt much of the theory to measure-preserving ${G}$-systems for other discrete countable abelian groups ${G}$, in which one now has a family ${(T_g)_{g \in G}}$ of shifts indexed by ${G}$ rather than a single shift, obeying the compatibility relation ${T_{g+h}=T_g T_h}$. The role of the intervals ${\{1,\ldots,N\}}$ in this more general setting is replaced by that of Folner sequences. For arbitrary countable abelian ${G}$, the theory for double averages (1) and triple limits (3) is essentially identical to the ${{\bf Z}}$-system case. But when one turns to quadruple and higher limits, the situation becomes more complicated (and, for arbitrary ${G}$, still not fully understood). However one model case which is now well understood is the finite field case when ${G = {\bf F}_p^\omega = \bigcup_{n=1}^\infty {\bf F}_p^n}$ is an infinite-dimensional vector space over a finite field ${{\bf F}_p}$ (with the finite subspaces ${{\bf F}_p^n}$ then being a good choice for the Folner sequence). Here, the analogue of the structural theory of Host and Kra was worked out by Vitaly, Tamar, and myself in these previous papers (treating the high characteristic and low characteristic cases respectively). In the finite field setting, it turns out that nilsystems no longer appear, and one only needs to deal with linear, quadratic, and higher order eigenfunctions (known collectively as phase polynomials). It is then natural to look for a limit formula that asserts, roughly speaking, that if ${x}$ is drawn at random from a ${{\bf F}_p^\omega}$-system and ${n}$ drawn randomly from a large subspace of ${{\bf F}_p^\omega}$, then the only constraints between ${x, T^n x, \ldots, T^{(p-1)n} x}$ are those that arise from phase polynomials. The main theorem of this paper is to establish this limit formula (which, again, is a little complicated to state explicitly and will not be done here). In particular, we establish for the first time that the limit actually exists (a result which, for ${{\bf Z}}$-systems, was one of the main results of this paper of Host and Kra).

As a consequence, we can recover finite field analogues of most of the results of Bergelson-Host-Kra, though interestingly some of the counterexamples demonstrating sharpness of their results for ${{\bf Z}}$-systems (based on Behrend set constructions) do not seem to be present in the finite field setting (cf. this previous blog post on the cap set problem). In particular, we are able to largely settle the question of when one has a Khintchine-type theorem that asserts that for any measurable set ${A}$ in an ergodic ${{\bf F}_p^\omega}$-system and any ${\epsilon>0}$, one has

$\displaystyle \mu( T_{c_1 n} A \cap \ldots \cap T_{c_k n} A ) > \mu(A)^k - \epsilon$

for a syndetic set of ${n}$, where ${c_1,\ldots,c_k \in {\bf F}_p}$ are distinct residue classes. It turns out that Khintchine-type theorems always hold for ${k=1,2,3}$ (and for ${k=1,2}$ ergodicity is not required), and for ${k=4}$ it holds whenever ${c_1,c_2,c_3,c_4}$ form a parallelogram, but not otherwise (though the counterexample here was such a painful computation that we ended up removing it from the paper, and may end up putting it online somewhere instead), and for larger ${k}$ we could show that the Khintchine property failed for generic choices of ${c_1,\ldots,c_k}$, though the problem of determining exactly the tuples for which the Khintchine property failed looked to be rather messy and we did not completely settle it.

Much as group theory is the study of groups, or graph theory is the study of graphs, model theory is the study of models (also known as structures) of some language ${{\mathcal L}}$ (which, in this post, will always be a single-sorted, first-order language). A structure is a set ${X}$, equipped with one or more operations, constants, and relations. This is of course an extremely general type of mathematical object, but (quite remarkably) one can still say a substantial number of interesting things about very broad classes of structures.

We will observe the common abuse of notation of using the set ${X}$ as a metonym for the entire structure, much as we usually refer to a group ${(G,1,\cdot,()^{-1})}$ simply as ${G}$, a vector space ${(V, 0, +, \cdot)}$ simply as ${V}$, and so forth. Following another common bending of the rules, we also allow some operations on structures (such as the multiplicative inverse operation on a group or field) to only be partially defined, and we allow use of the usual simplifying conventions for mathematical formulas (e.g. writing ${a+b+c}$ instead of ${(a+b)+c}$ or ${a+(b+c)}$, in cases where associativity is known). We will also deviate slightly from the usual practice in logic by emphasising individual structures, rather than the theory of general classes of structures; for instance, we will talk about the theory of a single field such as ${{\bf R}}$ or ${{\bf C}}$, rather than the theory of all fields of a certain type (e.g. real closed fields or algebraically closed fields).

Once one has a structure ${X}$, one can introduce the notion of a definable subset of ${X}$, or more generally of a Cartesian power ${X^n}$ of ${X}$, defined as a set ${E \subset X^n}$ of the form

$\displaystyle E = \{ (x_1,\ldots,x_n): P(x_1,\ldots,x_n) \hbox{ true} \} \ \ \ \ \ (1)$

for some formula ${P}$ in the language ${{\mathcal L}}$ with ${n}$ free variables and any number of constants from ${X}$ (that is, ${P(x_1,\ldots,x_n)}$ is a well-formed formula built up from a finite number of constants ${c_1,\ldots,c_m}$ in ${X}$, the relations and operations on ${X}$, logical connectives such as ${\neg}$, ${\wedge}$, ${\implies}$, and the quantifiers ${\forall, \exists}$). Thus, for instance, in the theory of the arithmetic of the natural numbers ${{\bf N} = ({\bf N}, 0, 1, +, \times)}$, the set of primes ${{\mathcal P}}$ is a definable set, since we have

$\displaystyle {\mathcal P} = \{ x \in {\bf N}: (\exists y: x=y+2) \wedge \neg (\exists z,w: x = (z+2)(w+2)) \}.$

In the theory of the field of reals ${{\bf R} = ({\bf R}, 0, 1, +, -, \times, ()^{-1})}$, the unit circle ${S^1}$ is an example of a definable set,

$\displaystyle S^1 = \{ (x,y) \in {\bf R}^2: x^2+y^2 = 1 \},$

but so is the the complement of the circle,

$\displaystyle {\bf R}^2 \backslash S^1 = \{ (x,y) \in {\bf R}^2: \neg(x^2+y^2 = 1) \}$

and the interval ${[-1,1]}$:

$\displaystyle [-1,1] = \{ x \in {\bf R}: \exists y: x^2+y^2 = 1\}.$

Due to the unlimited use of constants, any finite subset of a power ${X^n}$ of any structure ${X}$ is, by our conventions, definable in that structure. (One can of course also consider definability without parameters (also known as ${0}$-definability), in which arbitrary constants are not permitted, but we will not do so here.)

We can isolate some special subclasses of definable sets:

• An atomic definable set is a set of the form (1) in which ${P()}$ is an atomic formula (i.e. it does not contain any logical connectives or quantifiers).
• A quantifier-free definable set is a set of the form (1) in which ${P()}$ is quantifier-free (i.e. it can contain logical connectives, but does not contain the quantifiers ${\forall, \exists}$).

Example 1 In the theory of a field such as ${{\bf R}}$, an atomic definable set is the same thing as an affine algebraic set (also known as an affine algebraic variety, with the understanding that varieties are not necessarily assumed to be irreducible), and a quantifier-free definable set is known as a constructible set; thus we see that algebraic geometry can be viewed in some sense as a special case of model theory. (Conversely, it can in fact be quite profitable to think of model theory as an abstraction of algebraic geometry; for instance, the concepts of Morley rank and Morley degree in model theory (discussed in this previous blog post) directly generalises the concepts of dimension and degree in algebraic geometry.) Over ${{\bf R}}$, the interval ${[-1,1]}$ is a definable set, but not a quantifier-free definable set (and certainly not an atomic definable set); and similarly for the primes over ${{\bf N}}$.

A quantifier-free definable set in ${X^n}$ is nothing more than a finite boolean combination of atomic definable sets; in other words, the class of quantifier-free definable sets over ${X}$ is the smallest class that contains the atomic definable sets and is closed under boolean operations such as complementation and union (which generate all the other boolean operations). Similarly, the class of definable sets over ${X}$ is the smallest class that contains the quantifier-free definable sets, and is also closed under the operation of projection ${\pi_n: E \mapsto \pi_n(E)}$ from ${X^{n+1}}$ to ${X^n}$ for every natural number ${n}$, where ${\pi_n: X^{n+1} \rightarrow X^n}$ is the map ${\pi_n(x_1,\ldots,x_n,x_{n+1}) := (x_1,\ldots,x_n)}$.

Some structures have the property of enjoying quantifier elimination, which means that every definable set is in fact a quantifier-free definable set, or equivalently that the projection of a quantifier-free definable set is again quantifier-free. For instance, an algebraically closed field ${k}$ (with the field operations) has quantifier elimination (i.e. the projection of a constructible set is again constructible); this fact can be proven by the classical tool of resultants, and among other things can be used to give a proof of Hilbert’s nullstellensatz. (Note though that projection does not necessary preserve the property of being atomic; for instance, the projection of the atomic set ${\{ (x,y) \in k^2: xy=1 \}}$ is the non-atomic, but still quantifier-free definable, set ${\{ x \in k: \neg (k=0) \}}$.) In the converse direction, it is not difficult to use the nullstellensatz to deduce quantifier elimination. For theory of the real field ${{\bf R}}$, which is not algebraically closed, one does not have quantifier elimination, as one can see from the example of the unit circle (which is a quantifier-free definable set) projecting down to the interval ${[-1,1]}$ (which is definable, but not quantifer-free definable). However, if one adds the additional operation of order ${<}$ to the reals, giving it the language of an ordered field rather than just a field, then quantifier elimination is recovered (the class of quantifier-free definable sets now enlarges to match the class of definable sets, which in this case is also the class of semi-algebraic sets); this is the famous Tarski-Seidenberg theorem.

On the other hand, many important structures do not have quantifier elimination; typically, the projection of a quantifier-free definable set is not, in general, quantifier-free definable. This failure of the projection property also shows up in many contexts outside of model theory; for instance, Lebesgue famously made the error of thinking that the projection of a Borel measurable set remained Borel measurable (it is merely an analytic set instead). Turing’s halting theorem can be viewed as an assertion that the projection of a decidable set (also known as a computable or recursive set) is not necessarily decidable (it is merely semi-decidable (or recursively enumerable) instead). The notorious P=NP problem can also be essentially viewed in this spirit; roughly speaking (and glossing over the placement of some quantifiers), it asks whether the projection of a polynomial-time decidable set is again polynomial-time decidable. And so forth. (See this blog post of Dick Lipton for further discussion of the subtleties of projections.)

Now we consider the status of quantifier elimination for the theory of a finite field ${F}$. If interpreted naively, quantifier elimination is trivial for a finite field ${F}$, since every subset of ${F^n}$ is finite and thus quantifier-free definable. However, we can recover an interesting question in one of two (essentially equivalent) ways. One is to work in the asymptotic regime in which the field ${F}$ is large, but the length of the formulae used to construct one’s definable sets stays bounded uniformly in the size of ${F}$ (where we view any constant in ${F}$ as contributing a unit amount to the length of a formula, no matter how large ${F}$ is). A simple counting argument then shows that only a small number of subsets of ${F^n}$ become definable in the asymptotic limit ${|F| \rightarrow \infty}$, since the number of definable sets clearly grows at most polynomially in ${|F|}$ for any fixed bound on the formula length, while the number of all subsets of ${|F|^n}$ grows exponentially in ${n}$.

Another way to proceed is to work not with a single finite field ${F}$, or even with a sequence ${F_m}$ of finite fields, but with the ultraproduct ${F = \prod_{m \rightarrow p} F_m}$ of a sequence of finite fields, and to study the properties of definable sets over this ultraproduct. (We will be using the notation of ultraproducts and nonstandard analysis from this previous blog post.) This approach is equivalent to the more finitary approach mentioned in the previous paragraph, at least if one does not care to track of the exact bounds on the length of the formulae involved. Indeed, thanks to Los’s theorem, a definable subset ${E}$ of ${F^n}$ is nothing more than the ultraproduct ${E = \prod_{m \rightarrow p} E_m}$ of definable subsets ${E_m}$ of ${F_m^n}$ for all ${m}$ sufficiently close to ${p}$, with the length of the formulae used to define ${E_m}$ uniformly bounded in ${m}$. In the language of nonstandard analysis, one can view ${F}$ as a nonstandard finite field.

The ultraproduct ${F}$ of finite fields is an important example of a pseudo-finite field – a field that obeys all the sentences in the languages of fields that finite fields do, but is not necessarily itself a finite field. The model theory of pseudo-finite fields was first studied systematically by Ax (in the same paper where the Ax-Grothendieck theorem, discussed previously on this blog, was established), with important further contributions by Kiefe, by Fried-Sacerdote, by two papers of Chatzidakis-van den Dries-Macintyre, and many other authors.

As mentioned before, quantifier elimination trivially holds for finite fields. But for infinite pseudo-finite fields, such as the ultraproduct ${F = \prod_{m \rightarrow p} F_m}$ of finite fields with ${|F_m|}$ going to infinity, quantifier elimination fails. For instance, in a finite field ${F_m}$, the set ${E_m := \{ x \in F_m: \exists y \in F_m: x=y^2 \}}$ of quadratic residues is a definable set, with a bounded formula length, and so in the ultraproduct ${F =\prod_{m \rightarrow p} F_m}$, the set ${E := \prod_{m\rightarrow p} E_m}$ of nonstandard quadratic residues is also a definable set. However, in one dimension, we see from the factor theorem that the only atomic definable sets are either finite or the whole field ${F}$, and so the only constructible sets (i.e. the only quantifier-free definable sets) are either finite or cofinite in ${F}$. Since the quadratic residues have asymptotic density ${1/2}$ in a large finite field, they cannot form a quantifier-free definable set, despite being definable.

Nevertheless, there is a very nice almost quantifier elimination result for these fields, in characteristic zero at least, which we phrase here as follows:

Theorem 1 (Almost quantifier elimination) Let ${F}$ be a nonstandard finite field of characteristic zero, and let ${E \subset F^n}$ be a definable set over ${F}$. Then ${E}$ is the union of finitely many sets of the form

$\displaystyle E = \{ x \in V(F): \exists t \in F: P(x,t) = 0 \} \ \ \ \ \ (2)$

where ${V(F)}$ is an atomic definable subset of ${F^n}$ (i.e. the ${F}$-points of an algebraic variety ${V}$ defined over ${F}$ in ${F^n}$) and ${P: F^{n+1} \rightarrow F}$ is a polynomial.

Results of this type were first obtained essentially due to Catarina Kiefe, although the formulation here is closer to that of Chatzidakis-van den Dries-Macintyre.

Informally, this theorem says that while we cannot quite eliminate all quantifiers from a definable set over a nonstandard finite field, we can eliminate all but one existential quantifier. Note that negation has also been eliminated in this theorem; for instance, the definable set ${F \backslash \{0\} = \{ x \in F: \neg(x=0) \}}$ uses a negation, but can also be described using a single existential quantifier as ${\{ x \in F: \exists t: xt = 1 \}}$.) I believe that there are more complicated analogues of this result in positive characteristic, but I have not studied this case in detail (Kiefe’s result does not assume characteristic zero, but her conclusion is slightly different from the one given here). In the one-dimensional case ${n=1}$, the only varieties ${V}$ are the affine line and finite sets, and we can simplify the above statement, namely that any definable subset of ${F}$ takes the form ${\{ x\in F: \exists t \in F: P(x,t) = 0 \}}$ for some polynomial ${P}$ (i.e. definable sets in ${F}$ are nothing more than the projections of the ${F}$-points of a plane curve).

There is an equivalent formulation of this theorem for standard finite fields, namely that if ${F}$ is a finite field and ${E \subset F^n}$ is definable using a formula of length at most ${M}$, then ${E}$ can be expressed in the form (2) with the degree of ${P}$ bounded by some quantity ${C_{M,n}}$ depending on ${M}$ and ${n}$, assuming that the characteristic of ${F}$ is sufficiently large depending on ${M, n}$.

The theorem gives quite a satisfactory description of definable sets in either standard or nonstandard finite fields (at least if one does not care about effective bounds in some of the constants, and if one is willing to exclude the small characteristic case); for instance, in conjunction with the Lang-Weil bound discussed in this recent blog post, it shows that any non-empty definable subset of a nonstandard finite field has a nonstandard cardinality of ${(\alpha + O(|F|^{-1/2})) |F|^d}$ for some positive standard rational ${\alpha}$ and integer ${d}$. Equivalently, any non-empty definable subset of ${F^n}$ for some standard finite field ${F}$ using a formula of length at most ${M}$ has a standard cardinality of ${(\alpha + O_{M,n}(|F|^{-1/2})) |F|^d}$ for some positive rational of height ${O_{M,n}(1)}$ and some natural number ${d}$ between ${0}$ and ${n}$. (For instance, in the example of the quadratic residues given above, ${d}$ is equal to ${1}$ and ${\alpha}$ equal to ${1/2}$.) There is a more precise statement to this effect, namely that the Poincaré series of a definable set is rational; see Kiefe’s paper for details.

Below the fold I give a proof of Theorem 1, which relies primarily on the Lang-Weil bound mentioned above.

Let ${F}$ be a finite field, with algebraic closure ${\overline{F}}$, and let ${V}$ be an (affine) algebraic variety defined over ${\overline{F}}$, by which I mean a set of the form

$\displaystyle V = \{ x \in \overline{F}^d: P_1(x) = \ldots = P_m(x) = 0 \}$

for some ambient dimension ${d \geq 0}$, and some finite number of polynomials ${P_1,\ldots,P_m: \overline{F}^d \rightarrow \overline{F}}$. In order to reduce the number of subscripts later on, let us say that ${V}$ has complexity at most ${M}$ if ${d}$, ${m}$, and the degrees of the ${P_1,\ldots,P_m}$ are all less than or equal to ${M}$. Note that we do not require at this stage that ${V}$ be irreducible (i.e. not the union of two strictly smaller varieties), or defined over ${F}$, though we will often specialise to these cases later in this post. (Also, everything said here can also be applied with almost no changes to projective varieties, but we will stick with affine varieties for sake of concreteness.)

One can consider two crude measures of how “big” the variety ${V}$ is. The first measure, which is algebraic geometric in nature, is the dimension ${\hbox{dim}(V)}$ of the variety ${V}$, which is an integer between ${0}$ and ${d}$ (or, depending on convention, ${-\infty}$, ${-1}$, or undefined, if ${V}$ is empty) that can be defined in a large number of ways (e.g. it is the largest ${r}$ for which the generic linear projection from ${V}$ to ${\overline{F}^r}$ is dominant, or the smallest ${r}$ for which the intersection with a generic codimension ${r}$ subspace is non-empty). The second measure, which is number-theoretic in nature, is the number ${|V(F)| = |V \cap F^d|}$ of ${F}$-points of ${V}$, i.e. points ${x = (x_1,\ldots,x_d)}$ in ${V}$ all of whose coefficients lie in the finite field, or equivalently the number of solutions to the system of equations ${P_i(x_1,\ldots,x_d) = 0}$ for ${i=1,\ldots,m}$ with variables ${x_1,\ldots,x_d}$ in ${F}$.

These two measures are linked together in a number of ways. For instance, we have the basic Schwarz-Zippel type bound (which, in this qualitative form, goes back at least to Lemma 1 of the work of Lang and Weil in 1954).

Lemma 1 (Schwarz-Zippel type bound) Let ${V}$ be a variety of complexity at most ${M}$. Then we have ${|V(F)| \ll_M |F|^{\hbox{dim}(V)}}$.

Proof: (Sketch) For the purposes of exposition, we will not carefully track the dependencies of implied constants on the complexity ${M}$, instead simply assuming that all of these quantities remain controlled throughout the argument. (If one wished, one could obtain ineffective bounds on these quantities by an ultralimit argument, as discussed in this previous post, or equivalently by moving everything over to a nonstandard analysis framework; one could also obtain such uniformity using the machinery of schemes.)

We argue by induction on the ambient dimension ${d}$ of the variety ${V}$. The ${d=0}$ case is trivial, so suppose ${d \geq 1}$ and that the claim has already been proven for ${d-1}$. By breaking up ${V}$ into irreducible components we may assume that ${V}$ is irreducible (this requires some control on the number and complexity of these components, but this is available, as discussed in this previous post). For each ${x_1,\ldots,x_{d-1} \in \overline{F}}$, the fibre ${\{ x_d \in \overline{F}: (x_1,\ldots,x_{d-1},x_d) \in V \}}$ is either one-dimensional (and thus all of ${\overline{F}}$) or zero-dimensional. In the latter case, one has ${O_M(1)}$ points in the fibre from the fundamental theorem of algebra (indeed one has a bound of ${D}$ in this case), and ${(x_1,\ldots,x_{d-1})}$ lives in the projection of ${V}$ to ${\overline{F}^{d-1}}$, which is a variety of dimension at most ${\hbox{dim}(V)}$ and controlled complexity, so the contribution of this case is acceptable from the induction hypothesis. In the former case, the fibre contributes ${|F|}$ ${F}$-points, but ${(x_1,\ldots,x_{d-1})}$ lies in a variety in ${\overline{F}^{d-1}}$ of dimension at most ${\hbox{dim}(V)-1}$ (since otherwise ${V}$ would contain a subvariety of dimension at least ${\hbox{dim}(V)+1}$, which is absurd) and controlled complexity, and so the contribution of this case is also acceptable from the induction hypothesis. $\Box$

One can improve the bound on the implied constant to be linear in the degree of ${V}$ (see e.g. Claim 7.2 of this paper of Dvir, Kollar, and Lovett, or Lemma A.3 of this paper of Ellenberg, Oberlin, and myself), but we will not be concerned with these improvements here.

Without further hypotheses on ${V}$, the above upper bound is sharp (except for improvements in the implied constants). For instance, the variety

$\displaystyle V := \{ (x_1,\ldots,x_d) \in \overline{F}^d: \prod_{j=1}^D (x_d - a_j) = 0\},$

where ${a_1,\ldots,a_D \in F}$ are distict, is the union of ${D}$ distinct hyperplanes of dimension ${d-1}$, with ${|V(F)| = D |F|^{d-1}}$ and complexity ${\max(D,d)}$; similar examples can easily be concocted for other choices of ${\hbox{dim}(V)}$. In the other direction, there is also no non-trivial lower bound for ${|V(F)|}$ without further hypotheses on ${V}$. For a trivial example, if ${a}$ is an element of ${\overline{F}}$ that does not lie in ${F}$, then the hyperplane

$\displaystyle V := \{ (x_1,\ldots,x_d) \in \overline{F}^d: x_d - a = 0 \}$

clearly has no ${F}$-points whatsoever, despite being a ${d-1}$-dimensional variety in ${\overline{F}^d}$ of complexity ${d}$. For a slightly less non-trivial example, if ${a}$ is an element of ${F}$ that is not a quadratic residue, then the variety

$\displaystyle V := \{ (x_1,\ldots,x_d) \in \overline{F}^d: x_d^2 - a = 0 \},$

which is the union of two hyperplanes, still has no ${F}$-points, even though this time the variety is defined over ${F}$ instead of ${\overline{F}}$ (by which we mean that the defining polynomial(s) have all of their coefficients in ${F}$). There is however the important Lang-Weil bound that allows for a much better estimate as long as ${V}$ is both defined over ${F}$ and irreducible:

Theorem 2 (Lang-Weil bound) Let ${V}$ be a variety of complexity at most ${M}$. Assume that ${V}$ is defined over ${F}$, and that ${V}$ is irreducible as a variety over ${\overline{F}}$ (i.e. ${V}$ is geometrically irreducible or absolutely irreducible). Then

$\displaystyle |V(F)| = (1 + O_M(|F|^{-1/2})) |F|^{\hbox{dim}(V)}.$

Again, more explicit bounds on the implied constant here are known, but will not be the focus of this post. As the previous examples show, the hypotheses of definability over ${F}$ and geometric irreducibility are both necessary.

The Lang-Weil bound is already non-trivial in the model case ${d=2, \hbox{dim}(V)=1}$ of plane curves:

Theorem 3 (Hasse-Weil bound) Let ${P: \overline{F}^2 \rightarrow \overline{F}}$ be an irreducible polynomial of degree ${D}$ with coefficients in ${F}$. Then

$\displaystyle |\{ (x,y) \in F^2: P(x,y) = 0 \}| = |F| + O_D( |F|^{1/2} ).$

Thus, for instance, if ${a,b \in F}$, then the elliptic curve ${\{ (x,y) \in F^2: y^2 = x^3 + ax + b \}}$ has ${|F| + O(|F|^{1/2})}$ ${F}$-points, a result first established by Hasse. The Hasse-Weil bound is already quite non-trivial, being the analogue of the Riemann hypothesis for plane curves. For hyper-elliptic curves, an elementary proof (due to Stepanov) is discussed in this previous post. For general plane curves, the first proof was by Weil (leading to his famous Weil conjectures); there is also a nice version of Stepanov’s argument due to Bombieri covering this case which is a little less elementary (relying crucially on the Riemann-Roch theorem for the upper bound, and a lifting trick to then get the lower bound), which I briefly summarise later in this post. The full Lang-Weil bound is deduced from the Hasse-Weil bound by an induction argument using generic hyperplane slicing, as I will also summarise later in this post.

The hypotheses of definability over ${F}$ and geometric irreducibility in the Lang-Weil can be removed after inserting a geometric factor:

Corollary 4 (Lang-Weil bound, alternate form) Let ${V}$ be a variety of complexity at most ${M}$. Then one has

$\displaystyle |V(F)| = (c(V) + O_M(|F|^{-1/2})) |F|^{\hbox{dim}(V)}$

where ${c(V)}$ is the number of top-dimensional components of ${V}$ (i.e. geometrically irreducible components of ${V}$ of dimension ${\hbox{dim}(V)}$) that are definable over ${F}$, or equivalently are invariant with respect to the Frobenius endomorphism ${x \mapsto x^{|F|}}$ that defines ${F}$.

Proof: By breaking up a general variety ${V}$ into components (and using Lemma 1 to dispose of any lower-dimensional components), it suffices to establish this claim when ${V}$ is itself geometrically irreducible. If ${V}$ is definable over ${F}$, the claim follows from Theorem 2. If ${V}$ is not definable over ${F}$, then it is not fixed by the Frobenius endomorphism ${Frob}$ (since otherwise one could produce a set of defining polynomials that were fixed by Frobenius and thus defined over ${F}$ by using some canonical basis (such as a reduced Grobner basis) for the associated ideal), and so ${V \cap Frob(V)}$ has strictly smaller dimension than ${V}$. But ${V \cap Frob(V)}$ captures all the ${F}$-points of ${V}$, so in this case the claim follows from Lemma 1. $\Box$

Note that if ${V}$ is reducible but is itself defined over ${F}$, then the Frobenius endomorphism preserves ${V}$ itself, but may permute the components of ${V}$ around. In this case, ${c(V)}$ is the number of fixed points of this permutation action of Frobenius on the components. In particular, ${c(V)}$ is always a natural number between ${0}$ and ${O_M(1)}$; thus we see that regardless of the geometry of ${V}$, the normalised count ${|V(F)|/|F|^{\hbox{dim}(V)}}$ is asymptotically restricted to a bounded range of natural numbers (in the regime where the complexity stays bounded and ${|F|}$ goes to infinity).

Example 1 Consider the variety

$\displaystyle V := \{ (x,y) \in \overline{F}^2: x^2 - ay^2 = 0 \}$

for some non-zero parameter ${a \in F}$. Geometrically (by which we basically mean “when viewed over the algebraically closed field ${\overline{F}}$“), this is the union of two lines, with slopes corresponding to the two square roots of ${a}$. If ${a}$ is a quadratic residue, then both of these lines are defined over ${F}$, and are fixed by Frobenius, and ${c(V) = 2}$ in this case. If ${a}$ is not a quadratic residue, then the lines are not defined over ${F}$, and the Frobenius automorphism permutes the two lines while preserving ${V}$ as a whole, giving ${c(V)=0}$ in this case.

Corollary 4 effectively computes (at least to leading order) the number-theoretic size ${|V(F)|}$ of a variety in terms of geometric information about ${V}$, namely its dimension ${\hbox{dim}(V)}$ and the number ${c(V)}$ of top-dimensional components fixed by Frobenius. It turns out that with a little bit more effort, one can extend this connection to cover not just a single variety ${V}$, but a family of varieties indexed by points in some base space ${W}$. More precisely, suppose we now have two affine varieties ${V,W}$ of bounded complexity, together with a regular map ${\phi: V \rightarrow W}$ of bounded complexity (the definition of complexity of a regular map is a bit technical, see e.g. this paper, but one can think for instance of a polynomial or rational map of bounded degree as a good example). It will be convenient to assume that the base space ${W}$ is irreducible. If the map ${\phi}$ is a dominant map (i.e. the image ${\phi(V)}$ is Zariski dense in ${W}$), then standard algebraic geometry results tell us that the fibres ${\phi^{-1}(\{w\})}$ are an unramified family of ${\hbox{dim}(V)-\hbox{dim}(W)}$-dimensional varieties outside of an exceptional subset ${W'}$ of ${W}$ of dimension strictly smaller than ${\hbox{dim}(W)}$ (and with ${\phi^{-1}(W')}$ having dimension strictly smaller than ${\hbox{dim}(V)}$); see e.g. Section I.6.3 of Shafarevich.

Now suppose that ${V}$, ${W}$, and ${\phi}$ are defined over ${F}$. Then, by Lang-Weil, ${W(F)}$ has ${(1 + O(|F|^{-1/2})) |F|^{\hbox{dim}(W)}}$ ${F}$-points, and by Schwarz-Zippel, for all but ${O( |F|^{\hbox{dim}(W)-1})}$ of these ${F}$-points ${w}$ (the ones that lie in the subvariety ${W'}$), the fibre ${\phi^{-1}(\{w\})}$ is an algebraic variety defined over ${F}$ of dimension ${\hbox{dim}(V)-\hbox{dim}(W)}$. By using ultraproduct arguments (see e.g. Lemma 3.7 of this paper of mine with Emmanuel Breuillard and Ben Green), this variety can be shown to have bounded complexity, and thus by Corollary 4, has ${(c(\phi^{-1}(\{w\})) + O(|F|^{-1/2}) |F|^{\hbox{dim}(V)-\hbox{dim}(W)}}$ ${F}$-points. One can then ask how the quantity ${c(\phi^{-1}(\{w\})}$ is distributed. A simple but illustrative example occurs when ${V=W=F}$ and ${\phi: F \rightarrow F}$ is the polynomial ${\phi(x) := x^2}$. Then ${c(\phi^{-1}(\{w\})}$ equals ${2}$ when ${w}$ is a non-zero quadratic residue and ${0}$ when ${w}$ is a non-zero quadratic non-residue (and ${1}$ when ${w}$ is zero, but this is a negligible fraction of all ${w}$). In particular, in the asymptotic limit ${|F| \rightarrow \infty}$, ${c(\phi^{-1}(\{w\})}$ is equal to ${2}$ half of the time and ${0}$ half of the time.

Now we describe the asymptotic distribution of the ${c(\phi^{-1}(\{w\}))}$. We need some additional notation. Let ${w_0}$ be an ${F}$-point in ${W \backslash W'}$, and let ${\pi_0( \phi^{-1}(\{w_0\}) )}$ be the connected components of the fibre ${\phi^{-1}(\{w_0\})}$. As ${\phi^{-1}(\{w_0\})}$ is defined over ${F}$, this set of components is permuted by the Frobenius endomorphism ${Frob}$. But there is also an action by monodromy of the fundamental group ${\pi_1(W \backslash W')}$ (this requires a certain amount of étale machinery to properly set up, as we are working over a positive characteristic field rather than over the complex numbers, but I am going to ignore this rather important detail here, as I still don’t fully understand it). This fundamental group may be infinite, but (by the étale construction) is always profinite, and in particular has a Haar probability measure, in which every finite index subgroup (and their cosets) are measurable. Thus we may meaningfully talk about elements drawn uniformly at random from this group, so long as we work only with the profinite ${\sigma}$-algebra on ${\pi_1(W \backslash W')}$ that is generated by the cosets of the finite index subgroups of this group (which will be the only relevant sets we need to measure when considering the action of this group on finite sets, such as the components of a generic fibre).

Theorem 5 (Lang-Weil with parameters) Let ${V, W}$ be varieties of complexity at most ${M}$ with ${W}$ irreducible, and let ${\phi: V \rightarrow W}$ be a dominant map of complexity at most ${M}$. Let ${w_0}$ be an ${F}$-point of ${W \backslash W'}$. Then, for any natural number ${a}$, one has ${c(\phi^{-1}(\{w\})) = a}$ for ${(\mathop{\bf P}( X = a ) + O_M(|F|^{-1/2})) |F|^{\hbox{dim}(W)}}$ values of ${w \in W(F)}$, where ${X}$ is the random variable that counts the number of components of a generic fibre ${\phi^{-1}(w_0)}$ that are invariant under ${g \circ Frob}$, where ${g}$ is an element chosen uniformly at random from the étale fundamental group ${\pi_1(W \backslash W')}$. In particular, in the asymptotic limit ${|F| \rightarrow \infty}$, and with ${w}$ chosen uniformly at random from ${W(F)}$, ${c(\phi^{-1}(\{w\}))}$ (or, equivalently, ${|\phi^{-1}(\{w\})(F)| / |F|^{\hbox{dim}(V)-\hbox{dim}(W)}}$) and ${X}$ have the same asymptotic distribution.

This theorem generalises Corollary 4 (which is the case when ${W}$ is just a point, so that ${\phi^{-1}(\{w\})}$ is just ${V}$ and ${g}$ is trivial). Informally, the effect of a non-trivial parameter space ${W}$ on the Lang-Weil bound is to push around the Frobenius map by monodromy for the purposes of counting invariant components, and a randomly chosen set of parameters corresponds to a randomly chosen loop on which to perform monodromy.

Example 2 Let ${V=W=F}$ and ${\phi(x) = x^m}$ for some fixed ${m \geq 1}$; to avoid some technical issues let us suppose that ${m}$ is coprime to ${|F|}$. Then ${W'}$ can be taken to be ${\{0\}}$, and for a base point ${w_0 \in W \backslash W'}$ we can take ${w_0=1}$. The fibre ${\phi^{-1}(\{1\})}$ – the ${m^{th}}$ roots of unity – can be identified with the cyclic group ${{\bf Z}/m{\bf Z}}$ by using a primitive root of unity. The étale fundamental group ${\pi(W \backslash W') = \pi(\overline{F} \backslash 0)}$ is (I think) isomorphic to the profinite closure ${\hat {\bf Z}}$ of the integers ${{\bf Z}}$ (excluding the part of that closure coming from the characteristic of ${F}$). Not coincidentally, the integers ${{\bf Z}}$ are the fundamental group of the complex analogue ${{\bf C} \backslash \{0\}}$ of ${W \backslash W'}$. (Brian Conrad points out to me though that for more complicated varieties, such as covers of ${\overline{F} \backslash \{0\}}$ by a power of the characteristic, the etale fundamental group is more complicated than just a profinite closure of the ordinary fundamental group, due to the presence of Artin-Schreier covers that are only ramified at infinity.) The action of this fundamental group on the fibres ${{\bf Z}/m{\bf Z}}$ can given by translation. Meanwhile, the Frobenius map ${Frob}$ on ${{\bf Z}/m{\bf Z}}$ is given by multiplication by ${|F|}$. A random element ${g \circ Frob}$ then becomes a random affine map ${x \mapsto |F|x+b}$ on ${{\bf Z}/m{\bf Z}}$, where ${b}$ chosen uniformly at random from ${{\bf Z}/m{\bf Z}}$. The number of fixed points of this map is equal to the greatest common divisor ${(|F|-1,m)}$ of ${|F|-1}$ and ${m}$ when ${b}$ is divisible by ${(|F|-1,m)}$, and equal to ${0}$ otherwise. This matches up with the elementary number fact that a randomly chosen non-zero element of ${F}$ will be an ${m^{th}}$ power with probability ${1/(|F|-1,m)}$, and when this occurs, the number of ${m^{th}}$ roots in ${F}$ will be ${(|F|-1,m)}$.

Example 3 (Thanks to Jordan Ellenberg for this example.) Consider a random elliptic curve ${E = \{ y^2 = x^3 + ax + b \}}$, where ${a,b}$ are chosen uniformly at random, and let ${m \geq 1}$. Let ${E[m]}$ be the ${m}$-torsion points of ${E}$ (i.e. those elements ${g \in E}$ with ${mg = 0}$ using the elliptic curve addition law); as a group, this is isomorphic to ${{\bf Z}/m{\bf Z} \times {\bf Z}/m{\bf Z}}$ (assuming that ${F}$ has sufficiently large characteristic, for simplicity), and consider the number of ${F}$ points of ${E[m]}$, which is a random variable taking values in the natural numbers between ${0}$ and ${m^2}$. In this case, the base variety ${W}$ is the modular curve ${X(1)}$, and the covering variety ${V}$ is the modular curve ${X_1(m)}$. The generic fibre here can be identified with ${{\bf Z}/m{\bf Z} \times {\bf Z}/m{\bf Z}}$, the monodromy action projects down to the action of ${SL_2({\bf Z}/m{\bf Z})}$, and the action of Frobenius on this fibre can be shown to be given by a ${2 \times 2}$ matrix with determinant ${|F|}$ (with the exact choice of matrix depending on the choice of fibre and of the identification), so the distribution of the number of ${F}$-points of ${E[m]}$ is asymptotic to the distribution of the number of fixed points ${X}$ of a random linear map of determinant ${|F|}$ on ${{\bf Z}/m{\bf Z} \times {\bf Z}/m{\bf Z}}$.

Theorem 5 seems to be well known “folklore” among arithmetic geometers, though I do not know of an explicit reference for it. I enjoyed deriving it for myself (though my derivation is somewhat incomplete due to my lack of understanding of étale cohomology) from the ordinary Lang-Weil theorem and the moment method. I’m recording this derivation later in this post, mostly for my own benefit (as I am still in the process of learning this material), though perhaps some other readers may also be interested in it.

Caveat: not all details are fully fleshed out in this writeup, particularly those involving the finer points of algebraic geometry and étale cohomology, as my understanding of these topics is not as complete as I would like it to be.

Many thanks to Brian Conrad and Jordan Ellenberg for helpful discussions on these topics.

Ben Green and I have just uploaded to the arXiv our paper “New bounds for Szemeredi’s theorem, Ia: Progressions of length 4 in finite field geometries revisited“, submitted to Proc. Lond. Math. Soc.. This is both an erratum to, and a replacement for, our previous paper “New bounds for Szemeredi’s theorem. I. Progressions of length 4 in finite field geometries“. The main objective in both papers is to bound the quantity ${r_4(F^n)}$ for a vector space ${F^n}$ over a finite field ${F}$ of characteristic greater than ${4}$, where ${r_4(F^n)}$ is defined as the cardinality of the largest subset of ${F^n}$ that does not contain an arithmetic progression of length ${4}$. In our earlier paper, we gave two arguments that bounded ${r_4(F^n)}$ in the regime when the field ${F}$ was fixed and ${n}$ was large. The first “cheap” argument gave the bound

$\displaystyle r_4(F^n) \ll |F|^n \exp( - c \sqrt{\log n} )$

and the more complicated “expensive” argument gave the improvement

$\displaystyle r_4(F^n) \ll |F|^n n^{-c} \ \ \ \ \ (1)$

for some constant ${c>0}$ depending only on ${F}$.

Unfortunately, while the cheap argument is correct, we discovered a subtle but serious gap in our expensive argument in the original paper. Roughly speaking, the strategy in that argument is to employ the density increment method: one begins with a large subset ${A}$ of ${F^n}$ that has no arithmetic progressions of length ${4}$, and seeks to locate a subspace on which ${A}$ has a significantly increased density. Then, by using a “Koopman-von Neumann theorem”, ultimately based on an iteration of the inverse ${U^3}$ theorem of Ben and myself (and also independently by Samorodnitsky), one approximates ${A}$ by a “quadratically structured” function ${f}$, which is (locally) a combination of a bounded number of quadratic phase functions, which one can prepare to be in a certain “locally equidistributed” or “locally high rank” form. (It is this reduction to the high rank case that distinguishes the “expensive” argument from the “cheap” one.) Because ${A}$ has no progressions of length ${4}$, the count of progressions of length ${4}$ weighted by ${f}$ will also be small; by combining this with the theory of equidistribution of quadratic phase functions, one can then conclude that there will be a subspace on which ${f}$ has increased density.

The error in the paper was to conclude from this that the original function ${1_A}$ also had increased density on the same subspace; it turns out that the manner in which ${f}$ approximates ${1_A}$ is not strong enough to deduce this latter conclusion from the former. (One can strengthen the nature of approximation until one restores such a conclusion, but only at the price of deteriorating the quantitative bounds on ${r_4(F^n)}$ one gets at the end of the day to be worse than the cheap argument.)

After trying unsuccessfully to repair this error, we eventually found an alternate argument, based on earlier papers of ourselves and of Bergelson-Host-Kra, that avoided the density increment method entirely and ended up giving a simpler proof of a stronger result than (1), and also gives the explicit value of ${c = 2^{-22}}$ for the exponent ${c}$ in (1). In fact, it gives the following stronger result:

Theorem 1 Let ${A}$ be a subset of ${F^n}$ of density at least ${\alpha}$, and let ${\epsilon>0}$. Then there is a subspace ${W}$ of ${F^n}$ of codimension ${O( \epsilon^{-2^{20}})}$ such that the number of (possibly degenerate) progressions ${a, a+r, a+2r, a+3r}$ in ${A \cap W}$ is at least ${(\alpha^4-\epsilon)|W|^2}$.

The bound (1) is an easy consequence of this theorem after choosing ${\epsilon := \alpha^4/2}$ and removing the degenerate progressions from the conclusion of the theorem.

The main new idea is to work with a local Koopman-von Neumann theorem rather than a global one, trading a relatively weak global approximation to ${1_A}$ with a significantly stronger local approximation to ${1_A}$ on a subspace ${W}$. This is somewhat analogous to how sometimes in graph theory it is more efficient (from the point of view of quantative estimates) to work with a local version of the Szemerédi regularity lemma which gives just a single regular pair of cells, rather than attempting to regularise almost all of the cells. This local approach is well adapted to the inverse ${U^3}$ theorem we use (which also has this local aspect), and also makes the reduction to the high rank case much cleaner. At the end of the day, one ends up with a fairly large subspace ${W}$ on which ${A}$ is quite dense (of density ${\alpha-O(\epsilon)}$) and which can be well approximated by a “pure quadratic” object, namely a function of a small number of quadratic phases obeying a high rank condition. One can then exploit a special positivity property of the count of length four progressions weighted by pure quadratic objects, essentially due to Bergelson-Host-Kra, which then gives the required lower bound.

Tamar Ziegler and I have just uploaded to the arXiv our paper “The inverse conjecture for the Gowers norm over finite fields in low characteristic“, submitted to Annals of Combinatorics. This paper completes another case of the inverse conjecture for the Gowers norm, this time for vector spaces ${{\bf F}^n}$ over a fixed finite field ${{\bf F} = {\bf F}_p}$ of prime order; with Vitaly Bergelson, we had previously established this claim when the characteristic of the field was large, so the main new result here is the extension to the low characteristic case. (The case of a cyclic group ${{\bf Z}/N{\bf Z}}$ or interval ${[N]}$ was established by Ben Green and ourselves in another recent paper. For an arbitrary abelian (or nilpotent) group, a general but less explicit description of the obstructions to Gowers uniformity was recently obtained by Szegedy; the latter result recovers the high-characteristic case of our result (as was done in a subsequent paper of Szegedy), as well as our results with Green, but it is not immediately evident whether Szegedy’s description of the obstructions matches up with the one predicted by the inverse conjecture in low characteristic.)

The statement of the main theorem is as follows. Given a finite-dimensional vector space ${V = {\bf F}^n}$ and a function ${f: V \rightarrow {\bf C}}$, and an integer ${s \geq 0}$, one can define the Gowers uniformity norm ${\|f\|_{U^{s+1}(V)}}$ by the formula

$\displaystyle \|f\|_{U^{s+1}(V)} := \left( \mathop{\bf E}_{x,h_1,\ldots,h_{s+1} \in V} \Delta_{h_1} \ldots \Delta_{h_{s+1}} f(x) \right)^{1/2^{s+1}}$

where ${\Delta_h f(x) := f(x+h) \overline{f(x)}}$. If ${f}$ is bounded in magnitude by ${1}$, it is easy to see that ${\|f\|_{U^{s+1}(V)}}$ is bounded by ${1}$ also, with equality if and only if ${f(x) = e(P)}$ for some non-classical polynomial ${P: V \rightarrow {\bf R}/{\bf Z}}$ of degree at most ${s}$, where ${e(x) := e^{2\pi ix}}$, and a non-classical polynomial of degree at most ${s}$ is a function whose ${s+1^{th}}$ “derivatives” vanish in the sense that

$\displaystyle \partial_{h_1} \ldots \partial_{h_{s+1}} P(x) = 0$

for all ${x,h_1,\ldots,h_{s+1} \in V}$, where ${\partial_h P(x) := P(x+h) - P(x)}$. Our result generalises this to the case when the uniformity norm is not equal to ${1}$, but is still bounded away from zero:

Theorem 1 (Inverse conjecture) Let ${f: V \rightarrow {\bf C}}$ be bounded by ${1}$ with ${\|f\|_{U^{s+1}(V)} \geq \delta > 0}$ for some ${s \geq 0}$. Then there exists a non-classical polynomial ${P: V \rightarrow {\bf R}/{\bf Z}}$ of degree at most ${s}$ such that ${|\langle f, e(P) \rangle_{L^2(V)}| := |{\bf E}_{x \in V} f(x) e(-P(x))| \geq c(s,p, \delta) > 0}$, where ${c(s,p, \delta)}$ is a positive quantity depending only on the indicated parameters.

This theorem is trivial for ${s=0}$, and follows easily from Fourier analysis for ${s=1}$. The case ${s=2}$ was done in odd characteristic by Ben Green and myself, and in even characteristic by Samorodnitsky. In two papers, one with Vitaly Bergelson, we established this theorem in the “high characteristic” case when the characteristic ${p}$ of ${{\bf F}}$ was greater than ${s}$ (in which case there is essentially no distinction between non-classical polynomials and their classical counterparts, as discussed previously on this blog). The need to deal with genuinely non-classical polynomials is the main new difficulty in this paper that was not dealt with in previous literature.

In our previous paper with Bergelson, a “weak” version of the above theorem was proven, in which the polynomial ${P}$ in the conclusion had bounded degree ${O_{s,p}(1)}$, rather than being of degree at most ${s}$. In the current paper, we use this weak inverse theorem to reduce the inverse conjecture to a statement purely about polynomials:

Theorem 2 (Inverse conjecture for polynomials) Let ${s \geq 0}$, and let ${P: V \rightarrow {\bf C}}$ be a non-classical polynomial of degree at most ${s+1}$ such that ${\|e(P)\|_{U^{s+1}(V)} \geq \delta > 0}$. Then ${P}$ has bounded rank in the sense that ${P}$ is a function of ${O_{s,p,\delta}(1)}$ polynomials of degree at most ${s}$.

This type of inverse theorem was first introduced by Bogdanov and Viola. The deduction of Theorem 1 from Theorem 2 and the weak inverse Gowers conjecture is fairly standard, so the main difficulty is to show Theorem 2.

The quantity ${-\log_{|{\bf F}|} \|e(P)\|_{U^{s+1}(V)}^{1/2^{s+1}}}$ of a polynomial ${P}$ of degree at most ${s+1}$ was denoted the analytic rank of ${P}$ by Gowers and Wolf. They observed that the analytic rank of ${P}$ was closely related to the rank of ${P}$, defined as the least number of degree ${s}$ polynomials needed to express ${P}$. For instance, in the quadratic case ${s=1}$ the two ranks are identical (in odd characteristic, at least). For general ${s}$, it was easy to see that bounded rank implied bounded analytic rank; Theorem 2 is the converse statement.

We tried a number of ways to show that bounded analytic rank implied bounded rank, in particular spending a lot of time on ergodic-theoretic approaches, but eventually we settled on a “brute force” approach that relies on classifying those polynomials of bounded analytic rank as precisely as possible. The argument splits up into establishing three separate facts:

1. (Classical case) If a classical polynomial has bounded analytic rank, then it has bounded rank.
2. (Multiplication by ${p}$) If a non-classical polynomial ${P}$ (of degree at most ${s+1}$) has bounded analytic rank, then ${pP}$ (which can be shown to have degree at most ${\max(s-p,0)}$) also has bounded analytic rank.
3. (Division by ${p}$) If ${Q}$ is a non-clsasical polynomial of degree ${\max(s-p,0)}$ of bounded rank, then there is a non-classical polynomial ${P}$ of degree at most ${s+1}$ of bounded rank such that ${pQ=P}$.

The multiplication by ${p}$ and division by ${p}$ facts allow one to easily extend the classical case of the theorem to the non-classical case of the theorem, basically because classical polynomials are the kernel of the multiplication-by-${p}$ homomorphism. Indeed, if ${P}$ is a non-classical polynomial of bounded analytic rank of the right degree, then the multiplication by ${p}$ claim tells us that ${pP}$ also has bounded analytic rank, which by an induction hypothesis implies that ${pP}$ has bounded rank. Applying the division by ${p}$ claim, we find a bounded rank polynomial ${P'}$ such that ${pP = pP'}$, thus ${P}$ differs from ${P'}$ by a classical polynomial, which necessarily has bounded analytic rank, hence bounded rank by the classical claim, and the claim follows.

Of the three claims, the multiplication-by-${p}$ claim is the easiest to prove using known results; after a bit of Fourier analysis, it turns out to follow more or less immediately from the multidimensional Szemerédi theorem over finite fields of Bergelson, Leibman, and McCutcheon (one can also use the density Hales-Jewett theorem here if one desires).

The next easiest claim is the classical case. Here, the idea is to analyse a degree ${s+1}$ classical polynomial ${P: V \rightarrow {\bf F}}$ via its derivative ${d^{s+1} P: V^{s+1} \rightarrow {\bf F}}$, defined by the formula

$\displaystyle d^{s+1} P( h_1,\ldots,h_{s+1}) := \partial_{h_1} \ldots \partial_{h_{s+1}} P(x)$

for any ${x,h_1,\ldots,h_{s+1} \in V}$ (the RHS is independent of ${x}$ as ${P}$ has degree ${s+1}$). This is a multilinear form, and if ${P}$ has bounded analytic rank, this form is biased (in the sense that the mean of ${e(d^{s+1} P)}$ is large). Applying a general equidistribution theorem of Kaufman and Lovett (based on this earlier paper of Green and myself) this implies that ${d^{s+1} P}$ is a function of a bounded number of multilinear forms of lower degree. Using some “regularity lemma” theory to clean up these forms so that they have good equidistribution properties, it is possible to understand exactly how the original multilinear form ${d^{s+1} P}$ depends on these lower degree forms; indeed, the description one eventually obtains is so explicit that one can write down by inspection another bounded rank polynomial ${Q}$ such that ${d^{s+1} P}$ is equal to ${d^{s+1} Q}$. Thus ${P}$ differs from the bounded rank polynomial ${Q}$ by a lower degree error, which is automatically of bounded rank also, and the claim follows.

The trickiest thing to establish is the division by ${p}$ claim. The polynomial ${Q}$ is some function ${F(R_1,\ldots,R_m)}$ of lower degree polynomials ${R_1,\ldots,R_m}$. Ideally, one would like to find a function ${F'(R_1,\ldots,R_m)}$ of the same polynomials with ${pF' = F}$, such that ${F'(R_1,\ldots,R_m)}$ has the correct degree; however, we have counterexamples that show that this is not always possible. (These counterexamples are the main obstruction to making the ergodic theory approach work: in ergodic theory, one is only allowed to work with “measurable” functions, which are roughly analogous in this context to functions of the indicated polynomials ${Q, R_1,\ldots,R_m}$ and their shifts.) To get around this we have to first apply a regularity lemma to place ${R_1,\ldots,R_m}$ in a suitably equidistributed form (although the fact that ${R_1,\ldots,R_m}$ may be non-classical leads to a rather messy and technical description of this equidistribution), and then we have to extend each ${R_j}$ to a higher degree polynomial ${R'_j}$ with ${pR'_j = R_j}$. There is a crucial “exact roots” property of polynomials that allows one to do this, with ${R'_j}$ having degree exactly ${p-1}$ higher than ${R_j}$. It turns out that it is possible to find a function ${P = F'(R'_1,\ldots,R'_m)}$ of these extended polynomials that have the right degree and which solves the required equation ${pP=Q}$; this is established by classifying completely all functions of the equidistributed polynomials ${R_1,\ldots,R_m}$ or ${R'_1,\ldots,R'_m}$ that are of a given degree.

In the previous lectures, we have focused mostly on the equidistribution or linear patterns on a subset of the integers ${{\bf Z}}$, and in particular on intervals ${[N]}$. The integers are of course a very important domain to study in additive combinatorics; but there are also other fundamental model examples of domains to study. One of these is that of a vector space ${V}$ over a finite field ${{\bf F} = {\bf F}_p}$ of prime order. Such domains are of interest in computer science (particularly when ${p=2}$) and also in number theory; but they also serve as an important simplified “dyadic model” for the integers. See this survey article of Green for further discussion of this point.

The additive combinatorics of the integers ${{\bf Z}}$, and of vector spaces ${V}$ over finite fields, are analogous, but not quite identical. For instance, the analogue of an arithmetic progression in ${{\bf Z}}$ is a subspace of ${V}$. In many cases, the finite field theory is a little bit simpler than the integer theory; for instance, subspaces are closed under addition, whereas arithmetic progressions are only “almost” closed under addition in various senses. (For instance, ${[N]}$ is closed under addition approximately half of the time.) However, there are some ways in which the integers are better behaved. For instance, because the integers can be generated by a single generator, a homomorphism from ${{\bf Z}}$ to some other group ${G}$ can be described by a single group element ${g}$: ${n \mapsto g^n}$. However, to specify a homomorphism from a vector space ${V}$ to ${G}$ one would need to specify one group element for each dimension of ${V}$. Thus we see that there is a tradeoff when passing from ${{\bf Z}}$ (or ${[N]}$) to a vector space model; one gains a bounded torsion property, at the expense of conceding the bounded generation property. (Of course, if one wants to deal with arbitrarily large domains, one has to concede one or the other; the only additive groups that have both bounded torsion and boundedly many generators, are bounded.)

The starting point for this course (Notes 1) was the study of equidistribution of polynomials ${P: {\bf Z} \rightarrow {\bf R}/{\bf Z}}$ from the integers to the unit circle. We now turn to the parallel theory of equidistribution of polynomials ${P: V \rightarrow {\bf R}/{\bf Z}}$ from vector spaces over finite fields to the unit circle. Actually, for simplicity we will mostly focus on the classical case, when the polynomials in fact take values in the ${p^{th}}$ roots of unity (where ${p}$ is the characteristic of the field ${{\bf F} = {\bf F}_p}$). As it turns out, the non-classical case is also of importance (particularly in low characteristic), but the theory is more difficult; see these notes for some further discussion.

Jean-Pierre Serre (whose papers are, of course, always worth reading) recently posted a lovely lecture on the arXiv entitled “How to use finite fields for problems concerning infinite fields”. In it, he describes several ways in which algebraic statements over fields of zero characteristic, such as ${{\mathbb C}}$, can be deduced from their positive characteristic counterparts such as ${F_{p^m}}$, despite the fact that there is no non-trivial field homomorphism between the two types of fields. In particular finitary tools, including such basic concepts as cardinality, can now be deployed to establish infinitary results. This leads to some simple and elegant proofs of non-trivial algebraic results which are not easy to establish by other means.

One deduction of this type is based on the idea that positive characteristic fields can partially model zero characteristic fields, and proceeds like this: if a certain algebraic statement failed over (say) ${{\mathbb C}}$, then there should be a “finitary algebraic” obstruction that “witnesses” this failure over ${{\mathbb C}}$. Because this obstruction is both finitary and algebraic, it must also be definable in some (large) finite characteristic, thus leading to a comparable failure over a finite characteristic field. Taking contrapositives, one obtains the claim.

Algebra is definitely not my own field of expertise, but it is interesting to note that similar themes have also come up in my own area of additive combinatorics (and more generally arithmetic combinatorics), because the combinatorics of addition and multiplication on finite sets is definitely of a “finitary algebraic” nature. For instance, a recent paper of Vu, Wood, and Wood establishes a finitary “Freiman-type” homomorphism from (finite subsets of) the complex numbers to large finite fields that allows them to pull back many results in arithmetic combinatorics in finite fields (e.g. the sum-product theorem) to the complex plane. (Van Vu and I also used a similar trick to control the singularity property of random sign matrices by first mapping them into finite fields in which cardinality arguments became available.) And I have a particular fondness for correspondences between finitary and infinitary mathematics; the correspondence Serre discusses is slightly different from the one I discuss for instance in here or here, although there seems to be a common theme of “compactness” (or of model theory) tying these correspondences together.

As one of his examples, Serre cites one of my own favourite results in algebra, discovered independently by Ax and by Grothendieck (and then rediscovered many times since). Here is a special case of that theorem:

Theorem 1 (Ax-Grothendieck theorem, special case) Let ${P: {\mathbb C}^n \rightarrow {\mathbb C}^n}$ be a polynomial map from a complex vector space to itself. If ${P}$ is injective, then ${P}$ is bijective.

The full version of the theorem allows one to replace ${{\mathbb C}^n}$ by an algebraic variety ${X}$ over any algebraically closed field, and for ${P}$ to be an morphism from the algebraic variety ${X}$ to itself, but for simplicity I will just discuss the above special case. This theorem is not at all obvious; it is not too difficult (see Lemma 4 below) to show that the Jacobian of ${P}$ is non-degenerate, but this does not come close to solving the problem since one would then be faced with the notorious Jacobian conjecture. Also, the claim fails if “polynomial” is replaced by “holomorphic”, due to the existence of Fatou-Bieberbach domains.

In this post I would like to give the proof of Theorem 1 based on finite fields as mentioned by Serre, as well as another elegant proof of Rudin that combines algebra with some elementary complex variable methods. (There are several other proofs of this theorem and its generalisations, for instance a topological proof by Borel, which I will not discuss here.)

Update, March 8: Some corrections to the finite field proof. Thanks to Matthias Aschenbrenner also for clarifying the relationship with Tarski’s theorem and some further references.

Vitaly Bergelson, Tamar Ziegler, and I have just uploaded to the arXiv our paper “An inverse theorem for the uniformity seminorms associated with the action of $F^\infty_p$“. This paper establishes the ergodic inverse theorems that are needed in our other recent paper to establish the inverse conjecture for the Gowers norms over finite fields in high characteristic (and to establish a partial result in low characteristic), as follows:

Theorem. Let ${\Bbb F}$ be a finite field of characteristic p.  Suppose that $X = (X,{\mathcal B},\mu)$ is a probability space with an ergodic measure-preserving action $(T_g)_{g \in {\Bbb F}^\omega}$ of ${\Bbb F}^\omega$.  Let $f \in L^\infty(X)$ be such that the Gowers-Host-Kra seminorm $\|f\|_{U^k(X)}$ (defined in a previous post) is non-zero.

1. In the high-characteristic case $p \geq k$, there exists a phase polynomial g of degree <k (as defined in the previous post) such that $|\int_X f \overline{g}\ d\mu| > 0$.
2. In general characteristic, there exists a phase polynomial of degree <C(k) for some C(k) depending only on k such that $|\int_X f \overline{g}\ d\mu| > 0$.

This theorem is closely analogous to a similar theorem of Host and Kra on ergodic actions of ${\Bbb Z}$, in which the role of phase polynomials is played by functions that arise from nilsystem factors of X.  Indeed, our arguments rely heavily on the machinery of Host and Kra.

The paper is rather technical (60+ pages!) and difficult to describe in detail here, but I will try to sketch out (in very broad brush strokes) what the key steps in the proof of part 2 of the theorem are.  (Part 1 is similar but requires a more delicate analysis at various stages, keeping more careful track of the degrees of various polynomials.)

Let $k \geq 0$ be an integer.  The concept of a polynomial $P: {\Bbb R} \to {\Bbb R}$ of one variable of degree $ (or $\leq k-1$) can be defined in one of two equivalent ways:

• (Global definition) $P: {\Bbb R} \to {\Bbb R}$ is a polynomial of degree $ iff it can be written in the form $P(x) = \sum_{0 \leq j < k} c_j x^j$ for some coefficients $c_j \in {\Bbb R}$.
• (Local definition) $P: {\Bbb R} \to {\Bbb R}$ is a polynomial of degree $ if it is k-times continuously differentiable and $\frac{d^k}{dx^k} P \equiv 0$.

From single variable calculus we know that if P is a polynomial in the global sense, then it is a polynomial in the local sense; conversely, if P is a polynomial in the local sense, then from the Taylor series expansion

$\displaystyle P(x) = \sum_{0 \leq j < k} \frac{P^{(j)}(0)}{j!} x^j$

we see that P is a polynomial in the global sense. We make the trivial remark that we have no difficulty dividing by $j!$ here, because the field ${\Bbb R}$ is of characteristic zero.

The above equivalence carries over to higher dimensions:

• (Global definition) $P: {\Bbb R}^n \to {\Bbb R}$ is a polynomial of degree $ iff it can be written in the form $P(x_1,\ldots,x_n) = \sum_{0 \leq j_1,\ldots,j_n; j_1+\ldots+j_n < k} c_{j_1,\ldots,j_n} x_1^{j_1} \ldots x_n^{j_n}$ for some coefficients $c_{j_1,\ldots,j_n} \in {\Bbb R}$.
• (Local definition) $P: {\Bbb R}^n \to {\Bbb R}$ is a polynomial of degree $ if it is k-times continuously differentiable and $(h_1 \cdot \nabla) \ldots (h_k \cdot \nabla) P \equiv 0$ for all $h_1,\ldots,h_k \in {\Bbb R}^n$.

Again, it is not difficult to use several variable calculus to show that these two definitions of a polynomial are equivalent.

The purpose of this (somewhat technical) post here is to record some basic analogues of the above facts in finite characteristic, in which the underlying domain of the polynomial P is F or $F^n$ for some finite field F.  In the “classical” case when the range of P is also the field F, it is a well-known fact (which we reproduce here) that the local and global definitions of polynomial are equivalent.  But in the “non-classical” case, when P ranges in a more general group (and in particular in the unit circle ${\Bbb R}/{\Bbb Z}$), the global definition needs to be corrected somewhat by adding some new monomials to the classical ones $x_1^{j_1} \ldots x_n^{j_n}$.  Once one does this, one can recover the equivalence between the local and global definitions.

(The results here are derived from forthcoming work with Vitaly Bergelson and Tamar Ziegler.)

Tamar Ziegler and I have just uploaded to the arXiv our paper, “The inverse conjecture for the Gowers norm over finite fields via the correspondence principle“, submitted to Analysis & PDE.  As announced a few months ago in this blog post, this paper establishes (most of) the inverse conjecture for the Gowers norm from an ergodic theory analogue of this conjecture (in a forthcoming paper by Vitaly Bergelson, Tamar Ziegler, and myself, which should be ready shortly), using a variant of the Furstenberg correspondence principle.  Our papers were held up for a while due to some unexpected technical difficulties arising in the low characteristic case; as a consequence, our paper only establishes the full inverse conjecture in the high characteristic case $p \geq k$, and gives a partial result in the low characteristic case $p < k$.

In the rest of this post, I would like to describe the inverse conjecture (in both combinatorial and ergodic forms), and sketch how one deduces one from the other via the correspondence principle (together with two additional ingredients, namely a statistical sampling lemma and a local testability result for polynomials).