You are currently browsing the tag archive for the ‘Vitaly Bergelson’ tag.

Vitaly Bergelson, Tamar Ziegler, and I have just uploaded to the arXiv our joint paper “Multiple recurrence and convergence results associated to ${{\bf F}_{p}^{\omega}}$-actions“. This paper is primarily concerned with limit formulae in the theory of multiple recurrence in ergodic theory. Perhaps the most basic formula of this type is the mean ergodic theorem, which (among other things) asserts that if ${(X,{\mathcal X}, \mu,T)}$ is a measure-preserving ${{\bf Z}}$-system (which, in this post, means that ${(X,{\mathcal X}, \mu)}$ is a probability space and ${T: X \mapsto X}$ is measure-preserving and invertible, thus giving an action ${(T^n)_{n \in {\bf Z}}}$ of the integers), and ${f,g \in L^2(X,{\mathcal X}, \mu)}$ are functions, and ${X}$ is ergodic (which means that ${L^2(X,{\mathcal X}, \mu)}$ contains no ${T}$-invariant functions other than the constants (up to almost everywhere equivalence, of course)), then the average

$\displaystyle \frac{1}{N} \sum_{n=1}^N \int_X f(x) g(T^n x)\ d\mu \ \ \ \ \ (1)$

converges as ${N \rightarrow \infty}$ to the expression

$\displaystyle (\int_X f(x)\ d\mu) (\int_X g(x)\ d\mu);$

see e.g. this previous blog post. Informally, one can interpret this limit formula as an equidistribution result: if ${x}$ is drawn at random from ${X}$ (using the probability measure ${\mu}$), and ${n}$ is drawn at random from ${\{1,\ldots,N\}}$ for some large ${N}$, then the pair ${(x, T^n x)}$ becomes uniformly distributed in the product space ${X \times X}$ (using product measure ${\mu \times \mu}$) in the limit as ${N \rightarrow \infty}$.

If we allow ${(X,\mu)}$ to be non-ergodic, then we still have a limit formula, but it is a bit more complicated. Let ${{\mathcal X}^T}$ be the ${T}$-invariant measurable sets in ${{\mathcal X}}$; the ${{\bf Z}}$-system ${(X, {\mathcal X}^T, \mu, T)}$ can then be viewed as a factor of the original system ${(X, {\mathcal X}, \mu, T)}$, which is equivalent (in the sense of measure-preserving systems) to a trivial system ${(Z_0, {\mathcal Z}_0, \mu_{Z_0}, 1)}$ (known as the invariant factor) in which the shift is trivial. There is then a projection map ${\pi_0: X \rightarrow Z_0}$ to the invariant factor which is a factor map, and the average (1) converges in the limit to the expression

$\displaystyle \int_{Z_0} (\pi_0)_* f(z) (\pi_0)_* g(z)\ d\mu_{Z_0}(x), \ \ \ \ \ (2)$

where ${(\pi_0)_*: L^2(X,{\mathcal X},\mu) \rightarrow L^2(Z_0,{\mathcal Z}_0,\mu_{Z_0})}$ is the pushforward map associated to the map ${\pi_0: X \rightarrow Z_0}$; see e.g. this previous blog post. We can interpret this as an equidistribution result. If ${(x,T^n x)}$ is a pair as before, then we no longer expect complete equidistribution in ${X \times X}$ in the non-ergodic, because there are now non-trivial constraints relating ${x}$ with ${T^n x}$; indeed, for any ${T}$-invariant function ${f: X \rightarrow {\bf C}}$, we have the constraint ${f(x) = f(T^n x)}$; putting all these constraints together we see that ${\pi_0(x) = \pi_0(T^n x)}$ (for almost every ${x}$, at least). The limit (2) can be viewed as an assertion that this constraint ${\pi_0(x) = \pi_0(T^n x)}$ are in some sense the “only” constraints between ${x}$ and ${T^n x}$, and that the pair ${(x,T^n x)}$ is uniformly distributed relative to these constraints.

Limit formulae are known for multiple ergodic averages as well, although the statement becomes more complicated. For instance, consider the expression

$\displaystyle \frac{1}{N} \sum_{n=1}^N \int_X f(x) g(T^n x) h(T^{2n} x)\ d\mu \ \ \ \ \ (3)$

for three functions ${f,g,h \in L^\infty(X, {\mathcal X}, \mu)}$; this is analogous to the combinatorial task of counting length three progressions in various sets. For simplicity we assume the system ${(X,{\mathcal X},\mu,T)}$ to be ergodic. Naively one might expect this limit to then converge to

$\displaystyle (\int_X f\ d\mu) (\int_X g\ d\mu) (\int_X h\ d\mu)$

which would roughly speaking correspond to an assertion that the triplet ${(x,T^n x, T^{2n} x)}$ is asymptotically equidistributed in ${X \times X \times X}$. However, even in the ergodic case there can be additional constraints on this triplet that cannot be seen at the level of the individual pairs ${(x,T^n x)}$, ${(x, T^{2n} x)}$. The key obstruction here is that of eigenfunctions of the shift ${T: X \rightarrow X}$, that is to say non-trivial functions ${f: X \rightarrow S^1}$ that obey the eigenfunction equation ${Tf = \lambda f}$ almost everywhere for some constant (or ${T}$-invariant) ${\lambda}$. Each such eigenfunction generates a constraint

$\displaystyle f(x) \overline{f(T^n x)}^2 f(T^{2n} x) = 1 \ \ \ \ \ (4)$

tying together ${x}$, ${T^n x}$, and ${T^{2n} x}$. However, it turns out that these are in some sense the only constraints on ${x,T^n x, T^{2n} x}$ that are relevant for the limit (3). More precisely, if one sets ${{\mathcal X}_1}$ to be the sub-algebra of ${{\mathcal X}}$ generated by the eigenfunctions of ${T}$, then it turns out that the factor ${(X, {\mathcal X}_1, \mu, T)}$ is isomorphic to a shift system ${(Z_1, {\mathcal Z}_1, \mu_{Z_1}, x \mapsto x+\alpha)}$ known as the Kronecker factor, for some compact abelian group ${Z_1 = (Z_1,+)}$ and some (irrational) shift ${\alpha \in Z_1}$; the factor map ${\pi_1: X \rightarrow Z_1}$ pushes eigenfunctions forward to (affine) characters on ${Z_1}$. It is then known that the limit of (3) is

$\displaystyle \int_\Sigma (\pi_1)_* f(x_0) (\pi_1)_* g(x_1) (\pi_1)_* h(x_2)\ d\mu_\Sigma$

where ${\Sigma \subset Z_1^3}$ is the closed subgroup

$\displaystyle \Sigma = \{ (x_1,x_2,x_3) \in Z_1^3: x_1-2x_2+x_3=0 \}$

and ${\mu_\Sigma}$ is the Haar probability measure on ${\Sigma}$; see this previous blog post. The equation ${x_1-2x_2+x_3=0}$ defining ${\Sigma}$ corresponds to the constraint (4) mentioned earlier. Among other things, this limit formula implies Roth’s theorem, which in the context of ergodic theory is the assertion that the limit (or at least the limit inferior) of (3) is positive when ${f=g=h}$ is non-negative and not identically vanishing.

If one considers a quadruple average

$\displaystyle \frac{1}{N} \sum_{n=1}^N \int_X f(x) g(T^n x) h(T^{2n} x) k(T^{3n} x)\ d\mu \ \ \ \ \ (5)$

(analogous to counting length four progressions) then the situation becomes more complicated still, even in the ergodic case. In addition to the (linear) eigenfunctions that already showed up in the computation of the triple average (3), a new type of constraint also arises from quadratic eigenfunctions ${f: X \rightarrow S^1}$, which obey an eigenfunction equation ${Tf = \lambda f}$ in which ${\lambda}$ is no longer constant, but is now a linear eigenfunction. For such functions, ${f(T^n x)}$ behaves quadratically in ${n}$, and one can compute the existence of a constraint

$\displaystyle f(x) \overline{f(T^n x)}^3 f(T^{2n} x)^3 \overline{f(T^{3n} x)} = 1 \ \ \ \ \ (6)$

between ${x}$, ${T^n x}$, ${T^{2n} x}$, and ${T^{3n} x}$ that is not detected at the triple average level. As it turns out, this is not the only type of constraint relevant for (5); there is a more general class of constraint involving two-step nilsystems which we will not detail here, but see e.g. this previous blog post for more discussion. Nevertheless there is still a similar limit formula to previous examples, involving a special factor ${(Z_2, {\mathcal Z}_2, \mu_{Z_2}, S)}$ which turns out to be an inverse limit of two-step nilsystems; this limit theorem can be extracted from the structural theory in this paper of Host and Kra combined with a limit formula for nilsystems obtained by Lesigne, but will not be reproduced here. The pattern continues to higher averages (and higher step nilsystems); this was first done explicitly by Ziegler, and can also in principle be extracted from the structural theory of Host-Kra combined with nilsystem equidistribution results of Leibman. These sorts of limit formulae can lead to various recurrence results refining Roth’s theorem in various ways; see this paper of Bergelson, Host, and Kra for some examples of this.

The above discussion was concerned with ${{\bf Z}}$-systems, but one can adapt much of the theory to measure-preserving ${G}$-systems for other discrete countable abelian groups ${G}$, in which one now has a family ${(T_g)_{g \in G}}$ of shifts indexed by ${G}$ rather than a single shift, obeying the compatibility relation ${T_{g+h}=T_g T_h}$. The role of the intervals ${\{1,\ldots,N\}}$ in this more general setting is replaced by that of Folner sequences. For arbitrary countable abelian ${G}$, the theory for double averages (1) and triple limits (3) is essentially identical to the ${{\bf Z}}$-system case. But when one turns to quadruple and higher limits, the situation becomes more complicated (and, for arbitrary ${G}$, still not fully understood). However one model case which is now well understood is the finite field case when ${G = {\bf F}_p^\omega = \bigcup_{n=1}^\infty {\bf F}_p^n}$ is an infinite-dimensional vector space over a finite field ${{\bf F}_p}$ (with the finite subspaces ${{\bf F}_p^n}$ then being a good choice for the Folner sequence). Here, the analogue of the structural theory of Host and Kra was worked out by Vitaly, Tamar, and myself in these previous papers (treating the high characteristic and low characteristic cases respectively). In the finite field setting, it turns out that nilsystems no longer appear, and one only needs to deal with linear, quadratic, and higher order eigenfunctions (known collectively as phase polynomials). It is then natural to look for a limit formula that asserts, roughly speaking, that if ${x}$ is drawn at random from a ${{\bf F}_p^\omega}$-system and ${n}$ drawn randomly from a large subspace of ${{\bf F}_p^\omega}$, then the only constraints between ${x, T^n x, \ldots, T^{(p-1)n} x}$ are those that arise from phase polynomials. The main theorem of this paper is to establish this limit formula (which, again, is a little complicated to state explicitly and will not be done here). In particular, we establish for the first time that the limit actually exists (a result which, for ${{\bf Z}}$-systems, was one of the main results of this paper of Host and Kra).

As a consequence, we can recover finite field analogues of most of the results of Bergelson-Host-Kra, though interestingly some of the counterexamples demonstrating sharpness of their results for ${{\bf Z}}$-systems (based on Behrend set constructions) do not seem to be present in the finite field setting (cf. this previous blog post on the cap set problem). In particular, we are able to largely settle the question of when one has a Khintchine-type theorem that asserts that for any measurable set ${A}$ in an ergodic ${{\bf F}_p^\omega}$-system and any ${\epsilon>0}$, one has

$\displaystyle \mu( T_{c_1 n} A \cap \ldots \cap T_{c_k n} A ) > \mu(A)^k - \epsilon$

for a syndetic set of ${n}$, where ${c_1,\ldots,c_k \in {\bf F}_p}$ are distinct residue classes. It turns out that Khintchine-type theorems always hold for ${k=1,2,3}$ (and for ${k=1,2}$ ergodicity is not required), and for ${k=4}$ it holds whenever ${c_1,c_2,c_3,c_4}$ form a parallelogram, but not otherwise (though the counterexample here was such a painful computation that we ended up removing it from the paper, and may end up putting it online somewhere instead), and for larger ${k}$ we could show that the Khintchine property failed for generic choices of ${c_1,\ldots,c_k}$, though the problem of determining exactly the tuples for which the Khintchine property failed looked to be rather messy and we did not completely settle it.

I’ve just uploaded to the arXiv my joint paper with Vitaly Bergelson, “Multiple recurrence in quasirandom groups“, which is submitted to Geom. Func. Anal.. This paper builds upon a paper of Gowers in which he introduced the concept of a quasirandom group, and established some mixing (or recurrence) properties of such groups. A ${D}$-quasirandom group is a finite group with no non-trivial unitary representations of dimension at most ${D}$. We will informally refer to a “quasirandom group” as a ${D}$-quasirandom group with the quasirandomness parameter ${D}$ large (more formally, one can work with a sequence of ${D_n}$-quasirandom groups with ${D_n}$ going to infinity). A typical example of a quasirandom group is ${SL_2(F_p)}$ where ${p}$ is a large prime. Quasirandom groups are discussed in depth in this blog post. One of the key properties of quasirandom groups established in Gowers’ paper is the following “weak mixing” property: if ${A, B}$ are subsets of ${G}$, then for “almost all” ${g \in G}$, one has

$\displaystyle \mu( A \cap gB ) \approx \mu(A) \mu(B) \ \ \ \ \ (1)$

where ${\mu(A) := |A|/|G|}$ denotes the density of ${A}$ in ${G}$. Here, we use ${x \approx y}$ to informally represent an estimate of the form ${x=y+o(1)}$ (where ${o(1)}$ is a quantity that goes to zero when the quasirandomness parameter ${D}$ goes to infinity), and “almost all ${g \in G}$” denotes “for all ${g}$ in a subset of ${G}$ of density ${1-o(1)}$“. As a corollary, if ${A,B,C}$ have positive density in ${G}$ (by which we mean that ${\mu(A)}$ is bounded away from zero, uniformly in the quasirandomness parameter ${D}$, and similarly for ${B,C}$), then (if the quasirandomness parameter ${D}$ is sufficiently large) we can find elements ${g, x \in G}$ such that ${g \in A}$, ${x \in B}$, ${gx \in C}$. In fact we can find approximately ${\mu(A)\mu(B)\mu(C) |G|^2}$ such pairs ${(g,x)}$. To put it another way: if we choose ${g,x}$ uniformly and independently at random from ${G}$, then the events ${g \in A}$, ${x \in B}$, ${gx \in C}$ are approximately independent (thus the random variable ${(g,x,gx) \in G^3}$ resembles a uniformly distributed random variable on ${G^3}$ in some weak sense). One can also express this mixing property in integral form as

$\displaystyle \int_G \int_G f_1(g) f_2(x) f_3(gx)\ d\mu(g) d\mu(x) \approx (\int_G f_1\ d\mu) (\int_G f_2\ d\mu) (\int_G f_3\ d\mu)$

for any bounded functions ${f_1,f_2,f_3: G \rightarrow {\bf R}}$. (Of course, with ${G}$ being finite, one could replace the integrals here by finite averages if desired.) Or in probabilistic language, we have

$\displaystyle \mathop{\bf E} f_1(g) f_2(x) f_3(gx) \approx \mathop{\bf E} f_1(x_1) f_2(x_2) f_3(x_3)$

where ${g, x, x_1, x_2, x_3}$ are drawn uniformly and independently at random from ${G}$.

As observed in Gowers’ paper, one can iterate this observation to find “parallelopipeds” of any given dimension in dense subsets of ${G}$. For instance, applying (1) with ${A,B,C}$ replaced by ${A \cap hB}$, ${C \cap hD}$, and ${E \cap hF}$ one can assert (after some relabeling) that for ${g,h,x}$ chosen uniformly and independently at random from ${G}$, the events ${g \in A}$, ${h \in B}$, ${gh \in C}$, ${x \in D}$, ${gx \in E}$, ${hx \in F}$, ${ghx \in H}$ are approximately independent whenever ${A,B,C,D,E,F,H}$ are dense subsets of ${G}$; thus the tuple ${(g,h,gh,x,gh,hx,ghx)}$ resebles a uniformly distributed random variable in ${G^7}$ in some weak sense.

However, there are other tuples for which the above iteration argument does not seem to apply. One of the simplest tuples in this vein is the tuple ${(g, x, xg, gx)}$ in ${G^4}$, when ${g, x}$ are drawn uniformly at random from a quasirandom group ${G}$. Here, one does not expect the tuple to behave as if it were uniformly distributed in ${G^4}$, because there is an obvious constraint connecting the last two components ${gx, xg}$ of this tuple: they must lie in the same conjugacy class! In particular, if ${A}$ is a subset of ${G}$ that is the union of conjugacy classes, then the events ${gx \in A}$, ${xg \in A}$ are perfectly correlated, so that ${\mu( gx \in A, xg \in A)}$ is equal to ${\mu(A)}$ rather than ${\mu(A)^2}$. Our main result, though, is that in a quasirandom group, this is (approximately) the only constraint on the tuple. More precisely, we have

Theorem 1 Let ${G}$ be a ${D}$-quasirandom group, and let ${g, x}$ be drawn uniformly at random from ${G}$. Then for any ${f_1,f_2,f_3,f_4: G \rightarrow [-1,1]}$, we have

$\displaystyle \mathop{\bf E} f_1(g) f_2(x) f_3(gx) f_4(xg) = \mathop{\bf E} f_1(x_1) f_2(x_2) f_3(x_3) f_4(x_4) + o(1)$

where ${o(1)}$ goes to zero as ${D \rightarrow \infty}$, ${x_1,x_2,x_3}$ are drawn uniformly and independently at random from ${G}$, and ${x_4}$ is drawn uniformly at random from the conjugates of ${x_3}$ for each fixed choice of ${x_1,x_2,x_3}$.

This is the probabilistic formulation of the above theorem; one can also phrase the theorem in other formulations (such as an integral formulation), and this is detailed in the paper. This theorem leads to a number of recurrence results; for instance, as a corollary of this result, we have

$\displaystyle \mu(A) \mu(B)^2 - o(1) \leq \mu( A \cap gB \cap Bg ) \leq \mu(A) \mu(B) + o(1)$

for almost all ${g \in G}$, and any dense subsets ${A, B}$ of ${G}$; the lower and upper bounds are sharp, with the lower bound being attained when ${B}$ is randomly distributed, and the upper bound when ${B}$ is conjugation-invariant.

To me, the more interesting thing here is not the result itself, but how it is proven. Vitaly and I were not able to find a purely finitary way to establish this mixing theorem. Instead, we had to first use the machinery of ultraproducts (as discussed in this previous post) to convert the finitary statement about a quasirandom group to an infinitary statement about a type of infinite group which we call an ultra quasirandom group (basically, an ultraproduct of increasingly quasirandom finite groups). This is analogous to how the Furstenberg correspondence principle is used to convert a finitary combinatorial problem into an infinitary ergodic theory problem.

Ultra quasirandom groups come equipped with a finite, countably additive measure known as Loeb measure ${\mu_G}$, which is very analogous to the Haar measure of a compact group, except that in the case of ultra quasirandom groups one does not quite have a topological structure that would give compactness. Instead, one has a slightly weaker structure known as a ${\sigma}$-topology, which is like a topology except that open sets are only closed under countable unions rather than arbitrary ones. There are some interesting measure-theoretic and topological issues regarding the distinction between topologies and ${\sigma}$-topologies (and between Haar measure and Loeb measure), but for this post it is perhaps best to gloss over these issues and pretend that ultra quasirandom groups ${G}$ come with a Haar measure. One can then recast Theorem 1 as a mixing theorem for the left and right actions of the ultra approximate group ${G}$ on itself, which roughly speaking is the assertion that

$\displaystyle \int_G f_1(x) L_g f_2(x) L_g R_g f_3(x)\ d\mu_G(x) \approx 0 \ \ \ \ \ (2)$

for “almost all” ${g \in G}$, if ${f_1, f_2, f_3}$ are bounded measurable functions on ${G}$, with ${f_3}$ having zero mean on all conjugacy classes of ${G}$, where ${L_g, R_g}$ are the left and right translation operators

$\displaystyle L_g f(x) := f(g^{-1} x); \quad R_g f(x) := f(xg).$

To establish this mixing theorem, we use the machinery of idempotent ultrafilters, which is a particularly useful tool for understanding the ergodic theory of actions of countable groups ${G}$ that need not be amenable; in the non-amenable setting the classical ergodic averages do not make much sense, but ultrafilter-based averages are still available. To oversimplify substantially, the idempotent ultrafilter arguments let one establish mixing estimates of the form (2) for “many” elements ${g}$ of an infinite-dimensional parallelopiped known as an IP system (provided that the actions ${L_g,R_g}$ of this IP system obey some technical mixing hypotheses, but let’s ignore that for sake of this discussion). The claim then follows by using the quasirandomness hypothesis to show that if the estimate (2) failed for a large set of ${g \in G}$, then this large set would then contain an IP system, contradicting the previous claim.

Idempotent ultrafilters are an extremely infinitary type of mathematical object (one has to use Zorn’s lemma no fewer than three times just to construct one of these objects!). So it is quite remarkable that they can be used to establish a finitary theorem such as Theorem 1, though as is often the case with such infinitary arguments, one gets absolutely no quantitative control whatsoever on the error terms ${o(1)}$ appearing in that theorem. (It is also mildly amusing to note that our arguments involve the use of ultrafilters in two completely different ways: firstly in order to set up the ultraproduct that converts the finitary mixing problem to an infinitary one, and secondly to solve the infinitary mixing problem. Despite some superficial similarities, there appear to be no substantial commonalities between these two usages of ultrafilters.) There is already a fair amount of literature on using idempotent ultrafilter methods in infinitary ergodic theory, and perhaps by further development of ultraproduct correspondence principles, one can use such methods to obtain further finitary consequences (although the state of the art for idempotent ultrafilter ergodic theory has not advanced much beyond the analysis of two commuting shifts ${L_g, R_g}$ currently, which is the main reason why our arguments only handle the pattern ${(g,x,xg,gx)}$ and not more sophisticated patterns).

We also have some miscellaneous other results in the paper. It turns out that by using the triangle removal lemma from graph theory, one can obtain a recurrence result that asserts that whenever ${A}$ is a dense subset of a finite group ${G}$ (not necessarily quasirandom), then there are ${\gg |G|^2}$ pairs ${(x,g)}$ such that ${x, gx, xg}$ all lie in ${A}$. Using a hypergraph generalisation of the triangle removal lemma known as the hypergraph removal lemma, one can obtain more complicated versions of this statement; for instance, if ${A}$ is a dense subset of ${G^2}$, then one can find ${\gg |G|^2}$ triples ${(x,y,g)}$ such that ${(x,y), (gx, y), (gx, gy), (gxg^{-1}, gyg^{-1})}$ all lie in ${A}$. But the method is tailored to the specific types of patterns given here, and we do not have a general method for obtaining recurrence or mixing properties for arbitrary patterns of words in some finite alphabet such as ${g,x,y}$.

We also give some properties of a model example of an ultra quasirandom group, namely the ultraproduct ${SL_2(F)}$ of ${SL_2(F_{p_n})}$ where ${p_n}$ is a sequence of primes going off to infinity. Thanks to the substantial recent progress (by Helfgott, Bourgain, Gamburd, Breuillard, and others) on understanding the expansion properties of the finite groups ${SL_2(F_{p_n})}$, we have a fair amount of knowledge on the ultraproduct ${SL_2(F)}$ as well; for instance any two elements of ${SL_2(F)}$ will almost surely generate a spectral gap. We don’t have any direct application of this particular ultra quasirandom group, but it might be interesting to study it further.

Vitaly Bergelson, Tamar Ziegler, and I have just uploaded to the arXiv our paper “An inverse theorem for the uniformity seminorms associated with the action of $F^\infty_p$“. This paper establishes the ergodic inverse theorems that are needed in our other recent paper to establish the inverse conjecture for the Gowers norms over finite fields in high characteristic (and to establish a partial result in low characteristic), as follows:

Theorem. Let ${\Bbb F}$ be a finite field of characteristic p.  Suppose that $X = (X,{\mathcal B},\mu)$ is a probability space with an ergodic measure-preserving action $(T_g)_{g \in {\Bbb F}^\omega}$ of ${\Bbb F}^\omega$.  Let $f \in L^\infty(X)$ be such that the Gowers-Host-Kra seminorm $\|f\|_{U^k(X)}$ (defined in a previous post) is non-zero.

1. In the high-characteristic case $p \geq k$, there exists a phase polynomial g of degree <k (as defined in the previous post) such that $|\int_X f \overline{g}\ d\mu| > 0$.
2. In general characteristic, there exists a phase polynomial of degree <C(k) for some C(k) depending only on k such that $|\int_X f \overline{g}\ d\mu| > 0$.

This theorem is closely analogous to a similar theorem of Host and Kra on ergodic actions of ${\Bbb Z}$, in which the role of phase polynomials is played by functions that arise from nilsystem factors of X.  Indeed, our arguments rely heavily on the machinery of Host and Kra.

The paper is rather technical (60+ pages!) and difficult to describe in detail here, but I will try to sketch out (in very broad brush strokes) what the key steps in the proof of part 2 of the theorem are.  (Part 1 is similar but requires a more delicate analysis at various stages, keeping more careful track of the degrees of various polynomials.)

Let $k \geq 0$ be an integer.  The concept of a polynomial $P: {\Bbb R} \to {\Bbb R}$ of one variable of degree $ (or $\leq k-1$) can be defined in one of two equivalent ways:

• (Global definition) $P: {\Bbb R} \to {\Bbb R}$ is a polynomial of degree $ iff it can be written in the form $P(x) = \sum_{0 \leq j < k} c_j x^j$ for some coefficients $c_j \in {\Bbb R}$.
• (Local definition) $P: {\Bbb R} \to {\Bbb R}$ is a polynomial of degree $ if it is k-times continuously differentiable and $\frac{d^k}{dx^k} P \equiv 0$.

From single variable calculus we know that if P is a polynomial in the global sense, then it is a polynomial in the local sense; conversely, if P is a polynomial in the local sense, then from the Taylor series expansion

$\displaystyle P(x) = \sum_{0 \leq j < k} \frac{P^{(j)}(0)}{j!} x^j$

we see that P is a polynomial in the global sense. We make the trivial remark that we have no difficulty dividing by $j!$ here, because the field ${\Bbb R}$ is of characteristic zero.

The above equivalence carries over to higher dimensions:

• (Global definition) $P: {\Bbb R}^n \to {\Bbb R}$ is a polynomial of degree $ iff it can be written in the form $P(x_1,\ldots,x_n) = \sum_{0 \leq j_1,\ldots,j_n; j_1+\ldots+j_n < k} c_{j_1,\ldots,j_n} x_1^{j_1} \ldots x_n^{j_n}$ for some coefficients $c_{j_1,\ldots,j_n} \in {\Bbb R}$.
• (Local definition) $P: {\Bbb R}^n \to {\Bbb R}$ is a polynomial of degree $ if it is k-times continuously differentiable and $(h_1 \cdot \nabla) \ldots (h_k \cdot \nabla) P \equiv 0$ for all $h_1,\ldots,h_k \in {\Bbb R}^n$.

Again, it is not difficult to use several variable calculus to show that these two definitions of a polynomial are equivalent.

The purpose of this (somewhat technical) post here is to record some basic analogues of the above facts in finite characteristic, in which the underlying domain of the polynomial P is F or $F^n$ for some finite field F.  In the “classical” case when the range of P is also the field F, it is a well-known fact (which we reproduce here) that the local and global definitions of polynomial are equivalent.  But in the “non-classical” case, when P ranges in a more general group (and in particular in the unit circle ${\Bbb R}/{\Bbb Z}$), the global definition needs to be corrected somewhat by adding some new monomials to the classical ones $x_1^{j_1} \ldots x_n^{j_n}$.  Once one does this, one can recover the equivalence between the local and global definitions.

(The results here are derived from forthcoming work with Vitaly Bergelson and Tamar Ziegler.)