You are currently browsing the tag archive for the ‘nonstandard analysis’ tag.

In graph theory, the recently developed theory of graph limits has proven to be a useful tool for analysing large dense graphs, being a convenient reformulation of the Szemerédi regularity lemma. Roughly speaking, the theory asserts that given any sequence {G_n = (V_n, E_n)} of finite graphs, one can extract a subsequence {G_{n_j} = (V_{n_j}, E_{n_j})} which converges (in a specific sense) to a continuous object known as a “graphon” – a symmetric measurable function {p\colon [0,1] \times [0,1] \rightarrow [0,1]}. What “converges” means in this context is that subgraph densities converge to the associated integrals of the graphon {p}. For instance, the edge density

\displaystyle  \frac{1}{|V_{n_j}|^2} |E_{n_j}|

converge to the integral

\displaystyle  \int_0^1 \int_0^1 p(x,y)\ dx dy,

the triangle density

\displaystyle  \frac{1}{|V_{n_j}|^3} \lvert \{ (v_1,v_2,v_3) \in V_{n_j}^3: \{v_1,v_2\}, \{v_2,v_3\}, \{v_3,v_1\} \in E_{n_j} \} \rvert

converges to the integral

\displaystyle  \int_0^1 \int_0^1 \int_0^1 p(x_1,x_2) p(x_2,x_3) p(x_3,x_1)\ dx_1 dx_2 dx_3,

the four-cycle density

\displaystyle  \frac{1}{|V_{n_j}|^4} \lvert \{ (v_1,v_2,v_3,v_4) \in V_{n_j}^4: \{v_1,v_2\}, \{v_2,v_3\}, \{v_3,v_4\}, \{v_4,v_1\} \in E_{n_j} \} \rvert

converges to the integral

\displaystyle  \int_0^1 \int_0^1 \int_0^1 \int_0^1 p(x_1,x_2) p(x_2,x_3) p(x_3,x_4) p(x_4,x_1)\ dx_1 dx_2 dx_3 dx_4,

and so forth. One can use graph limits to prove many results in graph theory that were traditionally proven using the regularity lemma, such as the triangle removal lemma, and can also reduce many asymptotic graph theory problems to continuous problems involving multilinear integrals (although the latter problems are not necessarily easy to solve!). See this text of Lovasz for a detailed study of graph limits and their applications.

One can also express graph limits (and more generally hypergraph limits) in the language of nonstandard analysis (or of ultraproducts); see for instance this paper of Elek and Szegedy, Section 6 of this previous blog post, or this paper of Towsner. (In this post we assume some familiarity with nonstandard analysis, as reviewed for instance in the previous blog post.) Here, one starts as before with a sequence {G_n = (V_n,E_n)} of finite graphs, and then takes an ultraproduct (with respect to some arbitrarily chosen non-principal ultrafilter {\alpha \in\beta {\bf N} \backslash {\bf N}}) to obtain a nonstandard graph {G_\alpha = (V_\alpha,E_\alpha)}, where {V_\alpha = \prod_{n\rightarrow \alpha} V_n} is the ultraproduct of the {V_n}, and similarly for the {E_\alpha}. The set {E_\alpha} can then be viewed as a symmetric subset of {V_\alpha \times V_\alpha} which is measurable with respect to the Loeb {\sigma}-algebra {{\mathcal L}_{V_\alpha \times V_\alpha}} of the product {V_\alpha \times V_\alpha} (see this previous blog post for the construction of Loeb measure). A crucial point is that this {\sigma}-algebra is larger than the product {{\mathcal L}_{V_\alpha} \times {\mathcal L}_{V_\alpha}} of the Loeb {\sigma}-algebra of the individual vertex set {V_\alpha}. This leads to a decomposition

\displaystyle  1_{E_\alpha} = p + e

where the “graphon” {p} is the orthogonal projection of {1_{E_\alpha}} onto {L^2( {\mathcal L}_{V_\alpha} \times {\mathcal L}_{V_\alpha} )}, and the “regular error” {e} is orthogonal to all product sets {A \times B} for {A, B \in {\mathcal L}_{V_\alpha}}. The graphon {p\colon V_\alpha \times V_\alpha \rightarrow [0,1]} then captures the statistics of the nonstandard graph {G_\alpha}, in exact analogy with the more traditional graph limits: for instance, the edge density

\displaystyle  \hbox{st} \frac{1}{|V_\alpha|^2} |E_\alpha|

(or equivalently, the limit of the {\frac{1}{|V_n|^2} |E_n|} along the ultrafilter {\alpha}) is equal to the integral

\displaystyle  \int_{V_\alpha} \int_{V_\alpha} p(x,y)\ d\mu_{V_\alpha}(x) d\mu_{V_\alpha}(y)

where {d\mu_V} denotes Loeb measure on a nonstandard finite set {V}; the triangle density

\displaystyle  \hbox{st} \frac{1}{|V_\alpha|^3} \lvert \{ (v_1,v_2,v_3) \in V_\alpha^3: \{v_1,v_2\}, \{v_2,v_3\}, \{v_3,v_1\} \in E_\alpha \} \rvert

(or equivalently, the limit along {\alpha} of the triangle densities of {E_n}) is equal to the integral

\displaystyle  \int_{V_\alpha} \int_{V_\alpha} \int_{V_\alpha} p(x_1,x_2) p(x_2,x_3) p(x_3,x_1)\ d\mu_{V_\alpha}(x_1) d\mu_{V_\alpha}(x_2) d\mu_{V_\alpha}(x_3),

and so forth. Note that with this construction, the graphon {p} is living on the Cartesian square of an abstract probability space {V_\alpha}, which is likely to be inseparable; but it is possible to cut down the Loeb {\sigma}-algebra on {V_\alpha} to minimal countable {\sigma}-algebra for which {p} remains measurable (up to null sets), and then one can identify {V_\alpha} with {[0,1]}, bringing this construction of a graphon in line with the traditional notion of a graphon. (See Remark 5 of this previous blog post for more discussion of this point.)

Additive combinatorics, which studies things like the additive structure of finite subsets {A} of an abelian group {G = (G,+)}, has many analogies and connections with asymptotic graph theory; in particular, there is the arithmetic regularity lemma of Green which is analogous to the graph regularity lemma of Szemerédi. (There is also a higher order arithmetic regularity lemma analogous to hypergraph regularity lemmas, but this is not the focus of the discussion here.) Given this, it is natural to suspect that there is a theory of “additive limits” for large additive sets of bounded doubling, analogous to the theory of graph limits for large dense graphs. The purpose of this post is to record a candidate for such an additive limit. This limit can be used as a substitute for the arithmetic regularity lemma in certain results in additive combinatorics, at least if one is willing to settle for qualitative results rather than quantitative ones; I give a few examples of this below the fold.

It seems that to allow for the most flexible and powerful manifestation of this theory, it is convenient to use the nonstandard formulation (among other things, it allows for full use of the transfer principle, whereas a more traditional limit formulation would only allow for a transfer of those quantities continuous with respect to the notion of convergence). Here, the analogue of a nonstandard graph is an ultra approximate group {A_\alpha} in a nonstandard group {G_\alpha = \prod_{n \rightarrow \alpha} G_n}, defined as the ultraproduct of finite {K}-approximate groups {A_n \subset G_n} for some standard {K}. (A {K}-approximate group {A_n} is a symmetric set containing the origin such that {A_n+A_n} can be covered by {K} or fewer translates of {A_n}.) We then let {O(A_\alpha)} be the external subgroup of {G_\alpha} generated by {A_\alpha}; equivalently, {A_\alpha} is the union of {A_\alpha^m} over all standard {m}. This space has a Loeb measure {\mu_{O(A_\alpha)}}, defined by setting

\displaystyle \mu_{O(A_\alpha)}(E_\alpha) := \hbox{st} \frac{|E_\alpha|}{|A_\alpha|}

whenever {E_\alpha} is an internal subset of {A_\alpha^m} for any standard {m}, and extended to a countably additive measure; the arguments in Section 6 of this previous blog post can be easily modified to give a construction of this measure.

The Loeb measure {\mu_{O(A_\alpha)}} is a translation invariant measure on {O(A_{\alpha})}, normalised so that {A_\alpha} has Loeb measure one. As such, one should think of {O(A_\alpha)} as being analogous to a locally compact abelian group equipped with a Haar measure. It should be noted though that {O(A_\alpha)} is not actually a locally compact group with Haar measure, for two reasons:

  • There is not an obvious topology on {O(A_\alpha)} that makes it simultaneously locally compact, Hausdorff, and {\sigma}-compact. (One can get one or two out of three without difficulty, though.)
  • The addition operation {+\colon O(A_\alpha) \times O(A_\alpha) \rightarrow O(A_\alpha)} is not measurable from the product Loeb algebra {{\mathcal L}_{O(A_\alpha)} \times {\mathcal L}_{O(A_\alpha)}} to {{\mathcal L}_{O(\alpha)}}. Instead, it is measurable from the coarser Loeb algebra {{\mathcal L}_{O(A_\alpha) \times O(A_\alpha)}} to {{\mathcal L}_{O(\alpha)}} (compare with the analogous situation for nonstandard graphs).

Nevertheless, the analogy is a useful guide for the arguments that follow.

Let {L(O(A_\alpha))} denote the space of bounded Loeb measurable functions {f\colon O(A_\alpha) \rightarrow {\bf C}} (modulo almost everywhere equivalence) that are supported on {A_\alpha^m} for some standard {m}; this is a complex algebra with respect to pointwise multiplication. There is also a convolution operation {\star\colon L(O(A_\alpha)) \times L(O(A_\alpha)) \rightarrow L(O(A_\alpha))}, defined by setting

\displaystyle  \hbox{st} f \star \hbox{st} g(x) := \hbox{st} \frac{1}{|A_\alpha|} \sum_{y \in A_\alpha^m} f(y) g(x-y)

whenever {f\colon A_\alpha^m \rightarrow {}^* {\bf C}}, {g\colon A_\alpha^l \rightarrow {}^* {\bf C}} are bounded nonstandard functions (extended by zero to all of {O(A_\alpha)}), and then extending to arbitrary elements of {L(O(A_\alpha))} by density. Equivalently, {f \star g} is the pushforward of the {{\mathcal L}_{O(A_\alpha) \times O(A_\alpha)}}-measurable function {(x,y) \mapsto f(x) g(y)} under the map {(x,y) \mapsto x+y}.

The basic structural theorem is then as follows.

Theorem 1 (Kronecker factor) Let {A_\alpha} be an ultra approximate group. Then there exists a (standard) locally compact abelian group {G} of the form

\displaystyle  G = {\bf R}^d \times {\bf Z}^m \times T

for some standard {d,m} and some compact abelian group {T}, equipped with a Haar measure {\mu_G} and a measurable homomorphism {\pi\colon O(A_\alpha) \rightarrow G} (using the Loeb {\sigma}-algebra on {O(A_\alpha)} and the Borel {\sigma}-algebra on {G}), with the following properties:

  • (i) {\pi} has dense image, and {\mu_G} is the pushforward of Loeb measure {\mu_{O(A_\alpha)}} by {\pi}.
  • (ii) There exists sets {\{0\} \subset U_0 \subset K_0 \subset G} with {U_0} open and {K_0} compact, such that

    \displaystyle  \pi^{-1}(U_0) \subset 4A_\alpha \subset \pi^{-1}(K_0). \ \ \ \ \ (1)

  • (iii) Whenever {K \subset U \subset G} with {K} compact and {U} open, there exists a nonstandard finite set {B} such that

    \displaystyle  \pi^{-1}(K) \subset B \subset \pi^{-1}(U). \ \ \ \ \ (2)

  • (iv) If {f, g \in L}, then we have the convolution formula

    \displaystyle  f \star g = \pi^*( (\pi_* f) \star (\pi_* g) ) \ \ \ \ \ (3)

    where {\pi_* f,\pi_* g} are the pushforwards of {f,g} to {L^2(G, \mu_G)}, the convolution {\star} on the right-hand side is convolution using {\mu_G}, and {\pi^*} is the pullback map from {L^2(G,\mu_G)} to {L^2(O(A_\alpha), \mu_{O(A_\alpha)})}. In particular, if {\pi_* f = 0}, then {f*g=0} for all {g \in L}.

One can view the locally compact abelian group {G} as a “model “or “Kronecker factor” for the ultra approximate group {A_\alpha} (in close analogy with the Kronecker factor from ergodic theory). In the case that {A_\alpha} is a genuine nonstandard finite group rather than an ultra approximate group, the non-compact components {{\bf R}^d \times {\bf Z}^m} of the Kronecker group {G} are trivial, and this theorem was implicitly established by Szegedy. The compact group {T} is quite large, and in particular is likely to be inseparable; but as with the case of graphons, when one is only studying at most countably many functions {f}, one can cut down the size of this group to be separable (or equivalently, second countable or metrisable) if desired, so one often works with a “reduced Kronecker factor” which is a quotient of the full Kronecker factor {G}.

Given any sequence of uniformly bounded functions {f_n\colon A_n^m \rightarrow {\bf C}} for some fixed {m}, we can view the function {f \in L} defined by

\displaystyle  f := \pi_* \hbox{st} \lim_{n \rightarrow \alpha} f_n \ \ \ \ \ (4)

as an “additive limit” of the {f_n}, in much the same way that graphons {p\colon V_\alpha \times V_\alpha \rightarrow [0,1]} are limits of the indicator functions {1_{E_n}\colon V_n \times V_n \rightarrow \{0,1\}}. The additive limits capture some of the statistics of the {f_n}, for instance the normalised means

\displaystyle  \frac{1}{|A_n|} \sum_{x \in A_n^m} f_n(x)

converge (along the ultrafilter {\alpha}) to the mean

\displaystyle  \int_G f(x)\ d\mu_G(x),

and for three sequences {f_n,g_n,h_n\colon A_n^m \rightarrow {\bf C}} of functions, the normalised correlation

\displaystyle  \frac{1}{|A_n|^2} \sum_{x,y \in A_n^m} f_n(x) g_n(y) h_n(x+y)

converges along {\alpha} to the correlation

\displaystyle  \int_G \int_G f(x) g(y) h(x+y)\ d\mu_G(x) d\mu_G(y),

the normalised {U^2} Gowers norm

\displaystyle  ( \frac{1}{|A_n|^3} \sum_{x,y,z,w \in A_n^m: x+w=y+z} f_n(x) \overline{f_n(y)} \overline{f_n(z)} f_n(w))^{1/4}

converges along {\alpha} to the {U^2} Gowers norm

\displaystyle  ( \int_{G \times G \times G} f(x) \overline{f(y)} \overline{f(z)} f_n(x+y-z)\ d\mu_G(x) d\mu_G(y) d\mu_G(z))^{1/4}

and so forth. We caution however that some correlations that involve evaluating more than one function at the same point will not necessarily be preserved in the additive limit; for instance the normalised {\ell^2} norm

\displaystyle  (\frac{1}{|A_n|} \sum_{x \in A_n^m} |f_n(x)|^2)^{1/2}

does not necessarily converge to the {L^2} norm

\displaystyle  (\int_G |f(x)|^2\ d\mu_G(x))^{1/2},

but can converge instead to a larger quantity, due to the presence of the orthogonal projection {\pi_*} in the definition (4) of {f}.

An important special case of an additive limit occurs when the functions {f_n\colon A_n^m \rightarrow {\bf C}} involved are indicator functions {f_n = 1_{E_n}} of some subsets {E_n} of {A_n^m}. The additive limit {f \in L} does not necessarily remain an indicator function, but instead takes values in {[0,1]} (much as a graphon {p} takes values in {[0,1]} even though the original indicators {1_{E_n}} take values in {\{0,1\}}). The convolution {f \star f\colon G \rightarrow [0,1]} is then the ultralimit of the normalised convolutions {\frac{1}{|A_n|} 1_{E_n} \star 1_{E_n}}; in particular, the measure of the support of {f \star f} provides a lower bound on the limiting normalised cardinality {\frac{1}{|A_n|} |E_n + E_n|} of a sumset. In many situations this lower bound is an equality, but this is not necessarily the case, because the sumset {2E_n = E_n + E_n} could contain a large number of elements which have very few ({o(|A_n|)}) representations as the sum of two elements of {E_n}, and in the limit these portions of the sumset fall outside of the support of {f \star f}. (One can think of the support of {f \star f} as describing the “essential” sumset of {2E_n = E_n + E_n}, discarding those elements that have only very few representations.) Similarly for higher convolutions of {f}. Thus one can use additive limits to partially control the growth {k E_n} of iterated sumsets of subsets {E_n} of approximate groups {A_n}, in the regime where {k} stays bounded and {n} goes to infinity.

Theorem 1 can be proven by Fourier-analytic means (combined with Freiman’s theorem from additive combinatorics), and we will do so below the fold. For now, we give some illustrative examples of additive limits.

Example 1 (Bohr sets) We take {A_n} to be the intervals {A_n := \{ x \in {\bf Z}: |x| \leq N_n \}}, where {N_n} is a sequence going to infinity; these are {2}-approximate groups for all {n}. Let {\theta} be an irrational real number, let {I} be an interval in {{\bf R}/{\bf Z}}, and for each natural number {n} let {B_n} be the Bohr set

\displaystyle  B_n := \{ x \in A^{(n)}: \theta x \hbox{ mod } 1 \in I \}.

In this case, the (reduced) Kronecker factor {G} can be taken to be the infinite cylinder {{\bf R} \times {\bf R}/{\bf Z}} with the usual Lebesgue measure {\mu_G}. The additive limits of {1_{A_n}} and {1_{B_n}} end up being {1_A} and {1_B}, where {A} is the finite cylinder

\displaystyle  A := \{ (x,t) \in {\bf R} \times {\bf R}/{\bf Z}: x \in [-1,1]\}

and {B} is the rectangle

\displaystyle  B := \{ (x,t) \in {\bf R} \times {\bf R}/{\bf Z}: x \in [-1,1]; t \in I \}.

Geometrically, one should think of {A_n} and {B_n} as being wrapped around the cylinder {{\bf R} \times {\bf R}/{\bf Z}} via the homomorphism {x \mapsto (\frac{x}{N_n}, \theta x \hbox{ mod } 1)}, and then one sees that {B_n} is converging in some normalised weak sense to {B}, and similarly for {A_n} and {A}. In particular, the additive limit predicts the growth rate of the iterated sumsets {kB_n} to be quadratic in {k} until {k|I|} becomes comparable to {1}, at which point the growth transitions to linear growth, in the regime where {k} is bounded and {n} is large.

If {\theta = \frac{p}{q}} were rational instead of irrational, then one would need to replace {{\bf R}/{\bf Z}} by the finite subgroup {\frac{1}{q}{\bf Z}/{\bf Z}} here.

Example 2 (Structured subsets of progressions) We take {A_n} be the rank two progression

\displaystyle  A_n := \{ a + b N_n^2: a,b \in {\bf Z}; |a|, |b| \leq N_n \},

where {N_n} is a sequence going to infinity; these are {4}-approximate groups for all {n}. Let {B_n} be the subset

\displaystyle  B_n := \{ a + b N_n^2: a,b \in {\bf Z}; |a|^2 + |b|^2 \leq N_n^2 \}.

Then the (reduced) Kronecker factor can be taken to be {G = {\bf R}^2} with Lebesgue measure {\mu_G}, and the additive limits of the {1_{A_n}} and {1_{B_n}} are then {1_A} and {1_B}, where {A} is the square

\displaystyle  A := \{ (a,b) \in {\bf R}^2: |a|, |b| \leq 1 \}

and {B} is the circle

\displaystyle  B := \{ (a,b) \in {\bf R}^2: a^2+b^2 \leq 1 \}.

Geometrically, the picture is similar to the Bohr set one, except now one uses a Freiman homomorphism {a + b N_n^2 \mapsto (\frac{a}{N_n}, \frac{b}{N_n})} for {a,b = O( N_n )} to embed the original sets {A_n, B_n} into the plane {{\bf R}^2}. In particular, one now expects the growth rate of the iterated sumsets {k A_n} and {k B_n} to be quadratic in {k}, in the regime where {k} is bounded and {n} is large.

Example 3 (Dissociated sets) Let {d} be a fixed natural number, and take

\displaystyle  A_n = \{0, v_1,\dots,v_d,-v_1,\dots,-v_d \}

where {v_1,\dots,v_d} are randomly chosen elements of a large cyclic group {{\bf Z}/p_n{\bf Z}}, where {p_n} is a sequence of primes going to infinity. These are {O(d)}-approximate groups. The (reduced) Kronecker factor {G} can (almost surely) then be taken to be {{\bf Z}^d} with counting measure, and the additive limit of {1_{A_n}} is {1_A}, where {A = \{ 0, e_1,\dots,e_d,-e_1,\dots,-e_d\}} and {e_1,\dots,e_d} is the standard basis of {{\bf Z}^d}. In particular, the growth rates of {k A_n} should grow approximately like {k^d} for {k} bounded and {n} large.

Example 4 (Random subsets of groups) Let {A_n = G_n} be a sequence of finite additive groups whose order is going to infinity. Let {B_n} be a random subset of {G_n} of some fixed density {0 \leq \lambda \leq 1}. Then (almost surely) the Kronecker factor here can be reduced all the way to the trivial group {\{0\}}, and the additive limit of the {1_{B_n}} is the constant function {\lambda}. The convolutions {\frac{1}{|G_n|} 1_{B_n} * 1_{B_n}} then converge in the ultralimit (modulo almost everywhere equivalence) to the pullback of {\lambda^2}; this reflects the fact that {(1-o(1))|G_n|} of the elements of {G_n} can be represented as the sum of two elements of {B_n} in {(\lambda^2 + o(1)) |G_n|} ways. In particular, {B_n+B_n} occupies a proportion {1-o(1)} of {G_n}.

Example 5 (Trigonometric series) Take {A_n = G_n = {\bf Z}/p_n {\bf C}} for a sequence {p_n} of primes going to infinity, and for each {n} let {\xi_{n,1},\xi_{n,2},\dots} be an infinite sequence of frequencies chosen uniformly and independently from {{\bf Z}/p_n{\bf Z}}. Let {f_n\colon {\bf Z}/p_n{\bf Z} \rightarrow {\bf C}} denote the random trigonometric series

\displaystyle  f_n(x) := \sum_{j=1}^\infty 2^{-j} e^{2\pi i \xi_{n,j} x / p_n }.

Then (almost surely) we can take the reduced Kronecker factor {G} to be the infinite torus {({\bf R}/{\bf Z})^{\bf N}} (with the Haar probability measure {\mu_G}), and the additive limit of the {f_n} then becomes the function {f\colon ({\bf R}/{\bf Z})^{\bf N} \rightarrow {\bf R}} defined by the formula

\displaystyle  f( (x_j)_{j=1}^\infty ) := \sum_{j=1}^\infty e^{2\pi i x_j}.

In fact, the pullback {\pi^* f} is the ultralimit of the {f_n}. As such, for any standard exponent {1 \leq q < \infty}, the normalised {l^q} norm

\displaystyle  (\frac{1}{p_n} \sum_{x \in {\bf Z}/p_n{\bf Z}} |f_n(x)|^q)^{1/q}

can be seen to converge to the limit

\displaystyle  (\int_{({\bf R}/{\bf Z})^{\bf N}} |f(x)|^q\ d\mu_G(x))^{1/q}.

The reader is invited to consider combinations of the above examples, e.g. random subsets of Bohr sets, to get a sense of the general case of Theorem 1.

It is likely that this theorem can be extended to the noncommutative setting, using the noncommutative Freiman theorem of Emmanuel Breuillard, Ben Green, and myself, but I have not attempted to do so here (see though this recent preprint of Anush Tserunyan for some related explorations); in a separate direction, there should be extensions that can control higher Gowers norms, in the spirit of the work of Szegedy.

Note: the arguments below will presume some familiarity with additive combinatorics and with nonstandard analysis, and will be a little sketchy in places.

Read the rest of this entry »

Let {\bar{{\bf Q}}} be the algebraic closure of {{\bf Q}}, that is to say the field of algebraic numbers. We fix an embedding of {\bar{{\bf Q}}} into {{\bf C}}, giving rise to a complex absolute value {z \mapsto |z|} for algebraic numbers {z \in \bar{{\bf Q}}}.

Let {\alpha \in \bar{{\bf Q}}} be of degree {D > 1}, so that {\alpha} is irrational. A classical theorem of Liouville gives the quantitative bound

\displaystyle  |\alpha - \frac{p}{q}| \geq c \frac{1}{|q|^D} \ \ \ \ \ (1)

for the irrationality of {\alpha} fails to be approximated by rational numbers {p/q}, where {c>0} depends on {\alpha,D} but not on {p,q}. Indeed, if one lets {\alpha = \alpha_1, \alpha_2, \dots, \alpha_D} be the Galois conjugates of {\alpha}, then the quantity {\prod_{i=1}^D |q \alpha_i - p|} is a non-zero natural number divided by a constant, and so we have the trivial lower bound

\displaystyle  \prod_{i=1}^D |q \alpha_i - p| \geq c

from which the bound (1) easily follows. A well known corollary of the bound (1) is that Liouville numbers are automatically transcendental.

The famous theorem of Thue, Siegel and Roth improves the bound (1) to

\displaystyle  |\alpha - \frac{p}{q}| \geq c \frac{1}{|q|^{2+\epsilon}} \ \ \ \ \ (2)

for any {\epsilon>0} and rationals {\frac{p}{q}}, where {c>0} depends on {\alpha,\epsilon} but not on {p,q}. Apart from the {\epsilon} in the exponent and the implied constant, this bound is optimal, as can be seen from Dirichlet’s theorem. This theorem is a good example of the ineffectivity phenomenon that affects a large portion of modern number theory: the implied constant in the {\gg} notation is known to be finite, but there is no explicit bound for it in terms of the coefficients of the polynomial defining {\alpha} (in contrast to (1), for which an effective bound may be easily established). This is ultimately due to the reliance on the “dueling conspiracy” (or “repulsion phenomenon”) strategy. We do not as yet have a good way to rule out one counterexample to (2), in which {\frac{p}{q}} is far closer to {\alpha} than {\frac{1}{|q|^{2+\epsilon}}}; however we can rule out two such counterexamples, by playing them off of each other.

A powerful strengthening of the Thue-Siegel-Roth theorem is given by the subspace theorem, first proven by Schmidt and then generalised further by several authors. To motivate the theorem, first observe that the Thue-Siegel-Roth theorem may be rephrased as a bound of the form

\displaystyle  | \alpha p - \beta q | \times | \alpha' p - \beta' q | \geq c (1 + |p| + |q|)^{-\epsilon} \ \ \ \ \ (3)

for any algebraic numbers {\alpha,\beta,\alpha',\beta'} with {(\alpha,\beta)} and {(\alpha',\beta')} linearly independent (over the algebraic numbers), and any {(p,q) \in {\bf Z}^2} and {\epsilon>0}, with the exception when {\alpha,\beta} or {\alpha',\beta'} are rationally dependent (i.e. one is a rational multiple of the other), in which case one has to remove some lines (i.e. subspaces in {{\bf Q}^2}) of rational slope from the space {{\bf Z}^2} of pairs {(p,q)} to which the bound (3) does not apply (namely, those lines for which the left-hand side vanishes). Here {c>0} can depend on {\alpha,\beta,\alpha',\beta',\epsilon} but not on {p,q}. More generally, we have

Theorem 1 (Schmidt subspace theorem) Let {d} be a natural number. Let {L_1,\dots,L_d: \bar{{\bf Q}}^d \rightarrow \bar{{\bf Q}}} be linearly independent linear forms. Then for any {\epsilon>0}, one has the bound

\displaystyle  \prod_{i=1}^d |L_i(x)| \geq c (1 + \|x\| )^{-\epsilon}

for all {x \in {\bf Z}^d}, outside of a finite number of proper subspaces of {{\bf Q}^d}, where

\displaystyle  \| (x_1,\dots,x_d) \| := \max( |x_1|, \dots, |x_d| )

and {c>0} depends on {\epsilon, d} and the {\alpha_{i,j}}, but is independent of {x}.

Being a generalisation of the Thue-Siegel-Roth theorem, it is unsurprising that the known proofs of the subspace theorem are also ineffective with regards to the constant {c}. (However, the number of exceptional subspaces may be bounded effectively; cf. the situation with the Skolem-Mahler-Lech theorem, discussed in this previous blog post.) Once again, the lower bound here is basically sharp except for the {\epsilon} factor and the implied constant: given any {\delta_1,\dots,\delta_d > 0} with {\delta_1 \dots \delta_d = 1}, a simple volume packing argument (the same one used to prove the Dirichlet approximation theorem) shows that for any sufficiently large {N \geq 1}, one can find integers {x_1,\dots,x_d \in [-N,N]}, not all zero, such that

\displaystyle  |L_i(x)| \ll \delta_i

for all {i=1,\dots,d}. Thus one can get {\prod_{i=1}^d |L_i(x)|} comparable to {1} in many different ways.

There are important generalisations of the subspace theorem to other number fields than the rationals (and to other valuations than the Archimedean valuation {z \mapsto |z|}); we will develop one such generalisation below.

The subspace theorem is one of many finiteness theorems in Diophantine geometry; in this case, it is the number of exceptional subspaces which is finite. It turns out that finiteness theorems are very compatible with the language of nonstandard analysis. (See this previous blog post for a review of the basics of nonstandard analysis, and in particular for the nonstandard interpretation of asymptotic notation such as {\ll} and {o()}.) The reason for this is that a standard set {X} is finite if and only if it contains no strictly nonstandard elements (that is to say, elements of {{}^* X \backslash X}). This makes for a clean formulation of finiteness theorems in the nonstandard setting. For instance, the standard form of Bezout’s theorem asserts that if {P(x,y), Q(x,y)} are coprime polynomials over some field, then the curves {\{ (x,y): P(x,y) = 0\}} and {\{ (x,y): Q(x,y)=0\}} intersect in only finitely many points. The nonstandard version of this is then

Theorem 2 (Bezout’s theorem, nonstandard form) Let {P(x,y), Q(x,y)} be standard coprime polynomials. Then there are no strictly nonstandard solutions to {P(x,y)=Q(x,y)=0}.

Now we reformulate Theorem 1 in nonstandard language. We need a definition:

Definition 3 (General position) Let {K \subset L} be nested fields. A point {x = (x_1,\dots,x_d)} in {L^d} is said to be in {K}-general position if it is not contained in any hyperplane of {L^d} definable over {K}, or equivalently if one has

\displaystyle  a_1 x_1 + \dots + a_d x_d = 0 \iff a_1=\dots = a_d = 0

for any {a_1,\dots,a_d \in K}.

Theorem 4 (Schmidt subspace theorem, nonstandard version) Let {d} be a standard natural number. Let {L_1,\dots,L_d: \bar{{\bf Q}}^d \rightarrow \bar{{\bf Q}}} be linearly independent standard linear forms. Let {x \in {}^* {\bf Z}^d} be a tuple of nonstandard integers which is in {{\bf Q}}-general position (in particular, this forces {x} to be strictly nonstandard). Then one has

\displaystyle  \prod_{i=1}^d |L_i(x)| \gg \|x\|^{-o(1)},

where we extend {L_i} from {\bar{{\bf Q}}} to {{}^* \bar{{\bf Q}}} (and also similarly extend {\| \|} from {{\bf Z}^d} to {{}^* {\bf Z}^d}) in the usual fashion.

Observe that (as is usual when translating to nonstandard analysis) some of the epsilons and quantifiers that are present in the standard version become hidden in the nonstandard framework, being moved inside concepts such as “strictly nonstandard” or “general position”. We remark that as {x} is in {{\bf Q}}-general position, it is also in {\bar{{\bf Q}}}-general position (as an easy Galois-theoretic argument shows), and the requirement that the {L_1,\dots,L_d} are linearly independent is thus equivalent to {L_1(x),\dots,L_d(x)} being {\bar{{\bf Q}}}-linearly independent.

Exercise 1 Verify that Theorem 1 and Theorem 4 are equivalent. (Hint: there are only countably many proper subspaces of {{\bf Q}^d}.)

We will not prove the subspace theorem here, but instead focus on a particular application of the subspace theorem, namely to counting integer points on curves. In this paper of Corvaja and Zannier, the subspace theorem was used to give a new proof of the following basic result of Siegel:

Theorem 5 (Siegel’s theorem on integer points) Let {P \in {\bf Q}[x,y]} be an irreducible polynomial of two variables, such that the affine plane curve {C := \{ (x,y): P(x,y)=0\}} either has genus at least one, or has at least three points on the line at infinity, or both. Then {C} has only finitely many integer points {(x,y) \in {\bf Z}^2}.

This is a finiteness theorem, and as such may be easily converted to a nonstandard form:

Theorem 6 (Siegel’s theorem, nonstandard form) Let {P \in {\bf Q}[x,y]} be a standard irreducible polynomial of two variables, such that the affine plane curve {C := \{ (x,y): P(x,y)=0\}} either has genus at least one, or has at least three points on the line at infinity, or both. Then {C} does not contain any strictly nonstandard integer points {(x_*,y_*) \in {}^* {\bf Z}^2 \backslash {\bf Z}^2}.

Note that Siegel’s theorem can fail for genus zero curves that only meet the line at infinity at just one or two points; the key examples here are the graphs {\{ (x,y): y - f(x) = 0\}} for a polynomial {f \in {\bf Z}[x]}, and the Pell equation curves {\{ (x,y): x^2 - dy^2 = 1 \}}. Siegel’s theorem can be compared with the more difficult theorem of Faltings, which establishes finiteness of rational points (not just integer points), but now needs the stricter requirement that the curve {C} has genus at least two (to avoid the additional counterexample of elliptic curves of positive rank, which have infinitely many rational points).

The standard proofs of Siegel’s theorem rely on a combination of the Thue-Siegel-Roth theorem and a number of results on abelian varieties (notably the Mordell-Weil theorem). The Corvaja-Zannier argument rebalances the difficulty of the argument by replacing the Thue-Siegel-Roth theorem by the more powerful subspace theorem (in fact, they need one of the stronger versions of this theorem alluded to earlier), while greatly reducing the reliance on results on abelian varieties. Indeed, for curves with three or more points at infinity, no theory from abelian varieties is needed at all, while for the remaining cases, one mainly needs the existence of the Abel-Jacobi embedding, together with a relatively elementary theorem of Chevalley-Weil which is used in the proof of the Mordell-Weil theorem, but is significantly easier to prove.

The Corvaja-Zannier argument (together with several further applications of the subspace theorem) is presented nicely in this Bourbaki expose of Bilu. To establish the theorem in full generality requires a certain amount of algebraic number theory machinery, such as the theory of valuations on number fields, or of relative discriminants between such number fields. However, the basic ideas can be presented without much of this machinery by focusing on simple special cases of Siegel’s theorem. For instance, we can handle irreducible cubics that meet the line at infinity at exactly three points {[1,\alpha_1,0], [1,\alpha_2,0], [1,\alpha_3,0]}:

Theorem 7 (Siegel’s theorem with three points at infinity) Siegel’s theorem holds when the irreducible polynomial {P(x,y)} takes the form

\displaystyle  P(x,y) = (y - \alpha_1 x) (y - \alpha_2 x) (y - \alpha_3 x) + Q(x,y)

for some quadratic polynomial {Q \in {\bf Q}[x,y]} and some distinct algebraic numbers {\alpha_1,\alpha_2,\alpha_3}.

Proof: We use the nonstandard formalism. Suppose for sake of contradiction that we can find a strictly nonstandard integer point {(x_*,y_*) \in {}^* {\bf Z}^2 \backslash {\bf Z}^2} on a curve {C := \{ (x,y): P(x,y)=0\}} of the indicated form. As this point is infinitesimally close to the line at infinity, {y_*/x_*} must be infinitesimally close to one of {\alpha_1,\alpha_2,\alpha_3}; without loss of generality we may assume that {y_*/x_*} is infinitesimally close to {\alpha_1}.

We now use a version of the polynomial method, to find some polynomials of controlled degree that vanish to high order on the “arm” of the cubic curve {C} that asymptotes to {[1,\alpha_1,0]}. More precisely, let {D \geq 3} be a large integer (actually {D=3} will already suffice here), and consider the {\bar{{\bf Q}}}-vector space {V} of polynomials {R(x,y) \in \bar{{\bf Q}}[x,y]} of degree at most {D}, and of degree at most {2} in the {y} variable; this space has dimension {3D}. Also, as one traverses the arm {y/x \rightarrow \alpha_1} of {C}, any polynomial {R} in {V} grows at a rate of at most {D}, that is to say {R} has a pole of order at most {D} at the point at infinity {[1,\alpha_1,0]}. By performing Laurent expansions around this point (which is a non-singular point of {C}, as the {\alpha_i} are assumed to be distinct), we may thus find a basis {R_1, \dots, R_{3D}} of {V}, with the property that {R_j} has a pole of order at most {D+1-j} at {[1,\alpha_1,0]} for each {j=1,\dots,3D}.

From the control of the pole at {[1,\alpha_1,0]}, we have

\displaystyle  |R_j(x_*,y_*)| \ll (|x_*|+|y_*|)^{D+1-j}

for all {j=1,\dots,3D}. The exponents here become negative for {j > D+1}, and on multiplying them all together we see that

\displaystyle  \prod_{j=1}^{3D} |R_j(x_*,y_*)| \ll (|x_*|+|y_*|)^{3D(D+1) - \frac{3D(3D+1)}{2}}.

This exponent is negative for {D} large enough (or just take {D=3}). If we expand

\displaystyle  R_j(x_*,y_*) = \sum_{a+b \leq D; b \leq 2} \alpha_{j,a,b} x_*^a y_*^b

for some algebraic numbers {\alpha_{j,a,b}}, then we thus have

\displaystyle  \prod_{j=1}^{3D} |\sum_{a+b \leq D; b \leq 2} \alpha_{j,a,b} x_*^a y_*^b| \ll (|x_*|+|y_*|)^{-\epsilon}

for some standard {\epsilon>0}. Note that the {3D}-dimensional vectors {(\alpha_{j,a,b})_{a+b \leq D; b \leq 2}} are linearly independent in {{\bf C}^{3D}}, because the {R_j} are linearly independent in {V}. Applying the Schmidt subspace theorem in the contrapositive, we conclude that the {3D}-tuple {( x_*^a y_*^b )_{a+b \leq D; b \leq 2} \in {}^* {\bf Z}^{3D}} is not in {{\bf Q}}-general position. That is to say, one has a non-trivial constraint of the form

\displaystyle  \sum_{a+b \leq D; b \leq 2} c_{a,b} x_*^a y_*^b = 0 \ \ \ \ \ (4)

for some standard rational coefficients {c_{a,b}}, not all zero. But, as {P} is irreducible and cubic in {y}, it has no common factor with the standard polynomial {\sum_{a+b \leq D; b \leq 2} c_{a,b} x^a y^b}, so by Bezout’s theorem (Theorem 2) the constraint (4) only has standard solutions, contradicting the strictly nonstandard nature of {(x_*,y_*)}. \Box

Exercise 2 Rewrite the above argument so that it makes no reference to nonstandard analysis. (In this case, the rewriting is quite straightforward; however, there will be a subsequent argument in which the standard version is significantly messier than the nonstandard counterpart, which is the reason why I am working with the nonstandard formalism in this blog post.)

A similar argument works for higher degree curves that meet the line at infinity in three or more points, though if the curve has singularities at infinity then it becomes convenient to rely on the Riemann-Roch theorem to control the dimension of the analogue of the space {V}. Note that when there are only two or fewer points at infinity, though, one cannot get the negative exponent of {-\epsilon} needed to usefully apply the subspace theorem. To deal with this case we require some additional tricks. For simplicity we focus on the case of Mordell curves, although it will be convenient to work with more general number fields {{\bf Q} \subset K \subset \bar{{\bf Q}}} than the rationals:

Theorem 8 (Siegel’s theorem for Mordell curves) Let {k} be a non-zero integer. Then there are only finitely many integer solutions {(x,y) \in {\bf Z}^2} to {y^2 - x^3 = k}. More generally, for any number field {K}, and any nonzero {k \in K}, there are only finitely many algebraic integer solutions {(x,y) \in {\mathcal O}_K^2} to {y^2-x^3=k}, where {{\mathcal O}_K} is the ring of algebraic integers in {K}.

Again, we will establish the nonstandard version. We need some additional notation:

Definition 9

  • We define an almost rational integer to be a nonstandard {x \in {}^* {\bf Q}} such that {Mx \in {}^* {\bf Z}} for some standard positive integer {M}, and write {{\bf Q} {}^* {\bf Z}} for the {{\bf Q}}-algebra of almost rational integers.
  • If {K} is a standard number field, we define an almost {K}-integer to be a nonstandard {x \in {}^* K} such that {Mx \in {}^* {\mathcal O}_K} for some standard positive integer {M}, and write {K {}^* {\bf Z} = K {\mathcal O}_K} for the {K}-algebra of almost {K}-integers.
  • We define an almost algebraic integer to be a nonstandard {x \in {}^* {\bar Q}} such that {Mx} is a nonstandard algebraic integer for some standard positive integer {M}, and write {\bar{{\bf Q}} {}^* {\bf Z}} for the {\bar{{\bf Q}}}-algebra of almost algebraic integers.
  • Theorem 10 (Siegel for Mordell, nonstandard version) Let {k} be a non-zero standard algebraic number. Then the curve {\{ (x,y): y^2 - x^3 = k \}} does not contain any strictly nonstandard almost algebraic integer point.

    Another way of phrasing this theorem is that if {x,y} are strictly nonstandard almost algebraic integers, then {y^2-x^3} is either strictly nonstandard or zero.

    Exercise 3 Verify that Theorem 8 and Theorem 10 are equivalent.

    Due to all the ineffectivity, our proof does not supply any bound on the solutions {x,y} in terms of {k}, even if one removes all references to nonstandard analysis. It is a conjecture of Hall (a special case of the notorious ABC conjecture) that one has the bound {|x| \ll_\epsilon |k|^{2+\epsilon}} for all {\epsilon>0} (or equivalently {|y| \ll_\epsilon |k|^{3+\epsilon}}), but even the weaker conjecture that {x,y} are of polynomial size in {k} is open. (The best known bounds are of exponential nature, and are proven using a version of Baker’s method: see for instance this text of Sprindzuk.)

    A direct repetition of the arguments used to prove Theorem 7 will not work here, because the Mordell curve {\{ (x,y): y^2 - x^3 = k \}} only hits the line at infinity at one point, {[0,1,0]}. To get around this we will exploit the fact that the Mordell curve is an elliptic curve and thus has a group law on it. We will then divide all the integer points on this curve by two; as elliptic curves have four 2-torsion points, this will end up placing us in a situation like Theorem 7, with four points at infinity. However, there is an obstruction: it is not obvious that dividing an integer point on the Mordell curve by two will produce another integer point. However, this is essentially true (after enlarging the ring of integers slightly) thanks to a general principle of Chevalley and Weil, which can be worked out explicitly in the case of division by two on Mordell curves by relatively elementary means (relying mostly on unique factorisation of ideals of algebraic integers). We give the details below the fold.

    Read the rest of this entry »

    There are a number of ways to construct the real numbers {{\bf R}}, for instance

    • as the metric completion of {{\bf Q}} (thus, {{\bf R}} is defined as the set of Cauchy sequences of rationals, modulo Cauchy equivalence);
    • as the space of Dedekind cuts on the rationals {{\bf Q}};
    • as the space of quasimorphisms {\phi: {\bf Z} \rightarrow {\bf Z}} on the integers, quotiented by bounded functions. (I believe this construction first appears in this paper of Street, who credits the idea to Schanuel, though the germ of this construction arguably goes all the way back to Eudoxus.)

    There is also a fourth family of constructions that proceeds via nonstandard analysis, as a special case of what is known as the nonstandard hull construction. (Here I will assume some basic familiarity with nonstandard analysis and ultraproducts, as covered for instance in this previous blog post.) Given an unbounded nonstandard natural number {N \in {}^* {\bf N} \backslash {\bf N}}, one can define two external additive subgroups of the nonstandard integers {{}^* {\bf Z}}:

    • The group {O(N) := \{ n \in {}^* {\bf Z}: |n| \leq CN \hbox{ for some } C \in {\bf N} \}} of all nonstandard integers of magnitude less than or comparable to {N}; and
    • The group {o(N) := \{ n \in {}^* {\bf Z}: |n| \leq C^{-1} N \hbox{ for all } C \in {\bf N} \}} of nonstandard integers of magnitude infinitesimally smaller than {N}.

    The group {o(N)} is a subgroup of {O(N)}, so we may form the quotient group {O(N)/o(N)}. This space is isomorphic to the reals {{\bf R}}, and can in fact be used to construct the reals:

    Proposition 1 For any coset {n + o(N)} of {O(N)/o(N)}, there is a unique real number {\hbox{st} \frac{n}{N}} with the property that {\frac{n}{N} = \hbox{st} \frac{n}{N} + o(1)}. The map {n + o(N) \mapsto \hbox{st} \frac{n}{N}} is then an isomorphism between the additive groups {O(N)/o(N)} and {{\bf R}}.

    Proof: Uniqueness is clear. For existence, observe that the set {\{ x \in {\bf R}: Nx \leq n + o(N) \}} is a Dedekind cut, and its supremum can be verified to have the required properties for {\hbox{st} \frac{n}{N}}. \Box

    In a similar vein, we can view the unit interval {[0,1]} in the reals as the quotient

    \displaystyle  [0,1] \equiv [N] / o(N) \ \ \ \ \ (1)

    where {[N]} is the nonstandard (i.e. internal) set {\{ n \in {\bf N}: n \leq N \}}; of course, {[N]} is not a group, so one should interpret {[N]/o(N)} as the image of {[N]} under the quotient map {{}^* {\bf Z} \rightarrow {}^* {\bf Z} / o(N)} (or {O(N) \rightarrow O(N)/o(N)}, if one prefers). Or to put it another way, (1) asserts that {[0,1]} is the image of {[N]} with respect to the map {\pi: n \mapsto \hbox{st} \frac{n}{N}}.

    In this post I would like to record a nice measure-theoretic version of the equivalence (1), which essentially appears already in standard texts on Loeb measure (see e.g. this text of Cutland). To describe the results, we must first quickly recall the construction of Loeb measure on {[N]}. Given an internal subset {A} of {[N]}, we may define the elementary measure {\mu_0(A)} of {A} by the formula

    \displaystyle  \mu_0(A) := \hbox{st} \frac{|A|}{N}.

    This is a finitely additive probability measure on the Boolean algebra of internal subsets of {[N]}. We can then construct the Loeb outer measure {\mu^*(A)} of any subset {A \subset [N]} in complete analogy with Lebesgue outer measure by the formula

    \displaystyle  \mu^*(A) := \inf \sum_{n=1}^\infty \mu_0(A_n)

    where {(A_n)_{n=1}^\infty} ranges over all sequences of internal subsets of {[N]} that cover {A}. We say that a subset {A} of {[N]} is Loeb measurable if, for any (standard) {\epsilon>0}, one can find an internal subset {B} of {[N]} which differs from {A} by a set of Loeb outer measure at most {\epsilon}, and in that case we define the Loeb measure {\mu(A)} of {A} to be {\mu^*(A)}. It is a routine matter to show (e.g. using the Carathéodory extension theorem) that the space {{\mathcal L}} of Loeb measurable sets is a {\sigma}-algebra, and that {\mu} is a countably additive probability measure on this space that extends the elementary measure {\mu_0}. Thus {[N]} now has the structure of a probability space {([N], {\mathcal L}, \mu)}.

    Now, the group {o(N)} acts (Loeb-almost everywhere) on the probability space {[N]} by the addition map, thus {T^h n := n+h} for {n \in [N]} and {h \in o(N)} (excluding a set of Loeb measure zero where {n+h} exits {[N]}). This action is clearly seen to be measure-preserving. As such, we can form the invariant factor {Z^0_{o(N)}([N]) = ([N], {\mathcal L}^{o(N)}, \mu\downharpoonright_{{\mathcal L}^{o(N)}})}, defined by restricting attention to those Loeb measurable sets {A \subset [N]} with the property that {T^h A} is equal {\mu}-almost everywhere to {A} for each {h \in o(N)}.

    The claim is then that this invariant factor is equivalent (up to almost everywhere equivalence) to the unit interval {[0,1]} with Lebesgue measure {m} (and the trivial action of {o(N)}), by the same factor map {\pi: n \mapsto \hbox{st} \frac{n}{N}} used in (1). More precisely:

    Theorem 2 Given a set {A \in {\mathcal L}^{o(N)}}, there exists a Lebesgue measurable set {B \subset [0,1]}, unique up to {m}-a.e. equivalence, such that {A} is {\mu}-a.e. equivalent to the set {\pi^{-1}(B) := \{ n \in [N]: \hbox{st} \frac{n}{N} \in B \}}. Conversely, if {B \in [0,1]} is Lebesgue measurable, then {\pi^{-1}(B)} is in {{\mathcal L}^{o(N)}}, and {\mu( \pi^{-1}(B) ) = m( B )}.

    More informally, we have the measure-theoretic version

    \displaystyle  [0,1] \equiv Z^0_{o(N)}( [N] )

    of (1).

    Proof: We first prove the converse. It is clear that {\pi^{-1}(B)} is {o(N)}-invariant, so it suffices to show that {\pi^{-1}(B)} is Loeb measurable with Loeb measure {m(B)}. This is easily verified when {B} is an elementary set (a finite union of intervals). By countable subadditivity of outer measure, this implies that Loeb outer measure of {\pi^{-1}(E)} is bounded by the Lebesgue outer measure of {E} for any set {E \subset [0,1]}; since every Lebesgue measurable set differs from an elementary set by a set of arbitrarily small Lebesgue outer measure, the claim follows.

    Now we establish the forward claim. Uniqueness is clear from the converse claim, so it suffices to show existence. Let {A \in {\mathcal L}^{o(N)}}. Let {\epsilon>0} be an arbitrary standard real number, then we can find an internal set {A_\epsilon \subset [N]} which differs from {A} by a set of Loeb measure at most {\epsilon}. As {A} is {o(N)}-invariant, we conclude that for every {h \in o(N)}, {A_\epsilon} and {T^h A_\epsilon} differ by a set of Loeb measure (and hence elementary measure) at most {2\epsilon}. By the (contrapositive of the) underspill principle, there must exist a standard {\delta>0} such that {A_\epsilon} and {T^h A_\epsilon} differ by a set of elementary measure at most {2\epsilon} for all {|h| \leq \delta N}. If we then define the nonstandard function {f_\epsilon: [N] \rightarrow {}^* {\bf R}} by the formula

    \displaystyle  f(n) := \hbox{st} \frac{1}{\delta N} \sum_{m \in [N]: m \leq n \leq m+\delta N} 1_{A_\epsilon}(m),

    then from the (nonstandard) triangle inequality we have

    \displaystyle  \frac{1}{N} \sum_{n \in [N]} |f(n) - 1_{A_\epsilon}(n)| \leq 3\epsilon

    (say). On the other hand, {f} has the Lipschitz continuity property

    \displaystyle  |f(n)-f(m)| \leq \frac{2|n-m|}{\delta N}

    and so in particular we see that

    \displaystyle  \hbox{st} f(n) = \tilde f( \hbox{st} \frac{n}{N} )

    for some Lipschitz continuous function {\tilde f: [0,1] \rightarrow [0,1]}. If we then let {E_\epsilon} be the set where {\tilde f \geq 1 - \sqrt{\epsilon}}, one can check that {A_\epsilon} differs from {\pi^{-1}(E_\epsilon)} by a set of Loeb outer measure {O(\sqrt{\epsilon})}, and hence {A} does so also. Sending {\epsilon} to zero, we see (from the converse claim) that {1_{E_\epsilon}} is a Cauchy sequence in {L^1} and thus converges in {L^1} for some Lebesgue measurable {E}. The sets {A_\epsilon} then converge in Loeb outer measure to {\pi^{-1}(E)}, giving the claim. \Box

    Thanks to the Lebesgue differentiation theorem, the conditional expectation {{\bf E}( f | Z^0_{o(N)}([N]))} of a bounded Loeb-measurable function {f: [N] \rightarrow {\bf R}} can be expressed (as a function on {[0,1]}, defined {m}-a.e.) as

    \displaystyle  {\bf E}( f | Z^0_{o(N)}([N]))(x) := \lim_{\epsilon \rightarrow 0} \frac{1}{2\epsilon} \int_{[x-\epsilon N,x+\epsilon N]} f\ d\mu.

    By the abstract ergodic theorem from the previous post, one can also view this conditional expectation as the element in the closed convex hull of the shifts {T^h f}, {h = o(N)} of minimal {L^2} norm. In particular, we obtain a form of the von Neumann ergodic theorem in this context: the averages {\frac{1}{H} \sum_{h=1}^H T^h f} for {H=O(N)} converge (as a net, rather than a sequence) in {L^2} to {{\bf E}( f | Z^0_{o(N)}([N]))}.

    If {f: [N] \rightarrow [-1,1]} is (the standard part of) an internal function, that is to say the ultralimit of a sequence {f_n: [N_n] \rightarrow [-1,1]} of finitary bounded functions, one can view the measurable function {F := {\bf E}( f | Z^0_{o(N)}([N]))} as a limit of the {f_n} that is analogous to the “graphons” that emerge as limits of graphs (see e.g. the recent text of Lovasz on graph limits). Indeed, the measurable function {F: [0,1] \rightarrow [-1,1]} is related to the discrete functions {f_n: [N_n] \rightarrow [-1,1]} by the formula

    \displaystyle  \int_a^b F(x)\ dx = \hbox{st} \lim_{n \rightarrow p} \frac{1}{N_n} \sum_{a N_n \leq m \leq b N_n} f_n(m)

    for all {0 \leq a < b \leq 1}, where {p} is the nonprincipal ultrafilter used to define the nonstandard universe. In particular, from the Arzela-Ascoli diagonalisation argument there is a subsequence {n_j} such that

    \displaystyle  \int_a^b F(x)\ dx = \lim_{j \rightarrow \infty} \frac{1}{N_{n_j}} \sum_{a N_{n_j} \leq m \leq b N_{n_j}} f_n(m),

    thus {F} is the asymptotic density function of the {f_n}. For instance, if {f_n} is the indicator function of a randomly chosen subset of {[N_n]}, then the asymptotic density function would equal {1/2} (almost everywhere, at least).

    I’m continuing to look into understanding the ergodic theory of {o(N)} actions, as I believe this may allow one to apply ergodic theory methods to the “single-scale” or “non-asymptotic” setting (in which one averages only over scales comparable to a large parameter {N}, rather than the traditional asymptotic approach of letting the scale go to infinity). I’m planning some further posts in this direction, though this is still a work in progress.

    (This is an extended blog post version of my talk “Ultraproducts as a Bridge Between Discrete and Continuous Analysis” that I gave at the Simons institute for the theory of computing at the workshop “Neo-Classical methods in discrete analysis“. Some of the material here is drawn from previous blog posts, notably “Ultraproducts as a bridge between hard analysis and soft analysis” and “Ultralimit analysis and quantitative algebraic geometry“‘. The text here has substantially more details than the talk; one may wish to skip all of the proofs given here to obtain a closer approximation to the original talk.)

    Discrete analysis, of course, is primarily interested in the study of discrete (or “finitary”) mathematical objects: integers, rational numbers (which can be viewed as ratios of integers), finite sets, finite graphs, finite or discrete metric spaces, and so forth. However, many powerful tools in mathematics (e.g. ergodic theory, measure theory, topological group theory, algebraic geometry, spectral theory, etc.) work best when applied to continuous (or “infinitary”) mathematical objects: real or complex numbers, manifolds, algebraic varieties, continuous topological or metric spaces, etc. In order to apply results and ideas from continuous mathematics to discrete settings, there are basically two approaches. One is to directly discretise the arguments used in continuous mathematics, which often requires one to keep careful track of all the bounds on various quantities of interest, particularly with regard to various error terms arising from discretisation which would otherwise have been negligible in the continuous setting. The other is to construct continuous objects as limits of sequences of discrete objects of interest, so that results from continuous mathematics may be applied (often as a “black box”) to the continuous limit, which then can be used to deduce consequences for the original discrete objects which are quantitative (though often ineffectively so). The latter approach is the focus of this current talk.

    The following table gives some examples of a discrete theory and its continuous counterpart, together with a limiting procedure that might be used to pass from the former to the latter:

    (Discrete) (Continuous) (Limit method)
    Ramsey theory Topological dynamics Compactness
    Density Ramsey theory Ergodic theory Furstenberg correspondence principle
    Graph/hypergraph regularity Measure theory Graph limits
    Polynomial regularity Linear algebra Ultralimits
    Structural decompositions Hilbert space geometry Ultralimits
    Fourier analysis Spectral theory Direct and inverse limits
    Quantitative algebraic geometry Algebraic geometry Schemes
    Discrete metric spaces Continuous metric spaces Gromov-Hausdorff limits
    Approximate group theory Topological group theory Model theory

    As the above table illustrates, there are a variety of different ways to form a limiting continuous object. Roughly speaking, one can divide limits into three categories:

    • Topological and metric limits. These notions of limits are commonly used by analysts. Here, one starts with a sequence (or perhaps a net) of objects {x_n} in a common space {X}, which one then endows with the structure of a topological space or a metric space, by defining a notion of distance between two points of the space, or a notion of open neighbourhoods or open sets in the space. Provided that the sequence or net is convergent, this produces a limit object {\lim_{n \rightarrow \infty} x_n}, which remains in the same space, and is “close” to many of the original objects {x_n} with respect to the given metric or topology.
    • Categorical limits. These notions of limits are commonly used by algebraists. Here, one starts with a sequence (or more generally, a diagram) of objects {x_n} in a category {X}, which are connected to each other by various morphisms. If the ambient category is well-behaved, one can then form the direct limit {\varinjlim x_n} or the inverse limit {\varprojlim x_n} of these objects, which is another object in the same category {X}, and is connected to the original objects {x_n} by various morphisms.
    • Logical limits. These notions of limits are commonly used by model theorists. Here, one starts with a sequence of objects {x_{\bf n}} or of spaces {X_{\bf n}}, each of which is (a component of) a model for given (first-order) mathematical language (e.g. if one is working in the language of groups, {X_{\bf n}} might be groups and {x_{\bf n}} might be elements of these groups). By using devices such as the ultraproduct construction, or the compactness theorem in logic, one can then create a new object {\lim_{{\bf n} \rightarrow \alpha} x_{\bf n}} or a new space {\prod_{{\bf n} \rightarrow \alpha} X_{\bf n}}, which is still a model of the same language (e.g. if the spaces {X_{\bf n}} were all groups, then the limiting space {\prod_{{\bf n} \rightarrow \alpha} X_{\bf n}} will also be a group), and is “close” to the original objects or spaces in the sense that any assertion (in the given language) that is true for the limiting object or space, will also be true for many of the original objects or spaces, and conversely. (For instance, if {\prod_{{\bf n} \rightarrow \alpha} X_{\bf n}} is an abelian group, then the {X_{\bf n}} will also be abelian groups for many {{\bf n}}.)

    The purpose of this talk is to highlight the third type of limit, and specifically the ultraproduct construction, as being a “universal” limiting procedure that can be used to replace most of the limits previously mentioned. Unlike the topological or metric limits, one does not need the original objects {x_{\bf n}} to all lie in a common space {X} in order to form an ultralimit {\lim_{{\bf n} \rightarrow \alpha} x_{\bf n}}; they are permitted to lie in different spaces {X_{\bf n}}; this is more natural in many discrete contexts, e.g. when considering graphs on {{\bf n}} vertices in the limit when {{\bf n}} goes to infinity. Also, no convergence properties on the {x_{\bf n}} are required in order for the ultralimit to exist. Similarly, ultraproduct limits differ from categorical limits in that no morphisms between the various spaces {X_{\bf n}} involved are required in order to construct the ultraproduct.

    With so few requirements on the objects {x_{\bf n}} or spaces {X_{\bf n}}, the ultraproduct construction is necessarily a very “soft” one. Nevertheless, the construction has two very useful properties which make it particularly useful for the purpose of extracting good continuous limit objects out of a sequence of discrete objects. First of all, there is Łos’s theorem, which roughly speaking asserts that any first-order sentence which is asymptotically obeyed by the {x_{\bf n}}, will be exactly obeyed by the limit object {\lim_{{\bf n} \rightarrow \alpha} x_{\bf n}}; in particular, one can often take a discrete sequence of “partial counterexamples” to some assertion, and produce a continuous “complete counterexample” that same assertion via an ultraproduct construction; taking the contrapositives, one can often then establish a rigorous equivalence between a quantitative discrete statement and its qualitative continuous counterpart. Secondly, there is the countable saturation property that ultraproducts automatically enjoy, which is a property closely analogous to that of compactness in topological spaces, and can often be used to ensure that the continuous objects produced by ultraproduct methods are “complete” or “compact” in various senses, which is particularly useful in being able to upgrade qualitative (or “pointwise”) bounds to quantitative (or “uniform”) bounds, more or less “for free”, thus reducing significantly the burden of “epsilon management” (although the price one pays for this is that one needs to pay attention to which mathematical objects of study are “standard” and which are “nonstandard”). To achieve this compactness or completeness, one sometimes has to restrict to the “bounded” portion of the ultraproduct, and it is often also convenient to quotient out the “infinitesimal” portion in order to complement these compactness properties with a matching “Hausdorff” property, thus creating familiar examples of continuous spaces, such as locally compact Hausdorff spaces.

    Ultraproducts are not the only logical limit in the model theorist’s toolbox, but they are one of the simplest to set up and use, and already suffice for many of the applications of logical limits outside of model theory. In this post, I will set out the basic theory of these ultraproducts, and illustrate how they can be used to pass between discrete and continuous theories in each of the examples listed in the above table.

    Apart from the initial “one-time cost” of setting up the ultraproduct machinery, the main loss one incurs when using ultraproduct methods is that it becomes very difficult to extract explicit quantitative bounds from results that are proven by transferring qualitative continuous results to the discrete setting via ultraproducts. However, in many cases (particularly those involving regularity-type lemmas) the bounds are already of tower-exponential type or worse, and there is arguably not much to be lost by abandoning the explicit quantitative bounds altogether.

    Read the rest of this entry »

    The rectification principle in arithmetic combinatorics asserts, roughly speaking, that very small subsets (or, alternatively, small structured subsets) of an additive group or a field of large characteristic can be modeled (for the purposes of arithmetic combinatorics) by subsets of a group or field of zero characteristic, such as the integers {{\bf Z}} or the complex numbers {{\bf C}}. The additive form of this principle is known as the Freiman rectification principle; it has several formulations, going back of course to the original work of Freiman. Here is one formulation as given by Bilu, Lev, and Ruzsa:

    Proposition 1 (Additive rectification) Let {A} be a subset of the additive group {{\bf Z}/p{\bf Z}} for some prime {p}, and let {s \geq 1} be an integer. Suppose that {|A| \leq \log_{2s} p}. Then there exists a map {\phi: A \rightarrow A'} into a subset {A'} of the integers which is a Freiman isomorphism of order {s} in the sense that for any {x_1,\ldots,x_s,y_1,\ldots,y_s \in A}, one has

    \displaystyle  x_1+\ldots+x_s = y_1+\ldots+y_s

    if and only if

    \displaystyle  \phi(x_1)+\ldots+\phi(x_s) = \phi(y_1)+\ldots+\phi(y_s).

    Furthermore {\phi} is a right-inverse of the obvious projection homomorphism from {{\bf Z}} to {{\bf Z}/p{\bf Z}}.

    The original version of the rectification principle allowed the sets involved to be substantially larger in size (cardinality up to a small constant multiple of {p}), but with the additional hypothesis of bounded doubling involved; see the above-mentioned papers, as well as this later paper of Green and Ruzsa, for further discussion.

    The proof of Proposition 1 is quite short (see Theorem 3.1 of Bilu-Lev-Ruzsa); the main idea is to use Minkowski’s theorem to find a non-trivial dilate {aA} of {A} that is contained in a small neighbourhood of the origin in {{\bf Z}/p{\bf Z}}, at which point the rectification map {\phi} can be constructed by hand.

    Very recently, Codrut Grosu obtained an arithmetic analogue of the above theorem, in which the rectification map {\phi} preserves both additive and multiplicative structure:

    Theorem 2 (Arithmetic rectification) Let {A} be a subset of the finite field {{\bf F}_p} for some prime {p \geq 3}, and let {s \geq 1} be an integer. Suppose that {|A| < \log_2 \log_{2s} \log_{2s^2} p - 1}. Then there exists a map {\phi: A \rightarrow A'} into a subset {A'} of the complex numbers which is a Freiman field isomorphism of order {s} in the sense that for any {x_1,\ldots,x_n \in A} and any polynomial {P(x_1,\ldots,x_n)} of degree at most {s} and integer coefficients of magnitude summing to at most {s}, one has

    \displaystyle  P(x_1,\ldots,x_n)=0

    if and only if

    \displaystyle  P(\phi(x_1),\ldots,\phi(x_n))=0.

    Note that it is necessary to use an algebraically closed field such as {{\bf C}} for this theorem, in contrast to the integers used in Proposition 1, as {{\bf F}_p} can contain objects such as square roots of {-1} which can only map to {\pm i} in the complex numbers (once {s} is at least {2}).

    Using Theorem 2, one can transfer results in arithmetic combinatorics (e.g. sum-product or Szemerédi-Trotter type theorems) regarding finite subsets of {{\bf C}} to analogous results regarding sufficiently small subsets of {{\bf F}_p}; see the paper of Grosu for several examples of this. This should be compared with the paper of Vu, Wood, and Wood, which introduces a converse principle that embeds finite subsets of {{\bf C}} (or more generally, a characteristic zero integral domain) in a Freiman field-isomorphic fashion into finite subsets of {{\bf F}_p} for arbitrarily large primes {p}, allowing one to transfer arithmetic combinatorical facts from the latter setting to the former.

    Grosu’s argument uses some quantitative elimination theory, and in particular a quantitative variant of a lemma of Chang that was discussed previously on this blog. In that previous blog post, it was observed that (an ineffective version of) Chang’s theorem could be obtained using only qualitative algebraic geometry (as opposed to quantitative algebraic geometry tools such as elimination theory results with explicit bounds) by means of nonstandard analysis (or, in what amounts to essentially the same thing in this context, the use of ultraproducts). One can then ask whether one can similarly establish an ineffective version of Grosu’s result by nonstandard means. The purpose of this post is to record that this can indeed be done without much difficulty, though the result obtained, being ineffective, is somewhat weaker than that in Theorem 2. More precisely, we obtain

    Theorem 3 (Ineffective arithmetic rectification) Let {s, n \geq 1}. Then if {{\bf F}} is a field of characteristic at least {C_{s,n}} for some {C_{s,n}} depending on {s,n}, and {A} is a subset of {{\bf F}} of cardinality {n}, then there exists a map {\phi: A \rightarrow A'} into a subset {A'} of the complex numbers which is a Freiman field isomorphism of order {s}.

    Our arguments will not provide any effective bound on the quantity {C_{s,n}} (though one could in principle eventually extract such a bound by deconstructing the proof of Proposition 4 below), making this result weaker than Theorem 2 (save for the minor generalisation that it can handle fields of prime power order as well as fields of prime order as long as the characteristic remains large).

    Following the principle that ultraproducts can be used as a bridge to connect quantitative and qualitative results (as discussed in these previous blog posts), we will deduce Theorem 3 from the following (well-known) qualitative version:

    Proposition 4 (Baby Lefschetz principle) Let {k} be a field of characteristic zero that is finitely generated over the rationals. Then there is an isomorphism {\phi: k \rightarrow \phi(k)} from {k} to a subfield {\phi(k)} of {{\bf C}}.

    This principle (first laid out in an appendix of Lefschetz’s book), among other things, often allows one to use the methods of complex analysis (e.g. Riemann surface theory) to study many other fields of characteristic zero. There are many variants and extensions of this principle; see for instance this MathOverflow post for some discussion of these. I used this baby version of the Lefschetz principle recently in a paper on expanding polynomial maps.

    Proof: We give two proofs of this fact, one using transcendence bases and the other using Hilbert’s nullstellensatz.

    We begin with the former proof. As {k} is finitely generated over {{\bf Q}}, it has finite transcendence degree, thus one can find algebraically independent elements {x_1,\ldots,x_m} of {k} over {{\bf Q}} such that {k} is a finite extension of {{\bf Q}(x_1,\ldots,x_m)}, and in particular by the primitive element theorem {k} is generated by {{\bf Q}(x_1,\ldots,x_m)} and an element {\alpha} which is algebraic over {{\bf Q}(x_1,\ldots,x_m)}. (Here we use the fact that characteristic zero fields are separable.) If we then define {\phi} by first mapping {x_1,\ldots,x_m} to generic (and thus algebraically independent) complex numbers {z_1,\ldots,z_m}, and then setting {\phi(\alpha)} to be a complex root of of the minimal polynomial for {\alpha} over {{\bf Q}(x_1,\ldots,x_m)} after replacing each {x_i} with the complex number {z_i}, we obtain a field isomorphism {\phi: k \rightarrow \phi(k)} with the required properties.

    Now we give the latter proof. Let {x_1,\ldots,x_m} be elements of {k} that generate that field over {{\bf Q}}, but which are not necessarily algebraically independent. Our task is then equivalent to that of finding complex numbers {z_1,\ldots,z_m} with the property that, for any polynomial {P(x_1,\ldots,x_m)} with rational coefficients, one has

    \displaystyle  P(x_1,\ldots,x_m) = 0

    if and only if

    \displaystyle  P(z_1,\ldots,z_m) = 0.

    Let {{\mathcal P}} be the collection of all polynomials {P} with rational coefficients with {P(x_1,\ldots,x_m)=0}, and {{\mathcal Q}} be the collection of all polynomials {P} with rational coefficients with {P(x_1,\ldots,x_m) \neq 0}. The set

    \displaystyle  S := \{ (z_1,\ldots,z_m) \in {\bf C}^m: P(z_1,\ldots,z_m)=0 \hbox{ for all } P \in {\mathcal P} \}

    is the intersection of countably many algebraic sets and is thus also an algebraic set (by the Hilbert basis theorem or the Noetherian property of algebraic sets). If the desired claim failed, then {S} could be covered by the algebraic sets {\{ (z_1,\ldots,z_m) \in {\bf C}^m: Q(z_1,\ldots,z_m) = 0 \}} for {Q \in {\mathcal Q}}. By decomposing into irreducible varieties and observing (e.g. from the Baire category theorem) that a variety of a given dimension over {{\bf C}} cannot be covered by countably many varieties of smaller dimension, we conclude that {S} must in fact be covered by a finite number of such sets, thus

    \displaystyle  S \subset \bigcup_{i=1}^n \{ (z_1,\ldots,z_m) \in {\bf C}^m: Q_i(z_1,\ldots,z_m) = 0 \}

    for some {Q_1,\ldots,Q_n \in {\bf C}^m}. By the nullstellensatz, we thus have an identity of the form

    \displaystyle  (Q_1 \ldots Q_n)^l = P_1 R_1 + \ldots + P_r R_r

    for some natural numbers {l,r \geq 1}, polynomials {P_1,\ldots,P_r \in {\mathcal P}}, and polynomials {R_1,\ldots,R_r} with coefficients in {\overline{{\bf Q}}}. In particular, this identity also holds in the algebraic closure {\overline{k}} of {k}. Evaluating this identity at {(x_1,\ldots,x_m)} we see that the right-hand side is zero but the left-hand side is non-zero, a contradiction, and the claim follows. \Box

    From Proposition 4 one can now deduce Theorem 3 by a routine ultraproduct argument (the same one used in these previous blog posts). Suppose for contradiction that Theorem 3 fails. Then there exists natural numbers {s,n \geq 1}, a sequence of finite fields {{\bf F}_i} of characteristic at least {i}, and subsets {A_i=\{a_{i,1},\ldots,a_{i,n}\}} of {{\bf F}_i} of cardinality {n} such that for each {i}, there does not exist a Freiman field isomorphism of order {s} from {A_i} to the complex numbers. Now we select a non-principal ultrafilter {\alpha \in \beta {\bf N} \backslash {\bf N}}, and construct the ultraproduct {{\bf F} := \prod_{i \rightarrow \alpha} {\bf F}_i} of the finite fields {{\bf F}_i}. This is again a field (and is a basic example of what is known as a pseudo-finite field); because the characteristic of {{\bf F}_i} goes to infinity as {i \rightarrow \infty}, it is easy to see (using Los’s theorem) that {{\bf F}} has characteristic zero and can thus be viewed as an extension of the rationals {{\bf Q}}.

    Now let {a_j := \lim_{i \rightarrow \alpha} a_{i,j}} be the ultralimit of the {a_{i,j}}, so that {A := \{a_1,\ldots,a_n\}} is the ultraproduct of the {A_i}, then {A} is a subset of {{\bf F}} of cardinality {n}. In particular, if {k} is the field generated by {{\bf Q}} and {A}, then {k} is a finitely generated extension of the rationals and thus, by Proposition 4 there is an isomorphism {\phi: k \rightarrow \phi(k)} from {k} to a subfield {\phi(k)} of the complex numbers. In particular, {\phi(a_1),\ldots,\phi(a_n)} are complex numbers, and for any polynomial {P(x_1,\ldots,x_n)} with integer coefficients, one has

    \displaystyle  P(a_1,\ldots,a_n) = 0

    if and only if

    \displaystyle  P(\phi(a_1),\ldots,\phi(a_n)) = 0.

    By Los’s theorem, we then conclude that for all {i} sufficiently close to {\alpha}, one has for all polynomials {P(x_1,\ldots,x_n)} of degree at most {s} and whose coefficients are integers whose magnitude sums up to {s}, one has

    \displaystyle  P(a_{i,1},\ldots,a_{i,n}) = 0

    if and only if

    \displaystyle  P(\phi(a_1),\ldots,\phi(a_n)) = 0.

    But this gives a Freiman field isomorphism of order {s} between {A_i} and {\phi(A)}, contradicting the construction of {A_i}, and Theorem 3 follows.

    Two weeks ago I was at Oberwolfach, for the Arbeitsgemeinschaft in Ergodic Theory and Combinatorial Number Theory that I was one of the organisers for. At this workshop, I learned the details of a very nice recent convergence result of Miguel Walsh (who, incidentally, is an informal grandstudent of mine, as his advisor, Roman Sasyk, was my informal student), which considerably strengthens and generalises a number of previous convergence results in ergodic theory (including one of my own), with a remarkably simple proof. Walsh’s argument is phrased in a finitary language (somewhat similar, in fact, to the approach used in my paper mentioned previously), and (among other things) relies on the concept of metastability of sequences, a variant of the notion of convergence which is useful in situations in which one does not expect a uniform convergence rate; see this previous blog post for some discussion of metastability. When interpreted in a finitary setting, this concept requires a fair amount of “epsilon management” to manipulate; also, Walsh’s argument uses some other epsilon-intensive finitary arguments, such as a decomposition lemma of Gowers based on the Hahn-Banach theorem. As such, I was tempted to try to rewrite Walsh’s argument in the language of nonstandard analysis to see the extent to which these sorts of issues could be managed. As it turns out, the argument gets cleaned up rather nicely, with the notion of metastability being replaced with the simpler notion of external Cauchy convergence (which we will define below the fold).

    Let’s first state Walsh’s theorem. This theorem is a norm convergence theorem in ergodic theory, and can be viewed as a substantial generalisation of one of the most fundamental theorems of this type, namely the mean ergodic theorem:

    Theorem 1 (Mean ergodic theorem) Let {(X,\mu,T)} be a measure-preserving system (a probability space {(X,\mu)} with an invertible measure-preserving transformation {T}). Then for any {f \in L^2(X,\mu)}, the averages {\frac{1}{N} \sum_{n=1}^N T^n f} converge in {L^2(X,\mu)} norm as {N \rightarrow \infty}, where {T^n f(x) := f(T^{-n} x)}.

    In this post, all functions in {L^2(X,\mu)} and similar spaces will be taken to be real instead of complex-valued for simplicity, though the extension to the complex setting is routine.

    Actually, we have a precise description of the limit of these averages, namely the orthogonal projection of {f} to the {T}-invariant factors. (See for instance my lecture notes on this theorem.) While this theorem ostensibly involves measure theory, it can be abstracted to the more general setting of unitary operators on a Hilbert space:

    Theorem 2 (von Neumann mean ergodic theorem) Let {H} be a Hilbert space, and let {U: H \rightarrow H} be a unitary operator on {H}. Then for any {f \in H}, the averages {\frac{1}{N} \sum_{n=1}^N U^n f} converge strongly in {H} as {N \rightarrow \infty}.

    Again, see my lecture notes (or just about any text in ergodic theory) for a proof.

    Now we turn to Walsh’s theorem.

    Theorem 3 (Walsh’s convergence theorem) Let {(X,\mu)} be a measure space with a measure-preserving action of a nilpotent group {G}. Let {g_1,\ldots,g_k: {\bf Z} \rightarrow G} be polynomial sequences in {G} (i.e. each {g_i} takes the form {g_i(n) = a_{i,1}^{p_{i,1}(n)} \ldots a_{i,j}^{p_{i,j}(n)}} for some {a_{i,1},\ldots,a_{i,j} \in G} and polynomials {p_{i,1},\ldots,p_{i,j}: {\bf Z} \rightarrow {\bf Z}}). Then for any {f_1,\ldots,f_k \in L^\infty(X,\mu)}, the averages {\frac{1}{N} \sum_{n=1}^N (g_1(n) f_1) \ldots (g_k(n) f_k)} converge in {L^2(X,\mu)} norm as {N \rightarrow \infty}, where {g(n) f(x) := f(g(n)^{-1} x)}.

    It turns out that this theorem can also be abstracted to some extent, although due to the multiplication in the summand {(g_1(n) f_1) \ldots (g_k(n) f_k)}, one cannot work purely with Hilbert spaces as in the von Neumann mean ergodic theorem, but must also work with something like the Banach algebra {L^\infty(X,\mu)}. There are a number of ways to formulate this abstraction (which will be of some minor convenience to us, as it will allow us to reduce the need to invoke the nonstandard measure theory of Loeb, discussed for instance in this blog post); we will use the notion of a (real) commutative probability space {({\mathcal A},\tau)}, which for us will be a commutative unital algebra {{\mathcal A}} over the reals together with a linear functional {\tau: {\mathcal A} \rightarrow {\bf R}} which maps {1} to {1} and obeys the non-negativity axiom {\tau(f^2) \ge 0} for all {f}. The key example to keep in mind here is {{\mathcal A} = L^\infty(X,\mu)} of essentially bounded real-valued measurable functions with the supremum norm, and with the trace {\tau(f) := \int_X f\ d\mu}. We will also assume in our definition of commutative probability spaces that all elements {f} of {{\mathcal A}} are bounded in the sense that the spectral radius {\rho(f) := \lim_{k \rightarrow \infty} \tau(f^{2k})^{1/2k}} is finite. (In the concrete case of {L^\infty(X,\mu)}, the spectral radius is just the {L^\infty} norm.)

    Given a commutative probability space, we can form an inner product {\langle, \rangle_{L^2(\tau)}} on it by the formula

    \displaystyle  \langle f, g \rangle_{L^2(\tau)} := \tau(fg).

    This is a positive semi-definite form, and gives a (possibly degenerate) inner product structure on {{\mathcal A}}. We could complete this structure into a Hilbert space {L^2(\tau)} (after quotienting out the elements of zero norm), but we will not do so here, instead just viewing {L^2(\tau)} as providing a semi-metric on {{\mathcal A}}. For future reference we record the inequalities

    \displaystyle  \rho(fg) \leq \rho(f) \rho(g)

    \displaystyle  \rho(f+g) \leq \rho(f) + \rho(g)

    \displaystyle  \| fg\|_{L^2(\tau)} \leq \|f\|_{L^2(\tau)} \rho(g)

    for any {f,g}, which we will use in the sequel without further comment; see e.g. these previous blog notes for proofs. (Actually, for the purposes of proving Theorem 3, one can specialise to the {L^\infty(X,\mu)} case (and ultraproducts thereof), in which case these inequalities are just the triangle and Hölder inequalities.)

    The abstract version of Theorem 3 is then

    Theorem 4 (Walsh’s theorem, abstract version) Let {({\mathcal A},\tau)} be a commutative probability space, and let {G} be a nilpotent group acting on {{\mathcal A}} by isomorphisms (preserving the algebra, conjugation, and trace structure, and thus also preserving the spectral radius and {L^2(\tau)} norm). Let {g_1,\ldots,g_k: {\bf Z} \rightarrow G} be polynomial sequences. Then for any {f_1,\ldots,f_k \in {\mathcal A}}, the averages {\frac{1}{N} \sum_{n=1}^N (g_1(n) f_1) \ldots (g_k(n) f_k)} form a Cauchy sequence in {L^2(\tau)} (semi-)norm as {N \rightarrow \infty}.

    It is easy to see that this theorem generalises Theorem 3. Conversely, one can use the commutative Gelfand-Naimark theorem to deduce Theorem 4 from Theorem 3, although we will not need this implication. Note how we are abandoning all attempts to discern what the limit of the sequence actually is, instead contenting ourselves with demonstrating that it is merely a Cauchy sequence. With this phrasing, it is tempting to ask whether there is any analogue of Walsh’s theorem for noncommutative probability spaces, but unfortunately the answer to that question is negative for all but the simplest of averages, as was worked out in this paper of Austin, Eisner, and myself.

    Our proof of Theorem 4 will proceed as follows. Firstly, in order to avoid the epsilon management alluded to earlier, we will take an ultraproduct to rephrase the theorem in the language of nonstandard analysis; for reasons that will be clearer later, we will also convert the convergence problem to a problem of obtaining metastability (external Cauchy convergence). Then, we observe that (the nonstandard counterpart of) the expression {\|\frac{1}{N} \sum_{n=1}^N (g_1(n) f_1) \ldots (g_k(n) f_k)\|_{L^2(\tau)}^2} can be viewed as the inner product of (say) {f_k} with a certain type of expression, which we call a dual function. By performing an orthogonal projection to the span of the dual functions, we can split {f_k} into the sum of an expression orthogonal to all dual functions (the “pseudorandom” component), and a function that can be well approximated by finite linear combinations of dual functions (the “structured” component). The contribution of the pseudorandom component is asymptotically negligible, so we can reduce to consideration of the structured component. But by a little bit of rearrangement, this can be viewed as an average of expressions similar to the initial average {\frac{1}{N} \sum_{n=1}^N (g_1(n) f_1) \ldots (g_k(n) f_k)}, except with the polynomials {g_1,\ldots,g_k} replaced by a “lower complexity” set of such polynomials, which can be greater in number, but which have slightly lower degrees in some sense. One can iterate this (using “PET induction”) until all the polynomials become trivial, at which point the claim follows.

    Read the rest of this entry »

    Much as group theory is the study of groups, or graph theory is the study of graphs, model theory is the study of models (also known as structures) of some language {{\mathcal L}} (which, in this post, will always be a single-sorted, first-order language). A structure is a set {X}, equipped with one or more operations, constants, and relations. This is of course an extremely general type of mathematical object, but (quite remarkably) one can still say a substantial number of interesting things about very broad classes of structures.

    We will observe the common abuse of notation of using the set {X} as a metonym for the entire structure, much as we usually refer to a group {(G,1,\cdot,()^{-1})} simply as {G}, a vector space {(V, 0, +, \cdot)} simply as {V}, and so forth. Following another common bending of the rules, we also allow some operations on structures (such as the multiplicative inverse operation on a group or field) to only be partially defined, and we allow use of the usual simplifying conventions for mathematical formulas (e.g. writing {a+b+c} instead of {(a+b)+c} or {a+(b+c)}, in cases where associativity is known). We will also deviate slightly from the usual practice in logic by emphasising individual structures, rather than the theory of general classes of structures; for instance, we will talk about the theory of a single field such as {{\bf R}} or {{\bf C}}, rather than the theory of all fields of a certain type (e.g. real closed fields or algebraically closed fields).

    Once one has a structure {X}, one can introduce the notion of a definable subset of {X}, or more generally of a Cartesian power {X^n} of {X}, defined as a set {E \subset X^n} of the form

    \displaystyle  E = \{ (x_1,\ldots,x_n): P(x_1,\ldots,x_n) \hbox{ true} \} \ \ \ \ \ (1)

    for some formula {P} in the language {{\mathcal L}} with {n} free variables and any number of constants from {X} (that is, {P(x_1,\ldots,x_n)} is a well-formed formula built up from a finite number of constants {c_1,\ldots,c_m} in {X}, the relations and operations on {X}, logical connectives such as {\neg}, {\wedge}, {\implies}, and the quantifiers {\forall, \exists}). Thus, for instance, in the theory of the arithmetic of the natural numbers {{\bf N} = ({\bf N}, 0, 1, +, \times)}, the set of primes {{\mathcal P}} is a definable set, since we have

    \displaystyle  {\mathcal P} = \{ x \in {\bf N}: (\exists y: x=y+2) \wedge \neg (\exists z,w: x = (z+2)(w+2)) \}.

    In the theory of the field of reals {{\bf R} = ({\bf R}, 0, 1, +, -, \times, ()^{-1})}, the unit circle {S^1} is an example of a definable set,

    \displaystyle  S^1 = \{ (x,y) \in {\bf R}^2: x^2+y^2 = 1 \},

    but so is the the complement of the circle,

    \displaystyle  {\bf R}^2 \backslash S^1 = \{ (x,y) \in {\bf R}^2: \neg(x^2+y^2 = 1) \}

    and the interval {[-1,1]}:

    \displaystyle  [-1,1] = \{ x \in {\bf R}: \exists y: x^2+y^2 = 1\}.

    Due to the unlimited use of constants, any finite subset of a power {X^n} of any structure {X} is, by our conventions, definable in that structure. (One can of course also consider definability without parameters (also known as {0}-definability), in which arbitrary constants are not permitted, but we will not do so here.)

    We can isolate some special subclasses of definable sets:

    • An atomic definable set is a set of the form (1) in which {P()} is an atomic formula (i.e. it does not contain any logical connectives or quantifiers).
    • A quantifier-free definable set is a set of the form (1) in which {P()} is quantifier-free (i.e. it can contain logical connectives, but does not contain the quantifiers {\forall, \exists}).

    Example 1 In the theory of a field such as {{\bf R}}, an atomic definable set is the same thing as an affine algebraic set (also known as an affine algebraic variety, with the understanding that varieties are not necessarily assumed to be irreducible), and a quantifier-free definable set is known as a constructible set; thus we see that algebraic geometry can be viewed in some sense as a special case of model theory. (Conversely, it can in fact be quite profitable to think of model theory as an abstraction of algebraic geometry; for instance, the concepts of Morley rank and Morley degree in model theory (discussed in this previous blog post) directly generalises the concepts of dimension and degree in algebraic geometry.) Over {{\bf R}}, the interval {[-1,1]} is a definable set, but not a quantifier-free definable set (and certainly not an atomic definable set); and similarly for the primes over {{\bf N}}.

    A quantifier-free definable set in {X^n} is nothing more than a finite boolean combination of atomic definable sets; in other words, the class of quantifier-free definable sets over {X} is the smallest class that contains the atomic definable sets and is closed under boolean operations such as complementation and union (which generate all the other boolean operations). Similarly, the class of definable sets over {X} is the smallest class that contains the quantifier-free definable sets, and is also closed under the operation of projection {\pi_n: E \mapsto \pi_n(E)} from {X^{n+1}} to {X^n} for every natural number {n}, where {\pi_n: X^{n+1} \rightarrow X^n} is the map {\pi_n(x_1,\ldots,x_n,x_{n+1}) := (x_1,\ldots,x_n)}.

    Some structures have the property of enjoying quantifier elimination, which means that every definable set is in fact a quantifier-free definable set, or equivalently that the projection of a quantifier-free definable set is again quantifier-free. For instance, an algebraically closed field {k} (with the field operations) has quantifier elimination (i.e. the projection of a constructible set is again constructible); this fact can be proven by the classical tool of resultants, and among other things can be used to give a proof of Hilbert’s nullstellensatz. (Note though that projection does not necessary preserve the property of being atomic; for instance, the projection of the atomic set {\{ (x,y) \in k^2: xy=1 \}} is the non-atomic, but still quantifier-free definable, set {\{ x \in k: \neg (k=0) \}}.) In the converse direction, it is not difficult to use the nullstellensatz to deduce quantifier elimination. For theory of the real field {{\bf R}}, which is not algebraically closed, one does not have quantifier elimination, as one can see from the example of the unit circle (which is a quantifier-free definable set) projecting down to the interval {[-1,1]} (which is definable, but not quantifer-free definable). However, if one adds the additional operation of order {<} to the reals, giving it the language of an ordered field rather than just a field, then quantifier elimination is recovered (the class of quantifier-free definable sets now enlarges to match the class of definable sets, which in this case is also the class of semi-algebraic sets); this is the famous Tarski-Seidenberg theorem.

    On the other hand, many important structures do not have quantifier elimination; typically, the projection of a quantifier-free definable set is not, in general, quantifier-free definable. This failure of the projection property also shows up in many contexts outside of model theory; for instance, Lebesgue famously made the error of thinking that the projection of a Borel measurable set remained Borel measurable (it is merely an analytic set instead). Turing’s halting theorem can be viewed as an assertion that the projection of a decidable set (also known as a computable or recursive set) is not necessarily decidable (it is merely semi-decidable (or recursively enumerable) instead). The notorious P=NP problem can also be essentially viewed in this spirit; roughly speaking (and glossing over the placement of some quantifiers), it asks whether the projection of a polynomial-time decidable set is again polynomial-time decidable. And so forth. (See this blog post of Dick Lipton for further discussion of the subtleties of projections.)

    Now we consider the status of quantifier elimination for the theory of a finite field {F}. If interpreted naively, quantifier elimination is trivial for a finite field {F}, since every subset of {F^n} is finite and thus quantifier-free definable. However, we can recover an interesting question in one of two (essentially equivalent) ways. One is to work in the asymptotic regime in which the field {F} is large, but the length of the formulae used to construct one’s definable sets stays bounded uniformly in the size of {F} (where we view any constant in {F} as contributing a unit amount to the length of a formula, no matter how large {F} is). A simple counting argument then shows that only a small number of subsets of {F^n} become definable in the asymptotic limit {|F| \rightarrow \infty}, since the number of definable sets clearly grows at most polynomially in {|F|} for any fixed bound on the formula length, while the number of all subsets of {|F|^n} grows exponentially in {n}.

    Another way to proceed is to work not with a single finite field {F}, or even with a sequence {F_m} of finite fields, but with the ultraproduct {F = \prod_{m \rightarrow p} F_m} of a sequence of finite fields, and to study the properties of definable sets over this ultraproduct. (We will be using the notation of ultraproducts and nonstandard analysis from this previous blog post.) This approach is equivalent to the more finitary approach mentioned in the previous paragraph, at least if one does not care to track of the exact bounds on the length of the formulae involved. Indeed, thanks to Los’s theorem, a definable subset {E} of {F^n} is nothing more than the ultraproduct {E = \prod_{m \rightarrow p} E_m} of definable subsets {E_m} of {F_m^n} for all {m} sufficiently close to {p}, with the length of the formulae used to define {E_m} uniformly bounded in {m}. In the language of nonstandard analysis, one can view {F} as a nonstandard finite field.

    The ultraproduct {F} of finite fields is an important example of a pseudo-finite field – a field that obeys all the sentences in the languages of fields that finite fields do, but is not necessarily itself a finite field. The model theory of pseudo-finite fields was first studied systematically by Ax (in the same paper where the Ax-Grothendieck theorem, discussed previously on this blog, was established), with important further contributions by Kiefe, by Fried-Sacerdote, by two papers of Chatzidakis-van den Dries-Macintyre, and many other authors.

    As mentioned before, quantifier elimination trivially holds for finite fields. But for infinite pseudo-finite fields, such as the ultraproduct {F = \prod_{m \rightarrow p} F_m} of finite fields with {|F_m|} going to infinity, quantifier elimination fails. For instance, in a finite field {F_m}, the set {E_m := \{ x \in F_m: \exists y \in F_m: x=y^2 \}} of quadratic residues is a definable set, with a bounded formula length, and so in the ultraproduct {F =\prod_{m \rightarrow p} F_m}, the set {E := \prod_{m\rightarrow p} E_m} of nonstandard quadratic residues is also a definable set. However, in one dimension, we see from the factor theorem that the only atomic definable sets are either finite or the whole field {F}, and so the only constructible sets (i.e. the only quantifier-free definable sets) are either finite or cofinite in {F}. Since the quadratic residues have asymptotic density {1/2} in a large finite field, they cannot form a quantifier-free definable set, despite being definable.

    Nevertheless, there is a very nice almost quantifier elimination result for these fields, in characteristic zero at least, which we phrase here as follows:

    Theorem 1 (Almost quantifier elimination) Let {F} be a nonstandard finite field of characteristic zero, and let {E \subset F^n} be a definable set over {F}. Then {E} is the union of finitely many sets of the form

    \displaystyle  E = \{ x \in V(F): \exists t \in F: P(x,t) = 0 \} \ \ \ \ \ (2)

    where {V(F)} is an atomic definable subset of {F^n} (i.e. the {F}-points of an algebraic variety {V} defined over {F} in {F^n}) and {P: F^{n+1} \rightarrow F} is a polynomial.

    Results of this type were first obtained essentially due to Catarina Kiefe, although the formulation here is closer to that of Chatzidakis-van den Dries-Macintyre.

    Informally, this theorem says that while we cannot quite eliminate all quantifiers from a definable set over a nonstandard finite field, we can eliminate all but one existential quantifier. Note that negation has also been eliminated in this theorem; for instance, the definable set {F \backslash \{0\} = \{ x \in F: \neg(x=0) \}} uses a negation, but can also be described using a single existential quantifier as {\{ x \in F: \exists t: xt = 1 \}}.) I believe that there are more complicated analogues of this result in positive characteristic, but I have not studied this case in detail (Kiefe’s result does not assume characteristic zero, but her conclusion is slightly different from the one given here). In the one-dimensional case {n=1}, the only varieties {V} are the affine line and finite sets, and we can simplify the above statement, namely that any definable subset of {F} takes the form {\{ x\in F: \exists t \in F: P(x,t) = 0 \}} for some polynomial {P} (i.e. definable sets in {F} are nothing more than the projections of the {F}-points of a plane curve).

    There is an equivalent formulation of this theorem for standard finite fields, namely that if {F} is a finite field and {E \subset F^n} is definable using a formula of length at most {M}, then {E} can be expressed in the form (2) with the degree of {P} bounded by some quantity {C_{M,n}} depending on {M} and {n}, assuming that the characteristic of {F} is sufficiently large depending on {M, n}.

    The theorem gives quite a satisfactory description of definable sets in either standard or nonstandard finite fields (at least if one does not care about effective bounds in some of the constants, and if one is willing to exclude the small characteristic case); for instance, in conjunction with the Lang-Weil bound discussed in this recent blog post, it shows that any non-empty definable subset of a nonstandard finite field has a nonstandard cardinality of {(\alpha + O(|F|^{-1/2})) |F|^d} for some positive standard rational {\alpha} and integer {d}. Equivalently, any non-empty definable subset of {F^n} for some standard finite field {F} using a formula of length at most {M} has a standard cardinality of {(\alpha + O_{M,n}(|F|^{-1/2})) |F|^d} for some positive rational of height {O_{M,n}(1)} and some natural number {d} between {0} and {n}. (For instance, in the example of the quadratic residues given above, {d} is equal to {1} and {\alpha} equal to {1/2}.) There is a more precise statement to this effect, namely that the Poincaré series of a definable set is rational; see Kiefe’s paper for details.

    Below the fold I give a proof of Theorem 1, which relies primarily on the Lang-Weil bound mentioned above.

    Read the rest of this entry »

    Nonstandard analysis is a mathematical framework in which one extends the standard mathematical universe {{\mathfrak U}} of standard numbers, standard sets, standard functions, etc. into a larger nonstandard universe {{}^* {\mathfrak U}} of nonstandard numbers, nonstandard sets, nonstandard functions, etc., somewhat analogously to how one places the real numbers inside the complex numbers, or the rationals inside the reals. This nonstandard universe enjoys many of the same properties as the standard one; in particular, we have the transfer principle that asserts that any statement in the language of first order logic is true in the standard universe if and only if it is true in the nonstandard one. (For instance, because Fermat’s last theorem is known to be true for standard natural numbers, it is automatically true for nonstandard natural numbers as well.) However, the nonstandard universe also enjoys some additional useful properties that the standard one does not, most notably the countable saturation property, which is a property somewhat analogous to the completeness property of a metric space; much as metric completeness allows one to assert that the intersection of a countable family of nested closed balls is non-empty, countable saturation allows one to assert that the intersection of a countable family of nested satisfiable formulae is simultaneously satisfiable. (See this previous blog post for more on the analogy between the use of nonstandard analysis and the use of metric completions.) Furthermore, by viewing both the standard and nonstandard universes externally (placing them both inside a larger metatheory, such as a model of Zermelo-Frankel-Choice (ZFC) set theory; in some more advanced set-theoretic applications one may also wish to add some large cardinal axioms), one can place some useful additional definitions and constructions on these universes, such as defining the concept of an infinitesimal nonstandard number (a number which is smaller in magnitude than any positive standard number). The ability to rigorously manipulate infinitesimals is of course one of the most well-known advantages of working with nonstandard analysis.

    To build a nonstandard universe {{}^* {\mathfrak U}} from a standard one {{\mathfrak U}}, the most common approach is to take an ultrapower of {{\mathfrak U}} with respect to some non-principal ultrafilter over the natural numbers; see e.g. this blog post for details. Once one is comfortable with ultrafilters and ultrapowers, this becomes quite a simple and elegant construction, and greatly demystifies the nature of nonstandard analysis.

    On the other hand, nonprincipal ultrafilters do have some unappealing features. The most notable one is that their very existence requires the axiom of choice (or more precisely, a weaker form of this axiom known as the boolean prime ideal theorem). Closely related to this is the fact that one cannot actually write down any explicit example of a nonprincipal ultrafilter, but must instead rely on nonconstructive tools such as Zorn’s lemma, the Hahn-Banach theorem, Tychonoff’s theorem, the Stone-Cech compactification, or the boolean prime ideal theorem to locate one. As such, ultrafilters definitely belong to the “infinitary” side of mathematics, and one may feel that it is inappropriate to use such tools for “finitary” mathematical applications, such as those which arise in hard analysis. From a more practical viewpoint, because of the presence of the infinitary ultrafilter, it can be quite difficult (though usually not impossible, with sufficient patience and effort) to take a finitary result proven via nonstandard analysis and coax an effective quantitative bound from it.

    There is however a “cheap” version of nonstandard analysis which is less powerful than the full version, but is not as infinitary in that it is constructive (in the sense of not requiring any sort of choice-type axiom), and which can be translated into standard analysis somewhat more easily than a fully nonstandard argument; indeed, a cheap nonstandard argument can often be presented (by judicious use of asymptotic notation) in a way which is nearly indistinguishable from a standard one. It is obtained by replacing the nonprincipal ultrafilter in fully nonstandard analysis with the more classical Fréchet filter of cofinite subsets of the natural numbers, which is the filter that implicitly underlies the concept of the classical limit {\lim_{{\bf n} \rightarrow \infty} a_{\bf n}} of a sequence when the underlying asymptotic parameter {{\bf n}} goes off to infinity. As such, “cheap nonstandard analysis” aligns very well with traditional mathematics, in which one often allows one’s objects to be parameterised by some external parameter such as {{\bf n}}, which is then allowed to approach some limit such as {\infty}. The catch is that the Fréchet filter is merely a filter and not an ultrafilter, and as such some of the key features of fully nonstandard analysis are lost. Most notably, the law of the excluded middle does not transfer over perfectly from standard analysis to cheap nonstandard analysis; much as there exist bounded sequences of real numbers (such as {0,1,0,1,\ldots}) which do not converge to a (classical) limit, there exist statements in cheap nonstandard analysis which are neither true nor false (at least without passing to a subsequence, see below). The loss of such a fundamental law of mathematical reasoning may seem like a major disadvantage for cheap nonstandard analysis, and it does indeed make cheap nonstandard analysis somewhat weaker than fully nonstandard analysis. But in some situations (particularly when one is reasoning in a “constructivist” or “intuitionistic” fashion, and in particular if one is avoiding too much reliance on set theory) it turns out that one can survive the loss of this law; and furthermore, the law of the excluded middle is still available for standard analysis, and so one can often proceed by working from time to time in the standard universe to temporarily take advantage of this law, and then transferring the results obtained there back to the cheap nonstandard universe once one no longer needs to invoke the law of the excluded middle. Furthermore, the law of the excluded middle can be recovered by adopting the freedom to pass to subsequences with regards to the asymptotic parameter {{\bf n}}; this technique is already in widespread use in the analysis of partial differential equations, although it is generally referred to by names such as “the compactness method” rather than as “cheap nonstandard analysis”.

    Below the fold, I would like to describe this cheap version of nonstandard analysis, which I think can serve as a pedagogical stepping stone towards fully nonstandard analysis, as it is formally similar to (though weaker than) fully nonstandard analysis, but on the other hand is closer in practice to standard analysis. As we shall see below, the relation between cheap nonstandard analysis and standard analysis is analogous in many ways to the relation between probabilistic reasoning and deterministic reasoning; it also resembles somewhat the preference in much of modern mathematics for viewing mathematical objects as belonging to families (or to categories) to be manipulated en masse, rather than treating each object individually. (For instance, nonstandard analysis can be used as a partial substitute for scheme theory in order to obtain uniformly quantitative results in algebraic geometry, as discussed for instance in this previous blog post.)

    Read the rest of this entry »

    Emmanuel Breuillard, Ben Green, and I have just uploaded to the arXiv our paper “The structure of approximate groups“, submitted to Pub. IHES. We had announced the main results of this paper in various forums (including this blog) for a few months now, but it had taken some time to fully write up the paper and put in various refinements and applications.

    As announced previously, the main result of this paper is what is a (virtually, qualitatively) complete description of finite approximate groups in an arbitrary (local or global) group {G}. For simplicity let us work in the much more familiar setting of global groups, although our results also apply (but are a bit more technical to state) in the local group setting.

    Recall that in a global group {G = (G,\cdot)}, a {K}-approximate group is a symmetric subset {A} of {G} containing the origin, with the property that the product set {A \cdot A} is covered by {K} left-translates of {A}. Examples of {O(1)}-approximate groups include genuine groups, convex bodies in a bounded dimensional vector space, small balls in a bounded dimensional Lie group, large balls in a discrete nilpotent group of bounded rank or step, or generalised arithmetic progressions (or more generally, coset progressions) of bounded rank in an abelian group. Specialising now to finite approximate groups, a key example of such a group is what we call a coset nilprogression: a set of the form {\pi^{-1}(P)}, where {\pi: G' \rightarrow N} is a homomorphism with finite kernel from a subgroup {G'} of {G} to a nilpotent group {N} of bounded step, and {P = P(u_1,\ldots,u_r;N_1,\ldots,N_r)} is a nilprogression with a bounded number of generators {u_1,\ldots,u_r} in {N} and some lengths {N_1,\ldots,N_r \gg 1}, where {P(u_1,\ldots,u_r;N_1,\ldots,N_r)} consists of all the words involving at most {N_1} copies of {u_1^{\pm 1}}, {N_2} copies of {u_2^{\pm 1}}, and so forth up to {N_r} copies of {u_r^{\pm 1}}. One can show (by some nilpotent algebra) that all such coset nilprogressions are {O(1)}-approximate groups so long as the step and the rank {r} are bounded (and if {N_1,\ldots,N_r} are sufficiently large).

    Our main theorem (which was essentially conjectured independently by Helfgott and by Lindenstrauss) asserts, roughly speaking, that coset nilprogressions are essentially the only examples of approximate groups.

    Theorem 1 Let {A} be a {K}-approximate group. Then {A^4} contains a coset nilprogression {P} of rank and step {O_K(1)}, such that {A} can be covered by {O_K(1)} left-translates of {P}.

    In the torsion-free abelian case, this result is essentially Freiman’s theorem (with an alternate proof by Ruzsa); for general abelian case, it is due to Green and Ruzsa. Various partial results in this direction for some other groups (e.g. free groups, nilpotent groups, solvable groups, or simple groups of Lie type) are also known; see these previous blog posts for a summary of several of these results.

    This result has a number of applications to geometric growth theory, and in particular to variants of Gromov’s theorem of groups of polynomial growth, which asserts that a finitely generated group is of polynomial growth if and only if it is virtually nilpotent. The connection lies in the fact that if the balls {B_S(R)} associated to a finite set of generators {S} has polynomial growth, then some simple volume-packing arguments combined with the pigeonhole principle will show that {B_S(R)} will end up being a {O(1)}-approximate group for many radii {R}. In fact, since our theorem only needs a single approximate group to obtain virtually nilpotent structure, we are able to obtain some new strengthenings of Gromov’s theorem. For instance, if {A} is any {K}-approximate group in a finitely generated group {G} that contains {B_S(R_0)} for some set of generators {S} and some {R_0} that is sufficiently large depending on {K}, our theorem implies that {G} is virtually nilpotent, answering a question of Petrunin. Among other things, this gives an alternate proof of a recent result of Kapovitch and Wilking (see also this previous paper of Cheeger and Colding) that a compact manifold of bounded diameter and Ricci curvature at least {-\epsilon} necessarily has a virtually nilpotent fundamental group if {\epsilon} is sufficiently small (depending only on dimension). The main point here is that no lower bound on the injectivity radius is required. Another application is a “Margulis-type lemma”, which asserts that if a metric space {X} has “bounded packing” (in the sense that any ball of radius (say) {4} is covered by a bounded number of balls of radius {1}), and {\Gamma} is a group of isometries on {X} that acts discretely (i.e. every orbit has only finitely many elements (counting multiplicity) in each bounded set), then the near-stabiliser {\{ \gamma \in \Gamma: d(\gamma x, x) \leq \epsilon \}} of a point {x} is virtually nilpotent if {\epsilon} is small enough depending on the packing constant.

    There are also some variants and refinements to the main theorem proved in the paper, such as an extension to local groups, and also an improvement on the bound on the rank and step from {O_K(1)} to {O(\log K)} (but at the cost of replacing {A^4} in the theorem with {A^{O(1)}}).

    I’ll be discussing the proof of the main theorem in detail in the next few lecture notes of my current graduate course. The full proof is somewhat lengthy (occupying about 50 pages of the 90-page paper), but can be summarised in the following steps:

    1. (Hrushovski) Take an arbitrary sequence {A_n} of finite {K}-approximate groups, and show that an appropriate limit {A} of such groups can be “modeled” in some sense by an open bounded subset of a locally compact group. (The precise definition of “model” is technical, but “macroscopically faithful representation” is a good first approximation.) As discussed in the previous lecture notes, we use an ultralimit for this purpose; the paper of Hrushovski where this strategy was first employed also considered more sophisticated model-theoretic limits. To build a locally compact topology, Hrushovski used some tools from definability theory; in our paper, we instead use a combinatorial lemma of Sanders (closely related to a similar result of Croot and Sisask.)
    2. (Gleason-Yamabe) The locally compact group can in turn be “modeled” by a Lie group (possibly after shrinking the group, and thus the ultralimit {A}, slightly). (This result arose from the solution to Hilbert’s fifth problem, as discussed here. For our extension to local groups, we use a recent local version of the Gleason-Yamabe theorem, due to Goldbring.)
    3. (Gleason) Using the escape properties of the Lie model, construct a norm {\| \|} (and thus a left-invariant metric {d}) on the ultralimit approximate group {A} (and also on the finitary groups {A_n}) that obeys a number of good properties, such as a commutator estimate {\| [g,h]\| \ll \|g\| \|h\|}. (This is modeled on an analogous construction used in the theory of Hilbert’s fifth problem, as discussed in this previous set of lecture notes.) This norm is essentially an escape norm associated to (a slight modification) of {A} or {A_n}.
    4. (Jordan-Bieberbach-Frobenius) We now take advantage of the finite nature of the {A_n} by locating the non-trivial element {e} of {A_n} with minimal escape norm (but one has to first quotient out the elements of zero escape norm first). The commutator estimate mentioned previously ensures that this element is essentially “central” in {A_n}. One can then quotient out a progression {P(e;N)} generated by this central element (reducing the dimension of the Lie model by one in the process) and iterates the process until the dimension of the model drops to zero. Reversing the process, this constructs a coset nilprogression inside {A_n^4}. This argument is based on the classic proof of Jordan’s theorem due to Bieberbach and Frobenius, as discussed in this blog post.

    One quirk of the argument is that it requires one to work in the category of local groups rather than global groups. (This is somewhat analogous to how, in the standard proofs of Freiman’s theorem, one needs to work with the category of Freiman homomorphisms, rather than group homomorphisms.) The reason for this arises when performing the quotienting step in the Jordan-Bieberbach-Frobenius leg of the argument. The obvious way to perform this step (and the thing that we tried first) would be to quotient out by the entire cyclic group {\langle e \rangle} generated by the element {e} of minimal escape norm. However, it turns out that this doesn’t work too well, because the group quotiented out is so “large” that it can create a lot of torsion in the quotient. In particular, elements which used to have positive escape norm, can now become trapped in the quotient of {A_n}, thus sending their escape norm to zero. This leads to an inferior conclusion (in which a coset nilprogression is replaced by a more complicated tower of alternating extensions between central progressions and finite groups, similar to the towers encountered in my previous paper on this topic). To prevent this unwanted creation of torsion, one has to truncate the cyclic group {\langle e \rangle} before it escapes {A_n}, so that one quotients out by a geometric progression {P(e;N)} rather than the cyclic group. But the operation of quotienting out by a {P(e;N)}, which is a local group rather than a global one, cannot be formalised in the category of global groups, but only in the category of local groups. Because of this, we were forced to carry out the entire argument using the language of local groups. As it turns out, the arguments are ultimately more natural in this setting, although there is an initial investment of notation required, given that global group theory is much more familiar and well-developed than local group theory.

    One interesting feature of the argument is that it does not use much of the existing theory of Freiman-type theorems, instead building the coset nilprogression directly from the geometric properties of the approximate group. In particular, our argument gives a new proof of Freiman’s theorem in the abelian case, which largely avoids Fourier analysis (except through the use of the theory of Hilbert’s fifth problem, which uses the Peter-Weyl theorem (or, in the abelian case, Pontryagin duality), which is basically a version of Fourier analysis).

    I have blogged several times in the past about nonstandard analysis, which among other things is useful in allowing one to import tools from infinitary (or qualitative) mathematics in order to establish results in finitary (or quantitative) mathematics. One drawback, though, to using nonstandard analysis methods is that the bounds one obtains by such methods are usually ineffective: in particular, the conclusions of a nonstandard analysis argument may involve an unspecified constant {C} that is known to be finite but for which no explicit bound is obviously available. (In many cases, a bound can eventually be worked out by performing proof mining on the argument, and in particular by carefully unpacking the proofs of all the various results from infinitary mathematics that were used in the argument, as opposed to simply using them as “black boxes”, but this is a time-consuming task and the bounds that one eventually obtains tend to be quite poor (e.g. tower exponential or Ackermann type bounds are not uncommon).)

    Because of this fact, it would seem that quantitative bounds, such as polynomial type bounds {X \leq C Y^C} that show that one quantity {X} is controlled in a polynomial fashion by another quantity {Y}, are not easily obtainable through the ineffective methods of nonstandard analysis. Actually, this is not the case; as I will demonstrate by an example below, nonstandard analysis can certainly yield polynomial type bounds. The catch is that the exponent {C} in such bounds will be ineffective; but nevertheless such bounds are still good enough for many applications.

    Let us now illustrate this by reproving a lemma from this paper of Mei-Chu Chang (Lemma 2.14, to be precise), which was recently pointed out to me by Van Vu. Chang’s paper is focused primarily on the sum-product problem, but she uses a quantitative lemma from algebraic geometry which is of independent interest. To motivate the lemma, let us first establish a qualitative version:

    Lemma 1 (Qualitative solvability) Let {P_1,\ldots,P_r: {\bf C}^d \rightarrow {\bf C}} be a finite number of polynomials in several variables with rational coefficients. If there is a complex solution {z = (z_1,\ldots,z_d) \in {\bf C}^d} to the simultaneous system of equations

    \displaystyle  P_1(z) = \ldots = P_r(z) = 0,

    then there also exists a solution {z \in \overline{{\bf Q}}^d} whose coefficients are algebraic numbers (i.e. they lie in the algebraic closure {{\bf Q}} of the rationals).

    Proof: Suppose there was no solution to {P_1(z)=\ldots=P_r(z)=0} over {\overline{{\bf Q}}}. Applying Hilbert’s nullstellensatz (which is available as {\overline{{\bf Q}}} is algebraically closed), we conclude the existence of some polynomials {Q_1,\ldots,Q_r} (with coefficients in {\overline{{\bf Q}}}) such that

    \displaystyle  P_1 Q_1 + \ldots + P_r Q_r = 1

    as polynomials. In particular, we have

    \displaystyle  P_1(z) Q_1(z) + \ldots + P_r(z) Q_r(z) = 1

    for all {z \in {\bf C}^d}. This shows that there is no solution to {P_1(z)=\ldots=P_r(z)=0} over {{\bf C}}, as required. \Box

    Remark 1 Observe that in the above argument, one could replace {{\bf Q}} and {{\bf C}} by any other pair of fields, with the latter containing the algebraic closure of the former, and still obtain the same result.

    The above lemma asserts that if a system of rational equations is solvable at all, then it is solvable with some algebraic solution. But it gives no bound on the complexity of that solution in terms of the complexity of the original equation. Chang’s lemma provides such a bound. If {H \geq 1} is an integer, let us say that an algebraic number has height at most {H} if its minimal polynomial (after clearing denominators) consists of integers of magnitude at most {H}.

    Lemma 2 (Quantitative solvability) Let {P_1,\ldots,P_r: {\bf C}^d \rightarrow {\bf C}} be a finite number of polynomials of degree at most {D} with rational coefficients, each of height at most {H}. If there is a complex solution {z = (z_1,\ldots,z_d) \in {\bf C}^d} to the simultaneous system of equations

    \displaystyle  P_1(z) = \ldots = P_r(z) = 0,

    then there also exists a solution {z \in \overline{{\bf Q}}^d} whose coefficients are algebraic numbers of degree at most {C} and height at most {CH^C}, where {C = C_{D, d,r}} depends only on {D}, {d} and {r}.

    Chang proves this lemma by essentially establishing a quantitative version of the nullstellensatz, via elementary elimination theory (somewhat similar, actually, to the approach I took to the nullstellensatz in my own blog post). She also notes that one could also establish the result through the machinery of Gröbner bases. In each of these arguments, it was not possible to use Lemma 1 (or the closely related nullstellensatz) as a black box; one actually had to unpack one of the proofs of that lemma or nullstellensatz to get the polynomial bound. However, using nonstandard analysis, it is possible to get such polynomial bounds (albeit with an ineffective value of the constant {C}) directly from Lemma 1 (or more precisely, the generalisation in Remark 1) without having to inspect the proof, and instead simply using it as a black box, thus providing a “soft” proof of Lemma 2 that is an alternative to the “hard” proofs mentioned above.

    Here’s how the proof works. Informally, the idea is that Lemma 2 should follow from Lemma 1 after replacing the field of rationals {{\bf Q}} with “the field of rationals of polynomially bounded height”. Unfortunately, the latter object does not really make sense as a field in standard analysis; nevertheless, it is a perfectly sensible object in nonstandard analysis, and this allows the above informal argument to be made rigorous.

    We turn to the details. As is common whenever one uses nonstandard analysis to prove finitary results, we use a “compactness and contradiction” argument (or more precisely, an “ultralimit and contradiction” argument). Suppose for contradiction that Lemma 2 failed. Carefully negating the quantifiers (and using the axiom of choice), we conclude that there exists {D, d, r} such that for each natural number {n}, there is a positive integer {H^{(n)}} and a family {P_1^{(n)}, \ldots, P_r^{(n)}: {\bf C}^d \rightarrow {\bf C}} of polynomials of degree at most {D} and rational coefficients of height at most {H^{(n)}}, such that there exist at least one complex solution {z^{(n)} \in {\bf C}^d} to

    \displaystyle  P_1^{(n)}(z^{(n)}) = \ldots = P_r(z^{(n)}) = 0, \ \ \ \ \ (1)

    but such that there does not exist any such solution whose coefficients are algebraic numbers of degree at most {n} and height at most {n (H^{(n)})^n}.

    Now we take ultralimits (see e.g. this previous blog post of a quick review of ultralimit analysis, which we will assume knowledge of in the argument that follows). Let {p \in \beta {\bf N} \backslash {\bf N}} be a non-principal ultrafilter. For each {i=1,\ldots,r}, the ultralimit

    \displaystyle  P_i := \lim_{n \rightarrow p} P_i^{(n)}

    of the (standard) polynomials {P_i^{(n)}} is a nonstandard polynomial {P_i: {}^* {\bf C}^d \rightarrow {}^* {\bf C}} of degree at most {D}, whose coefficients now lie in the nonstandard rationals {{}^* {\bf Q}}. Actually, due to the height restriction, we can say more. Let {H := \lim_{n \rightarrow p} H^{(n)} \in {}^* {\bf N}} be the ultralimit of the {H^{(n)}}, this is a nonstandard natural number (which will almost certainly be unbounded, but we will not need to use this). Let us say that a nonstandard integer {a} is of polynomial size if we have {|a| \leq C H^C} for some standard natural number {C}, and say that a nonstandard rational number {a/b} is of polynomial height if {a}, {b} are of polynomial size. Let {{\bf Q}_{poly(H)}} be the collection of all nonstandard rationals of polynomial height. (In the language of nonstandard analysis, {{\bf Q}_{poly(H)}} is an external set rather than an internal one, because it is not itself an ultraproduct of standard sets; but this will not be relevant for the argument that follows.) It is easy to see that {{\bf Q}_{poly(H)}} is a field, basically because the sum or product of two integers of polynomial size, remains of polynomial size. By construction, it is clear that the coefficients of {P_i} are nonstandard rationals of polynomial height, and thus {P_1,\ldots,P_r} are defined over {{\bf Q}_{poly(H)}}.

    Meanwhile, if we let {z := \lim_{n \rightarrow p} z^{(n)} \in {}^* {\bf C}^d} be the ultralimit of the solutions {z^{(n)}} in (1), we have

    \displaystyle  P_1(z) = \ldots = P_r(z) = 0,

    thus {P_1,\ldots,P_r} are solvable in {{}^* {\bf C}}. Applying Lemma 1 (or more precisely, the generalisation in Remark 1), we see that {P_1,\ldots,P_r} are also solvable in {\overline{{\bf Q}_{poly(H)}}}. (Note that as {{\bf C}} is algebraically closed, {{}^*{\bf C}} is also (by Los’s theorem), and so {{}^* {\bf C}} contains {\overline{{\bf Q}_{poly(H)}}}.) Thus, there exists {w \in \overline{{\bf Q}_{poly(H)}}^d} with

    \displaystyle  P_1(w) = \ldots = P_r(w) = 0.

    As {\overline{{\bf Q}_{poly(H)}}^d} lies in {{}^* {\bf C}^d}, we can write {w} as an ultralimit {w = \lim_{n \rightarrow p} w^{(n)}} of standard complex vectors {w^{(n)} \in {\bf C}^d}. By construction, the coefficients of {w} each obey a non-trivial polynomial equation of degree at most {C} and whose coefficients are nonstandard integers of magnitude at most {C H^C}, for some standard natural number {C}. Undoing the ultralimit, we conclude that for {n} sufficiently close to {p}, the coefficients of {w^{(n)}} obey a non-trivial polynomial equation of degree at most {C} whose coefficients are standard integers of magnitude at most {C (H^{(n)})^C}. In particular, these coefficients have height at most {C (H^{(n)})^C}. Also, we have

    \displaystyle  P_1^{(n)}(w^{(n)}) = \ldots = P_r^{(n)}(w^{(n)}) = 0.

    But for {n} larger than {C}, this contradicts the construction of the {P_i^{(n)}}, and the claim follows. (Note that as {p} is non-principal, any neighbourhood of {p} in {{\bf N}} will contain arbitrarily large natural numbers.)

    Remark 2 The same argument actually gives a slightly stronger version of Lemma 2, namely that the integer coefficients used to define the algebraic solution {z} can be taken to be polynomials in the coefficients of {P_1,\ldots,P_r}, with degree and coefficients bounded by {C_{D,d,r}}.

    Archives

    RSS Google+ feed

    • An error has occurred; the feed is probably down. Try again later.
    Follow

    Get every new post delivered to your Inbox.

    Join 4,043 other followers