You are currently browsing the category archive for the ‘math.LO’ category.

The rectification principle in arithmetic combinatorics asserts, roughly speaking, that very small subsets (or, alternatively, small structured subsets) of an additive group or a field of large characteristic can be modeled (for the purposes of arithmetic combinatorics) by subsets of a group or field of zero characteristic, such as the integers {{\bf Z}} or the complex numbers {{\bf C}}. The additive form of this principle is known as the Freiman rectification principle; it has several formulations, going back of course to the original work of Freiman. Here is one formulation as given by Bilu, Lev, and Ruzsa:

Proposition 1 (Additive rectification) Let {A} be a subset of the additive group {{\bf Z}/p{\bf Z}} for some prime {p}, and let {s \geq 1} be an integer. Suppose that {|A| \leq \log_{2s} p}. Then there exists a map {\phi: A \rightarrow A'} into a subset {A'} of the integers which is a Freiman isomorphism of order {s} in the sense that for any {x_1,\ldots,x_s,y_1,\ldots,y_s \in A}, one has

\displaystyle  x_1+\ldots+x_s = y_1+\ldots+y_s

if and only if

\displaystyle  \phi(x_1)+\ldots+\phi(x_s) = \phi(y_1)+\ldots+\phi(y_s).

Furthermore {\phi} is a right-inverse of the obvious projection homomorphism from {{\bf Z}} to {{\bf Z}/p{\bf Z}}.

The original version of the rectification principle allowed the sets involved to be substantially larger in size (cardinality up to a small constant multiple of {p}), but with the additional hypothesis of bounded doubling involved; see the above-mentioned papers, as well as this later paper of Green and Ruzsa, for further discussion.

The proof of Proposition 1 is quite short (see Theorem 3.1 of Bilu-Lev-Ruzsa); the main idea is to use Minkowski’s theorem to find a non-trivial dilate {aA} of {A} that is contained in a small neighbourhood of the origin in {{\bf Z}/p{\bf Z}}, at which point the rectification map {\phi} can be constructed by hand.

Very recently, Codrut Grosu obtained an arithmetic analogue of the above theorem, in which the rectification map {\phi} preserves both additive and multiplicative structure:

Theorem 2 (Arithmetic rectification) Let {A} be a subset of the finite field {{\bf F}_p} for some prime {p \geq 3}, and let {s \geq 1} be an integer. Suppose that {|A| < \log_2 \log_{2s} \log_{2s^2} p - 1}. Then there exists a map {\phi: A \rightarrow A'} into a subset {A'} of the complex numbers which is a Freiman field isomorphism of order {s} in the sense that for any {x_1,\ldots,x_n \in A} and any polynomial {P(x_1,\ldots,x_n)} of degree at most {s} and integer coefficients of magnitude summing to at most {s}, one has

\displaystyle  P(x_1,\ldots,x_n)=0

if and only if

\displaystyle  P(\phi(x_1),\ldots,\phi(x_n))=0.

Note that it is necessary to use an algebraically closed field such as {{\bf C}} for this theorem, in contrast to the integers used in Proposition 1, as {{\bf F}_p} can contain objects such as square roots of {-1} which can only map to {\pm i} in the complex numbers (once {s} is at least {2}).

Using Theorem 2, one can transfer results in arithmetic combinatorics (e.g. sum-product or Szemerédi-Trotter type theorems) regarding finite subsets of {{\bf C}} to analogous results regarding sufficiently small subsets of {{\bf F}_p}; see the paper of Grosu for several examples of this. This should be compared with the paper of Vu, Wood, and Wood, which introduces a converse principle that embeds finite subsets of {{\bf C}} (or more generally, a characteristic zero integral domain) in a Freiman field-isomorphic fashion into finite subsets of {{\bf F}_p} for arbitrarily large primes {p}, allowing one to transfer arithmetic combinatorical facts from the latter setting to the former.

Grosu’s argument uses some quantitative elimination theory, and in particular a quantitative variant of a lemma of Chang that was discussed previously on this blog. In that previous blog post, it was observed that (an ineffective version of) Chang’s theorem could be obtained using only qualitative algebraic geometry (as opposed to quantitative algebraic geometry tools such as elimination theory results with explicit bounds) by means of nonstandard analysis (or, in what amounts to essentially the same thing in this context, the use of ultraproducts). One can then ask whether one can similarly establish an ineffective version of Grosu’s result by nonstandard means. The purpose of this post is to record that this can indeed be done without much difficulty, though the result obtained, being ineffective, is somewhat weaker than that in Theorem 2. More precisely, we obtain

Theorem 3 (Ineffective arithmetic rectification) Let {s, n \geq 1}. Then if {{\bf F}} is a field of characteristic at least {C_{s,n}} for some {C_{s,n}} depending on {s,n}, and {A} is a subset of {{\bf F}} of cardinality {n}, then there exists a map {\phi: A \rightarrow A'} into a subset {A'} of the complex numbers which is a Freiman field isomorphism of order {s}.

Our arguments will not provide any effective bound on the quantity {C_{s,n}} (though one could in principle eventually extract such a bound by deconstructing the proof of Proposition 4 below), making this result weaker than Theorem 2 (save for the minor generalisation that it can handle fields of prime power order as well as fields of prime order as long as the characteristic remains large).

Following the principle that ultraproducts can be used as a bridge to connect quantitative and qualitative results (as discussed in these previous blog posts), we will deduce Theorem 3 from the following (well-known) qualitative version:

Proposition 4 (Baby Lefschetz principle) Let {k} be a field of characteristic zero that is finitely generated over the rationals. Then there is an isomorphism {\phi: k \rightarrow \phi(k)} from {k} to a subfield {\phi(k)} of {{\bf C}}.

This principle (first laid out in an appendix of Lefschetz’s book), among other things, often allows one to use the methods of complex analysis (e.g. Riemann surface theory) to study many other fields of characteristic zero. There are many variants and extensions of this principle; see for instance this MathOverflow post for some discussion of these. I used this baby version of the Lefschetz principle recently in a paper on expanding polynomial maps.

Proof: We give two proofs of this fact, one using transcendence bases and the other using Hilbert’s nullstellensatz.

We begin with the former proof. As {k} is finitely generated over {{\bf Q}}, it has finite transcendence degree, thus one can find algebraically independent elements {x_1,\ldots,x_m} of {k} over {{\bf Q}} such that {k} is a finite extension of {{\bf Q}(x_1,\ldots,x_m)}, and in particular by the primitive element theorem {k} is generated by {{\bf Q}(x_1,\ldots,x_m)} and an element {\alpha} which is algebraic over {{\bf Q}(x_1,\ldots,x_m)}. (Here we use the fact that characteristic zero fields are separable.) If we then define {\phi} by first mapping {x_1,\ldots,x_m} to generic (and thus algebraically independent) complex numbers {z_1,\ldots,z_m}, and then setting {\phi(\alpha)} to be a complex root of of the minimal polynomial for {\alpha} over {{\bf Q}(x_1,\ldots,x_m)} after replacing each {x_i} with the complex number {z_i}, we obtain a field isomorphism {\phi: k \rightarrow \phi(k)} with the required properties.

Now we give the latter proof. Let {x_1,\ldots,x_m} be elements of {k} that generate that field over {{\bf Q}}, but which are not necessarily algebraically independent. Our task is then equivalent to that of finding complex numbers {z_1,\ldots,z_m} with the property that, for any polynomial {P(x_1,\ldots,x_m)} with rational coefficients, one has

\displaystyle  P(x_1,\ldots,x_m) = 0

if and only if

\displaystyle  P(z_1,\ldots,z_m) = 0.

Let {{\mathcal P}} be the collection of all polynomials {P} with rational coefficients with {P(x_1,\ldots,x_m)=0}, and {{\mathcal Q}} be the collection of all polynomials {P} with rational coefficients with {P(x_1,\ldots,x_m) \neq 0}. The set

\displaystyle  S := \{ (z_1,\ldots,z_m) \in {\bf C}^m: P(z_1,\ldots,z_m)=0 \hbox{ for all } P \in {\mathcal P} \}

is the intersection of countably many algebraic sets and is thus also an algebraic set (by the Hilbert basis theorem or the Noetherian property of algebraic sets). If the desired claim failed, then {S} could be covered by the algebraic sets {\{ (z_1,\ldots,z_m) \in {\bf C}^m: Q(z_1,\ldots,z_m) = 0 \}} for {Q \in {\mathcal Q}}. By decomposing into irreducible varieties and observing (e.g. from the Baire category theorem) that a variety of a given dimension over {{\bf C}} cannot be covered by countably many varieties of smaller dimension, we conclude that {S} must in fact be covered by a finite number of such sets, thus

\displaystyle  S \subset \bigcup_{i=1}^n \{ (z_1,\ldots,z_m) \in {\bf C}^m: Q_i(z_1,\ldots,z_m) = 0 \}

for some {Q_1,\ldots,Q_n \in {\bf C}^m}. By the nullstellensatz, we thus have an identity of the form

\displaystyle  (Q_1 \ldots Q_n)^l = P_1 R_1 + \ldots + P_r R_r

for some natural numbers {l,r \geq 1}, polynomials {P_1,\ldots,P_r \in {\mathcal P}}, and polynomials {R_1,\ldots,R_r} with coefficients in {\overline{{\bf Q}}}. In particular, this identity also holds in the algebraic closure {\overline{k}} of {k}. Evaluating this identity at {(x_1,\ldots,x_m)} we see that the right-hand side is zero but the left-hand side is non-zero, a contradiction, and the claim follows. \Box

From Proposition 4 one can now deduce Theorem 3 by a routine ultraproduct argument (the same one used in these previous blog posts). Suppose for contradiction that Theorem 3 fails. Then there exists natural numbers {s,n \geq 1}, a sequence of finite fields {{\bf F}_i} of characteristic at least {i}, and subsets {A_i=\{a_{i,1},\ldots,a_{i,n}\}} of {{\bf F}_i} of cardinality {n} such that for each {i}, there does not exist a Freiman field isomorphism of order {s} from {A_i} to the complex numbers. Now we select a non-principal ultrafilter {\alpha \in \beta {\bf N} \backslash {\bf N}}, and construct the ultraproduct {{\bf F} := \prod_{i \rightarrow \alpha} {\bf F}_i} of the finite fields {{\bf F}_i}. This is again a field (and is a basic example of what is known as a pseudo-finite field); because the characteristic of {{\bf F}_i} goes to infinity as {i \rightarrow \infty}, it is easy to see (using Los’s theorem) that {{\bf F}} has characteristic zero and can thus be viewed as an extension of the rationals {{\bf Q}}.

Now let {a_j := \lim_{i \rightarrow \alpha} a_{i,j}} be the ultralimit of the {a_{i,j}}, so that {A := \{a_1,\ldots,a_n\}} is the ultraproduct of the {A_i}, then {A} is a subset of {{\bf F}} of cardinality {n}. In particular, if {k} is the field generated by {{\bf Q}} and {A}, then {k} is a finitely generated extension of the rationals and thus, by Proposition 4 there is an isomorphism {\phi: k \rightarrow \phi(k)} from {k} to a subfield {\phi(k)} of the complex numbers. In particular, {\phi(a_1),\ldots,\phi(a_n)} are complex numbers, and for any polynomial {P(x_1,\ldots,x_n)} with integer coefficients, one has

\displaystyle  P(a_1,\ldots,a_n) = 0

if and only if

\displaystyle  P(\phi(a_1),\ldots,\phi(a_n)) = 0.

By Los’s theorem, we then conclude that for all {i} sufficiently close to {\alpha}, one has for all polynomials {P(x_1,\ldots,x_n)} of degree at most {s} and whose coefficients are integers whose magnitude sums up to {s}, one has

\displaystyle  P(a_{i,1},\ldots,a_{i,n}) = 0

if and only if

\displaystyle  P(\phi(a_1),\ldots,\phi(a_n)) = 0.

But this gives a Freiman field isomorphism of order {s} between {A_i} and {\phi(A)}, contradicting the construction of {A_i}, and Theorem 3 follows.

Two weeks ago I was at Oberwolfach, for the Arbeitsgemeinschaft in Ergodic Theory and Combinatorial Number Theory that I was one of the organisers for. At this workshop, I learned the details of a very nice recent convergence result of Miguel Walsh (who, incidentally, is an informal grandstudent of mine, as his advisor, Roman Sasyk, was my informal student), which considerably strengthens and generalises a number of previous convergence results in ergodic theory (including one of my own), with a remarkably simple proof. Walsh’s argument is phrased in a finitary language (somewhat similar, in fact, to the approach used in my paper mentioned previously), and (among other things) relies on the concept of metastability of sequences, a variant of the notion of convergence which is useful in situations in which one does not expect a uniform convergence rate; see this previous blog post for some discussion of metastability. When interpreted in a finitary setting, this concept requires a fair amount of “epsilon management” to manipulate; also, Walsh’s argument uses some other epsilon-intensive finitary arguments, such as a decomposition lemma of Gowers based on the Hahn-Banach theorem. As such, I was tempted to try to rewrite Walsh’s argument in the language of nonstandard analysis to see the extent to which these sorts of issues could be managed. As it turns out, the argument gets cleaned up rather nicely, with the notion of metastability being replaced with the simpler notion of external Cauchy convergence (which we will define below the fold).

Let’s first state Walsh’s theorem. This theorem is a norm convergence theorem in ergodic theory, and can be viewed as a substantial generalisation of one of the most fundamental theorems of this type, namely the mean ergodic theorem:

Theorem 1 (Mean ergodic theorem) Let {(X,\mu,T)} be a measure-preserving system (a probability space {(X,\mu)} with an invertible measure-preserving transformation {T}). Then for any {f \in L^2(X,\mu)}, the averages {\frac{1}{N} \sum_{n=1}^N T^n f} converge in {L^2(X,\mu)} norm as {N \rightarrow \infty}, where {T^n f(x) := f(T^{-n} x)}.

In this post, all functions in {L^2(X,\mu)} and similar spaces will be taken to be real instead of complex-valued for simplicity, though the extension to the complex setting is routine.

Actually, we have a precise description of the limit of these averages, namely the orthogonal projection of {f} to the {T}-invariant factors. (See for instance my lecture notes on this theorem.) While this theorem ostensibly involves measure theory, it can be abstracted to the more general setting of unitary operators on a Hilbert space:

Theorem 2 (von Neumann mean ergodic theorem) Let {H} be a Hilbert space, and let {U: H \rightarrow H} be a unitary operator on {H}. Then for any {f \in H}, the averages {\frac{1}{N} \sum_{n=1}^N U^n f} converge strongly in {H} as {N \rightarrow \infty}.

Again, see my lecture notes (or just about any text in ergodic theory) for a proof.

Now we turn to Walsh’s theorem.

Theorem 3 (Walsh’s convergence theorem) Let {(X,\mu)} be a measure space with a measure-preserving action of a nilpotent group {G}. Let {g_1,\ldots,g_k: {\bf Z} \rightarrow G} be polynomial sequences in {G} (i.e. each {g_i} takes the form {g_i(n) = a_{i,1}^{p_{i,1}(n)} \ldots a_{i,j}^{p_{i,j}(n)}} for some {a_{i,1},\ldots,a_{i,j} \in G} and polynomials {p_{i,1},\ldots,p_{i,j}: {\bf Z} \rightarrow {\bf Z}}). Then for any {f_1,\ldots,f_k \in L^\infty(X,\mu)}, the averages {\frac{1}{N} \sum_{n=1}^N (g_1(n) f_1) \ldots (g_k(n) f_k)} converge in {L^2(X,\mu)} norm as {N \rightarrow \infty}, where {g(n) f(x) := f(g(n)^{-1} x)}.

It turns out that this theorem can also be abstracted to some extent, although due to the multiplication in the summand {(g_1(n) f_1) \ldots (g_k(n) f_k)}, one cannot work purely with Hilbert spaces as in the von Neumann mean ergodic theorem, but must also work with something like the Banach algebra {L^\infty(X,\mu)}. There are a number of ways to formulate this abstraction (which will be of some minor convenience to us, as it will allow us to reduce the need to invoke the nonstandard measure theory of Loeb, discussed for instance in this blog post); we will use the notion of a (real) commutative probability space {({\mathcal A},\tau)}, which for us will be a commutative unital algebra {{\mathcal A}} over the reals together with a linear functional {\tau: {\mathcal A} \rightarrow {\bf R}} which maps {1} to {1} and obeys the non-negativity axiom {\tau(f^2) \ge 0} for all {f}. The key example to keep in mind here is {{\mathcal A} = L^\infty(X,\mu)} of essentially bounded real-valued measurable functions with the supremum norm, and with the trace {\tau(f) := \int_X f\ d\mu}. We will also assume in our definition of commutative probability spaces that all elements {f} of {{\mathcal A}} are bounded in the sense that the spectral radius {\rho(f) := \lim_{k \rightarrow \infty} \tau(f^{2k})^{1/2k}} is finite. (In the concrete case of {L^\infty(X,\mu)}, the spectral radius is just the {L^\infty} norm.)

Given a commutative probability space, we can form an inner product {\langle, \rangle_{L^2(\tau)}} on it by the formula

\displaystyle  \langle f, g \rangle_{L^2(\tau)} := \tau(fg).

This is a positive semi-definite form, and gives a (possibly degenerate) inner product structure on {{\mathcal A}}. We could complete this structure into a Hilbert space {L^2(\tau)} (after quotienting out the elements of zero norm), but we will not do so here, instead just viewing {L^2(\tau)} as providing a semi-metric on {{\mathcal A}}. For future reference we record the inequalities

\displaystyle  \rho(fg) \leq \rho(f) \rho(g)

\displaystyle  \rho(f+g) \leq \rho(f) + \rho(g)

\displaystyle  \| fg\|_{L^2(\tau)} \leq \|f\|_{L^2(\tau)} \rho(g)

for any {f,g}, which we will use in the sequel without further comment; see e.g. these previous blog notes for proofs. (Actually, for the purposes of proving Theorem 3, one can specialise to the {L^\infty(X,\mu)} case (and ultraproducts thereof), in which case these inequalities are just the triangle and Hölder inequalities.)

The abstract version of Theorem 3 is then

Theorem 4 (Walsh’s theorem, abstract version) Let {({\mathcal A},\tau)} be a commutative probability space, and let {G} be a nilpotent group acting on {{\mathcal A}} by isomorphisms (preserving the algebra, conjugation, and trace structure, and thus also preserving the spectral radius and {L^2(\tau)} norm). Let {g_1,\ldots,g_k: {\bf Z} \rightarrow G} be polynomial sequences. Then for any {f_1,\ldots,f_k \in {\mathcal A}}, the averages {\frac{1}{N} \sum_{n=1}^N (g_1(n) f_1) \ldots (g_k(n) f_k)} form a Cauchy sequence in {L^2(\tau)} (semi-)norm as {N \rightarrow \infty}.

It is easy to see that this theorem generalises Theorem 3. Conversely, one can use the commutative Gelfand-Naimark theorem to deduce Theorem 4 from Theorem 3, although we will not need this implication. Note how we are abandoning all attempts to discern what the limit of the sequence actually is, instead contenting ourselves with demonstrating that it is merely a Cauchy sequence. With this phrasing, it is tempting to ask whether there is any analogue of Walsh’s theorem for noncommutative probability spaces, but unfortunately the answer to that question is negative for all but the simplest of averages, as was worked out in this paper of Austin, Eisner, and myself.

Our proof of Theorem 4 will proceed as follows. Firstly, in order to avoid the epsilon management alluded to earlier, we will take an ultraproduct to rephrase the theorem in the language of nonstandard analysis; for reasons that will be clearer later, we will also convert the convergence problem to a problem of obtaining metastability (external Cauchy convergence). Then, we observe that (the nonstandard counterpart of) the expression {\|\frac{1}{N} \sum_{n=1}^N (g_1(n) f_1) \ldots (g_k(n) f_k)\|_{L^2(\tau)}^2} can be viewed as the inner product of (say) {f_k} with a certain type of expression, which we call a dual function. By performing an orthogonal projection to the span of the dual functions, we can split {f_k} into the sum of an expression orthogonal to all dual functions (the “pseudorandom” component), and a function that can be well approximated by finite linear combinations of dual functions (the “structured” component). The contribution of the pseudorandom component is asymptotically negligible, so we can reduce to consideration of the structured component. But by a little bit of rearrangement, this can be viewed as an average of expressions similar to the initial average {\frac{1}{N} \sum_{n=1}^N (g_1(n) f_1) \ldots (g_k(n) f_k)}, except with the polynomials {g_1,\ldots,g_k} replaced by a “lower complexity” set of such polynomials, which can be greater in number, but which have slightly lower degrees in some sense. One can iterate this (using “PET induction”) until all the polynomials become trivial, at which point the claim follows.

Read the rest of this entry »

Much as group theory is the study of groups, or graph theory is the study of graphs, model theory is the study of models (also known as structures) of some language {{\mathcal L}} (which, in this post, will always be a single-sorted, first-order language). A structure is a set {X}, equipped with one or more operations, constants, and relations. This is of course an extremely general type of mathematical object, but (quite remarkably) one can still say a substantial number of interesting things about very broad classes of structures.

We will observe the common abuse of notation of using the set {X} as a metonym for the entire structure, much as we usually refer to a group {(G,1,\cdot,()^{-1})} simply as {G}, a vector space {(V, 0, +, \cdot)} simply as {V}, and so forth. Following another common bending of the rules, we also allow some operations on structures (such as the multiplicative inverse operation on a group or field) to only be partially defined, and we allow use of the usual simplifying conventions for mathematical formulas (e.g. writing {a+b+c} instead of {(a+b)+c} or {a+(b+c)}, in cases where associativity is known). We will also deviate slightly from the usual practice in logic by emphasising individual structures, rather than the theory of general classes of structures; for instance, we will talk about the theory of a single field such as {{\bf R}} or {{\bf C}}, rather than the theory of all fields of a certain type (e.g. real closed fields or algebraically closed fields).

Once one has a structure {X}, one can introduce the notion of a definable subset of {X}, or more generally of a Cartesian power {X^n} of {X}, defined as a set {E \subset X^n} of the form

\displaystyle  E = \{ (x_1,\ldots,x_n): P(x_1,\ldots,x_n) \hbox{ true} \} \ \ \ \ \ (1)

for some formula {P} in the language {{\mathcal L}} with {n} free variables and any number of constants from {X} (that is, {P(x_1,\ldots,x_n)} is a well-formed formula built up from a finite number of constants {c_1,\ldots,c_m} in {X}, the relations and operations on {X}, logical connectives such as {\neg}, {\wedge}, {\implies}, and the quantifiers {\forall, \exists}). Thus, for instance, in the theory of the arithmetic of the natural numbers {{\bf N} = ({\bf N}, 0, 1, +, \times)}, the set of primes {{\mathcal P}} is a definable set, since we have

\displaystyle  {\mathcal P} = \{ x \in {\bf N}: (\exists y: x=y+2) \wedge \neg (\exists z,w: x = (z+2)(w+2)) \}.

In the theory of the field of reals {{\bf R} = ({\bf R}, 0, 1, +, -, \times, ()^{-1})}, the unit circle {S^1} is an example of a definable set,

\displaystyle  S^1 = \{ (x,y) \in {\bf R}^2: x^2+y^2 = 1 \},

but so is the the complement of the circle,

\displaystyle  {\bf R}^2 \backslash S^1 = \{ (x,y) \in {\bf R}^2: \neg(x^2+y^2 = 1) \}

and the interval {[-1,1]}:

\displaystyle  [-1,1] = \{ x \in {\bf R}: \exists y: x^2+y^2 = 1\}.

Due to the unlimited use of constants, any finite subset of a power {X^n} of any structure {X} is, by our conventions, definable in that structure. (One can of course also consider definability without parameters (also known as {0}-definability), in which arbitrary constants are not permitted, but we will not do so here.)

We can isolate some special subclasses of definable sets:

  • An atomic definable set is a set of the form (1) in which {P()} is an atomic formula (i.e. it does not contain any logical connectives or quantifiers).
  • A quantifier-free definable set is a set of the form (1) in which {P()} is quantifier-free (i.e. it can contain logical connectives, but does not contain the quantifiers {\forall, \exists}).

Example 1 In the theory of a field such as {{\bf R}}, an atomic definable set is the same thing as an affine algebraic set (also known as an affine algebraic variety, with the understanding that varieties are not necessarily assumed to be irreducible), and a quantifier-free definable set is known as a constructible set; thus we see that algebraic geometry can be viewed in some sense as a special case of model theory. (Conversely, it can in fact be quite profitable to think of model theory as an abstraction of algebraic geometry; for instance, the concepts of Morley rank and Morley degree in model theory (discussed in this previous blog post) directly generalises the concepts of dimension and degree in algebraic geometry.) Over {{\bf R}}, the interval {[-1,1]} is a definable set, but not a quantifier-free definable set (and certainly not an atomic definable set); and similarly for the primes over {{\bf N}}.

A quantifier-free definable set in {X^n} is nothing more than a finite boolean combination of atomic definable sets; in other words, the class of quantifier-free definable sets over {X} is the smallest class that contains the atomic definable sets and is closed under boolean operations such as complementation and union (which generate all the other boolean operations). Similarly, the class of definable sets over {X} is the smallest class that contains the quantifier-free definable sets, and is also closed under the operation of projection {\pi_n: E \mapsto \pi_n(E)} from {X^{n+1}} to {X^n} for every natural number {n}, where {\pi_n: X^{n+1} \rightarrow X^n} is the map {\pi_n(x_1,\ldots,x_n,x_{n+1}) := (x_1,\ldots,x_n)}.

Some structures have the property of enjoying quantifier elimination, which means that every definable set is in fact a quantifier-free definable set, or equivalently that the projection of a quantifier-free definable set is again quantifier-free. For instance, an algebraically closed field {k} (with the field operations) has quantifier elimination (i.e. the projection of a constructible set is again constructible); this fact can be proven by the classical tool of resultants, and among other things can be used to give a proof of Hilbert’s nullstellensatz. (Note though that projection does not necessary preserve the property of being atomic; for instance, the projection of the atomic set {\{ (x,y) \in k^2: xy=1 \}} is the non-atomic, but still quantifier-free definable, set {\{ x \in k: \neg (k=0) \}}.) In the converse direction, it is not difficult to use the nullstellensatz to deduce quantifier elimination. For theory of the real field {{\bf R}}, which is not algebraically closed, one does not have quantifier elimination, as one can see from the example of the unit circle (which is a quantifier-free definable set) projecting down to the interval {[-1,1]} (which is definable, but not quantifer-free definable). However, if one adds the additional operation of order {<} to the reals, giving it the language of an ordered field rather than just a field, then quantifier elimination is recovered (the class of quantifier-free definable sets now enlarges to match the class of definable sets, which in this case is also the class of semi-algebraic sets); this is the famous Tarski-Seidenberg theorem.

On the other hand, many important structures do not have quantifier elimination; typically, the projection of a quantifier-free definable set is not, in general, quantifier-free definable. This failure of the projection property also shows up in many contexts outside of model theory; for instance, Lebesgue famously made the error of thinking that the projection of a Borel measurable set remained Borel measurable (it is merely an analytic set instead). Turing’s halting theorem can be viewed as an assertion that the projection of a decidable set (also known as a computable or recursive set) is not necessarily decidable (it is merely semi-decidable (or recursively enumerable) instead). The notorious P=NP problem can also be essentially viewed in this spirit; roughly speaking (and glossing over the placement of some quantifiers), it asks whether the projection of a polynomial-time decidable set is again polynomial-time decidable. And so forth. (See this blog post of Dick Lipton for further discussion of the subtleties of projections.)

Now we consider the status of quantifier elimination for the theory of a finite field {F}. If interpreted naively, quantifier elimination is trivial for a finite field {F}, since every subset of {F^n} is finite and thus quantifier-free definable. However, we can recover an interesting question in one of two (essentially equivalent) ways. One is to work in the asymptotic regime in which the field {F} is large, but the length of the formulae used to construct one’s definable sets stays bounded uniformly in the size of {F} (where we view any constant in {F} as contributing a unit amount to the length of a formula, no matter how large {F} is). A simple counting argument then shows that only a small number of subsets of {F^n} become definable in the asymptotic limit {|F| \rightarrow \infty}, since the number of definable sets clearly grows at most polynomially in {|F|} for any fixed bound on the formula length, while the number of all subsets of {|F|^n} grows exponentially in {n}.

Another way to proceed is to work not with a single finite field {F}, or even with a sequence {F_m} of finite fields, but with the ultraproduct {F = \prod_{m \rightarrow p} F_m} of a sequence of finite fields, and to study the properties of definable sets over this ultraproduct. (We will be using the notation of ultraproducts and nonstandard analysis from this previous blog post.) This approach is equivalent to the more finitary approach mentioned in the previous paragraph, at least if one does not care to track of the exact bounds on the length of the formulae involved. Indeed, thanks to Los’s theorem, a definable subset {E} of {F^n} is nothing more than the ultraproduct {E = \prod_{m \rightarrow p} E_m} of definable subsets {E_m} of {F_m^n} for all {m} sufficiently close to {p}, with the length of the formulae used to define {E_m} uniformly bounded in {m}. In the language of nonstandard analysis, one can view {F} as a nonstandard finite field.

The ultraproduct {F} of finite fields is an important example of a pseudo-finite field – a field that obeys all the sentences in the languages of fields that finite fields do, but is not necessarily itself a finite field. The model theory of pseudo-finite fields was first studied systematically by Ax (in the same paper where the Ax-Grothendieck theorem, discussed previously on this blog, was established), with important further contributions by Kiefe, by Fried-Sacerdote, by two papers of Chatzidakis-van den Dries-Macintyre, and many other authors.

As mentioned before, quantifier elimination trivially holds for finite fields. But for infinite pseudo-finite fields, such as the ultraproduct {F = \prod_{m \rightarrow p} F_m} of finite fields with {|F_m|} going to infinity, quantifier elimination fails. For instance, in a finite field {F_m}, the set {E_m := \{ x \in F_m: \exists y \in F_m: x=y^2 \}} of quadratic residues is a definable set, with a bounded formula length, and so in the ultraproduct {F =\prod_{m \rightarrow p} F_m}, the set {E := \prod_{m\rightarrow p} E_m} of nonstandard quadratic residues is also a definable set. However, in one dimension, we see from the factor theorem that the only atomic definable sets are either finite or the whole field {F}, and so the only constructible sets (i.e. the only quantifier-free definable sets) are either finite or cofinite in {F}. Since the quadratic residues have asymptotic density {1/2} in a large finite field, they cannot form a quantifier-free definable set, despite being definable.

Nevertheless, there is a very nice almost quantifier elimination result for these fields, in characteristic zero at least, which we phrase here as follows:

Theorem 1 (Almost quantifier elimination) Let {F} be a nonstandard finite field of characteristic zero, and let {E \subset F^n} be a definable set over {F}. Then {E} is the union of finitely many sets of the form

\displaystyle  E = \{ x \in V(F): \exists t \in F: P(x,t) = 0 \} \ \ \ \ \ (2)

where {V(F)} is an atomic definable subset of {F^n} (i.e. the {F}-points of an algebraic variety {V} defined over {F} in {F^n}) and {P: F^{n+1} \rightarrow F} is a polynomial.

Results of this type were first obtained essentially due to Catarina Kiefe, although the formulation here is closer to that of Cherlin-van den Dries-Macintyre.

Informally, this theorem says that while we cannot quite eliminate all quantifiers from a definable set over a nonstandard finite field, we can eliminate all but one existential quantifier. Note that negation has also been eliminated in this theorem; for instance, the definable set {F \backslash \{0\} = \{ x \in F: \neg(x=0) \}} uses a negation, but can also be described using a single existential quantifier as {\{ x \in F: \exists t: xt = 1 \}}.) I believe that there are more complicated analogues of this result in positive characteristic, but I have not studied this case in detail (Kiefe’s result does not assume characteristic zero, but her conclusion is slightly different from the one given here). In the one-dimensional case {n=1}, the only varieties {V} are the affine line and finite sets, and we can simplify the above statement, namely that any definable subset of {F} takes the form {\{ x\in F: \exists t \in F: P(x,t) = 0 \}} for some polynomial {P} (i.e. definable sets in {F} are nothing more than the projections of the {F}-points of a plane curve).

There is an equivalent formulation of this theorem for standard finite fields, namely that if {F} is a finite field and {E \subset F^n} is definable using a formula of length at most {M}, then {E} can be expressed in the form (2) with the degree of {P} bounded by some quantity {C_{M,n}} depending on {M} and {n}, assuming that the characteristic of {F} is sufficiently large depending on {M, n}.

The theorem gives quite a satisfactory description of definable sets in either standard or nonstandard finite fields (at least if one does not care about effective bounds in some of the constants, and if one is willing to exclude the small characteristic case); for instance, in conjunction with the Lang-Weil bound discussed in this recent blog post, it shows that any non-empty definable subset of a nonstandard finite field has a nonstandard cardinality of {(\alpha + O(|F|^{-1/2})) |F|^d} for some positive standard rational {\alpha} and integer {d}. Equivalently, any non-empty definable subset of {F^n} for some standard finite field {F} using a formula of length at most {M} has a standard cardinality of {(\alpha + O_{M,n}(|F|^{-1/2})) |F|^d} for some positive rational of height {O_{M,n}(1)} and some natural number {d} between {0} and {n}. (For instance, in the example of the quadratic residues given above, {d} is equal to {1} and {\alpha} equal to {1/2}.) There is a more precise statement to this effect, namely that the Poincaré series of a definable set is rational; see Kiefe’s paper for details.

Below the fold I give a proof of Theorem 1, which relies primarily on the Lang-Weil bound mentioned above.

Read the rest of this entry »

I had another long plane flight recently, so I decided to try making another game, to explore exactly what types of mathematical reasoning might be amenable to gamification.  I decided to start with one of the simplest types of logical argument (and one of the few that avoids the disjunction problem mentioned in the previous post), namely the Aristotelian logic of syllogistic reasoning, most famously exemplified by the classic syllogism:

  • Major premise: All men are mortal.
  • Minor premise: Socrates is a man.
  • Conclusion: Socrates is a mortal.

There is a classic collection of logic puzzles of Lewis Carroll (from his book on symbolic logic), in which he presents a set of premises and asks to derive a conclusion using all of the premises.  Here are four examples of such sets:

  • Babies are illogical;
  • Nobody is despised who can manage a crocodile;
  • Illogical persons are despised.
  • My saucepans are the only things that I have that are made of tin;
  • I find all your presents very useful;
  • None of my saucepans are of the slightest use.
  • No  potatoes of mine, that are new, have been boiled;
  • All of my potatoes in this dish are fit to eat;
  • No unboiled potatoes of mine are fit to eat.
  • No ducks waltz;
  • No officers ever decline to waltz;
  • All my poultry are ducks.

After a certain amount of effort, I was able to gamify the solution process to these sort of puzzles in a Scratch game, although I am not fully satisfied with the results (in part due to the inherent technical limitations of the Scratch software, but also because I have not yet found a smooth user interface for this process).   In order to not have to build a natural language parser, I modified Lewis Carroll’s sentences somewhat in order to be machine-readable.  Here is a typical screenshot:

Unfortunately, the gameplay is somewhat clunkier than in the algebra game, basically because one needs three or four clicks and a keyboard press in order to make a move, whereas in the algebra game each click corresponded to one move. This is in part due to Scratch not having an easy way to have drag-and-drop or right-click commands, but even with a fully featured GUI, I am unsure how to make an interface that would make the process of performing a deduction easy; one may need a “smart” interface that is able to guess some possible intended moves from a minimal amount of input from the user, and then suggest these choices (somewhat similarly to the “auto-complete” feature in a search box).   This would require more effort than I could expend on a plane trip, though (as well as the use of a more powerful language than Scratch).

There are of course several existing proof assistants one could try to use as a model (Coq, Isabelle, etc.), but my impression is that the syntax for such assistants would only be easily mastered by someone who already is quite experienced with computer languages as well as proof writing, which would defeat the purpose of the games I have in mind.  But perhaps it is possible to create a proof assistant for a very restricted logic (such as one without disjunction) that can be easily used by non-experts…

Nonstandard analysis is a mathematical framework in which one extends the standard mathematical universe {{\mathfrak U}} of standard numbers, standard sets, standard functions, etc. into a larger nonstandard universe {{}^* {\mathfrak U}} of nonstandard numbers, nonstandard sets, nonstandard functions, etc., somewhat analogously to how one places the real numbers inside the complex numbers, or the rationals inside the reals. This nonstandard universe enjoys many of the same properties as the standard one; in particular, we have the transfer principle that asserts that any statement in the language of first order logic is true in the standard universe if and only if it is true in the nonstandard one. (For instance, because Fermat’s last theorem is known to be true for standard natural numbers, it is automatically true for nonstandard natural numbers as well.) However, the nonstandard universe also enjoys some additional useful properties that the standard one does not, most notably the countable saturation property, which is a property somewhat analogous to the completeness property of a metric space; much as metric completeness allows one to assert that the intersection of a countable family of nested closed balls is non-empty, countable saturation allows one to assert that the intersection of a countable family of nested satisfiable formulae is simultaneously satisfiable. (See this previous blog post for more on the analogy between the use of nonstandard analysis and the use of metric completions.) Furthermore, by viewing both the standard and nonstandard universes externally (placing them both inside a larger metatheory, such as a model of Zermelo-Frankel-Choice (ZFC) set theory; in some more advanced set-theoretic applications one may also wish to add some large cardinal axioms), one can place some useful additional definitions and constructions on these universes, such as defining the concept of an infinitesimal nonstandard number (a number which is smaller in magnitude than any positive standard number). The ability to rigorously manipulate infinitesimals is of course one of the most well-known advantages of working with nonstandard analysis.

To build a nonstandard universe {{}^* {\mathfrak U}} from a standard one {{\mathfrak U}}, the most common approach is to take an ultrapower of {{\mathfrak U}} with respect to some non-principal ultrafilter over the natural numbers; see e.g. this blog post for details. Once one is comfortable with ultrafilters and ultrapowers, this becomes quite a simple and elegant construction, and greatly demystifies the nature of nonstandard analysis.

On the other hand, nonprincipal ultrafilters do have some unappealing features. The most notable one is that their very existence requires the axiom of choice (or more precisely, a weaker form of this axiom known as the boolean prime ideal theorem). Closely related to this is the fact that one cannot actually write down any explicit example of a nonprincipal ultrafilter, but must instead rely on nonconstructive tools such as Zorn’s lemma, the Hahn-Banach theorem, Tychonoff’s theorem, the Stone-Cech compactification, or the boolean prime ideal theorem to locate one. As such, ultrafilters definitely belong to the “infinitary” side of mathematics, and one may feel that it is inappropriate to use such tools for “finitary” mathematical applications, such as those which arise in hard analysis. From a more practical viewpoint, because of the presence of the infinitary ultrafilter, it can be quite difficult (though usually not impossible, with sufficient patience and effort) to take a finitary result proven via nonstandard analysis and coax an effective quantitative bound from it.

There is however a “cheap” version of nonstandard analysis which is less powerful than the full version, but is not as infinitary in that it is constructive (in the sense of not requiring any sort of choice-type axiom), and which can be translated into standard analysis somewhat more easily than a fully nonstandard argument; indeed, a cheap nonstandard argument can often be presented (by judicious use of asymptotic notation) in a way which is nearly indistinguishable from a standard one. It is obtained by replacing the nonprincipal ultrafilter in fully nonstandard analysis with the more classical Fréchet filter of cofinite subsets of the natural numbers, which is the filter that implicitly underlies the concept of the classical limit {\lim_{{\bf n} \rightarrow \infty} a_{\bf n}} of a sequence when the underlying asymptotic parameter {{\bf n}} goes off to infinity. As such, “cheap nonstandard analysis” aligns very well with traditional mathematics, in which one often allows one’s objects to be parameterised by some external parameter such as {{\bf n}}, which is then allowed to approach some limit such as {\infty}. The catch is that the Fréchet filter is merely a filter and not an ultrafilter, and as such some of the key features of fully nonstandard analysis are lost. Most notably, the law of the excluded middle does not transfer over perfectly from standard analysis to cheap nonstandard analysis; much as there exist bounded sequences of real numbers (such as {0,1,0,1,\ldots}) which do not converge to a (classical) limit, there exist statements in cheap nonstandard analysis which are neither true nor false (at least without passing to a subsequence, see below). The loss of such a fundamental law of mathematical reasoning may seem like a major disadvantage for cheap nonstandard analysis, and it does indeed make cheap nonstandard analysis somewhat weaker than fully nonstandard analysis. But in some situations (particularly when one is reasoning in a “constructivist” or “intuitionistic” fashion, and in particular if one is avoiding too much reliance on set theory) it turns out that one can survive the loss of this law; and furthermore, the law of the excluded middle is still available for standard analysis, and so one can often proceed by working from time to time in the standard universe to temporarily take advantage of this law, and then transferring the results obtained there back to the cheap nonstandard universe once one no longer needs to invoke the law of the excluded middle. Furthermore, the law of the excluded middle can be recovered by adopting the freedom to pass to subsequences with regards to the asymptotic parameter {{\bf n}}; this technique is already in widespread use in the analysis of partial differential equations, although it is generally referred to by names such as “the compactness method” rather than as “cheap nonstandard analysis”.

Below the fold, I would like to describe this cheap version of nonstandard analysis, which I think can serve as a pedagogical stepping stone towards fully nonstandard analysis, as it is formally similar to (though weaker than) fully nonstandard analysis, but on the other hand is closer in practice to standard analysis. As we shall see below, the relation between cheap nonstandard analysis and standard analysis is analogous in many ways to the relation between probabilistic reasoning and deterministic reasoning; it also resembles somewhat the preference in much of modern mathematics for viewing mathematical objects as belonging to families (or to categories) to be manipulated en masse, rather than treating each object individually. (For instance, nonstandard analysis can be used as a partial substitute for scheme theory in order to obtain uniformly quantitative results in algebraic geometry, as discussed for instance in this previous blog post.)

Read the rest of this entry »

In the previous set of notes, we introduced the notion of an ultra approximate group – an ultraproduct {A = \prod_{n \rightarrow\alpha} A_n} of finite {K}-approximate groups {A_n} for some {K} independent of {n}, where each {K}-approximate group {A_n} may lie in a distinct ambient group {G_n}. Although these objects arise initially from the “finitary” objects {A_n}, it turns out that ultra approximate groups {A} can be profitably analysed by means of infinitary groups {L} (and in particular, locally compact groups or Lie groups {L}), by means of certain models {\rho: \langle A \rangle \rightarrow L} of {A} (or of the group {\langle A \rangle} generated by {A}). We will define precisely what we mean by a model later, but as a first approximation one can view a model as a representation of the ultra approximate group {A} (or of {\langle A \rangle}) that is “macroscopically faithful” in that it accurately describes the “large scale” behaviour of {A} (or equivalently, that the kernel of the representation is “microscopic” in some sense). In the next section we will see how one can use “Gleason lemma” technology to convert this macroscopic control of an ultra approximate group into microscopic control, which will be the key to classifying approximate groups.

Models of ultra approximate groups can be viewed as the multiplicative combinatorics analogue of the more well known concept of an ultralimit of metric spaces, which we briefly review below the fold as motivation.

The crucial observation is that ultra approximate groups enjoy a local compactness property which allows them to be usefully modeled by locally compact groups (and hence, through the Gleason-Yamabe theorem from previous notes, by Lie groups also). As per the Heine-Borel theorem, the local compactness will come from a combination of a completeness property and a local total boundedness property. The completeness property turns out to be a direct consequence of the countable saturation property of ultraproducts, thus illustrating one of the key advantages of the ultraproduct setting. The local total boundedness property is more interesting. Roughly speaking, it asserts that “large bounded sets” (such as {A} or {A^{100}}) can be covered by finitely many translates of “small bounded sets” {S}, where “small” is a topological group sense, implying in particular that large powers {S^m} of {S} lie inside a set such as {A} or {A^4}. The easiest way to obtain such a property comes from the following lemma of Sanders:

Lemma 1 (Sanders lemma) Let {A} be a finite {K}-approximate group in a (global) group {G}, and let {m \geq 1}. Then there exists a symmetric subset {S} of {A^4} with {|S| \gg_{K,m} |A|} containing the identity such that {S^m \subset A^4}.

This lemma has an elementary combinatorial proof, and is the key to endowing an ultra approximate group with locally compact structure. There is also a closely related lemma of Croot and Sisask which can achieve similar results, and which will also be discussed below. (The locally compact structure can also be established more abstractly using the much more general methods of definability theory, as was first done by Hrushovski, but we will not discuss this approach here.)

By combining the locally compact structure of ultra approximate groups {A} with the Gleason-Yamabe theorem, one ends up being able to model a large “ultra approximate subgroup” {A'} of {A} by a Lie group {L}. Such Lie models serve a number of important purposes in the structure theory of approximate groups. Firstly, as all Lie groups have a dimension which is a natural number, they allow one to assign a natural number “dimension” to ultra approximate groups, which opens up the ability to perform “induction on dimension” arguments. Secondly, Lie groups have an escape property (which is in fact equivalent to no small subgroups property): if a group element {g} lies outside of a very small ball {B_\epsilon}, then some power {g^n} of it will escape a somewhat larger ball {B_1}. Or equivalently: if a long orbit {g, g^2, \ldots, g^n} lies inside the larger ball {B_1}, one can deduce that the original element {g} lies inside the small ball {B_\epsilon}. Because all Lie groups have this property, we will be able to show that all ultra approximate groups {A} “essentially” have a similar property, in that they are “controlled” by a nearby ultra approximate group which obeys a number of escape-type properties analogous to those enjoyed by small balls in a Lie group, and which we will call a strong ultra approximate group. This will be discussed in the next set of notes, where we will also see how these escape-type properties can be exploited to create a metric structure on strong approximate groups analogous to the Gleason metrics studied in previous notes, which can in turn be exploited (together with an induction on dimension argument) to fully classify such approximate groups (in the finite case, at least).

There are some cases where the analysis is particularly simple. For instance, in the bounded torsion case, one can show that the associated Lie model {L} is necessarily zero-dimensional, which allows for a easy classification of approximate groups of bounded torsion.

Some of the material here is drawn from my recent paper with Ben Green and Emmanuel Breuillard, which is in turn inspired by a previous paper of Hrushovski.

Read the rest of this entry »

Roughly speaking, mathematical analysis can be divided into two major styles, namely hard analysis and soft analysis. The precise distinction between the two types of analysis is imprecise (and in some cases one may use a blend the two styles), but some key differences can be listed as follows.

  • Hard analysis tends to be concerned with quantitative or effective properties such as estimates, upper and lower bounds, convergence rates, and growth rates or decay rates. In contrast, soft analysis tends to be concerned with qualitative or ineffective properties such as existence and uniqueness, finiteness, measurability, continuity, differentiability, connectedness, or compactness.
  • Hard analysis tends to be focused on finitary, finite-dimensional or discrete objects, such as finite sets, finitely generated groups, finite Boolean combination of boxes or balls, or “finite-complexity” functions, such as polynomials or functions on a finite set. In contrast, soft analysis tends to be focused on infinitary, infinite-dimensional, or continuous objects, such as arbitrary measurable sets or measurable functions, or abstract locally compact groups.
  • Hard analysis tends to involve explicit use of many parameters such as {\epsilon}, {\delta}, {N}, etc. In contrast, soft analysis tends to rely instead on properties such as continuity, differentiability, compactness, etc., which implicitly are defined using a similar set of parameters, but whose parameters often do not make an explicit appearance in arguments.
  • In hard analysis, it is often the case that a key lemma in the literature is not quite optimised for the application at hand, and one has to reprove a slight variant of that lemma (using a variant of the proof of the original lemma) in order for it to be suitable for applications. In contrast, in soft analysis, key results can often be used as “black boxes”, without need of further modification or inspection of the proof.
  • The properties in soft analysis tend to enjoy precise closure properties; for instance, the composition or linear combination of continuous functions is again continuous, and similarly for measurability, differentiability, etc. In contrast, the closure properties in hard analysis tend to be fuzzier, in that the parameters in the conclusion are often different from the parameters in the hypotheses. For instance, the composition of two Lipschitz functions with Lipschitz constant {K} is still Lipschitz, but now with Lipschitz constant {K^2} instead of {K}. These changes in parameters mean that hard analysis arguments often require more “bookkeeping” than their soft analysis counterparts, and are less able to utilise algebraic constructions (e.g. quotient space constructions) that rely heavily on precise closure properties.

In the lectures so far, focusing on the theory surrounding Hilbert’s fifth problem, the results and techniques have fallen well inside the category of soft analysis. However, we will now turn to the theory of approximate groups, which is a topic which is traditionally studied using the methods of hard analysis. (Later we will also study groups of polynomial growth, which lies on an intermediate position in the spectrum between hard and soft analysis, and which can be profitably analysed using both styles of analysis.)

Despite the superficial differences between hard and soft analysis, though, there are a number of important correspondences between results in hard analysis and results in soft analysis. For instance, if one has some sort of uniform quantitative bound on some expression relating to finitary objects, one can often use limiting arguments to then conclude a qualitative bound on analogous expressions on infinitary objects, by viewing the latter objects as some sort of “limit” of the former objects. Conversely, if one has a qualitative bound on infinitary objects, one can often use compactness and contradiction arguments to recover uniform quantitative bounds on finitary objects as a corollary.

Remark 1 Another type of correspondence between hard analysis and soft analysis, which is “syntactical” rather than “semantical” in nature, arises by taking the proofs of a soft analysis result, and translating such a qualitative proof somehow (e.g. by carefully manipulating quantifiers) into a quantitative proof of an analogous hard analysis result. This type of technique is sometimes referred to as proof mining in the proof theory literature, and is discussed in this previous blog post (and its comments). We will however not employ systematic proof mining techniques here, although in later posts we will informally borrow arguments from infinitary settings (such as the methods used to construct Gleason metrics) and adapt them to finitary ones.

Let us illustrate the correspondence between hard and soft analysis results with a simple example.

Proposition 1 Let {X} be a sequentially compact topological space, let {S} be a dense subset of {X}, and let {f: X \rightarrow [0,+\infty]} be a continuous function (giving the extended half-line {[0,+\infty]} the usual order topology). Then the following statements are equivalent:

  • (i) (Qualitative bound on infinitary objects) For all {x \in X}, one has {f(x) < +\infty}.
  • (ii) (Quantitative bound on finitary objects) There exists {M < +\infty} such that {f(x) \leq M} for all {x \in S}.

In applications, {S} is typically a (non-compact) set of “finitary” (or “finite complexity”) objects of a certain class, and {X} is some sort of “completion” or “compactification” of {S} which admits additional “infinitary” objects that may be viewed as limits of finitary objects.

Proof: To see that (ii) implies (i), observe from density that every point {x} in {X} is adherent to {S}, and so given any neighbourhood {U} of {x}, there exists {y \in S \cap U}. Since {f(y) \leq M}, we conclude from the continuity of {f} that {f(x) \leq M} also, and the claim follows.

Conversely, to show that (i) implies (ii), we use the “compactness and contradiction” argument. Suppose for sake of contradiction that (ii) failed. Then for any natural number {n}, there exists {x_n \in S} such that {f(x_n) \geq n}. (Here we have used the axiom of choice, which we will assume throughout this course.) Using sequential compactness, and passing to a subsequence if necessary, we may assume that the {x_n} converge to a limit {x \in X}. By continuity of {f}, this implies that {f(x) = +\infty}, contradicting (i). \Box

Remark 2 Note that the above deduction of (ii) from (i) is ineffective in that it gives no explicit bound on the uniform bound {M} in (ii). Without any further information on how the qualitative bound (i) is proven, this is the best one can do in general (and this is one of the most significant weaknesses of infinitary methods when used to solve finitary problems); but if one has access to the proof of (i), one can often finitise or proof mine that argument to extract an effective bound for {M}, although often the bound one obtains in the process is quite poor (particularly if the proof of (i) relied extensively on infinitary tools, such as limits). See this blog post for some related discussion.

The above simple example illustrates that in order to get from an “infinitary” statement such as (i) to a “finitary” statement such as (ii), a key step is to be able to take a sequence {(x_n)_{n \in {\bf N}}} (or in some cases, a more general net {(x_\alpha)_{\alpha \in A}}) of finitary objects and extract a suitable infinitary limit object {x}. In the literature, there are three main ways in which one can extract such a limit:

  • (Topological limit) If the {x_n} are all elements of some topological space {S} (e.g. an incomplete function space) which has a suitable “compactification” or “completion” {X} (e.g. a Banach space), then (after passing to a subsequence if necessary) one can often ensure the {x_n} converge in a topological sense (or in a metrical sense) to a limit {x}. The use of this type of limit to pass between quantitative/finitary and qualitative/infinitary results is particularly common in the more analytical areas of mathematics (such as ergodic theory, asymptotic combinatorics, or PDE), due to the abundance of useful compactness results in analysis such as the (sequential) Banach-Alaoglu theorem, Prokhorov’s theorem, the Helly selection theorem, the Arzelá-Ascoli theorem, or even the humble Bolzano-Weierstrass theorem. However, one often has to take care with the nature of convergence, as many compactness theorems only guarantee convergence in a weak sense rather than in a strong one.
  • (Categorical limit) If the {x_n} are all objects in some category (e.g. metric spaces, groups, fields, etc.) with a number of morphisms between the {x_n} (e.g. morphisms from {x_{n+1}} to {x_n}, or vice versa), then one can often form a direct limit {\lim_{\rightarrow} x_n} or inverse limit {\lim_{\leftarrow} x_n} of these objects to form a limiting object {x}. The use of these types of limits to connect quantitative and qualitative results is common in subjects such as algebraic geometry that are particularly amenable to categorical ways of thinking. (We have seen inverse limits appear in the discussion of Hilbert’s fifth problem, although in that context they were not really used to connect quantitative and qualitative results together.)
  • (Logical limit) If the {x_n} are all distinct spaces (or elements or subsets of distinct spaces), with few morphisms connecting them together, then topological and categorical limits are often unavailable or unhelpful. In such cases, however, one can still tie together such objects using an ultraproduct construction (or similar device) to create a limiting object {\lim_{n \rightarrow \alpha} x_n} or limiting space {\prod_{n \rightarrow \alpha} x_n} that is a logical limit of the {x_n}, in the sense that various properties of the {x_n} (particularly those that can be phrased using the language of first-order logic) are preserved in the limit. As such, logical limits are often very well suited for the task of connecting finitary and infinitary mathematics together. Ultralimit type constructions are of course used extensively in logic (particularly in model theory), but are also popular in metric geometry. They can also be used in many of the previously mentioned areas of mathematics, such as algebraic geometry (as discussed in this previous post).

The three types of limits are analogous in many ways, with a number of connections between them. For instance, in the study of groups of polynomial growth, both topological limits (using the metric notion of Gromov-Hausdorff convergence) and logical limits (using the ultralimit construction) are commonly used, and to some extent the two constructions are at least partially interchangeable in this setting. (See also these previous posts for the use of ultralimits as a substitute for topological limits.) In the theory of approximate groups, though, it was observed by Hrushovski that logical limits (and in particular, ultraproducts) are the most useful type of limit to connect finitary approximate groups to their infinitary counterparts. One reason for this is that one is often interested in obtaining results on approximate groups {A} that are uniform in the choice of ambient group {G}. As such, one often seeks to take a limit of approximate groups {A_n} that lie in completely unrelated ambient groups {G_n}, with no obvious morphisms or metrics tying the {G_n} to each other. As such, the topological and categorical limits are not easily usable, whereas the logical limits can still be employed without much difficulty.

Logical limits are closely tied with non-standard analysis. Indeed, by applying an ultraproduct construction to standard number systems such as the natural numbers {{\bf N}} or the reals {{\bf R}}, one can obtain nonstandard number systems such as the nonstandard natural numbers {{}^* {\bf N}} or the nonstandard real numbers (or hyperreals) {{}^* {\bf R}}. These nonstandard number systems behave very similarly to their standard counterparts, but also enjoy the advantage of containing the standard number systems as proper subsystems (e.g. {{\bf R}} is a subring of {{}^* {\bf R}}), which allows for some convenient algebraic manipulations (such as the quotient space construction to create spaces such as {{}^* {\bf R} / {\bf R}}) which are not easily accessible in the purely standard universe. Nonstandard spaces also enjoy a useful completeness property, known as countable saturation, which is analogous to metric completeness (as discussed in this previous blog post) and which will be particularly useful for us in tying together the theory of approximate groups with the theory of Hilbert’s fifth problem. See this previous post for more discussion on ultrafilters and nonstandard analysis.

In these notes, we lay out the basic theory of ultraproducts and ultralimits (in particular, proving Los’s theorem, which roughly speaking asserts that ultralimits are limits in a logical sense, as well as the countable saturation property alluded to earlier). We also lay out some of the basic foundations of nonstandard analysis, although we will not rely too heavily on nonstandard tools in this course. Finally, we apply this general theory to approximate groups, to connect finite approximate groups to an infinitary type of approximate group which we will call an ultra approximate group. We will then study these ultra approximate groups (and models of such groups) in more detail in the next set of notes.

Remark 3 Throughout these notes (and in the rest of the course), we will assume the axiom of choice, in order to easily use ultrafilter-based tools. If one really wanted to expend the effort, though, one could eliminate the axiom of choice from the proofs of the final “finitary” results that one is ultimately interested in proving, at the cost of making the proofs significantly lengthier. Indeed, there is a general result of Gödel that any result which can be stated in the language of Peano arithmetic (which, roughly speaking, means that the result is “finitary” in nature), and can be proven in set theory using the axiom of choice (or more precisely, in the ZFC axiom system), can also be proven in set theory without the axiom of choice (i.e. in the ZF system). As this course is not focused on foundations, we shall simply assume the axiom of choice henceforth to avoid further distraction by such issues.

Read the rest of this entry »

This fall (starting Monday, September 26), I will be teaching a graduate topics course which I have entitled “Hilbert’s fifth problem and related topics.” The course is going to focus on three related topics:

  • Hilbert’s fifth problem on the topological description of Lie groups, as well as the closely related (local) classification of locally compact groups (the Gleason-Yamabe theorem).
  • Approximate groups in nonabelian groups, and their classification via the Gleason-Yamabe theorem (this is very recent work of Emmanuel Breuillard, Ben Green, Tom Sanders, and myself, building upon earlier work of Hrushovski);
  • Gromov’s theorem on groups of polynomial growth, as proven via the classification of approximate groups (as well as some consequences to fundamental groups of Riemannian manifolds).

I have already blogged about these topics repeatedly in the past (particularly with regard to Hilbert’s fifth problem), and I intend to recycle some of that material in the lecture notes for this course.

The above three families of results exemplify two broad principles (part of what I like to call “the dichotomy between structure and randomness“):

  • (Rigidity) If a group-like object exhibits a weak amount of regularity, then it (or a large portion thereof) often automatically exhibits a strong amount of regularity as well;
  • (Structure) This strong regularity manifests itself either as Lie type structure (in continuous settings) or nilpotent type structure (in discrete settings). (In some cases, “nilpotent” should be replaced by sister properties such as “abelian“, “solvable“, or “polycyclic“.)

Let me illustrate what I mean by these two principles with two simple examples, one in the continuous setting and one in the discrete setting. We begin with a continuous example. Given an {n \times n} complex matrix {A \in M_n({\bf C})}, define the matrix exponential {\exp(A)} of {A} by the formula

\displaystyle  \exp(A) := \sum_{k=0}^\infty \frac{A^k}{k!} = 1 + A + \frac{1}{2!} A^2 + \frac{1}{3!} A^3 + \ldots

which can easily be verified to be an absolutely convergent series.

Exercise 1 Show that the map {A \mapsto \exp(A)} is a real analytic (and even complex analytic) map from {M_n({\bf C})} to {M_n({\bf C})}, and obeys the restricted homomorphism property

\displaystyle  \exp(sA) \exp(tA) = \exp((s+t)A) \ \ \ \ \ (1)

for all {A \in M_n({\bf C})} and {s,t \in {\bf C}}.

Proposition 1 (Rigidity and structure of matrix homomorphisms) Let {n} be a natural number. Let {GL_n({\bf C})} be the group of invertible {n \times n} complex matrices. Let {\Phi: {\bf R} \rightarrow GL_n({\bf C})} be a map obeying two properties:

  • (Group-like object) {\Phi} is a homomorphism, thus {\Phi(s) \Phi(t) = \Phi(s+t)} for all {s,t \in {\bf R}}.
  • (Weak regularity) The map {t \mapsto \Phi(t)} is continuous.

Then:

  • (Strong regularity) The map {t \mapsto \Phi(t)} is smooth (i.e. infinitely differentiable). In fact it is even real analytic.
  • (Lie-type structure) There exists a (unique) complex {n \times n} matrix {A} such that {\Phi(t) = \exp(tA)} for all {t \in {\bf R}}.

Proof: Let {\Phi} be as above. Let {\epsilon > 0} be a small number (depending only on {n}). By the homomorphism property, {\Phi(0) = 1} (where we use {1} here to denote the identity element of {GL_n({\bf C})}), and so by continuity we may find a small {t_0>0} such that {\Phi(t) = 1 + O(\epsilon)} for all {t \in [-t_0,t_0]} (we use some arbitrary norm here on the space of {n \times n} matrices, and allow implied constants in the {O()} notation to depend on {n}).

The map {A \mapsto \exp(A)} is real analytic and (by the inverse function theorem) is a diffeomorphism near {0}. Thus, by the inverse function theorem, we can (if {\epsilon} is small enough) find a matrix {B} of size {B = O(\epsilon)} such that {\Phi(t_0) = \exp(B)}. By the homomorphism property and (1), we thus have

\displaystyle  \Phi(t_0/2)^2 = \Phi(t_0) = \exp(B) = \exp(B/2)^2.

On the other hand, by another application of the inverse function theorem we see that the squaring map {A \mapsto A^2} is a diffeomorphism near {1} in {GL_n({\bf C})}, and thus (if {\epsilon} is small enough)

\displaystyle  \Phi(t_0/2) = \exp(B/2).

We may iterate this argument (for a fixed, but small, value of {\epsilon}) and conclude that

\displaystyle  \Phi(t_0/2^k) = \exp(B/2^k)

for all {k = 0,1,2,\ldots}. By the homomorphism property and (1) we thus have

\displaystyle  \Phi(qt_0) = \exp(qB)

whenever {q} is a dyadic rational, i.e. a rational of the form {a/2^k} for some integer {a} and natural number {k}. By continuity we thus have

\displaystyle  \Phi(st_0) = \exp(sB)

for all real {s}. Setting {A := B/t_0} we conclude that

\displaystyle  \Phi(t) = \exp(tA)

for all real {t}, which gives existence of the representation and also real analyticity and smoothness. Finally, uniqueness of the representation {\Phi(t) = \exp(tA)} follows from the identity

\displaystyle  A = \frac{d}{dt} \exp(tA)|_{t=0}.

\Box

Exercise 2 Generalise Proposition 1 by replacing the hypothesis that {\Phi} is continuous with the hypothesis that {\Phi} is Lebesgue measurable (Hint: use the Steinhaus theorem.). Show that the proposition fails (assuming the axiom of choice) if this hypothesis is omitted entirely.

Note how one needs both the group-like structure and the weak regularity in combination in order to ensure the strong regularity; neither is sufficient on its own. We will see variants of the above basic argument throughout the course. Here, the task of obtaining smooth (or real analytic structure) was relatively easy, because we could borrow the smooth (or real analytic) structure of the domain {{\bf R}} and range {M_n({\bf C})}; but, somewhat remarkably, we shall see that one can still build such smooth or analytic structures even when none of the original objects have any such structure to begin with.

Now we turn to a second illustration of the above principles, namely Jordan’s theorem, which uses a discreteness hypothesis to upgrade Lie type structure to nilpotent (and in this case, abelian) structure. We shall formulate Jordan’s theorem in a slightly stilted fashion in order to emphasise the adherence to the above-mentioned principles.

Theorem 2 (Jordan’s theorem) Let {G} be an object with the following properties:

  • (Group-like object) {G} is a group.
  • (Discreteness) {G} is finite.
  • (Lie-type structure) {G} is contained in {U_n({\bf C})} (the group of unitary {n \times n} matrices) for some {n}.

Then there is a subgroup {G'} of {G} such that

  • ({G'} is close to {G}) The index {|G/G'|} of {G'} in {G} is {O_n(1)} (i.e. bounded by {C_n} for some quantity {C_n} depending only on {n}).
  • (Nilpotent-type structure) {G'} is abelian.

A key observation in the proof of Jordan’s theorem is that if two unitary elements {g, h \in U_n({\bf C})} are close to the identity, then their commutator {[g,h] = g^{-1}h^{-1}gh} is even closer to the identity (in, say, the operator norm {\| \|_{op}}). Indeed, since multiplication on the left or right by unitary elements does not affect the operator norm, we have

\displaystyle  \| [g,h] - 1 \|_{op} = \| gh - hg \|_{op}

\displaystyle  = \| (g-1)(h-1) - (h-1)(g-1) \|_{op}

and so by the triangle inequality

\displaystyle  \| [g,h] - 1 \|_{op} \leq 2 \|g-1\|_{op} \|h-1\|_{op}. \ \ \ \ \ (2)

Now we can prove Jordan’s theorem.

Proof: We induct on {n}, the case {n=1} being trivial. Suppose first that {G} contains a central element {g} which is not a multiple of the identity. Then, by definition, {G} is contained in the centraliser {Z(g)} of {g}, which by the spectral theorem is isomorphic to a product {U_{n_1}({\bf C}) \times \ldots \times U_{n_k}({\bf C})} of smaller unitary groups. Projecting {G} to each of these factor groups and applying the induction hypothesis, we obtain the claim.

Thus we may assume that {G} contains no central elements other than multiples of the identity. Now pick a small {\epsilon > 0} (one could take {\epsilon=\frac{1}{10d}} in fact) and consider the subgroup {G'} of {G} generated by those elements of {G} that are within {\epsilon} of the identity (in the operator norm). By considering a maximal {\epsilon}-net of {G} we see that {G'} has index at most {O_{n,\epsilon}(1)} in {G}. By arguing as before, we may assume that {G'} has no central elements other than multiples of the identity.

If {G'} consists only of multiples of the identity, then we are done. If not, take an element {g} of {G'} that is not a multiple of the identity, and which is as close as possible to the identity (here is where we crucially use that {G} is finite). By (2), we see that if {\epsilon} is sufficiently small depending on {n}, and if {h} is one of the generators of {G'}, then {[g,h]} lies in {G'} and is closer to the identity than {g}, and is thus a multiple of the identity. On the other hand, {[g,h]} has determinant {1}. Given that it is so close to the identity, it must therefore be the identity (if {\epsilon} is small enough). In other words, {g} is central in {G'}, and is thus a multiple of the identity. But this contradicts the hypothesis that there are no central elements other than multiples of the identity, and we are done. \Box

Commutator estimates such as (2) will play a fundamental role in many of the arguments we will see in this course; as we saw above, such estimates combine very well with a discreteness hypothesis, but will also be very useful in the continuous setting.

Exercise 3 Generalise Jordan’s theorem to the case when {G} is a finite subgroup of {GL_n({\bf C})} rather than of {U_n({\bf C})}. (Hint: The elements of {G} are not necessarily unitary, and thus do not necessarily preserve the standard Hilbert inner product of {{\bf C}^n}. However, if one averages that inner product by the finite group {G}, one obtains a new inner product on {{\bf C}^n} that is preserved by {G}, which allows one to conjugate {G} to a subgroup of {U_n({\bf C})}. This averaging trick is (a small) part of Weyl’s unitary trick in representation theory.)

Exercise 4 (Inability to discretise nonabelian Lie groups) Show that if {n \geq 3}, then the orthogonal group {O_n({\bf R})} cannot contain arbitrarily dense finite subgroups, in the sense that there exists an {\epsilon = \epsilon_n > 0} depending only on {n} such that for every finite subgroup {G} of {O_n({\bf R})}, there exists a ball of radius {\epsilon} in {O_n({\bf R})} (with, say, the operator norm metric) that is disjoint from {G}. What happens in the {n=2} case?

Remark 1 More precise classifications of the finite subgroups of {U_n({\bf C})} are known, particularly in low dimensions. For instance, one can show that the only finite subgroups of {SO_3({\bf R})} (which {SU_2({\bf C})} is a double cover of) are isomorphic to either a cyclic group, a dihedral group, or the symmetry group of one of the Platonic solids.

Read the rest of this entry »

I have blogged several times in the past about nonstandard analysis, which among other things is useful in allowing one to import tools from infinitary (or qualitative) mathematics in order to establish results in finitary (or quantitative) mathematics. One drawback, though, to using nonstandard analysis methods is that the bounds one obtains by such methods are usually ineffective: in particular, the conclusions of a nonstandard analysis argument may involve an unspecified constant {C} that is known to be finite but for which no explicit bound is obviously available. (In many cases, a bound can eventually be worked out by performing proof mining on the argument, and in particular by carefully unpacking the proofs of all the various results from infinitary mathematics that were used in the argument, as opposed to simply using them as “black boxes”, but this is a time-consuming task and the bounds that one eventually obtains tend to be quite poor (e.g. tower exponential or Ackermann type bounds are not uncommon).)

Because of this fact, it would seem that quantitative bounds, such as polynomial type bounds {X \leq C Y^C} that show that one quantity {X} is controlled in a polynomial fashion by another quantity {Y}, are not easily obtainable through the ineffective methods of nonstandard analysis. Actually, this is not the case; as I will demonstrate by an example below, nonstandard analysis can certainly yield polynomial type bounds. The catch is that the exponent {C} in such bounds will be ineffective; but nevertheless such bounds are still good enough for many applications.

Let us now illustrate this by reproving a lemma from this paper of Mei-Chu Chang (Lemma 2.14, to be precise), which was recently pointed out to me by Van Vu. Chang’s paper is focused primarily on the sum-product problem, but she uses a quantitative lemma from algebraic geometry which is of independent interest. To motivate the lemma, let us first establish a qualitative version:

Lemma 1 (Qualitative solvability) Let {P_1,\ldots,P_r: {\bf C}^d \rightarrow {\bf C}} be a finite number of polynomials in several variables with rational coefficients. If there is a complex solution {z = (z_1,\ldots,z_d) \in {\bf C}^d} to the simultaneous system of equations

\displaystyle  P_1(z) = \ldots = P_r(z) = 0,

then there also exists a solution {z \in \overline{{\bf Q}}^d} whose coefficients are algebraic numbers (i.e. they lie in the algebraic closure {{\bf Q}} of the rationals).

Proof: Suppose there was no solution to {P_1(z)=\ldots=P_r(z)=0} over {\overline{{\bf Q}}}. Applying Hilbert’s nullstellensatz (which is available as {\overline{{\bf Q}}} is algebraically closed), we conclude the existence of some polynomials {Q_1,\ldots,Q_r} (with coefficients in {\overline{{\bf Q}}}) such that

\displaystyle  P_1 Q_1 + \ldots + P_r Q_r = 1

as polynomials. In particular, we have

\displaystyle  P_1(z) Q_1(z) + \ldots + P_r(z) Q_r(z) = 1

for all {z \in {\bf C}^d}. This shows that there is no solution to {P_1(z)=\ldots=P_r(z)=0} over {{\bf C}}, as required. \Box

Remark 1 Observe that in the above argument, one could replace {{\bf Q}} and {{\bf C}} by any other pair of fields, with the latter containing the algebraic closure of the former, and still obtain the same result.

The above lemma asserts that if a system of rational equations is solvable at all, then it is solvable with some algebraic solution. But it gives no bound on the complexity of that solution in terms of the complexity of the original equation. Chang’s lemma provides such a bound. If {H \geq 1} is an integer, let us say that an algebraic number has height at most {H} if its minimal polynomial (after clearing denominators) consists of integers of magnitude at most {H}.

Lemma 2 (Quantitative solvability) Let {P_1,\ldots,P_r: {\bf C}^d \rightarrow {\bf C}} be a finite number of polynomials of degree at most {D} with rational coefficients, each of height at most {H}. If there is a complex solution {z = (z_1,\ldots,z_d) \in {\bf C}^d} to the simultaneous system of equations

\displaystyle  P_1(z) = \ldots = P_r(z) = 0,

then there also exists a solution {z \in \overline{{\bf Q}}^d} whose coefficients are algebraic numbers of degree at most {C} and height at most {CH^C}, where {C = C_{D, d,r}} depends only on {D}, {d} and {r}.

Chang proves this lemma by essentially establishing a quantitative version of the nullstellensatz, via elementary elimination theory (somewhat similar, actually, to the approach I took to the nullstellensatz in my own blog post). She also notes that one could also establish the result through the machinery of Gröbner bases. In each of these arguments, it was not possible to use Lemma 1 (or the closely related nullstellensatz) as a black box; one actually had to unpack one of the proofs of that lemma or nullstellensatz to get the polynomial bound. However, using nonstandard analysis, it is possible to get such polynomial bounds (albeit with an ineffective value of the constant {C}) directly from Lemma 1 (or more precisely, the generalisation in Remark 1) without having to inspect the proof, and instead simply using it as a black box, thus providing a “soft” proof of Lemma 2 that is an alternative to the “hard” proofs mentioned above.

Here’s how the proof works. Informally, the idea is that Lemma 2 should follow from Lemma 1 after replacing the field of rationals {{\bf Q}} with “the field of rationals of polynomially bounded height”. Unfortunately, the latter object does not really make sense as a field in standard analysis; nevertheless, it is a perfectly sensible object in nonstandard analysis, and this allows the above informal argument to be made rigorous.

We turn to the details. As is common whenever one uses nonstandard analysis to prove finitary results, we use a “compactness and contradiction” argument (or more precisely, an “ultralimit and contradiction” argument). Suppose for contradiction that Lemma 2 failed. Carefully negating the quantifiers (and using the axiom of choice), we conclude that there exists {D, d, r} such that for each natural number {n}, there is a positive integer {H^{(n)}} and a family {P_1^{(n)}, \ldots, P_r^{(n)}: {\bf C}^d \rightarrow {\bf C}} of polynomials of degree at most {D} and rational coefficients of height at most {H^{(n)}}, such that there exist at least one complex solution {z^{(n)} \in {\bf C}^d} to

\displaystyle  P_1^{(n)}(z^{(n)}) = \ldots = P_r(z^{(n)}) = 0, \ \ \ \ \ (1)

but such that there does not exist any such solution whose coefficients are algebraic numbers of degree at most {n} and height at most {n (H^{(n)})^n}.

Now we take ultralimits (see e.g. this previous blog post of a quick review of ultralimit analysis, which we will assume knowledge of in the argument that follows). Let {p \in \beta {\bf N} \backslash {\bf N}} be a non-principal ultrafilter. For each {i=1,\ldots,r}, the ultralimit

\displaystyle  P_i := \lim_{n \rightarrow p} P_i^{(n)}

of the (standard) polynomials {P_i^{(n)}} is a nonstandard polynomial {P_i: {}^* {\bf C}^d \rightarrow {}^* {\bf C}} of degree at most {D}, whose coefficients now lie in the nonstandard rationals {{}^* {\bf Q}}. Actually, due to the height restriction, we can say more. Let {H := \lim_{n \rightarrow p} H^{(n)} \in {}^* {\bf N}} be the ultralimit of the {H^{(n)}}, this is a nonstandard natural number (which will almost certainly be unbounded, but we will not need to use this). Let us say that a nonstandard integer {a} is of polynomial size if we have {|a| \leq C H^C} for some standard natural number {C}, and say that a nonstandard rational number {a/b} is of polynomial height if {a}, {b} are of polynomial size. Let {{\bf Q}_{poly(H)}} be the collection of all nonstandard rationals of polynomial height. (In the language of nonstandard analysis, {{\bf Q}_{poly(H)}} is an external set rather than an internal one, because it is not itself an ultraproduct of standard sets; but this will not be relevant for the argument that follows.) It is easy to see that {{\bf Q}_{poly(H)}} is a field, basically because the sum or product of two integers of polynomial size, remains of polynomial size. By construction, it is clear that the coefficients of {P_i} are nonstandard rationals of polynomial height, and thus {P_1,\ldots,P_r} are defined over {{\bf Q}_{poly(H)}}.

Meanwhile, if we let {z := \lim_{n \rightarrow p} z^{(n)} \in {}^* {\bf C}^d} be the ultralimit of the solutions {z^{(n)}} in (1), we have

\displaystyle  P_1(z) = \ldots = P_r(z) = 0,

thus {P_1,\ldots,P_r} are solvable in {{}^* {\bf C}}. Applying Lemma 1 (or more precisely, the generalisation in Remark 1), we see that {P_1,\ldots,P_r} are also solvable in {\overline{{\bf Q}_{poly(H)}}}. (Note that as {{\bf C}} is algebraically closed, {{}^*{\bf C}} is also (by Los’s theorem), and so {{}^* {\bf C}} contains {\overline{{\bf Q}_{poly(H)}}}.) Thus, there exists {w \in \overline{{\bf Q}_{poly(H)}}^d} with

\displaystyle  P_1(w) = \ldots = P_r(w) = 0.

As {\overline{{\bf Q}_{poly(H)}}^d} lies in {{}^* {\bf C}^d}, we can write {w} as an ultralimit {w = \lim_{n \rightarrow p} w^{(n)}} of standard complex vectors {w^{(n)} \in {\bf C}^d}. By construction, the coefficients of {w} each obey a non-trivial polynomial equation of degree at most {C} and whose coefficients are nonstandard integers of magnitude at most {C H^C}, for some standard natural number {C}. Undoing the ultralimit, we conclude that for {n} sufficiently close to {p}, the coefficients of {w^{(n)}} obey a non-trivial polynomial equation of degree at most {C} whose coefficients are standard integers of magnitude at most {C (H^{(n)})^C}. In particular, these coefficients have height at most {C (H^{(n)})^C}. Also, we have

\displaystyle  P_1^{(n)}(w^{(n)}) = \ldots = P_r^{(n)}(w^{(n)}) = 0.

But for {n} larger than {C}, this contradicts the construction of the {P_i^{(n)}}, and the claim follows. (Note that as {p} is non-principal, any neighbourhood of {p} in {{\bf N}} will contain arbitrarily large natural numbers.)

Remark 2 The same argument actually gives a slightly stronger version of Lemma 2, namely that the integer coefficients used to define the algebraic solution {z} can be taken to be polynomials in the coefficients of {P_1,\ldots,P_r}, with degree and coefficients bounded by {C_{D,d,r}}.

I recently reposted my favourite logic puzzle, namely the blue-eyed islander puzzle. I am fond of this puzzle because in order to properly understand the correct solution (and to properly understand why the alternative solution is incorrect), one has to think very clearly (but unintuitively) about the nature of knowledge.

There is however an additional subtlety to the puzzle that was pointed out in comments, in that the correct solution to the puzzle has two components, a (necessary) upper bound and a (possible) lower bound (I’ll explain this further below the fold, in order to avoid blatantly spoiling the puzzle here). Only the upper bound is correctly explained in the puzzle (and even then, there are some slight inaccuracies, as will be discussed below). The lower bound, however, is substantially more difficult to establish, in part because the bound is merely possible and not necessary. Ultimately, this is because to demonstrate the upper bound, one merely has to show that a certain statement is logically deducible from an islander’s state of knowledge, which can be done by presenting an appropriate chain of logical deductions. But to demonstrate the lower bound, one needs to show that certain statements are not logically deducible from an islander’s state of knowledge, which is much harder, as one has to rule out all possible chains of deductive reasoning from arriving at this particular conclusion. In fact, to rigorously establish such impossiblity statements, one ends up having to leave the “syntactic” side of logic (deductive reasoning), and move instead to the dual “semantic” side of logic (creation of models). As we shall see, semantics requires substantially more mathematical setup than syntax, and the demonstration of the lower bound will therefore be much lengthier than that of the upper bound.

To complicate things further, the particular logic that is used in the blue-eyed islander puzzle is not the same as the logics that are commonly used in mathematics, namely propositional logic and first-order logic. Because the logical reasoning here depends so crucially on the concept of knowledge, one must work instead with an epistemic logic (or more precisely, an epistemic modal logic) which can properly work with, and model, the knowledge of various agents. To add even more complication, the role of time is also important (an islander may not know a certain fact on one day, but learn it on the next day), so one also needs to incorporate the language of temporal logic in order to fully model the situation. This makes both the syntax and semantics of the logic quite intricate; to see this, one only needs to contemplate the task of programming a computer with enough epistemic and temporal deductive reasoning powers that it would be able to solve the islander puzzle (or even a smaller version thereof, say with just three or four islanders) without being deliberately “fed” the solution. (The fact, therefore, that humans can grasp the correct solution without any formal logical training is therefore quite remarkable.)

As difficult as the syntax of temporal epistemic modal logic is, though, the semantics is more intricate still. For instance, it turns out that in order to completely model the epistemic state of a finite number of agents (such as 1000 islanders), one requires an infinite model, due to the existence of arbitrarily long nested chains of knowledge (e.g. “{A} knows that {B} knows that {C} knows that {D} has blue eyes”), which cannot be automatically reduced to shorter chains of knowledge. Furthermore, because each agent has only an incomplete knowledge of the world, one must take into account multiple hypothetical worlds, which differ from the real world but which are considered to be possible worlds by one or more agents, thus introducing modality into the logic. More subtly, one must also consider worlds which each agent knows to be impossible, but are not commonly known to be impossible, so that (for instance) one agent is willing to admit the possibility that another agent considers that world to be possible; it is the consideration of such worlds which is crucial to the resolution of the blue-eyed islander puzzle. And this is even before one adds the temporal aspect (e.g. “On Tuesday, {A} knows that on Monday, {B} knew that by Wednesday, {C} will know that {D} has blue eyes”).

Despite all this fearsome complexity, it is still possible to set up both the syntax and semantics of temporal epistemic modal logic in such a way that one can formulate the blue-eyed islander problem rigorously, and in such a way that one has both an upper and a lower bound in the solution. The purpose of this post is to construct such a setup and to explain the lower bound in particular. The same logic is also useful for analysing another well-known paradox, the unexpected hanging paradox, and I will do so at the end of the post. Note though that there is more than one way to set up epistemic logics, and they are not all equivalent to each other.

(On the other hand, for puzzles such as the islander puzzle in which there are only a finite number of atomic propositions and no free variables, one at least can avoid the need to admit predicate logic, in which one has to discuss quantifiers such as {\forall} and {\exists}. A fully formed predicate temporal epistemic modal logic would indeed be of terrifying complexity.)

Our approach here will be a little different from the approach commonly found in the epistemic logic literature, in which one jumps straight to “arbitrary-order epistemic logic” in which arbitrarily long nested chains of knowledge (“{A} knows that {B} knows that {C} knows that \ldots”) are allowed. Instead, we will adopt a hierarchical approach, recursively defining for {k=0,1,2,\ldots} a “{k^{th}}-order epistemic logic” in which knowledge chains of depth up to {k}, but no greater, are permitted. The arbitrarily order epistemic logic is then obtained as a limit (a direct limit on the syntactic side, and an inverse limit on the semantic side, which is dual to the syntactic side) of the finite order epistemic logics.

I should warn that this is going to be a rather formal and mathematical post. Readers who simply want to know the answer to the islander puzzle would probably be better off reading the discussion at the puzzle’s own blog post instead.

Read the rest of this entry »

RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.
Follow

Get every new post delivered to your Inbox.

Join 2,339 other followers