You are currently browsing the category archive for the ‘math.LO’ category.
In orthodox first-order logic, variables and expressions are only allowed to take one value at a time; a variable , for instance, is not allowed to equal
and
simultaneously. We will call such variables completely specified. If one really wants to deal with multiple values of objects simultaneously, one is encouraged to use the language of set theory and/or logical quantifiers to do so.
However, the ability to allow expressions to become only partially specified is undeniably convenient, and also rather intuitive. A classic example here is that of the quadratic formula:
Strictly speaking, the expression is not well-formed according to the grammar of first-order logic; one should instead use something like
or
or
in order to strictly adhere to this grammar. But none of these three reformulations are as compact or as conceptually clear as the original one. In a similar spirit, a mathematical English sentence such as
is also not a first-order sentence; one would instead have to write something like
Another example of partially specified notation is the innocuous notation. For instance, the assertion
Below the fold I’ll try to assign a formal meaning to partially specified expressions such as (1), for instance allowing one to condense (2), (3), (4) to just
Rachel Greenfeld and I have just uploaded to the arXiv our preprint “Undecidable translational tilings with only two tiles, or one nonabelian tile“. This paper studies the following question: given a finitely generated group , a (periodic) subset
of
, and finite sets
in
, is it possible to tile
by translations
of the tiles
? That is to say, is there a solution
to the (translational) tiling equation
A bit more specifically, the paper studies the decidability of the above question. There are two slightly different types of decidability one could consider here:
- Logical decidability. For a given
, one can ask whether the solvability of the tiling equation (1) is provable or disprovable in ZFC (where we encode all the data
by appropriate constructions in ZFC). If this is the case we say that the tiling equation (1) (or more precisely, the solvability of this equation) is logically decidable, otherwise it is logically undecidable.
- Algorithmic decidability. For data
in some specified class (and encoded somehow as binary strings), one can ask whether the solvability of the tiling equation (1) can be correctly determined for all choices of data in this class by the output of some Turing machine that takes the data as input (encoded as a binary string) and halts in finite time, returning either YES if the equation can be solved or NO otherwise. If this is the case, we say the tiling problem of solving (1) for data in the given class is algorithmically decidable, otherwise it is algorithmically undecidable.
Note that the notion of logical decidability is “pointwise” in the sense that it pertains to a single choice of data , whereas the notion of algorithmic decidability pertains instead to classes of data, and is only interesting when this class is infinite. Indeed, any tiling problem with a finite class of data is trivially decidable because one could simply code a Turing machine that is basically a lookup table that returns the correct answer for each choice of data in the class. (This is akin to how a student with a good memory could pass any exam if the questions are drawn from a finite list, merely by memorising an answer key for that list of questions.)
The two notions are related as follows: if a tiling problem (1) is algorithmically undecidable for some class of data, then the tiling equation must be logically undecidable for at least one choice of data for this class. For if this is not the case, one could algorithmically decide the tiling problem by searching for proofs or disproofs that the equation (1) is solvable for a given choice of data; the logical decidability of all such solvability questions will ensure that this algorithm always terminates in finite time.
One can use the Gödel completeness theorem to interpret logical decidability in terms of universes (also known as structures or models) of ZFC. In addition to the “standard” universe of sets that we believe satisfies the axioms of ZFC, there are also other “nonstandard” universes
that also obey the axioms of ZFC. If the solvability of a tiling equation (1) is logically undecidable, this means that such a tiling exists in some universes of ZFC, but not in others.
(To continue the exam analogy, we thus see that a yes-no exam question is logically undecidable if the answer to the question is yes in some parallel universes, but not in others. A course syllabus is algorithmically undecidable if there is no way to prepare for the final exam for the course in a way that guarantees a perfect score (in the standard universe).)
Questions of decidability are also related to the notion of aperiodicity. For a given , a tiling equation (1) is said to be aperiodic if the equation (1) is solvable (in the standard universe
of ZFC), but none of the solutions (in that universe) are completely periodic (i.e., there are no solutions
where all of the
are periodic). Perhaps the most well-known example of an aperiodic tiling (in the context of
, and using rotations as well as translations) come from the Penrose tilings, but there are many others besides.
It was (essentially) observed by Hao Wang in the 1960s that if a tiling equation is logically undecidable, then it must necessarily be aperiodic. Indeed, if a tiling equation fails to be aperiodic, then (in the standard universe) either there is a periodic tiling, or there are no tilings whatsoever. In the former case, the periodic tiling can be used to give a finite proof that the tiling equation is solvable; in the latter case, the compactness theorem implies that there is some finite fragment of that is not compatible with being tiled by
, and this provides a finite proof that the tiling equation is unsolvable. Thus in either case the tiling equation is logically decidable.
This observation of Wang clarifies somewhat how logically undecidable tiling equations behave in the various universes of ZFC. In the standard universe, tilings exist, but none of them will be periodic. In nonstandard universes, tilings may or may not exist, and the tilings that do exist may be periodic (albeit with a nonstandard period); but there must be at least one universe in which no tiling exists at all.
In one dimension when (or more generally
with
a finite group), a simple pigeonholing argument shows that no tiling equations are aperiodic, and hence all tiling equations are decidable. However the situation changes in two dimensions. In 1966, Berger (a student of Wang) famously showed that there exist tiling equations (1) in the discrete plane
that are aperiodic, or even logically undecidable; in fact he showed that the tiling problem in this case (with arbitrary choices of data
) was algorithmically undecidable. (Strictly speaking, Berger established this for a variant of the tiling problem known as the domino problem, but later work of Golomb showed that the domino problem could be easily encoded within the tiling problem.) This was accomplished by encoding the halting problem for Turing machines into the tiling problem (or domino problem); the latter is well known to be algorithmically undecidable (and thus have logically undecidable instances), and so the latter does also. However, the number of tiles
required for Berger’s construction was quite large: his construction of an aperiodic tiling required
tiles, and his construction of a logically undecidable tiling required an even larger (and not explicitly specified) collection of tiles. Subsequent work by many authors did reduce the number of tiles required; in the
setting, the current world record for the fewest number of tiles in an aperiodic tiling is
(due to Amman, Grunbaum, and Shephard) and for a logically undecidable tiling is
(due to Ollinger). On the other hand, it is conjectured (see Grunbaum-Shephard and Lagarias-Wang) that one cannot lower
all the way to
:
Conjecture 1 (Periodic tiling conjecture) Ifis a periodic subset of a finitely generated abelian group
, and
is a finite subset of
, then the tiling equation
is not aperiodic.
This conjecture is known to be true in two dimensions (by work of Bhattacharya when , and more recently by us when
), but remains open in higher dimensions. By the preceding discussion, the conjecture implies that every tiling equation with a single tile is logically decidable, and the problem of whether a given periodic set can be tiled by a single tile is algorithmically decidable.
In this paper we show on the other hand that aperiodic and undecidable tilings exist when , at least if one is permitted to enlarge the group
a bit:
Theorem 2 (Logically undecidable tilings)
- (i) There exists a group
of the form
for some finite abelian
, a subset
of
, and finite sets
such that the tiling equation
is logically undecidable (and hence also aperiodic).
- (ii) There exists a dimension
, a periodic subset
of
, and finite sets
such that tiling equation
is logically undecidable (and hence also aperiodic).
- (iii) There exists a non-abelian finite group
(with the group law still written additively), a subset
of
, and a finite set
such that the nonabelian tiling equation
is logically undecidable (and hence also aperiodic).
We also have algorithmic versions of this theorem. For instance, the algorithmic version of (i) is that the problem of determining solvability of the tiling equation for a given choice of finite abelian group
, subset
of
, and finite sets
is algorithmically undecidable. Similarly for (ii), (iii).
This result (together with a negative result discussed below) suggest to us that there is a significant qualitative difference in the theory of tiling by a single (abelian) tile, and the
theory of tiling with multiple tiles (or one non-abelian tile). (The positive results on the periodic tiling conjecture certainly rely heavily on the fact that there is only one tile, in particular there is a “dilation lemma” that is only available in this setting that is of key importance in the two dimensional theory.) It would be nice to eliminate the group
from (i) (or to set
in (ii)), but I think this would require a fairly significant modification of our methods.
Like many other undecidability results, the proof of Theorem 2 proceeds by a sequence of reductions, in which the undecidability of one problem is shown to follow from the undecidability of another, more “expressive” problem that can be encoded inside the original problem, until one reaches a problem that is so expressive that it encodes a problem already known to be undecidable. Indeed, all three undecidability results are ultimately obtained from Berger’s undecidability result on the domino problem.
The first step in increasing expressiveness is to observe that the undecidability of a single tiling equation follows from the undecidability of a system of tiling equations. More precisely, suppose we have non-empty finite subsets of a finitely generated group
for
and
, as well as periodic sets
of
for
, such that it is logically undecidable whether the system of tiling equations
We view systems of the form (2) as belonging to a kind of “language” in which each equation in the system is a “sentence” in the language imposing additional constraints on a tiling. One can now pick and choose various sentences in this language to try to encode various interesting problems. For instance, one can encode the concept of a function taking values in a finite group
as a single tiling equation
This begins to resemble the equations that come up in the domino problem. Here one has a finite set of Wang tiles – unit squares where each of the four sides is colored with a color
(corresponding to the four cardinal directions North, South, East, and West) from some finite set
of colors. The domino problem is then to tile the plane with copies of these tiles in such a way that adjacent sides match. In terms of equations, one is seeking to find functions
obeying the pointwise constraint
Proposition 3 (Swapping property) Consider the solutions to a tiling equationin a one-dimensional group
(with
a finite abelian group,
finite, and
periodic). Suppose there are two solutions
to this equation that agree on the left in the sense that
For any function
, define the “swap”
of
and
to be the set
Then
also solves the equation (9).
One can think of and
as “genes” with “nucleotides”
,
at each position
, and
is a new gene formed by choosing one of the nucleotides from the “parent” genes
,
at each position. The above proposition then says that the solutions to the equation (9) must be closed under “genetic transfer” among any pair of genes that agree on the left. This seems to present an obstruction to trying to encode equation such as
Abdul Basit, Artem Chernikov, Sergei Starchenko, Chiu-Minh Tran and I have uploaded to the arXiv our paper Zarankiewicz’s problem for semilinear hypergraphs. This paper is in the spirit of a number of results in extremal graph theory in which the bounds for various graph-theoretic problems or results can be greatly improved if one makes some additional hypotheses regarding the structure of the graph, for instance by requiring that the graph be “definable” with respect to some theory with good model-theoretic properties.
A basic motivating example is the question of counting the number of incidences between points and lines (or between points and other geometric objects). Suppose one has points and
lines in a space. How many incidences can there be between these points and lines? The utterly trivial bound is
, but by using the basic fact that two points determine a line (or two lines intersect in at most one point), a simple application of Cauchy-Schwarz improves this bound to
. In graph theoretic terms, the point is that the bipartite incidence graph between points and lines does not contain a copy of
(there does not exist two points and two lines that are all incident to each other). Without any other further hypotheses, this bound is basically sharp: consider for instance the collection of
points and
lines in a finite plane
, that has
incidences (one can make the situation more symmetric by working with a projective plane rather than an affine plane). If however one considers lines in the real plane
, the famous Szemerédi-Trotter theorem improves the incidence bound further from
to
. Thus the incidence graph between real points and lines contains more structure than merely the absence of
.
More generally, bounding on the size of bipartite graphs (or multipartite hypergraphs) not containing a copy of some complete bipartite subgraph (or
in the hypergraph case) is known as Zarankiewicz’s problem. We have results for all
and all orders of hypergraph, but for sake of this post I will focus on the bipartite
case.
In our paper we improve the bound to a near-linear bound in the case that the incidence graph is “semilinear”. A model case occurs when one considers incidences between points and axis-parallel rectangles in the plane. Now the
condition is not automatic (it is of course possible for two distinct points to both lie in two distinct rectangles), so we impose this condition by fiat:
Theorem 1 Suppose one haspoints and
axis-parallel rectangles in the plane, whose incidence graph contains no
‘s, for some large
.
- (i) The total number of incidences is
.
- (ii) If all the rectangles are dyadic, the bound can be improved to
.
- (iii) The bound in (ii) is best possible (up to the choice of implied constant).
We don’t know whether the bound in (i) is similarly tight for non-dyadic boxes; the usual tricks for reducing the non-dyadic case to the dyadic case strangely fail to apply here. One can generalise to higher dimensions, replacing rectangles by polytopes with faces in some fixed finite set of orientations, at the cost of adding several more logarithmic factors; also, one can replace the reals by other ordered division rings, and replace polytopes by other sets of bounded “semilinear descriptive complexity”, e.g., unions of boundedly many polytopes, or which are cut out by boundedly many functions that enjoy coordinatewise monotonicity properties. For certain specific graphs we can remove the logarithmic factors entirely. We refer to the preprint for precise details.
The proof techniques are combinatorial. The proof of (i) relies primarily on the order structure of to implement a “divide and conquer” strategy in which one can efficiently control incidences between
points and rectangles by incidences between approximately
points and boxes. For (ii) there is additional order-theoretic structure one can work with: first there is an easy pruning device to reduce to the case when no rectangle is completely contained inside another, and then one can impose the “tile partial order” in which one dyadic rectangle
is less than another
if
and
. The point is that this order is “locally linear” in the sense that for any two dyadic rectangles
, the set
is linearly ordered, and this can be exploited by elementary double counting arguments to obtain a bound which eventually becomes
after optimising certain parameters in the argument. The proof also suggests how to construct the counterexample in (iii), which is achieved by an elementary iterative construction.
Asgar Jamneshan and I have just uploaded to the arXiv our paper “An uncountable Moore-Schmidt theorem“. This paper revisits a classical theorem of Moore and Schmidt in measurable cohomology of measure-preserving systems. To state the theorem, let be a probability space, and
be the group of measure-preserving automorphisms of this space, that is to say the invertible bimeasurable maps
that preserve the measure
:
. To avoid some ambiguity later in this post when we introduce abstract analogues of measure theory, we will refer to measurable maps as concrete measurable maps, and measurable spaces as concrete measurable spaces. (One could also call
a concrete probability space, but we will not need to do so here as we will not be working explicitly with abstract probability spaces.)
Let be a discrete group. A (concrete) measure-preserving action of
on
is a group homomorphism
from
to
, thus
is the identity map and
for all
. A large portion of ergodic theory is concerned with the study of such measure-preserving actions, especially in the classical case when
is the integers (with the additive group law).
Let be a compact Hausdorff abelian group, which we can endow with the Borel
-algebra
. A (concrete measurable)
–cocycle is a collection
of concrete measurable maps
obeying the cocycle equation
for -almost every
. (Here we are glossing over a measure-theoretic subtlety that we will return to later in this post – see if you can spot it before then!) Cocycles arise naturally in the theory of group extensions of dynamical systems; in particular (and ignoring the aforementioned subtlety), each cocycle induces a measure-preserving action
on
(which we endow with the product of
with Haar probability measure on
), defined by
This connection with group extensions was the original motivation for our study of measurable cohomology, but is not the focus of the current paper.
A special case of a -valued cocycle is a (concrete measurable)
-valued coboundary, in which
for each
takes the special form
for -almost every
, where
is some measurable function; note that (ignoring the aforementioned subtlety), every function of this form is automatically a concrete measurable
-valued cocycle. One of the first basic questions in measurable cohomology is to try to characterize which
-valued cocycles are in fact
-valued coboundaries. This is a difficult question in general. However, there is a general result of Moore and Schmidt that at least allows one to reduce to the model case when
is the unit circle
, by taking advantage of the Pontryagin dual group
of characters
, that is to say the collection of continuous homomorphisms
to the unit circle. More precisely, we have
Theorem 1 (Countable Moore-Schmidt theorem) Let
be a discrete group acting in a concrete measure-preserving fashion on a probability space
. Let
be a compact Hausdorff abelian group. Assume the following additional hypotheses:
- (i)
is at most countable.
- (ii)
is a standard Borel space.
- (iii)
is metrisable.
Then a
-valued concrete measurable cocycle
is a concrete coboundary if and only if for each character
, the
-valued cocycles
are concrete coboundaries.
The hypotheses (i), (ii), (iii) are saying in some sense that the data are not too “large”; in all three cases they are saying in some sense that the data are only “countably complicated”. For instance, (iii) is equivalent to
being second countable, and (ii) is equivalent to
being modeled by a complete separable metric space. It is because of this restriction that we refer to this result as a “countable” Moore-Schmidt theorem. This theorem is a useful tool in several other applications, such as the Host-Kra structure theorem for ergodic systems; I hope to return to these subsequent applications in a future post.
Let us very briefly sketch the main ideas of the proof of Theorem 1. Ignore for now issues of measurability, and pretend that something that holds almost everywhere in fact holds everywhere. The hard direction is to show that if each is a coboundary, then so is
. By hypothesis, we then have an equation of the form
for all and some functions
, and our task is then to produce a function
for which
for all .
Comparing the two equations, the task would be easy if we could find an for which
for all . However there is an obstruction to this: the left-hand side of (3) is additive in
, so the right-hand side would have to be also in order to obtain such a representation. In other words, for this strategy to work, one would have to first establish the identity
for all . On the other hand, the good news is that if we somehow manage to obtain the equation, then we can obtain a function
obeying (3), thanks to Pontryagin duality, which gives a one-to-one correspondence between
and the homomorphisms of the (discrete) group
to
.
Now, it turns out that one cannot derive the equation (4) directly from the given information (2). However, the left-hand side of (2) is additive in , so the right-hand side must be also. Manipulating this fact, we eventually arrive at
In other words, we don’t get to show that the left-hand side of (4) vanishes, but we do at least get to show that it is -invariant. Now let us assume for sake of argument that the action of
is ergodic, which (ignoring issues about sets of measure zero) basically asserts that the only
-invariant functions are constant. So now we get a weaker version of (4), namely
for some constants .
Now we need to eliminate the constants. This can be done by the following group-theoretic projection. Let denote the space of concrete measurable maps
from
to
, up to almost everywhere equivalence; this is an abelian group where the various terms in (5) naturally live. Inside this group we have the subgroup
of constant functions (up to almost everywhere equivalence); this is where the right-hand side of (5) lives. Because
is a divisible group, there is an application of Zorn’s lemma (a good exercise for those who are not acquainted with these things) to show that there exists a retraction
, that is to say a group homomorphism that is the identity on the subgroup
. We can use this retraction, or more precisely the complement
, to eliminate the constant in (5). Indeed, if we set
then from (5) we see that
while from (2) one has
and now the previous strategy works with replaced by
. This concludes the sketch of proof of Theorem 1.
In making the above argument rigorous, the hypotheses (i)-(iii) are used in several places. For instance, to reduce to the ergodic case one relies on the ergodic decomposition, which requires the hypothesis (ii). Also, most of the above equations only hold outside of a set of measure zero, and the hypothesis (i) and the hypothesis (iii) (which is equivalent to being at most countable) to avoid the problem that an uncountable union of sets of measure zero could have positive measure (or fail to be measurable at all).
My co-author Asgar Jamneshan and I are working on a long-term project to extend many results in ergodic theory (such as the aforementioned Host-Kra structure theorem) to “uncountable” settings in which hypotheses analogous to (i)-(iii) are omitted; thus we wish to consider actions on uncountable groups, on spaces that are not standard Borel, and cocycles taking values in groups that are not metrisable. Such uncountable contexts naturally arise when trying to apply ergodic theory techniques to combinatorial problems (such as the inverse conjecture for the Gowers norms), as one often relies on the ultraproduct construction (or something similar) to generate an ergodic theory translation of these problems, and these constructions usually give “uncountable” objects rather than “countable” ones. (For instance, the ultraproduct of finite groups is a hyperfinite group, which is usually uncountable.). This paper marks the first step in this project by extending the Moore-Schmidt theorem to the uncountable setting.
If one simply drops the hypotheses (i)-(iii) and tries to prove the Moore-Schmidt theorem, several serious difficulties arise. We have already mentioned the loss of the ergodic decomposition and the possibility that one has to control an uncountable union of null sets. But there is in fact a more basic problem when one deletes (iii): the addition operation , while still continuous, can fail to be measurable as a map from
to
! Thus for instance the sum of two measurable functions
need not remain measurable, which makes even the very definition of a measurable cocycle or measurable coboundary problematic (or at least unnatural). This phenomenon is known as the Nedoma pathology. A standard example arises when
is the uncountable torus
, endowed with the product topology. Crucially, the Borel
-algebra
generated by this uncountable product is not the product
of the factor Borel
-algebras (the discrepancy ultimately arises from the fact that topologies permit uncountable unions, but
-algebras do not); relating to this, the product
-algebra
is not the same as the Borel
-algebra
, but is instead a strict sub-algebra. If the group operations on
were measurable, then the diagonal set
would be measurable in . But it is an easy exercise in manipulation of
-algebras to show that if
are any two measurable spaces and
is measurable in
, then the fibres
of
are contained in some countably generated subalgebra of
. Thus if
were
-measurable, then all the points of
would lie in a single countably generated
-algebra. But the cardinality of such an algebra is at most
while the cardinality of
is
, and Cantor’s theorem then gives a contradiction.
To resolve this problem, we give a coarser
-algebra than the Borel
-algebra, namely the Baire
-algebra
, thus coarsening the measurable space structure on
to a new measurable space
. In the case of compact Hausdorff abelian groups,
can be defined as the
-algebra generated by the characters
; for more general compact abelian groups, one can define
as the
-algebra generated by all continuous maps into metric spaces. This
-algebra is equal to
when
is metrisable but can be smaller for other
. With this measurable structure,
becomes a measurable group; it seems that once one leaves the metrisable world that
is a superior (or at least equally good) space to work with than
for analysis, as it avoids the Nedoma pathology. (For instance, from Plancherel’s theorem, we see that if
is the Haar probability measure on
, then
(thus, every
-measurable set is equivalent modulo
-null sets to a
-measurable set), so there is no damage to Plancherel caused by passing to the Baire
-algebra.
Passing to the Baire -algebra
fixes the most severe problems with an uncountable Moore-Schmidt theorem, but one is still faced with an issue of having to potentially take an uncountable union of null sets. To avoid this sort of problem, we pass to the framework of abstract measure theory, in which we remove explicit mention of “points” and can easily delete all null sets at a very early stage of the formalism. In this setup, the category of concrete measurable spaces is replaced with the larger category of abstract measurable spaces, which we formally define as the opposite category of the category of
-algebras (with Boolean algebra homomorphisms). Thus, we define an abstract measurable space to be an object of the form
, where
is an (abstract)
-algebra and
is a formal placeholder symbol that signifies use of the opposite category, and an abstract measurable map
is an object of the form
, where
is a Boolean algebra homomorphism and
is again used as a formal placeholder; we call
the pullback map associated to
. [UPDATE: It turns out that this definition of a measurable map led to technical issues. In a forthcoming revision of the paper we also impose the requirement that the abstract measurable map be
-complete (i.e., it respects countable joins).] The composition
of two abstract measurable maps
,
is defined by the formula
, or equivalently
.
Every concrete measurable space can be identified with an abstract counterpart
, and similarly every concrete measurable map
can be identified with an abstract counterpart
, where
is the pullback map
. Thus the category of concrete measurable spaces can be viewed as a subcategory of the category of abstract measurable spaces. The advantage of working in the abstract setting is that it gives us access to more spaces that could not be directly defined in the concrete setting. Most importantly for us, we have a new abstract space, the opposite measure algebra
of
, defined as
where
is the ideal of null sets in
. Informally,
is the space
with all the null sets removed; there is a canonical abstract embedding map
, which allows one to convert any concrete measurable map
into an abstract one
. One can then define the notion of an abstract action, abstract cocycle, and abstract coboundary by replacing every occurrence of the category of concrete measurable spaces with their abstract counterparts, and replacing
with the opposite measure algebra
; see the paper for details. Our main theorem is then
Theorem 2 (Uncountable Moore-Schmidt theorem) Let
be a discrete group acting abstractly on a
-finite measure space
. Let
be a compact Hausdorff abelian group. Then a
-valued abstract measurable cocycle
is an abstract coboundary if and only if for each character
, the
-valued cocycles
are abstract coboundaries.
With the abstract formalism, the proof of the uncountable Moore-Schmidt theorem is almost identical to the countable one (in fact we were able to make some simplifications, such as avoiding the use of the ergodic decomposition). A key tool is what we call a “conditional Pontryagin duality” theorem, which asserts that if one has an abstract measurable map for each
obeying the identity
for all
, then there is an abstract measurable map
such that
for all
. This is derived from the usual Pontryagin duality and some other tools, most notably the completeness of the
-algebra of
, and the Sikorski extension theorem.
We feel that it is natural to stay within the abstract measure theory formalism whenever dealing with uncountable situations. However, it is still an interesting question as to when one can guarantee that the abstract objects constructed in this formalism are representable by concrete analogues. The basic questions in this regard are:
- (i) Suppose one has an abstract measurable map
into a concrete measurable space. Does there exist a representation of
by a concrete measurable map
? Is it unique up to almost everywhere equivalence?
- (ii) Suppose one has a concrete cocycle that is an abstract coboundary. When can it be represented by a concrete coboundary?
For (i) the answer is somewhat interesting (as I learned after posing this MathOverflow question):
- If
does not separate points, or is not compact metrisable or Polish, there can be counterexamples to uniqueness. If
is not compact or Polish, there can be counterexamples to existence.
- If
is a compact metric space or a Polish space, then one always has existence and uniqueness.
- If
is a compact Hausdorff abelian group, one always has existence.
- If
is a complete measure space, then one always has existence (from a theorem of Maharam).
- If
is the unit interval with the Borel
-algebra and Lebesgue measure, then one has existence for all compact Hausdorff
assuming the continuum hypothesis (from a theorem of von Neumann) but existence can fail under other extensions of ZFC (from a theorem of Shelah, using the method of forcing).
- For more general
, existence for all compact Hausdorff
is equivalent to the existence of a lifting from the
-algebra
to
(or, in the language of abstract measurable spaces, the existence of an abstract retraction from
to
).
- It is a long-standing open question (posed for instance by Fremlin) whether it is relatively consistent with ZFC that existence holds whenever
is compact Hausdorff.
Our understanding of (ii) is much less complete:
- If
is metrisable, the answer is “always” (which among other things establishes the countable Moore-Schmidt theorem as a corollary of the uncountable one).
- If
is at most countable and
is a complete measure space, then the answer is again “always”.
In view of the answers to (i), I would not be surprised if the full answer to (ii) was also sensitive to axioms of set theory. However, such set theoretic issues seem to be almost completely avoided if one sticks with the abstract formalism throughout; they only arise when trying to pass back and forth between the abstract and concrete categories.
As readers who have followed my previous post will know, I have been spending the last few weeks extending my previous interactive text on propositional logic (entitied “QED”) to also cover first-order logic. The text has now reached what seems to be a stable form, with a complete set of deductive rules for first-order logic with equality, and no major bugs as far as I can tell (apart from one weird visual bug I can’t eradicate, in that some graphics elements can occasionally temporarily disappear when one clicks on an item). So it will likely not change much going forward.
I feel though that there could be more that could be done with this sort of framework (e.g., improved GUI, modification to other logics, developing the ability to write one’s own texts and libraries, exploring mathematical theories such as Peano arithmetic, etc.). But writing this text (particularly the first-order logic sections) has brought me close to the limit of my programming ability, as the number of bugs introduced with each new feature implemented has begun to grow at an alarming rate. I would like to repackage the code so that it can be re-used by more adept programmers for further possible applications, though I have never done something like this before and would appreciate advice on how to do so. The code is already available under a Creative Commons licence, but I am not sure how readable and modifiable it will be to others currently. [Update: it is now on GitHub.]
[One thing I noticed is that I would probably have to make more of a decoupling between the GUI elements, the underlying logical elements, and the interactive text. For instance, at some point I made the decision (convenient at the time) to use some GUI elements to store some of the state variables of the text, e.g. the exercise buttons are currently storing the status of what exercises are unlocked or not. This is presumably not an example of good programming practice, though it would be relatively easy to fix. More seriously, due to my inability to come up with a good general-purpose matching algorithm (or even specification of such an algorithm) for the the laws of first-order logic, many of the laws have to be hard-coded into the matching routine, so one cannot currently remove them from the text. It may well be that the best thing to do in fact is to rework the entire codebase from scratch using more professional software design methods.]
[Update, Aug 23: links moved to GitHub version.]
About six years ago on this blog, I started thinking about trying to make a web-based game based around high-school algebra, and ended up using Scratch to write a short but playable puzzle game in which one solves linear equations for an unknown using a restricted set of moves. (At almost the same time, there were a number of more professionally made games released along similar lines, most notably Dragonbox.)
Since then, I have thought a couple times about whether there were other parts of mathematics which could be gamified in a similar fashion. Shortly after my first blog posts on this topic, I experimented with a similar gamification of Lewis Carroll’s classic list of logic puzzles, but the results were quite clunky, and I was never satisfied with the results.
Over the last few weeks I returned to this topic though, thinking in particular about how to gamify the rules of inference of propositional logic, in a manner that at least vaguely resembles how mathematicians actually go about making logical arguments (e.g., splitting into cases, arguing by contradiction, using previous result as lemmas to help with subsequent ones, and so forth). The rules of inference are a list of a dozen or so deductive rules concerning propositional sentences (things like “( AND
) OR (NOT
)”, where
are some formulas). A typical such rule is Modus Ponens: if the sentence
is known to be true, and the implication “
IMPLIES
” is also known to be true, then one can deduce that
is also true. Furthermore, in this deductive calculus it is possible to temporarily introduce some unproven statements as an assumption, only to discharge them later. In particular, we have the deduction theorem: if, after making an assumption
, one is able to derive the statement
, then one can conclude that the implication “
IMPLIES
” is true without any further assumption.
It took a while for me to come up with a workable game-like graphical interface for all of this, but I finally managed to set one up, now using Javascript instead of Scratch (which would be hopelessly inadequate for this task); indeed, part of the motivation of this project was to finally learn how to program in Javascript, which turned out to be not as formidable as I had feared (certainly having experience with other C-like languages like C++, Java, or lua, as well as some prior knowledge of HTML, was very helpful). The main code for this project is available here. Using this code, I have created an interactive textbook in the style of a computer game, which I have titled “QED”. This text contains thirty-odd exercises arranged in twelve sections that function as game “levels”, in which one has to use a given set of rules of inference, together with a given set of hypotheses, to reach a desired conclusion. The set of available rules increases as one advances through the text; in particular, each new section gives one or more rules, and additionally each exercise one solves automatically becomes a new deduction rule one can exploit in later levels, much as lemmas and propositions are used in actual mathematics to prove more difficult theorems. The text automatically tries to match available deduction rules to the sentences one clicks on or drags, to try to minimise the amount of manual input one needs to actually make a deduction.
Most of one’s proof activity takes place in a “root environment” of statements that are known to be true (under the given hypothesis), but for more advanced exercises one has to also work in sub-environments in which additional assumptions are made. I found the graphical metaphor of nested boxes to be useful to depict this tree of sub-environments, and it seems to combine well with the drag-and-drop interface.
The text also logs one’s moves in a more traditional proof format, which shows how the mechanics of the game correspond to a traditional mathematical argument. My hope is that this will give students a way to understand the underlying concept of forming a proof in a manner that is more difficult to achieve using traditional, non-interactive textbooks.
I have tried to organise the exercises in a game-like progression in which one first works with easy levels that train the player on a small number of moves, and then introduce more advanced moves one at a time. As such, the order in which the rules of inference are introduced is a little idiosyncratic. The most powerful rule (the law of the excluded middle, which is what separates classical logic from intuitionistic logic) is saved for the final section of the text.
Anyway, I am now satisfied enough with the state of the code and the interactive text that I am willing to make both available (and open source; I selected a CC-BY licence for both), and would be happy to receive feedback on any aspect of the either. In principle one could extend the game mechanics to other mathematical topics than the propositional calculus – the rules of inference for first-order logic being an obvious next candidate – but it seems to make sense to focus just on propositional logic for now.
In graph theory, the recently developed theory of graph limits has proven to be a useful tool for analysing large dense graphs, being a convenient reformulation of the Szemerédi regularity lemma. Roughly speaking, the theory asserts that given any sequence of finite graphs, one can extract a subsequence
which converges (in a specific sense) to a continuous object known as a “graphon” – a symmetric measurable function
. What “converges” means in this context is that subgraph densities converge to the associated integrals of the graphon
. For instance, the edge density
converge to the integral
the triangle density
converges to the integral
the four-cycle density
converges to the integral
and so forth. One can use graph limits to prove many results in graph theory that were traditionally proven using the regularity lemma, such as the triangle removal lemma, and can also reduce many asymptotic graph theory problems to continuous problems involving multilinear integrals (although the latter problems are not necessarily easy to solve!). See this text of Lovasz for a detailed study of graph limits and their applications.
One can also express graph limits (and more generally hypergraph limits) in the language of nonstandard analysis (or of ultraproducts); see for instance this paper of Elek and Szegedy, Section 6 of this previous blog post, or this paper of Towsner. (In this post we assume some familiarity with nonstandard analysis, as reviewed for instance in the previous blog post.) Here, one starts as before with a sequence of finite graphs, and then takes an ultraproduct (with respect to some arbitrarily chosen non-principal ultrafilter
) to obtain a nonstandard graph
, where
is the ultraproduct of the
, and similarly for the
. The set
can then be viewed as a symmetric subset of
which is measurable with respect to the Loeb
-algebra
of the product
(see this previous blog post for the construction of Loeb measure). A crucial point is that this
-algebra is larger than the product
of the Loeb
-algebra of the individual vertex set
. This leads to a decomposition
where the “graphon” is the orthogonal projection of
onto
, and the “regular error”
is orthogonal to all product sets
for
. The graphon
then captures the statistics of the nonstandard graph
, in exact analogy with the more traditional graph limits: for instance, the edge density
(or equivalently, the limit of the along the ultrafilter
) is equal to the integral
where denotes Loeb measure on a nonstandard finite set
; the triangle density
(or equivalently, the limit along of the triangle densities of
) is equal to the integral
and so forth. Note that with this construction, the graphon is living on the Cartesian square of an abstract probability space
, which is likely to be inseparable; but it is possible to cut down the Loeb
-algebra on
to minimal countable
-algebra for which
remains measurable (up to null sets), and then one can identify
with
, bringing this construction of a graphon in line with the traditional notion of a graphon. (See Remark 5 of this previous blog post for more discussion of this point.)
Additive combinatorics, which studies things like the additive structure of finite subsets of an abelian group
, has many analogies and connections with asymptotic graph theory; in particular, there is the arithmetic regularity lemma of Green which is analogous to the graph regularity lemma of Szemerédi. (There is also a higher order arithmetic regularity lemma analogous to hypergraph regularity lemmas, but this is not the focus of the discussion here.) Given this, it is natural to suspect that there is a theory of “additive limits” for large additive sets of bounded doubling, analogous to the theory of graph limits for large dense graphs. The purpose of this post is to record a candidate for such an additive limit. This limit can be used as a substitute for the arithmetic regularity lemma in certain results in additive combinatorics, at least if one is willing to settle for qualitative results rather than quantitative ones; I give a few examples of this below the fold.
It seems that to allow for the most flexible and powerful manifestation of this theory, it is convenient to use the nonstandard formulation (among other things, it allows for full use of the transfer principle, whereas a more traditional limit formulation would only allow for a transfer of those quantities continuous with respect to the notion of convergence). Here, the analogue of a nonstandard graph is an ultra approximate group in a nonstandard group
, defined as the ultraproduct of finite
-approximate groups
for some standard
. (A
-approximate group
is a symmetric set containing the origin such that
can be covered by
or fewer translates of
.) We then let
be the external subgroup of
generated by
; equivalently,
is the union of
over all standard
. This space has a Loeb measure
, defined by setting
whenever is an internal subset of
for any standard
, and extended to a countably additive measure; the arguments in Section 6 of this previous blog post can be easily modified to give a construction of this measure.
The Loeb measure is a translation invariant measure on
, normalised so that
has Loeb measure one. As such, one should think of
as being analogous to a locally compact abelian group equipped with a Haar measure. It should be noted though that
is not actually a locally compact group with Haar measure, for two reasons:
- There is not an obvious topology on
that makes it simultaneously locally compact, Hausdorff, and
-compact. (One can get one or two out of three without difficulty, though.)
- The addition operation
is not measurable from the product Loeb algebra
to
. Instead, it is measurable from the coarser Loeb algebra
to
(compare with the analogous situation for nonstandard graphs).
Nevertheless, the analogy is a useful guide for the arguments that follow.
Let denote the space of bounded Loeb measurable functions
(modulo almost everywhere equivalence) that are supported on
for some standard
; this is a complex algebra with respect to pointwise multiplication. There is also a convolution operation
, defined by setting
whenever ,
are bounded nonstandard functions (extended by zero to all of
), and then extending to arbitrary elements of
by density. Equivalently,
is the pushforward of the
-measurable function
under the map
.
The basic structural theorem is then as follows.
Theorem 1 (Kronecker factor) Let
be an ultra approximate group. Then there exists a (standard) locally compact abelian group
of the form
for some standard
and some compact abelian group
, equipped with a Haar measure
and a measurable homomorphism
(using the Loeb
-algebra on
and the Baire
-algebra on
), with the following properties:
- (i)
has dense image, and
is the pushforward of Loeb measure
by
.
- (ii) There exists sets
with
open and
compact, such that
- (iii) Whenever
with
compact and
open, there exists a nonstandard finite set
such that
- (iv) If
, then we have the convolution formula
where
are the pushforwards of
to
, the convolution
on the right-hand side is convolution using
, and
is the pullback map from
to
. In particular, if
, then
for all
.
One can view the locally compact abelian group as a “model “or “Kronecker factor” for the ultra approximate group
(in close analogy with the Kronecker factor from ergodic theory). In the case that
is a genuine nonstandard finite group rather than an ultra approximate group, the non-compact components
of the Kronecker group
are trivial, and this theorem was implicitly established by Szegedy. The compact group
is quite large, and in particular is likely to be inseparable; but as with the case of graphons, when one is only studying at most countably many functions
, one can cut down the size of this group to be separable (or equivalently, second countable or metrisable) if desired, so one often works with a “reduced Kronecker factor” which is a quotient of the full Kronecker factor
. Once one is in the separable case, the Baire sigma algebra is identical with the more familiar Borel sigma algebra.
Given any sequence of uniformly bounded functions for some fixed
, we can view the function
defined by
as an “additive limit” of the , in much the same way that graphons
are limits of the indicator functions
. The additive limits capture some of the statistics of the
, for instance the normalised means
converge (along the ultrafilter ) to the mean
and for three sequences of functions, the normalised correlation
converges along to the correlation
the normalised Gowers norm
converges along to the
Gowers norm
and so forth. We caution however that some correlations that involve evaluating more than one function at the same point will not necessarily be preserved in the additive limit; for instance the normalised norm
does not necessarily converge to the norm
but can converge instead to a larger quantity, due to the presence of the orthogonal projection in the definition (4) of
.
An important special case of an additive limit occurs when the functions involved are indicator functions
of some subsets
of
. The additive limit
does not necessarily remain an indicator function, but instead takes values in
(much as a graphon
takes values in
even though the original indicators
take values in
). The convolution
is then the ultralimit of the normalised convolutions
; in particular, the measure of the support of
provides a lower bound on the limiting normalised cardinality
of a sumset. In many situations this lower bound is an equality, but this is not necessarily the case, because the sumset
could contain a large number of elements which have very few (
) representations as the sum of two elements of
, and in the limit these portions of the sumset fall outside of the support of
. (One can think of the support of
as describing the “essential” sumset of
, discarding those elements that have only very few representations.) Similarly for higher convolutions of
. Thus one can use additive limits to partially control the growth
of iterated sumsets of subsets
of approximate groups
, in the regime where
stays bounded and
goes to infinity.
Theorem 1 can be proven by Fourier-analytic means (combined with Freiman’s theorem from additive combinatorics), and we will do so below the fold. For now, we give some illustrative examples of additive limits.
Example 2 (Bohr sets) We take
to be the intervals
, where
is a sequence going to infinity; these are
-approximate groups for all
. Let
be an irrational real number, let
be an interval in
, and for each natural number
let
be the Bohr set
In this case, the (reduced) Kronecker factor
can be taken to be the infinite cylinder
with the usual Lebesgue measure
. The additive limits of
and
end up being
and
, where
is the finite cylinder
and
is the rectangle
Geometrically, one should think of
and
as being wrapped around the cylinder
via the homomorphism
, and then one sees that
is converging in some normalised weak sense to
, and similarly for
and
. In particular, the additive limit predicts the growth rate of the iterated sumsets
to be quadratic in
until
becomes comparable to
, at which point the growth transitions to linear growth, in the regime where
is bounded and
is large.
If
were rational instead of irrational, then one would need to replace
by the finite subgroup
here.
Example 3 (Structured subsets of progressions) We take
be the rank two progression
where
is a sequence going to infinity; these are
-approximate groups for all
. Let
be the subset
Then the (reduced) Kronecker factor can be taken to be
with Lebesgue measure
, and the additive limits of the
and
are then
and
, where
is the square
and
is the circle
Geometrically, the picture is similar to the Bohr set one, except now one uses a Freiman homomorphism
for
to embed the original sets
into the plane
. In particular, one now expects the growth rate of the iterated sumsets
and
to be quadratic in
, in the regime where
is bounded and
is large.
Example 4 (Dissociated sets) Let
be a fixed natural number, and take
where
are randomly chosen elements of a large cyclic group
, where
is a sequence of primes going to infinity. These are
-approximate groups. The (reduced) Kronecker factor
can (almost surely) then be taken to be
with counting measure, and the additive limit of
is
, where
and
is the standard basis of
. In particular, the growth rates of
should grow approximately like
for
bounded and
large.
Example 5 (Random subsets of groups) Let
be a sequence of finite additive groups whose order is going to infinity. Let
be a random subset of
of some fixed density
. Then (almost surely) the Kronecker factor here can be reduced all the way to the trivial group
, and the additive limit of the
is the constant function
. The convolutions
then converge in the ultralimit (modulo almost everywhere equivalence) to the pullback of
; this reflects the fact that
of the elements of
can be represented as the sum of two elements of
in
ways. In particular,
occupies a proportion
of
.
Example 6 (Trigonometric series) Take
for a sequence
of primes going to infinity, and for each
let
be an infinite sequence of frequencies chosen uniformly and independently from
. Let
denote the random trigonometric series
Then (almost surely) we can take the reduced Kronecker factor
to be the infinite torus
(with the Haar probability measure
), and the additive limit of the
then becomes the function
defined by the formula
In fact, the pullback
is the ultralimit of the
. As such, for any standard exponent
, the normalised
norm
can be seen to converge to the limit
The reader is invited to consider combinations of the above examples, e.g. random subsets of Bohr sets, to get a sense of the general case of Theorem 1.
It is likely that this theorem can be extended to the noncommutative setting, using the noncommutative Freiman theorem of Emmanuel Breuillard, Ben Green, and myself, but I have not attempted to do so here (see though this recent preprint of Anush Tserunyan for some related explorations); in a separate direction, there should be extensions that can control higher Gowers norms, in the spirit of the work of Szegedy.
Note: the arguments below will presume some familiarity with additive combinatorics and with nonstandard analysis, and will be a little sketchy in places.
Let be the algebraic closure of
, that is to say the field of algebraic numbers. We fix an embedding of
into
, giving rise to a complex absolute value
for algebraic numbers
.
Let be of degree
, so that
is irrational. A classical theorem of Liouville gives the quantitative bound
for the irrationality of fails to be approximated by rational numbers
, where
depends on
but not on
. Indeed, if one lets
be the Galois conjugates of
, then the quantity
is a non-zero natural number divided by a constant, and so we have the trivial lower bound
from which the bound (1) easily follows. A well known corollary of the bound (1) is that Liouville numbers are automatically transcendental.
The famous theorem of Thue, Siegel and Roth improves the bound (1) to
for any and rationals
, where
depends on
but not on
. Apart from the
in the exponent and the implied constant, this bound is optimal, as can be seen from Dirichlet’s theorem. This theorem is a good example of the ineffectivity phenomenon that affects a large portion of modern number theory: the implied constant in the
notation is known to be finite, but there is no explicit bound for it in terms of the coefficients of the polynomial defining
(in contrast to (1), for which an effective bound may be easily established). This is ultimately due to the reliance on the “dueling conspiracy” (or “repulsion phenomenon”) strategy. We do not as yet have a good way to rule out one counterexample to (2), in which
is far closer to
than
; however we can rule out two such counterexamples, by playing them off of each other.
A powerful strengthening of the Thue-Siegel-Roth theorem is given by the subspace theorem, first proven by Schmidt and then generalised further by several authors. To motivate the theorem, first observe that the Thue-Siegel-Roth theorem may be rephrased as a bound of the form
for any algebraic numbers with
and
linearly independent (over the algebraic numbers), and any
and
, with the exception when
or
are rationally dependent (i.e. one is a rational multiple of the other), in which case one has to remove some lines (i.e. subspaces in
) of rational slope from the space
of pairs
to which the bound (3) does not apply (namely, those lines for which the left-hand side vanishes). Here
can depend on
but not on
. More generally, we have
Theorem 1 (Schmidt subspace theorem) Let
be a natural number. Let
be linearly independent linear forms. Then for any
, one has the bound
for all
, outside of a finite number of proper subspaces of
, where
and
depends on
and the
, but is independent of
.
Being a generalisation of the Thue-Siegel-Roth theorem, it is unsurprising that the known proofs of the subspace theorem are also ineffective with regards to the constant . (However, the number of exceptional subspaces may be bounded effectively; cf. the situation with the Skolem-Mahler-Lech theorem, discussed in this previous blog post.) Once again, the lower bound here is basically sharp except for the
factor and the implied constant: given any
with
, a simple volume packing argument (the same one used to prove the Dirichlet approximation theorem) shows that for any sufficiently large
, one can find integers
, not all zero, such that
for all . Thus one can get
comparable to
in many different ways.
There are important generalisations of the subspace theorem to other number fields than the rationals (and to other valuations than the Archimedean valuation ); we will develop one such generalisation below.
The subspace theorem is one of many finiteness theorems in Diophantine geometry; in this case, it is the number of exceptional subspaces which is finite. It turns out that finiteness theorems are very compatible with the language of nonstandard analysis. (See this previous blog post for a review of the basics of nonstandard analysis, and in particular for the nonstandard interpretation of asymptotic notation such as and
.) The reason for this is that a standard set
is finite if and only if it contains no strictly nonstandard elements (that is to say, elements of
). This makes for a clean formulation of finiteness theorems in the nonstandard setting. For instance, the standard form of Bezout’s theorem asserts that if
are coprime polynomials over some field, then the curves
and
intersect in only finitely many points. The nonstandard version of this is then
Theorem 2 (Bezout’s theorem, nonstandard form) Let
be standard coprime polynomials. Then there are no strictly nonstandard solutions to
.
Now we reformulate Theorem 1 in nonstandard language. We need a definition:
Definition 3 (General position) Let
be nested fields. A point
in
is said to be in
-general position if it is not contained in any hyperplane of
definable over
, or equivalently if one has
for any
.
Theorem 4 (Schmidt subspace theorem, nonstandard version) Let
be a standard natural number. Let
be linearly independent standard linear forms. Let
be a tuple of nonstandard integers which is in
-general position (in particular, this forces
to be strictly nonstandard). Then one has
where we extend
from
to
(and also similarly extend
from
to
) in the usual fashion.
Observe that (as is usual when translating to nonstandard analysis) some of the epsilons and quantifiers that are present in the standard version become hidden in the nonstandard framework, being moved inside concepts such as “strictly nonstandard” or “general position”. We remark that as is in
-general position, it is also in
-general position (as an easy Galois-theoretic argument shows), and the requirement that the
are linearly independent is thus equivalent to
being
-linearly independent.
Exercise 1 Verify that Theorem 1 and Theorem 4 are equivalent. (Hint: there are only countably many proper subspaces of
.)
We will not prove the subspace theorem here, but instead focus on a particular application of the subspace theorem, namely to counting integer points on curves. In this paper of Corvaja and Zannier, the subspace theorem was used to give a new proof of the following basic result of Siegel:
Theorem 5 (Siegel’s theorem on integer points) Let
be an irreducible polynomial of two variables, such that the affine plane curve
either has genus at least one, or has at least three points on the line at infinity, or both. Then
has only finitely many integer points
.
This is a finiteness theorem, and as such may be easily converted to a nonstandard form:
Theorem 6 (Siegel’s theorem, nonstandard form) Let
be a standard irreducible polynomial of two variables, such that the affine plane curve
either has genus at least one, or has at least three points on the line at infinity, or both. Then
does not contain any strictly nonstandard integer points
.
Note that Siegel’s theorem can fail for genus zero curves that only meet the line at infinity at just one or two points; the key examples here are the graphs for a polynomial
, and the Pell equation curves
. Siegel’s theorem can be compared with the more difficult theorem of Faltings, which establishes finiteness of rational points (not just integer points), but now needs the stricter requirement that the curve
has genus at least two (to avoid the additional counterexample of elliptic curves of positive rank, which have infinitely many rational points).
The standard proofs of Siegel’s theorem rely on a combination of the Thue-Siegel-Roth theorem and a number of results on abelian varieties (notably the Mordell-Weil theorem). The Corvaja-Zannier argument rebalances the difficulty of the argument by replacing the Thue-Siegel-Roth theorem by the more powerful subspace theorem (in fact, they need one of the stronger versions of this theorem alluded to earlier), while greatly reducing the reliance on results on abelian varieties. Indeed, for curves with three or more points at infinity, no theory from abelian varieties is needed at all, while for the remaining cases, one mainly needs the existence of the Abel-Jacobi embedding, together with a relatively elementary theorem of Chevalley-Weil which is used in the proof of the Mordell-Weil theorem, but is significantly easier to prove.
The Corvaja-Zannier argument (together with several further applications of the subspace theorem) is presented nicely in this Bourbaki expose of Bilu. To establish the theorem in full generality requires a certain amount of algebraic number theory machinery, such as the theory of valuations on number fields, or of relative discriminants between such number fields. However, the basic ideas can be presented without much of this machinery by focusing on simple special cases of Siegel’s theorem. For instance, we can handle irreducible cubics that meet the line at infinity at exactly three points :
Theorem 7 (Siegel’s theorem with three points at infinity) Siegel’s theorem holds when the irreducible polynomial
takes the form
for some quadratic polynomial
and some distinct algebraic numbers
.
Proof: We use the nonstandard formalism. Suppose for sake of contradiction that we can find a strictly nonstandard integer point on a curve
of the indicated form. As this point is infinitesimally close to the line at infinity,
must be infinitesimally close to one of
; without loss of generality we may assume that
is infinitesimally close to
.
We now use a version of the polynomial method, to find some polynomials of controlled degree that vanish to high order on the “arm” of the cubic curve that asymptotes to
. More precisely, let
be a large integer (actually
will already suffice here), and consider the
-vector space
of polynomials
of degree at most
, and of degree at most
in the
variable; this space has dimension
. Also, as one traverses the arm
of
, any polynomial
in
grows at a rate of at most
, that is to say
has a pole of order at most
at the point at infinity
. By performing Laurent expansions around this point (which is a non-singular point of
, as the
are assumed to be distinct), we may thus find a basis
of
, with the property that
has a pole of order at most
at
for each
.
From the control of the pole at , we have
for all . The exponents here become negative for
, and on multiplying them all together we see that
This exponent is negative for large enough (or just take
). If we expand
for some algebraic numbers , then we thus have
for some standard . Note that the
-dimensional vectors
are linearly independent in
, because the
are linearly independent in
. Applying the Schmidt subspace theorem in the contrapositive, we conclude that the
-tuple
is not in
-general position. That is to say, one has a non-trivial constraint of the form
for some standard rational coefficients , not all zero. But, as
is irreducible and cubic in
, it has no common factor with the standard polynomial
, so by Bezout’s theorem (Theorem 2) the constraint (4) only has standard solutions, contradicting the strictly nonstandard nature of
.
Exercise 2 Rewrite the above argument so that it makes no reference to nonstandard analysis. (In this case, the rewriting is quite straightforward; however, there will be a subsequent argument in which the standard version is significantly messier than the nonstandard counterpart, which is the reason why I am working with the nonstandard formalism in this blog post.)
A similar argument works for higher degree curves that meet the line at infinity in three or more points, though if the curve has singularities at infinity then it becomes convenient to rely on the Riemann-Roch theorem to control the dimension of the analogue of the space . Note that when there are only two or fewer points at infinity, though, one cannot get the negative exponent of
needed to usefully apply the subspace theorem. To deal with this case we require some additional tricks. For simplicity we focus on the case of Mordell curves, although it will be convenient to work with more general number fields
than the rationals:
Theorem 8 (Siegel’s theorem for Mordell curves) Let
be a non-zero integer. Then there are only finitely many integer solutions
to
. More generally, for any number field
, and any nonzero
, there are only finitely many algebraic integer solutions
to
, where
is the ring of algebraic integers in
.
Again, we will establish the nonstandard version. We need some additional notation:
Definition 9
We define an almost rational integer to be a nonstandard such that
for some standard positive integer
, and write
for the
-algebra of almost rational integers.
If is a standard number field, we define an almost
-integer to be a nonstandard
such that
for some standard positive integer
, and write
for the
-algebra of almost
-integers.
We define an almost algebraic integer to be a nonstandard such that
is a nonstandard algebraic integer for some standard positive integer
, and write
for the
-algebra of almost algebraic integers.
Theorem 10 (Siegel for Mordell, nonstandard version) Let
be a non-zero standard algebraic number. Then the curve
does not contain any strictly nonstandard almost algebraic integer point.
Another way of phrasing this theorem is that if are strictly nonstandard almost algebraic integers, then
is either strictly nonstandard or zero.
Exercise 3 Verify that Theorem 8 and Theorem 10 are equivalent.
Due to all the ineffectivity, our proof does not supply any bound on the solutions in terms of
, even if one removes all references to nonstandard analysis. It is a conjecture of Hall (a special case of the notorious ABC conjecture) that one has the bound
for all
(or equivalently
), but even the weaker conjecture that
are of polynomial size in
is open. (The best known bounds are of exponential nature, and are proven using a version of Baker’s method: see for instance this text of Sprindzuk.)
A direct repetition of the arguments used to prove Theorem 7 will not work here, because the Mordell curve only hits the line at infinity at one point,
. To get around this we will exploit the fact that the Mordell curve is an elliptic curve and thus has a group law on it. We will then divide all the integer points on this curve by two; as elliptic curves have four 2-torsion points, this will end up placing us in a situation like Theorem 7, with four points at infinity. However, there is an obstruction: it is not obvious that dividing an integer point on the Mordell curve by two will produce another integer point. However, this is essentially true (after enlarging the ring of integers slightly) thanks to a general principle of Chevalley and Weil, which can be worked out explicitly in the case of division by two on Mordell curves by relatively elementary means (relying mostly on unique factorisation of ideals of algebraic integers). We give the details below the fold.
There are a number of ways to construct the real numbers , for instance
- as the metric completion of
(thus,
is defined as the set of Cauchy sequences of rationals, modulo Cauchy equivalence);
- as the space of Dedekind cuts on the rationals
;
- as the space of quasimorphisms
on the integers, quotiented by bounded functions. (I believe this construction first appears in this paper of Street, who credits the idea to Schanuel, though the germ of this construction arguably goes all the way back to Eudoxus.)
There is also a fourth family of constructions that proceeds via nonstandard analysis, as a special case of what is known as the nonstandard hull construction. (Here I will assume some basic familiarity with nonstandard analysis and ultraproducts, as covered for instance in this previous blog post.) Given an unbounded nonstandard natural number , one can define two external additive subgroups of the nonstandard integers
:
- The group
of all nonstandard integers of magnitude less than or comparable to
; and
- The group
of nonstandard integers of magnitude infinitesimally smaller than
.
The group is a subgroup of
, so we may form the quotient group
. This space is isomorphic to the reals
, and can in fact be used to construct the reals:
Proposition 1 For any coset
of
, there is a unique real number
with the property that
. The map
is then an isomorphism between the additive groups
and
.
Proof: Uniqueness is clear. For existence, observe that the set is a Dedekind cut, and its supremum can be verified to have the required properties for
.
In a similar vein, we can view the unit interval in the reals as the quotient
where is the nonstandard (i.e. internal) set
; of course,
is not a group, so one should interpret
as the image of
under the quotient map
(or
, if one prefers). Or to put it another way, (1) asserts that
is the image of
with respect to the map
.
In this post I would like to record a nice measure-theoretic version of the equivalence (1), which essentially appears already in standard texts on Loeb measure (see e.g. this text of Cutland). To describe the results, we must first quickly recall the construction of Loeb measure on . Given an internal subset
of
, we may define the elementary measure
of
by the formula
This is a finitely additive probability measure on the Boolean algebra of internal subsets of . We can then construct the Loeb outer measure
of any subset
in complete analogy with Lebesgue outer measure by the formula
where ranges over all sequences of internal subsets of
that cover
. We say that a subset
of
is Loeb measurable if, for any (standard)
, one can find an internal subset
of
which differs from
by a set of Loeb outer measure at most
, and in that case we define the Loeb measure
of
to be
. It is a routine matter to show (e.g. using the Carathéodory extension theorem) that the space
of Loeb measurable sets is a
-algebra, and that
is a countably additive probability measure on this space that extends the elementary measure
. Thus
now has the structure of a probability space
.
Now, the group acts (Loeb-almost everywhere) on the probability space
by the addition map, thus
for
and
(excluding a set of Loeb measure zero where
exits
). This action is clearly seen to be measure-preserving. As such, we can form the invariant factor
, defined by restricting attention to those Loeb measurable sets
with the property that
is equal
-almost everywhere to
for each
.
The claim is then that this invariant factor is equivalent (up to almost everywhere equivalence) to the unit interval with Lebesgue measure
(and the trivial action of
), by the same factor map
used in (1). More precisely:
Theorem 2 Given a set
, there exists a Lebesgue measurable set
, unique up to
-a.e. equivalence, such that
is
-a.e. equivalent to the set
. Conversely, if
is Lebesgue measurable, then
is in
, and
.
More informally, we have the measure-theoretic version
of (1).
Proof: We first prove the converse. It is clear that is
-invariant, so it suffices to show that
is Loeb measurable with Loeb measure
. This is easily verified when
is an elementary set (a finite union of intervals). By countable subadditivity of outer measure, this implies that Loeb outer measure of
is bounded by the Lebesgue outer measure of
for any set
; since every Lebesgue measurable set differs from an elementary set by a set of arbitrarily small Lebesgue outer measure, the claim follows.
Now we establish the forward claim. Uniqueness is clear from the converse claim, so it suffices to show existence. Let . Let
be an arbitrary standard real number, then we can find an internal set
which differs from
by a set of Loeb measure at most
. As
is
-invariant, we conclude that for every
,
and
differ by a set of Loeb measure (and hence elementary measure) at most
. By the (contrapositive of the) underspill principle, there must exist a standard
such that
and
differ by a set of elementary measure at most
for all
. If we then define the nonstandard function
by the formula
then from the (nonstandard) triangle inequality we have
(say). On the other hand, has the Lipschitz continuity property
and so in particular we see that
for some Lipschitz continuous function . If we then let
be the set where
, one can check that
differs from
by a set of Loeb outer measure
, and hence
does so also. Sending
to zero, we see (from the converse claim) that
is a Cauchy sequence in
and thus converges in
for some Lebesgue measurable
. The sets
then converge in Loeb outer measure to
, giving the claim.
Thanks to the Lebesgue differentiation theorem, the conditional expectation of a bounded Loeb-measurable function
can be expressed (as a function on
, defined
-a.e.) as
By the abstract ergodic theorem from the previous post, one can also view this conditional expectation as the element in the closed convex hull of the shifts ,
of minimal
norm. In particular, we obtain a form of the von Neumann ergodic theorem in this context: the averages
for
converge (as a net, rather than a sequence) in
to
.
If is (the standard part of) an internal function, that is to say the ultralimit of a sequence
of finitary bounded functions, one can view the measurable function
as a limit of the
that is analogous to the “graphons” that emerge as limits of graphs (see e.g. the recent text of Lovasz on graph limits). Indeed, the measurable function
is related to the discrete functions
by the formula
for all , where
is the nonprincipal ultrafilter used to define the nonstandard universe. In particular, from the Arzela-Ascoli diagonalisation argument there is a subsequence
such that
thus is the asymptotic density function of the
. For instance, if
is the indicator function of a randomly chosen subset of
, then the asymptotic density function would equal
(almost everywhere, at least).
I’m continuing to look into understanding the ergodic theory of actions, as I believe this may allow one to apply ergodic theory methods to the “single-scale” or “non-asymptotic” setting (in which one averages only over scales comparable to a large parameter
, rather than the traditional asymptotic approach of letting the scale go to infinity). I’m planning some further posts in this direction, though this is still a work in progress.
(This is an extended blog post version of my talk “Ultraproducts as a Bridge Between Discrete and Continuous Analysis” that I gave at the Simons institute for the theory of computing at the workshop “Neo-Classical methods in discrete analysis“. Some of the material here is drawn from previous blog posts, notably “Ultraproducts as a bridge between hard analysis and soft analysis” and “Ultralimit analysis and quantitative algebraic geometry“‘. The text here has substantially more details than the talk; one may wish to skip all of the proofs given here to obtain a closer approximation to the original talk.)
Discrete analysis, of course, is primarily interested in the study of discrete (or “finitary”) mathematical objects: integers, rational numbers (which can be viewed as ratios of integers), finite sets, finite graphs, finite or discrete metric spaces, and so forth. However, many powerful tools in mathematics (e.g. ergodic theory, measure theory, topological group theory, algebraic geometry, spectral theory, etc.) work best when applied to continuous (or “infinitary”) mathematical objects: real or complex numbers, manifolds, algebraic varieties, continuous topological or metric spaces, etc. In order to apply results and ideas from continuous mathematics to discrete settings, there are basically two approaches. One is to directly discretise the arguments used in continuous mathematics, which often requires one to keep careful track of all the bounds on various quantities of interest, particularly with regard to various error terms arising from discretisation which would otherwise have been negligible in the continuous setting. The other is to construct continuous objects as limits of sequences of discrete objects of interest, so that results from continuous mathematics may be applied (often as a “black box”) to the continuous limit, which then can be used to deduce consequences for the original discrete objects which are quantitative (though often ineffectively so). The latter approach is the focus of this current talk.
The following table gives some examples of a discrete theory and its continuous counterpart, together with a limiting procedure that might be used to pass from the former to the latter:
(Discrete) | (Continuous) | (Limit method) |
Ramsey theory | Topological dynamics | Compactness |
Density Ramsey theory | Ergodic theory | Furstenberg correspondence principle |
Graph/hypergraph regularity | Measure theory | Graph limits |
Polynomial regularity | Linear algebra | Ultralimits |
Structural decompositions | Hilbert space geometry | Ultralimits |
Fourier analysis | Spectral theory | Direct and inverse limits |
Quantitative algebraic geometry | Algebraic geometry | Schemes |
Discrete metric spaces | Continuous metric spaces | Gromov-Hausdorff limits |
Approximate group theory | Topological group theory | Model theory |
As the above table illustrates, there are a variety of different ways to form a limiting continuous object. Roughly speaking, one can divide limits into three categories:
- Topological and metric limits. These notions of limits are commonly used by analysts. Here, one starts with a sequence (or perhaps a net) of objects
in a common space
, which one then endows with the structure of a topological space or a metric space, by defining a notion of distance between two points of the space, or a notion of open neighbourhoods or open sets in the space. Provided that the sequence or net is convergent, this produces a limit object
, which remains in the same space, and is “close” to many of the original objects
with respect to the given metric or topology.
- Categorical limits. These notions of limits are commonly used by algebraists. Here, one starts with a sequence (or more generally, a diagram) of objects
in a category
, which are connected to each other by various morphisms. If the ambient category is well-behaved, one can then form the direct limit
or the inverse limit
of these objects, which is another object in the same category
, and is connected to the original objects
by various morphisms.
- Logical limits. These notions of limits are commonly used by model theorists. Here, one starts with a sequence of objects
or of spaces
, each of which is (a component of) a model for given (first-order) mathematical language (e.g. if one is working in the language of groups,
might be groups and
might be elements of these groups). By using devices such as the ultraproduct construction, or the compactness theorem in logic, one can then create a new object
or a new space
, which is still a model of the same language (e.g. if the spaces
were all groups, then the limiting space
will also be a group), and is “close” to the original objects or spaces in the sense that any assertion (in the given language) that is true for the limiting object or space, will also be true for many of the original objects or spaces, and conversely. (For instance, if
is an abelian group, then the
will also be abelian groups for many
.)
The purpose of this talk is to highlight the third type of limit, and specifically the ultraproduct construction, as being a “universal” limiting procedure that can be used to replace most of the limits previously mentioned. Unlike the topological or metric limits, one does not need the original objects to all lie in a common space
in order to form an ultralimit
; they are permitted to lie in different spaces
; this is more natural in many discrete contexts, e.g. when considering graphs on
vertices in the limit when
goes to infinity. Also, no convergence properties on the
are required in order for the ultralimit to exist. Similarly, ultraproduct limits differ from categorical limits in that no morphisms between the various spaces
involved are required in order to construct the ultraproduct.
With so few requirements on the objects or spaces
, the ultraproduct construction is necessarily a very “soft” one. Nevertheless, the construction has two very useful properties which make it particularly useful for the purpose of extracting good continuous limit objects out of a sequence of discrete objects. First of all, there is Łos’s theorem, which roughly speaking asserts that any first-order sentence which is asymptotically obeyed by the
, will be exactly obeyed by the limit object
; in particular, one can often take a discrete sequence of “partial counterexamples” to some assertion, and produce a continuous “complete counterexample” that same assertion via an ultraproduct construction; taking the contrapositives, one can often then establish a rigorous equivalence between a quantitative discrete statement and its qualitative continuous counterpart. Secondly, there is the countable saturation property that ultraproducts automatically enjoy, which is a property closely analogous to that of compactness in topological spaces, and can often be used to ensure that the continuous objects produced by ultraproduct methods are “complete” or “compact” in various senses, which is particularly useful in being able to upgrade qualitative (or “pointwise”) bounds to quantitative (or “uniform”) bounds, more or less “for free”, thus reducing significantly the burden of “epsilon management” (although the price one pays for this is that one needs to pay attention to which mathematical objects of study are “standard” and which are “nonstandard”). To achieve this compactness or completeness, one sometimes has to restrict to the “bounded” portion of the ultraproduct, and it is often also convenient to quotient out the “infinitesimal” portion in order to complement these compactness properties with a matching “Hausdorff” property, thus creating familiar examples of continuous spaces, such as locally compact Hausdorff spaces.
Ultraproducts are not the only logical limit in the model theorist’s toolbox, but they are one of the simplest to set up and use, and already suffice for many of the applications of logical limits outside of model theory. In this post, I will set out the basic theory of these ultraproducts, and illustrate how they can be used to pass between discrete and continuous theories in each of the examples listed in the above table.
Apart from the initial “one-time cost” of setting up the ultraproduct machinery, the main loss one incurs when using ultraproduct methods is that it becomes very difficult to extract explicit quantitative bounds from results that are proven by transferring qualitative continuous results to the discrete setting via ultraproducts. However, in many cases (particularly those involving regularity-type lemmas) the bounds are already of tower-exponential type or worse, and there is arguably not much to be lost by abandoning the explicit quantitative bounds altogether.
Recent Comments