You are currently browsing the category archive for the ‘math.LO’ category.
One of the key difficulties in performing analysis in infinite-dimensional function spaces, as opposed to finite-dimensional vector spaces, is that the Bolzano-Weierstrass theorem no longer holds: a bounded sequence in an infinite-dimensional function space need not have any convergent subsequences (when viewed using the strong topology). To put it another way, the closed unit ball in an infinite-dimensional function space usually fails to be (sequentially) compact.
As compactness is such a useful property to have in analysis, various tools have been developed over the years to try to salvage some sort of substitute for the compactness property in infinite-dimensional spaces. One of these tools is concentration compactness, which was discussed previously on this blog. This can be viewed as a compromise between weak compactness (which is true in very general circumstances, but is often too weak for applications) and strong compactness (which would be very useful in applications, but is usually false), in which one obtains convergence in an intermediate sense that involves a group of symmetries acting on the function space in question.
Concentration compactness is usually stated and proved in the language of standard analysis: epsilons and deltas, limits and supremas, and so forth. In this post, I wanted to note that one could also state and prove the basic foundations of concentration compactness in the framework of nonstandard analysis, in which one now deals with infinitesimals and ultralimits instead of epsilons and ordinary limits. This is a fairly mild change of viewpoint, but I found it to be informative to view this subject from a slightly different perspective. The nonstandard proofs require a fair amount of general machinery to set up, but conversely, once all the machinery is up and running, the proofs become slightly shorter, and can exploit tools from (standard) infinitary analysis, such as orthogonal projections in Hilbert spaces, or the continuous-pure point decomposition of measures. Because of the substantial amount of setup required, nonstandard proofs tend to have significantly more net complexity than their standard counterparts when it comes to basic results (such as those presented in this post), but the gap between the two narrows when the results become more difficult, and for particularly intricate and deep results it can happen that nonstandard proofs end up being simpler overall than their standard analogues, particularly if the nonstandard proof is able to tap the power of some existing mature body of infinitary mathematics (e.g. ergodic theory, measure theory, Hilbert space theory, or topological group theory) which is difficult to directly access in the standard formulation of the argument.
Many structures in mathematics are incomplete in one or more ways. For instance, the field of rationals or the reals
are algebraically incomplete, because there are some non-trivial algebraic equations (such as
in the case of the rationals, or
in the case of the reals) which could potentially have solutions (because they do not imply a necessarily false statement, such as
, just using the laws of algebra), but do not actually have solutions in the specified field.
Similarly, the rationals , when viewed now as a metric space rather than as a field, are also metrically incomplete, beause there exist sequences in the rationals (e.g. the decimal approximations
of the irrational number
) which could potentially converge to a limit (because they form a Cauchy sequence), but do not actually converge in the specified metric space.
A third type of incompleteness is that of logical incompleteness, which applies now to formal theories rather than to fields or metric spaces. For instance, Zermelo-Frankel-Choice (ZFC) set theory is logically incomplete, because there exist statements (such as the consistency of ZFC) which could potentially be provable by the theory (because it does not lead to a contradiction, or at least so we believe, just from the axioms and deductive rules of the theory), but is not actually provable in this theory.
A fourth type of incompleteness, which is slightly less well known than the above three, is what I will call elementary incompleteness (and which model theorists call the failure of the countable saturation property). It applies to any structure that is describable by a first-order language, such as a field, a metric space, or a universe of sets. For instance, in the language of ordered real fields, the real line is elementarily incomplete, because there exists a sequence of statements (such as the statements
for natural numbers
) in this language which are potentially simultaneously satisfiable (in the sense that any finite number of these statements can be satisfied by some real number
) but are not actually simultaneously satisfiable in this theory.
In each of these cases, though, it is possible to start with an incomplete structure and complete it to a much larger structure to eliminate the incompleteness. For instance, starting with an arbitrary field , one can take its algebraic completion (or algebraic closure)
; for instance,
can be viewed as the algebraic completion of
. This field is usually significantly larger than the original field
, but contains
as a subfield, and every element of
can be described as the solution to some polynomial equation with coefficients in
. Furthermore,
is now algebraically complete (or algebraically closed): every polynomial equation in
which is potentially satisfiable (in the sense that it does not lead to a contradiction such as
from the laws of algebra), is actually satisfiable in
.
Similarly, starting with an arbitrary metric space , one can take its metric completion
; for instance,
can be viewed as the metric completion of
. Again, the completion
is usually much larger than the original metric space
, but contains
as a subspace, and every element of
can be described as the limit of some Cauchy sequence in
. Furthermore,
is now a complete metric space: every sequence in
which is potentially convergent (in the sense of being a Cauchy sequence), is now actually convegent in
.
In a similar vein, we have the Gödel completeness theorem, which implies (among other things) that for any consistent first-order theory for a first-order language
, there exists at least one completion
of that theory
, which is a consistent theory in which every sentence in
which is potentially true in
(because it does not lead to a contradiction in
) is actually true in
. Indeed, the completeness theorem provides at least one model (or structure)
of the consistent theory
, and then the completion
can be formed by interpreting every sentence in
using
to determine its truth value. Note, in contrast to the previous two examples, that the completion is usually not unique in any way; a theory
can have multiple inequivalent models
, giving rise to distinct completions of the same theory.
Finally, if one starts with an arbitrary structure , one can form an elementary completion
of it, which is a significantly larger structure which contains
as a substructure, and such that every element of
is an elementary limit of a sequence of elements in
(I will define this term shortly). Furthermore,
is elementarily complete; any sequence of statements that are potentially simultaneously satisfiable in
(in the sense that any finite number of statements in this collection are simultaneously satisfiable), will actually be simultaneously satisfiable. As we shall see, one can form such an elementary completion by taking an ultrapower of the original structure
. If
is the standard universe of all the standard objects one considers in mathematics, then its elementary completion
is known as the nonstandard universe, and is the setting for nonstandard analysis.
As mentioned earlier, completion tends to make a space much larger and more complicated. If one algebraically completes a finite field, for instance, one necessarily obtains an infinite field as a consequence. If one metrically completes a countable metric space with no isolated points, such as , then one necessarily obtains an uncountable metric space (thanks to the Baire category theorem). If one takes a logical completion of a consistent first-order theory that can model true arithmetic, then this completion is no longer describable by a recursively enumerable schema of axioms, thanks to Gödel’s incompleteness theorem. And if one takes the elementary completion of a countable structure, such as the integers
, then the resulting completion
will necessarily be uncountable.
However, there are substantial benefits to working in the completed structure which can make it well worth the massive increase in size. For instance, by working in the algebraic completion of a field, one gains access to the full power of algebraic geometry. By working in the metric completion of a metric space, one gains access to powerful tools of real analysis, such as the Baire category theorem, the Heine-Borel theorem, and (in the case of Euclidean completions) the Bolzano-Weierstrass theorem. By working in a logically and elementarily completed theory (aka a saturated model) of a first-order theory, one gains access to the branch of model theory known as definability theory, which allows one to analyse the structure of definable sets in much the same way that algebraic geometry allows one to analyse the structure of algebraic sets. Finally, when working in an elementary completion of a structure, one gains a sequential compactness property, analogous to the Bolzano-Weierstrass theorem, which can be interpreted as the foundation for much of nonstandard analysis, as well as providing a unifying framework to describe various correspondence principles between finitary and infinitary mathematics.
In this post, I wish to expand upon these above points with regard to elementary completion, and to present nonstandard analysis as a completion of standard analysis in much the same way as, say, complex algebra is a completion of real algebra, or real metric geometry is a completion of rational metric geometry.
This is the third in a series of posts on the “no self-defeating object” argument in mathematics – a powerful and useful argument based on formalising the observation that any object or structure that is so powerful that it can “defeat” even itself, cannot actually exist. This argument is used to establish many basic impossibility results in mathematics, such as Gödel’s theorem that it is impossible for any sufficiently sophisticated formal axiom system to prove its own consistency, Turing’s theorem that it is impossible for any sufficiently sophisticated programming language to solve its own halting problem, or Cantor’s theorem that it is impossible for any set to enumerate its own power set (and as a corollary, the natural numbers cannot enumerate the real numbers).
As remarked in the previous posts, many people who encounter these theorems can feel uneasy about their conclusions, and their method of proof; this seems to be particularly the case with regard to Cantor’s result that the reals are uncountable. In the previous post in this series, I focused on one particular aspect of the standard proofs which one might be uncomfortable with, namely their counterfactual nature, and observed that many of these proofs can be largely (though not completely) converted to non-counterfactual form. However, this does not fully dispel the sense that the conclusions of these theorems – that the reals are not countable, that the class of all sets is not itself a set, that truth cannot be captured by a predicate, that consistency is not provable, etc. – are highly unintuitive, and even objectionable to “common sense” in some cases.
How can intuition lead one to doubt the conclusions of these mathematical results? I believe that one reason is because these results are sensitive to the amount of vagueness in one’s mental model of mathematics. In the formal mathematical world, where every statement is either absolutely true or absolutely false with no middle ground, and all concepts require a precise definition (or at least a precise axiomatisation) before they can be used, then one can rigorously state and prove Cantor’s theorem, Gödel’s theorem, and all the other results mentioned in the previous posts without difficulty. However, in the vague and fuzzy world of mathematical intuition, in which one’s impression of the truth or falsity of a statement may be influenced by recent mental reference points, definitions are malleable and blurry with no sharp dividing lines between what is and what is not covered by such definitions, and key mathematical objects may be incompletely specified and thus “moving targets” subject to interpretation, then one can argue with some degree of justification that the conclusions of the above results are incorrect; in the vague world, it seems quite plausible that one can always enumerate all the real numbers “that one needs to”, one can always justify the consistency of one’s reasoning system, one can reason using truth as if it were a predicate, and so forth. The impossibility results only kick in once one tries to clear away the fog of vagueness and nail down all the definitions and mathematical statements precisely. (To put it another way, the no-self-defeating object argument relies very much on the disconnected, definite, and absolute nature of the boolean truth space in the rigorous mathematical world.)
One notable feature of mathematical reasoning is the reliance on counterfactual thinking – taking a hypothesis (or set of hypotheses) which may or may not be true, and following it (or them) to its logical conclusion. For instance, most propositions in mathematics start with a set of hypotheses (e.g. “Let be a natural number such that …”), which may or may not apply to the particular value of
one may have in mind. Or, if one ever argues by dividing into separate cases (e.g. “Case 1:
is even. … Case 2:
is odd. …”), then for any given
, at most one of these cases would actually be applicable, with the other cases being counterfactual alternatives. But the purest example of counterfactual thinking in mathematics comes when one employs a proof by contradiction (or reductio ad absurdum) – one introduces a hypothesis that in fact has no chance of being true at all (e.g. “Suppose for sake of contradiction that
is equal to the ratio
of two natural numbers.”), and proceeds to demonstrate this fact by showing that this hypothesis leads to absurdity.
Experienced mathematicians are so used to this type of counterfactual thinking that it is sometimes difficult for them to realise that it this type of thinking is not automatically intuitive for students or non-mathematicians, who can anchor their thinking on the single, “real” world to the extent that they cannot easily consider hypothetical alternatives. This can lead to confused exchanges such as the following:
Lecturer: “Theorem. Let
be a prime number. Then…”
Student: “But how do you know that
is a prime number? Couldn’t it be composite?”
or
Lecturer: “Now we see what the function
does when we give it the input of
instead. …”
Student: “But didn’t you just say that the input was equal to
just a moment ago?”
This is not to say that counterfactual thinking is not encountered at all outside of mathematics. For instance, an obvious source of counterfactual thinking occurs in fictional writing or film, particularly in speculative fiction such as science fiction, fantasy, or alternate history. Here, one can certainly take one or more counterfactual hypotheses (e.g. “what if magic really existed?”) and follow them to see what conclusions would result. The analogy between this and mathematical counterfactual reasoning is not perfect, of course: in fiction, consequences are usually not logically entailed by their premises, but are instead driven by more contingent considerations, such as the need to advance the plot, to entertain or emotionally affect the reader, or to make some moral or ideological point, and these types of narrative elements are almost completely absent in mathematical writing. Nevertheless, the analogy can be somewhat helpful when one is first coming to terms with mathematical reasoning. For instance, the mathematical concept of a proof by contradiction can be viewed as roughly analogous in some ways to such literary concepts as satire, dark humour, or absurdist fiction, in which one takes a premise specifically with the intent to derive absurd consequences from it. And if the proof of (say) a lemma is analogous to a short story, then the statement of that lemma can be viewed as analogous to the moral of that story.
Another source of counterfactual thinking outside of mathematics comes from simulation, when one feeds some initial data or hypotheses (that may or may not correspond to what actually happens in the real world) into a simulated environment (e.g. a piece of computer software, a laboratory experiment, or even just a thought-experiment), and then runs the simulation to see what consequences result from these hypotheses. Here, proof by contradiction is roughly analogous to the “garbage in, garbage out” phenomenon that is familiar to anyone who has worked with computers: if one’s initial inputs to a simulation are not consistent with the hypotheses of that simulation, or with each other, one can obtain bizarrely illogical (and sometimes unintentionally amusing) outputs as a result; and conversely, such outputs can be used to detect and diagnose problems with the data, hypotheses, or implementation of the simulation.
Despite the presence of these non-mathematical analogies, though, proofs by contradiction are still often viewed with suspicion and unease by many students of mathematics. Perhaps the quintessential example of this is the standard proof of Cantor’s theorem that the set of real numbers is uncountable. This is about as short and as elegant a proof by contradiction as one can have without being utterly trivial, and despite this (or perhaps because of this) it seems to offend the reason of many people when they are first exposed to it, to an extent far greater than most other results in mathematics. (The only other two examples I know of that come close to doing this are the fact that the real number
is equal to 1, and the solution to the blue-eyed islanders puzzle.)
Some time ago on this blog, I collected a family of well-known results in mathematics that were proven by contradiction, and specifically by a type of argument that I called the “no self-defeating object” argument; that any object that was so ridiculously overpowered that it could be used to “defeat” its own existence, could not actually exist. Many basic results in mathematics can be phrased in this manner: not only Cantor’s theorem, but Euclid’s theorem on the infinitude of primes, Gödel’s incompleteness theorem, or the conclusion (from Russell’s paradox) that the class of all sets cannot itself be a set.
I presented each of these arguments in the usual “proof by contradiction” manner; I made the counterfactual hypothesis that the impossibly overpowered object existed, and then used this to eventually derive a contradiction. Mathematically, there is nothing wrong with this reasoning, but because the argument spends almost its entire duration inside the bizarre counterfactual universe caused by an impossible hypothesis, readers who are not experienced with counterfactual thinking may view these arguments with unease.
It was pointed out to me, though (originally with regards to Euclid’s theorem, but the same point in fact applies to the other results I presented) that one can pull a large fraction of each argument out of this counterfactual world, so that one can see most of the argument directly, without the need for any intrinsically impossible hypotheses. This is done by converting the “no self-defeating object” argument into a logically equivalent “any object can be defeated” argument, with the former then being viewed as an immediate corollary of the latter. This change is almost trivial to enact (it is often little more than just taking the contrapositive of the original statement), but it does offer a slightly different “non-counterfactual” (or more precisely, “not necessarily counterfactual”) perspective on these arguments which may assist in understanding how they work.
For instance, consider the very first no-self-defeating result presented in the previous post:
Proposition 1 (No largest natural number). There does not exist a natural number
that is larger than all the other natural numbers.
This is formulated in the “no self-defeating object” formulation. But it has a logically equivalent “any object can be defeated” form:
Proposition 1′. Given any natural number
, one can find another natural number
which is larger than
.
Proof. Take .
While Proposition 1 and Proposition 1′ are logically equivalent to each other, note one key difference: Proposition 1′ can be illustrated with examples (e.g. take , so that the proof gives
), whilst Proposition 1 cannot (since there is, after all, no such thing as a largest natural number). So there is a sense in which Proposition 1′ is more “non-counterfactual” or ”constructive” than the “counterfactual” Proposition 1.
In a similar spirit, Euclid’s theorem (which we give using the numbering from the previous post),
Proposition 3. There are infinitely many primes.
can be recast in “all objects can be defeated” form as
Proposition 3′. Let
be a collection of primes. Then there exists a prime
which is distinct from any of the primes
.
Proof. Take to be any prime factor of
(for instance, one could take the smallest prime factor, if one wished to be completely concrete). Since
is not divisible by any of the primes
,
must be distinct from all of these primes.
One could argue that there was a slight use of proof by contradiction in the proof of Proposition 3′ (because one had to briefly entertain and then rule out the counterfactual possibility that was equal to one of the
), but the proposition itself is not inherently counterfactual, as it does not make as patently impossible a hypothesis as a finite enumeration of the primes. Incidentally, it can be argued that the proof of Proposition 3′ is closer in spirit to Euclid’s original proof of his theorem, than the proof of Proposition 3 that is usually given today. Again, Proposition 3′ is “constructive”; one can apply it to any finite list of primes, say
, and it will actually exhibit a prime not in that list (in this case,
). The same cannot be said of Proposition 3, despite the logical equivalence of the two statements.
[Note: the article below may make more sense if one first reviews the previous blog post on the "no self-defeating object". For instance, the section and theorem numbering here is deliberately chosen to match that of the preceding post.]
One of the most notorious open problems in functional analysis is the invariant subspace problem for Hilbert spaces, which I will state here as a conjecture:
Conjecture 1 (Invariant Subspace Problem, ISP0) Let
be an infinite dimensional complex Hilbert space, and let
be a bounded linear operator. Then
contains a proper closed invariant subspace
(thus
).
As stated this conjecture is quite infinitary in nature. Just for fun, I set myself the task of trying to find an equivalent reformulation of this conjecture that only involved finite-dimensional spaces and operators. This turned out to be somewhat difficult, but not entirely impossible, if one adopts a sufficiently generous version of “finitary” (cf. my discussion of how to finitise the infinitary pigeonhole principle). Unfortunately, the finitary formulation that I arrived at ended up being rather complicated (in particular, involving the concept of a “barrier”), and did not obviously suggest a path to resolving the conjecture; but it did at least provide some simpler finitary consequences of the conjecture which might be worth focusing on as subproblems.
I should point out that the arguments here are quite “soft” in nature and are not really addressing the heart of the invariant subspace problem; but I think it is still of interest to observe that this problem is not purely an infinitary problem, and does have some non-trivial finitary consequences.
I am indebted to Henry Towsner for many discussions on this topic.
In topology, a non-empty set is said to be connected if cannot be decomposed into two nontrivial subsets that are both closed and open relative to
, and path connected if any two points
in
can be connected by a path (i.e. there exists a continuous map
with
and
).
Path-connected sets are always connected, but the converse is not true, even in the model case of compact subsets of a Euclidean space. The classic counterexample is the set
to
).
Looking at the definitions of the two concepts, one notices a difference: the notion of path-connectedness is somehow a “positive” one, in the sense that a path-connected set can produce the existence of something (a path connecting two points and
) for a given type of input (in this case, a pair of points
). On the other hand, the notion of connectedness is a “negative” one, in that it asserts the non-existence of something (a non-trivial partition into clopen sets). To put it another way, it is relative easy to convince someone that a set is path-connected (by providing a connecting path for every pair of points) or is disconnected (by providing a non-trivial partition into clopen sets) but if a set is not path-connected, or is connected, how can one easily convince someone of this fact? To put it yet another way: is there a reasonable certificate for connectedness (or for path-disconnectedness)?
In the case of connectedness for compact subsets of Euclidean space, there is an answer as follows. If
, let us call two points
in
-connected if one can find a finite sequence
of points in
, such that
for all
; informally, one can jump from
to
in
using jumps of length at most
. Let us call
an
-discrete path.
Proposition 1 (Connectedness certificate for compact subsets of Euclidean space) Let
be compact and non-empty. Then
is connected if and only if every pair of points in
is
-connected for every
.
Proof: Suppose first that is disconnected, then
can be partitioned into two non-empty closed subsets
. Since
is compact,
are compact also, and so they are separated by some non-zero distance
. But then it is clear that points in
cannot be
-connected to points in
, and the claim follows.
Conversely, suppose that there is a pair of points in
and an
such that
are not
-connected. Let
be the set of all points in
that are
-connected to
. It is easy to check that
is open, closed, and a proper subset of
; thus
is disconnected.
We remark that the above proposition in fact works for any compact metric space. It is instructive to see how the points and
are
-connected in the set (1); the
-discrete path follows the graph of
backwards until one gets sufficiently close to the
-axis, at which point one “jumps” across to the
-axis to eventually reach
.
It is also interesting to contrast the above proposition with path connectedness. Clearly, if two points are connected by a path, then they are
-connected for every
(because every continuous map
is uniformly continuous); but from the analysis of the example (1) we see that the converse is not true. Roughly speaking, the various
-discrete paths from
to
have to be “compatible” with each other in some sense in order to synthesise a continuous path from
to
in the limit (we will not make this precise here).
But this leaves two (somewhat imprecise) questions, which I do not know how to satisfactorily answer:
Question 1: Is there a good certificate for path disconnectedness, say for compact subsets of
? One can construct lousy certificates, for instance one could look at all continuous paths in
joining two particular points
in
, and verify that each one of them leaves
at some point. But this is an “uncountable” certificate – it requires one to check an uncountable number of paths. In contrast, the certificate in Proposition 1 is basically a countable one (if one describes a compact set
by describing a family of
-nets for a countable sequence of
tending to zero). (Very roughly speaking, I would like a certificate that can somehow be “verified in countable time” in a suitable oracle model, as discussed in my previous post, though I have not tried to make this vague specification more rigorous.)
It is tempting to look at the equivalence classes of given by the relation of being connected by a path, but these classes need not be closed (as one can see with the example (1)) and it is not obvious to me how to certify that two such classes are not path-connected to each other.
Question 2: Is there a good certificate for connectedness for closed but unbounded closed subsets of ? Proposition 1 fails in this case; consider for instance the set
is
-connected for every
, and yet this set is disconnected.
The problem here is that as gets smaller, the
-discrete paths connecting a pair of points such as
and
have diameter going to infinity. One natural guess is then to require a uniform bound on the diameter, i.e. that for any pair of points
, there exists an
such that there is an
-discrete path from
to
of diameter at most
for every
. This does indeed force connectedness, but unfortunately not all connected sets have this property. Consider for instance the set
, where
is a rectangular ellipse centered at the origin with minor diameter endpoints and major diameter endpoints
, and
is a circle that connects the endpoint of
to the point
in
. One can check that
is a closed connected set, but the
-discrete paths connecting
with
have unbounded diameter as
.
Currently, I do not have any real progress on Question 1. For Question 2, I can only obtain the following strange “second-order” criterion for connectedness, that involves an unspecified gauge function :
Proposition 2 (Second-order connectedness certificate) Let
be a closed non-empty subset of
. Then the following are equivalent:
is connected.
- For every monotone decreasing, strictly positive function
and every
, there exists a discrete path
in
such that
.
Proof: This is proven in almost the same way as Proposition 1. If can be disconnected into two non-trivial sets
, then one can find a monotone decreasing gauge function
such that for each ball
,
and
are separated by at least
, and then there is no discrete path from
to
in
obeying the condition
.
Conversely, if there exists a gauge function and two points
which cannot be connected by a discrete path in
that obeys the condition
, then if one sets
to be all the points that can be reached from
in this manner, one easily verifies that
and
disconnect
.
It may be that this is somehow the “best” one can do, but I am not sure how to quantify this formally.
Anyway, I was curious if any of the readers here (particularly those with expertise in point-set topology or descriptive set theory) might be able to shed more light on these questions. (I also considered crossposting this to Math Overflow, but I think the question may be a bit too long (and vague) for that.)
(The original motivation for this question, by the way, stems from an attempt to use methods of topological group theory to attack questions in additive combinatorics, in the spirit of the paper of Hrushovski studied previously on this blog. The connection is rather indirect, though; I may discuss this more in a future post.)
The standard modern foundation of mathematics is constructed using set theory. With these foundations, the mathematical universe of objects one studies contains not only the “primitive” mathematical objects such as numbers and points, but also sets of these objects, sets of sets of objects, and so forth. (In a pure set theory, the primitive objects would themselves be sets as well; this is useful for studying the foundations of mathematics, but for most mathematical purposes it is more convenient, and less conceptually confusing, to refrain from modeling primitive objects as sets.) One has to carefully impose a suitable collection of axioms on these sets, in order to avoid paradoxes such as Russell’s paradox; but with a standard axiom system such as Zermelo-Fraenkel-Choice (ZFC), all actual paradoxes that we know of are eliminated. Still, one might be somewhat unnerved by the presence in set theory of statements which, while not genuinely paradoxical in a strict sense, are still highly unintuitive; Cantor’s theorem on the uncountability of the reals, and the Banach-Tarski paradox, are perhaps the two most familiar examples of this.
One may suspect that the reason for this unintuitive behaviour is the presence of infinite sets in one’s mathematical universe. After all, if one deals solely with finite sets, then there is no need to distinguish between countable and uncountable infinities, and Banach-Tarski type paradoxes cannot occur.
On the other hand, many statements in infinitary mathematics can be reformulated into equivalent statements in finitary mathematics (involving only finitely many points or numbers, etc.); I have explored this theme in a number of previous blog posts. So, one may ask: what is the finitary analogue of statements such as Cantor’s theorem or the Banach-Tarski paradox?
The finitary analogue of Cantor’s theorem is well-known: it is the assertion that for every natural number
, or equivalently that the power set of a finite set
of
elements cannot be enumerated by
itself. Though this is not quite the end of the story; after all, one also has
for every natural number
, or equivalently that the union
of a finite set
and an additional element
cannot be enumerated by
itself, but the former statement extends to the infinite case, while the latter one does not. What causes these two outcomes to be distinct?
On the other hand, it is less obvious what the finitary version of the Banach-Tarski paradox is. Note that this paradox is available only in three and higher dimensions, but not in one or two dimensions; so presumably a finitary analogue of this paradox should also make the same distinction between low and high dimensions.
I therefore set myself the exercise of trying to phrase Cantor’s theorem and the Banach-Tarski paradox in a more “finitary” language. It seems that the easiest way to accomplish this is to avoid the use of set theory, and replace sets by some other concept. Taking inspiration from theoretical computer science, I decided to replace concepts such as functions and sets by the concepts of algorithms and oracles instead, with various constructions in set theory being replaced instead by computer language pseudocode. The point of doing this is that one can now add a new parameter to the universe, namely the amount of computational resources one is willing to allow one’s algorithms to use. At one extreme, one can enforce a “strict finitist” viewpoint where the total computational resources available (time and memory) are bounded by some numerical constant, such as ; roughly speaking, this causes any mathematical construction to break down once its complexity exceeds this number. Or one can take the slightly more permissive “finitist” or “constructivist” viewpoint, where any finite amount of computational resource is permitted; or one can then move up to allowing any construction indexed by a countable ordinal, or the storage of any array of countable size. Finally one can allow constructions indexed by arbitrary ordinals (i.e. transfinite induction) and arrays of arbitrary infinite size, at which point the theory becomes more or less indistinguishable from standard set theory.
I describe this viewpoint, and how statements such as Cantor’s theorem and Banach-Tarski are interpreted with this viewpoint, below the fold. I should caution that this is a conceptual exercise rather than a rigorous one; I have not attempted to formalise these notions to the same extent that set theory is formalised. Thus, for instance, I have no explicit system of axioms that algorithms and oracles are supposed to obey. Of course, these formal issues have been explored in great depth by logicians over the past century or so, but I do not wish to focus on these topics in this post.
A second caveat is that the actual semantic content of this post is going to be extremely low. I am not going to provide any genuinely new proof of Cantor’s theorem, or give a new construction of Banach-Tarski type; instead, I will be reformulating the standard proofs and constructions in a different language. Nevertheless I believe this viewpoint is somewhat clarifying as to the nature of these paradoxes, and as to how they are not as fundamentally tied to the nature of sets or the nature of infinity as one might first expect.
I have blogged a number of times in the past about the relationship between finitary (or “hard”, or “quantitative”) analysis, and infinitary (or “soft”, or “qualitative”) analysis. One way to connect the two types of analysis is via compactness arguments (and more specifically, contradiction and compactness arguments); such arguments can convert qualitative properties (such as continuity) to quantitative properties (such as bounded), basically because of the fundamental fact that continuous functions on a compact space are bounded (or the closely related fact that sequentially continuous functions on a sequentially compact space are bounded).
A key stage in any such compactness argument is the following: one has a sequence of “quantitative” or “finitary” objects or spaces, and one has to somehow end up with a “qualitative” or “infinitary” limit object
or limit space. One common way to achieve this is to embed everything inside some universal space and then use some weak compactness property of that space, such as the Banach-Alaoglu theorem (or its sequential counterpart). This is for instance the idea behind the Furstenberg correspondence principle relating ergodic theory to combinatorics; see for instance this post of mine on this topic.
However, there is a slightly different approach, which I will call ultralimit analysis, which proceeds via the machinery of ultrafilters and ultraproducts; typically, the limit objects one constructs are now the ultraproducts (or ultralimits) of the original objects
. There are two main facts that make ultralimit analysis powerful. The first is that one can take ultralimits of arbitrary sequences of objects, as opposed to more traditional tools such as metric completions, which only allow one to take limits of Cauchy sequences of objects. The second fact is Los’s theorem, which tells us that
is an elementary limit of the
(i.e. every sentence in first-order logic which is true for the
for
large enough, is true for
). This existence of elementary limits is a manifestation of the compactness theorem in logic; see this earlier blog post for more discussion. So we see that compactness methods and ultrafilter methods are closely intertwined. (See also my earlier class notes for a related connection between ultrafilters and compactness.)
Ultralimit analysis is very closely related to nonstandard analysis. I already discussed some aspects of this relationship in an earlier post, and will expand upon it at the bottom of this post. Roughly speaking, the relationship between ultralimit analysis and nonstandard analysis is analogous to the relationship between measure theory and probability theory.
To illustrate how ultralimit analysis is actually used in practice, I will show later in this post how to take a qualitative infinitary theory – in this case, basic algebraic geometry – and apply ultralimit analysis to then deduce a quantitative version of this theory, in which the complexity of the various algebraic sets and varieties that appear as outputs are controlled uniformly by the complexity of the inputs. The point of this exercise is to show how ultralimit analysis allows for a relatively painless conversion back and forth between the quantitative and qualitative worlds, though in some cases the quantitative translation of a qualitative result (or vice versa) may be somewhat unexpected. In an upcoming paper of myself, Ben Green, and Emmanuel Breuillard (announced in the previous blog post), we will rely on ultralimit analysis to reduce the messiness of various quantitative arguments by replacing them with a qualitative setting in which the theory becomes significantly cleaner.
For sake of completeness, I also redo some earlier instances of the correspondence principle via ultralimit analysis, namely the deduction of the quantitative Gromov theorem from the qualitative one, and of Szemerédi’s theorem from the Furstenberg recurrence theorem, to illustrate how close the two techniques are to each other.
One of the most basic theorems in linear algebra is that every finite-dimensional vector space has a finite basis. Let us give a statement of this theorem in the case when the underlying field is the rationals:
Theorem 1 (Finite generation implies finite basis, infinitary version) Let
be a vector space over the rationals
, and let
be a finite collection of vectors in
. Then there exists a collection
of vectors in
, with
, such that
- (
generates
) Every
can be expressed as a rational linear combination of the
.
- (
independent) There is no non-trivial linear relation
,
among the
(where non-trivial means that the
are not all zero).
In fact, one can take
to be a subset of the
.
Proof: We perform the following “rank reduction argument”. Start with initialised to
(so initially we have
). Clearly
generates
. If the
are linearly independent then we are done. Otherwise, there is a non-trivial linear relation between them; after shuffling things around, we see that one of the
, say
, is a rational linear combination of the
. In such a case,
becomes redundant, and we may delete it (reducing the rank
by one). We repeat this procedure; it can only run for at most
steps and so terminates with
obeying both of the desired properties.
In additive combinatorics, one often wants to use results like this in finitary settings, such as that of a cyclic group where
is a large prime. Now, technically speaking,
is not a vector space over
, because one only multiply an element of
by a rational number if the denominator of that rational does not divide
. But for
very large,
“behaves” like a vector space over
, at least if one restricts attention to the rationals of “bounded height” – where the numerator and denominator of the rationals are bounded. Thus we shall refer to elements of
as “vectors” over
, even though strictly speaking this is not quite the case.
On the other hand, saying that one element of is a rational linear combination of another set of elements is not a very interesting statement: any non-zero element of
already generates the entire space! However, if one again restricts attention to rational linear combinations of bounded height, then things become interesting again. For instance, the vector
can generate elements such as
or
using rational linear combinations of bounded height, but will not be able to generate such elements of
as
without using rational numbers of unbounded height.
For similar reasons, the notion of linear independence over the rationals doesn’t initially look very interesting over : any two non-zero elements of
are of course rationally dependent. But again, if one restricts attention to rational numbers of bounded height, then independence begins to emerge: for instance,
and
are independent in this sense.
Thus, it becomes natural to ask whether there is a “quantitative” analogue of Theorem 1, with non-trivial content in the case of “vector spaces over the bounded height rationals” such as , which asserts that given any bounded collection
of elements, one can find another set
which is linearly independent “over the rationals up to some height”, such that the
can be generated by the
“over the rationals up to some height”. Of course to make this rigorous, one needs to quantify the two heights here, the one giving the independence, and the one giving the generation. In order to be useful for applications, it turns out that one often needs the former height to be much larger than the latter; exponentially larger, for instance, is not an uncommon request. Fortunately, one can accomplish this, at the cost of making the height somewhat large:
Theorem 2 (Finite generation implies finite basis, finitary version) Let
be an integer, and let
be a function. Let
be an abelian group which admits a well-defined division operation by any natural number of size at most
for some constant
depending only on
; for instance one can take
for
a prime larger than
. Let
be a finite collection of “vectors” in
. Then there exists a collection
of vectors in
, with
, as well an integer
, such that
- (Complexity bound)
for some
depending only on
.
- (
generates
) Every
can be expressed as a rational linear combination of the
of height at most
(i.e. the numerator and denominator of the coefficients are at most
).
- (
independent) There is no non-trivial linear relation
among the
in which the
are rational numbers of height at most
.
In fact, one can take
to be a subset of the
.
Proof: We perform the same “rank reduction argument” as before, but translated to the finitary setting. Start with initialised to
(so initially we have
), and initialise
. Clearly
generates
at this height. If the
are linearly independent up to rationals of height
then we are done. Otherwise, there is a non-trivial linear relation between them; after shuffling things around, we see that one of the
, say
, is a rational linear combination of the
, whose height is bounded by some function depending on
and
. In such a case,
becomes redundant, and we may delete it (reducing the rank
by one), but note that in order for the remaining
to generate
we need to raise the height upper bound for the rationals involved from
to some quantity
depending on
. We then replace
by
and continue the process. We repeat this procedure; it can only run for at most
steps and so terminates with
and
obeying all of the desired properties. (Note that the bound on
is quite poor, being essentially an
-fold iteration of
! Thus, for instance, if
is exponential, then the bound on
is tower-exponential in nature.)
(A variant of this type of approximate basis lemma was used in my paper with Van Vu on the singularity probability of random Bernoulli matrices.)
Looking at the statements and proofs of these two theorems it is clear that the two results are in some sense the “same” result, except that the latter has been made sufficiently quantitative that it is meaningful in such finitary settings as . In this note I will show how this equivalence can be made formal using the language of non-standard analysis. This is not a particularly deep (or new) observation, but it is perhaps the simplest example I know of that illustrates how nonstandard analysis can be used to transfer a quantifier-heavy finitary statement, such as Theorem 2, into a quantifier-light infinitary statement, such as Theorem 1, thus lessening the need to perform “epsilon management” duties, such as keeping track of unspecified growth functions such as
. This type of transference is discussed at length in this previous blog post of mine.
In this particular case, the amount of effort needed to set up the nonstandard machinery in order to reduce Theorem 2 from Theorem 1 is too great for this transference to be particularly worthwhile, especially given that Theorem 2 has such a short proof. However, when performing a particularly intricate argument in additive combinatorics, in which one is performing a number of “rank reduction arguments”, “energy increment arguments”, “regularity lemmas”, “structure theorems”, and so forth, the purely finitary approach can become bogged down with all the epsilon management one needs to do to organise all the parameters that are flying around. The nonstandard approach can efficiently hide a large number of these parameters from view, and it can then become worthwhile to invest in the nonstandard framework in order to clean up the rest of a lengthy argument. Furthermore, an advantage of moving up to the infinitary setting is that one can then deploy all the firepower of an existing well-developed infinitary theory of mathematics (in this particular case, this would be the theory of linear algebra) out of the box, whereas in the finitary setting one would have to painstakingly finitise each aspect of such a theory that one wished to use (imagine for instance trying to finitise the rank-nullity theorem for rationals of bounded height).
The nonstandard approach is very closely related to use of compactness arguments, or of the technique of taking ultralimits and ultraproducts; indeed we will use an ultrafilter in order to create the nonstandard model in the first place.
I will also discuss a two variants of both Theorem 1 and Theorem 2 which have actually shown up in my research. The first is that of the regularity lemma for polynomials over finite fields, which came up when studying the equidistribution of such polynomials (in this paper with Ben Green). The second comes up when is dealing not with a single finite collection of vectors, but rather with a family
of such vectors, where
ranges over a large set; this gives rise to what we call the sunflower lemma, and came up in this recent paper of myself, Ben Green, and Tamar Ziegler.
This post is mostly concerned with nonstandard translations of the “rank reduction argument”. Nonstandard translations of the “energy increment argument” and “density increment argument” were briefly discussed in this recent post; I may return to this topic in more detail in a future post.
This is an adaptation of a talk I gave recently for a program at IPAM. In this talk, I gave a (very informal and non-rigorous) overview of Hrushovski’s use of model-theoretic techniques to establish new Freiman-type theorems in non-commutative groups, and some recent work in progress of Ben Green, Tom Sanders and myself to establish combinatorial proofs of some of Hrushovski’s results.

Recent Comments