The standard modern foundation of mathematics is constructed using set theory. With these foundations, the mathematical universe of objects one studies contains not only the “primitive” mathematical objects such as numbers and points, but also sets of these objects, sets of sets of objects, and so forth. (In a pure set theory, the primitive objects would themselves be sets as well; this is useful for studying the foundations of mathematics, but for most mathematical purposes it is more convenient, and less conceptually confusing, to refrain from modeling primitive objects as sets.) One has to carefully impose a suitable collection of axioms on these sets, in order to avoid paradoxes such as Russell’s paradox; but with a standard axiom system such as Zermelo-Fraenkel-Choice (ZFC), all actual paradoxes that we know of are eliminated. Still, one might be somewhat unnerved by the presence in set theory of statements which, while not genuinely paradoxical in a strict sense, are still highly unintuitive; Cantor’s theorem on the uncountability of the reals, and the Banach-Tarski paradox, are perhaps the two most familiar examples of this.

One may suspect that the reason for this unintuitive behaviour is the presence of infinite sets in one’s mathematical universe. After all, if one deals solely with finite sets, then there is no need to distinguish between countable and uncountable infinities, and Banach-Tarski type paradoxes cannot occur.

On the other hand, many statements in infinitary mathematics can be reformulated into equivalent statements in finitary mathematics (involving only finitely many points or numbers, etc.); I have explored this theme in a number of previous blog posts. So, one may ask: what is the finitary analogue of statements such as Cantor’s theorem or the Banach-Tarski paradox?

The finitary analogue of Cantor’s theorem is well-known: it is the assertion that {2^n > n} for every natural number {n}, or equivalently that the power set of a finite set {A} of {n} elements cannot be enumerated by {A} itself. Though this is not quite the end of the story; after all, one also has {n+1 > n} for every natural number {n}, or equivalently that the union {A \cup \{a\}} of a finite set {A} and an additional element {a} cannot be enumerated by {A} itself, but the former statement extends to the infinite case, while the latter one does not. What causes these two outcomes to be distinct?

On the other hand, it is less obvious what the finitary version of the Banach-Tarski paradox is. Note that this paradox is available only in three and higher dimensions, but not in one or two dimensions; so presumably a finitary analogue of this paradox should also make the same distinction between low and high dimensions.

I therefore set myself the exercise of trying to phrase Cantor’s theorem and the Banach-Tarski paradox in a more “finitary” language. It seems that the easiest way to accomplish this is to avoid the use of set theory, and replace sets by some other concept. Taking inspiration from theoretical computer science, I decided to replace concepts such as functions and sets by the concepts of algorithms and oracles instead, with various constructions in set theory being replaced instead by computer language pseudocode. The point of doing this is that one can now add a new parameter to the universe, namely the amount of computational resources one is willing to allow one’s algorithms to use. At one extreme, one can enforce a “strict finitist” viewpoint where the total computational resources available (time and memory) are bounded by some numerical constant, such as {10^{100}}; roughly speaking, this causes any mathematical construction to break down once its complexity exceeds this number. Or one can take the slightly more permissive “finitist” or “constructivist” viewpoint, where any finite amount of computational resource is permitted; or one can then move up to allowing any construction indexed by a countable ordinal, or the storage of any array of countable size. Finally one can allow constructions indexed by arbitrary ordinals (i.e. transfinite induction) and arrays of arbitrary infinite size, at which point the theory becomes more or less indistinguishable from standard set theory.

I describe this viewpoint, and how statements such as Cantor’s theorem and Banach-Tarski are interpreted with this viewpoint, below the fold. I should caution that this is a conceptual exercise rather than a rigorous one; I have not attempted to formalise these notions to the same extent that set theory is formalised. Thus, for instance, I have no explicit system of axioms that algorithms and oracles are supposed to obey. Of course, these formal issues have been explored in great depth by logicians over the past century or so, but I do not wish to focus on these topics in this post.

A second caveat is that the actual semantic content of this post is going to be extremely low. I am not going to provide any genuinely new proof of Cantor’s theorem, or give a new construction of Banach-Tarski type; instead, I will be reformulating the standard proofs and constructions in a different language. Nevertheless I believe this viewpoint is somewhat clarifying as to the nature of these paradoxes, and as to how they are not as fundamentally tied to the nature of sets or the nature of infinity as one might first expect.

— 1. A computational perspective on mathematics —

The great advantage of using set theory in mathematics is that all objects in a given set (e.g. all numbers in the real line {{\bf R}}) are available to you at all times; one can take one, many, or all objects in a set and manipulate them as one wishes (cf. the axiom schema of replacement); similarly, one can assign a truth value to a statement that quantifies over an arbitrary number of objects. If one removes sets from the picture, then one no longer has immediate access to arbitrary elements of a set, and one can no longer perform operations en masse on all the elements of a set at once; instead, one must use some (possibly more restrictive) protocol for manipulating objects in a class, or verifying whether a given statement is true or false. (In more philosophical terms, we are focusing more on an epistemological approach to mathematics, based on what we can measure and query, as opposed to an ontological approach, based on what we believe to exist.)

For this, it is convenient to use the conceptual framework that is familiar to us through modern computer languages, such as C. In this paradigm, when dealing with a class of objects (e.g. integers), we do not get access to the entire set {{\bf Z}} of integers directly. Instead, we have to declare a integer variable, such as {n}, and set it equal to some value, e.g. {n := 57}; or, if one is creating a routine that takes input, {n} might be initialised to one of the unspecified inputs of that routine. Later on, we can use existing variables to define new ones, or to redefine existing ones, e.g. one might define {m} to equal {n * n}, or perhaps one can increment {n} to {n+1}. One can then set up various loops and iterations to explore more of the parameter space; for instance, if countably infinite loops are permitted as a computational resource, then one can exhaust the positive integers by starting at {n=1} and incrementing {n} by {1} indefinitely; one can similarly exhaust the negative integers, and by alternating between the two (and also passing through {0}) one can exhaust the entire integers by a countably infinite loop. This of course is just the standard demonstration that the integers are countable.

Real-world computers, of course, have finite limits of precision; they cannot represent arbitrarily large integers, but only integers up to a certain size (e.g. {2^{16}-1} or {2^{32}-1}). One could think about computational models with such a strict finitary limitation, but let us assume that we are in a more idealised setting in which there are no limitations on how large an integer one can store. Let us then make the even more idealised assumption that we can also store real numbers with unlimited precision; our computer never makes any roundoff errors. (Note that there are indeed arbitrary precision models of computation that can do this, though the catch is that speed of computation depends heavily on how complicated it is to describe any given number.)

Remark 1 A technical point: it may be that the computational model of the real numbers is different from the standard real line {{\bf R}}; for instance, perhaps the computer only implements “constructible” real numbers, which is for instance the case in physical arbitrary precision computers. We will touch upon this point again later.

Note that if one were to expand out a given real number, say {\pi}, as a decimal expansion, then one would obtain an infinite string of digits. But, just as we do not have direct access to the set {{\bf Z}} of all integers, we will not have direct access to the entire decimal expansion of {\pi}. If we have a natural number {n}, we are allowed to inspect the {n^{th}} digit of {\pi} by making a suitable function call (e.g. {\hbox{digit}(\pi,n)}), and if we are allowed to set up an infinite loop in which {n} starts at {1} and increments indefinitely, one can exhaust all the digits of {\pi}, which is good enough for most “practical” mathematical purposes. (For instance, if one were allowed to run programs of countable length using real arithmetic, one could make a program that determined whether {\pi} was normal or not, or to determine the truth of the Riemann hypothesis, or more generally to compute the truth-value of any first-order sentence in the theory of the real line.)

We assume that the usual arithmetic operations can be performed on real numbers in reasonable amounts of time. For instance, given two real numbers {x, y}, one can determine whether they are equal or not in finite time (consulting some sort of “equality oracle” for the reals if necessary). Note that equality can be a subtle issue; if one thinks of real numbers as infinite strings of digits, their equality can only be verified directly by using a countable amount of computing power. But we will sidestep the issue of how exactly the reals are implemented by simply assuming that enough oracles exist to perform real arithmetic at an acceptable computational cost.

As already hinted at in the above discussion, we are assuming that our computer has access to a certain amount of computing resources (e.g. time, memory, random number generation, oracles). We will be rather vague on exactly how to formalise the concept of a resource, but basically the standard definitions used in computer science would be a good approximation here, at least when the resources are finite. But one can consider allowing for certain computational resources to be infinite in some carefully controlled manner; for instance, one could consider a situation in which countably infinite loops are permitted (provided that all variables in the loop that one wants to retain “converge” in some sense at the end of the loop), but for which uncountable loops are not allowed. We will not formalise such concepts here (but roughly speaking, they correspond to allowing transfinite induction up to some specified ordinal, and no further).

We will not be using sets or set-theoretic functions in this computer language. However, we will use as a substitute the concept of an oracle – a “black box” routine {f()} that takes zero or more variables of a given class as input, and returns zero or more variables in various classes as output (usually we will have just a single output, but it will be convenient to allow multiple outputs). Being a black box, we do not know how the oracle obtains the output from the input, but we are able to use the oracle anyway. Let us assume that each invocation of an oracle takes some acceptable amount of time (e.g. bounded time when the computational resources are finite, or countably infinite time if countable time resources are allowed, etc.). All our oracles are consistent in the sense that they always produce the same output for a fixed choice of input; thus, if one calls the oracle again at some later time with the same input, then the oracle will return the same output as it did before. It is important to note that consistency is a subtly weaker assumption than requiring the oracle is non-adaptive; we allow the oracle to “remember” previous queries, and to use that memory to formulate answers to later queries, as long as it does not contradict the outputs it gave previously.

We will be concerned primarily with membership oracles – an oracle {E()} that takes a variable {x} in a given class and returns an answer {E(x)} that is either “Yes” or “No”. Informally, the oracle {E} is describing some subset of this class, and is answering questions regarding whether any given variable {x} lies in this set or not. Note, however, that this set only exists in a “virtual” or “potential” sense; if the oracle is adaptive, the set of inputs that will give a “Yes” answer may not yet be fixed, but could depend on future queries to the oracle. If the class can be exhausted within the computational resources permitted (e.g. if the parameter space can be countably enumerated by the computer, and countably infinite loops are permitted), then one can query the oracle for every single element and thus pin down the set completely (though one may not be able to store this set if one does not have sufficient memory resources!), but if the parameter space is too large to be exhausted with the available resources, then the set that the oracle is describing will never be completely described.

To illustrate this, let us briefly return to the traditional language of set theory and recall the following textbook example of a non-measurable subset {E} of the reals {{\bf R}}. This set is constructed by partitioning {{\bf R}} into cosets {x+{\bf Q}} of the rationals {{\bf Q}}, and then using the axiom of choice to selecting a single representative of each coset; the set {E} is the collection of all such representatives. Thus the rational translates {E+q}, {q \in {\bf Q}} partition {{\bf R}}, and it is not hard to then deduce (using the properties of Lebesgue measure) that {E} cannot be Lebesgue measurable.

We can simulate this non-measurable set by an adaptive oracle {E()}, which remembers all prior queries to itself, and works as follows:

  • {E()} takes a real number {x} as input.
  • If {E()} has previously answered the question {E(x)} of whether {x} lies in {E}, repeat whatever answer was given previously.
  • If {x} has not been queried before, and if furthermore no rational translate {x+q} of {x} has been queried before either (i.e. {x} differs from all previous queries by an irrational number), then answer {E(x)} with “yes”.
  • Finally, if {x} has not been queried before, but {x+q} has been queried before for some rational {q}, then answer {E(x)} with “no”.
  • Store {x} (and {E(x)}) in memory for use in future queries.

Assuming one has a rationality oracle {{\bf Q}()} that can tell (in bounded time) whether a given real number {x} is rational or not, then {E} is a perfectly programmable oracle, which will run in finite time whenever one asks only a finite number of queries of it. (After querying the oracle an infinite number of times, though, it will require an infinite search loop in order to make sure that any subsequent answer it gives is consistent with all previous answers, unless one is allowed to use an infinite amount of memory, e.g. an array indexed by the quotient space {{\bf R}/{\bf Q}}.) It is even completely deterministic – it requries no arbitrary choices on the part of the oracle. If one had the patience to query this oracle for every single real number {x} (which would, of course, require an uncountable number of queries), the oracle would eventually describe completely the non-measurable set {E}. But if one is only permitted a finite or countable number of queries, then the non-measurable set only exists in some “virtual” sense.

More generally, non-adaptive oracles tend to generate measurable sets, while adaptive oracles are likely to generate non-measurable sets. So we see that non-measurability does not have to be viewed as a quirk arising from the nature of infinity, or from the axiom of choice; it can be viewed instead as the freedom to adapt to previous queries to membership of the set, which is a concept that makes sense even in a strictly finitist setting.

Remark 2 One can think of an non-adaptive oracle as being like a truthful observer, reporting on some objective set that exists independently of the queries, while an adaptive oracle is more like a pathological liar, inventing a previously non-existent set on the fly as needed in order to consistently answer the questions posed of it. It is then not so surprising that the set thus invented is likely to be non-measurable. We thus see that the ability of oracles to adapt is somewhat analogous to the role the axiom of choice plays in traditional set theory.

We will discuss measurability and non-measurability in more detail a little later in this post.

We have seen that membership oracles are sort of like “virtual” sets. Many set operations can be simulated for membership oracles. For instance, given two membership oracles {E(), F()} that apply to variables {x} in some class (e.g. the real numbers), one can form the union oracle {(E \cup F)()}, which works by querying both {E(x)} and {F(x)} and returning “Yes” if at least one of {E(x)} and {F(x)} was “Yes”, and “No” otherwise. More generally, any finite boolean operation of membership oracles gives another membership oracle, and (if countable computational resources are available) the same is true for countable boolean operations also. As one increases the amount of computational resources available, more and more set-theoretic operations become available, and when one allows unlimited resources (or more precisely, transfinite induction up to any ordinal, and storage of arbitrarily sized infinite sets), then all the standard operations in set theory (e.g. invocations of the axiom schema of replacement, the power set axiom, the axiom of choice, etc.) become available.

Remark 3 Membership oracles {E()} only describe a raw, unstructured set. If one wants to place additional structure on a set (e.g. measure structure, topological structure, smooth structure, etc.) then additional oracles would be needed. Of course, the same is true in traditional set theory; for instance, to place a topology on a set {E} one also needs to specify a collection {{\mathcal F}} of open sets on {E} obeying the axioms of a topology, and similarly for other structures one can place on sets. We will see an example of these additional oracles later in this post, when we revisit the concept of measurability.

Remark 4 Membership oracles are weaker than sets in many ways. One of these comes from a breakdown of the law of the excluded middle. In set theory, a statement about a set {E} is either true or false; for instance, {E} is either finite or infinite. If one is instead given an adaptive membership oracle {E()}, questions such as whether {E()} describes a finite set or not are undecidable if one only has finite computational resources. However, one can imagine strengthening the membership oracle {E()} to a oracle that, in addition to answering questions about membership of individual elements, will also answer more general questions about {E} (such as whether {E} is finite), in such a fashion that all the answers given are consistent with each other. In such a way, the law of the excluded middle can be restored; but then programming an adaptive oracle in a way that keeps all the answers consistent becomes quite a challenge. (Note though that the Gödel completeness theorem asserts, in some sense, that this is always possible, provided that one’s initial constraints on {E} are originally consistent.)

— 2. Cantor’s theorem —

Cantor’s theorem (and its proof) transfers easily enough from sets to oracles. The analogue of a (proposed) enumeration of the real numbers is an “enumeration oracle” {f()} that takes a natural number {n} as input, and returns a real number {f(n)} as output; we allow repetitions. The Cantor diagonal argument shows that given any such putative enumeration {f}, and given access to a countable amount of computing resources, one can construct a real number {x} which is guaranteed not to be covered by the enumeration oracle; for instance, one can construct {x} to be a string of decimals {0.x_1 x_2 \ldots = \sum_{j=1}^\infty \frac{x_j}{10^j}}, with {x_1} chosen to be the first digit in {\{1,\ldots,8\}} (say) not equal to the first digit of {f(1)}, {x_2} chosen to be the first digit of {\{1,\ldots,8\}} not equal to the second digit of {f(2)}, and so forth. (I exclude the digits {0} and {9} to avoid the technical and irrelevant {0.999\ldots = 1.000\ldots} issue.)

This is of course virtually identical to the usual proof of Cantor’s theorem. But the proof highlights that in order to exhibit a counterexample to the claim that {f} enumerates the reals, one needs a countably infinite amount of computational resources. And indeed if one works in a finitary computational model in which one is only allowed to run programs that are guaranteed to halt, and if one limits one’s real number system to the “finitely constructible” reals – those reals that can be generated by halting programs – then one can indeed enumerate the “real numbers” in this system, by enumerating all the possible programs to generate real numbers as {P_1, P_2, \ldots}, and computing {f(n)} to be the output of {P_m}, where {1 \leq m \leq n} is the first integer such that {P_m} halts in less than {n} steps, while outputting a number different from {f(1),\ldots,f(n-1)} (setting {f(n)=0} if no such integer exists). In this model, the real number {x} given by Cantor’s argument is not finitely constructible, but only countably constructible. It is only because one has access to countably infinite computational resources (in particular, the ability to sum convergent infinite series such as {\sum_{j=1}^\infty \frac{x_j}{10^j}}), that the reals are no longer countable.

Let us now consider a slightly different version of Cantor’s theorem, which in the language of set theory asserts that for a given set {A} (e.g. the natural numbers {{\bf N}}), that one cannot enumerate the power set {2^A} by {A} itself. In order to phrase this using the language of oracles rather than sets, one needs to be able to treat oracles themselves as variables; note that many modern computer languages (particularly functional languages such as Lisp and Haskell) already do this. In particular, we allow for the existence of oracles that generate further oracles as output, or themselves take oracles as input.

A proposed enumeration of the power set of the natural numbers (say) by the natural numbers themselves can then be viewed as an enumeration oracle {f()} which takes a natural number {n} as input, and returns a membership oracle {f(n)()} in the natural numbers as output; thus {f(n)} itself is able to accept a natural number {m} as input and return a yes-or-no answer {f(n)(m)} as output.

Cantor’s diagonal argument then asserts that given any proposed enumeration oracle {f()} of the above form, one can always find a membership oracle {E()} which is not enumerated by {f}. Indeed, one simply sets {E(n)} to be the negation of {f(n)(n)} for any given natural number input {n}.

With this version of Cantor’s argument, we no longer need a countably infinite amount of computational resources; the above argument is valid even in the finitary computational model. In that model, the argument shows that the class of oracles that terminate in finite time cannot itself be enumerated by an oracle that terminates in finite time; this is essentially Turing’s halting theorem. Thus we see that Turing’s theorem and Cantor’s theorem are simply different cases of a single general theorem, the former in the case of finitary computational resources and the latter in the case of unlimited computational resources. (See also my previous blog post for further discussion of these “no self-defeating object” arguments.)

Also observe that this version of Cantor’s argument works if the natural numbers are replaced by any other class of variable; for instance, in the finite class of integers between {1} and {n}, the argument demonstrates that the membership oracles in this class cannot be enumerated by the numbers from {1} to {n} itself, thus {2^n > n}.

Let us now discuss the analogous situation with the inequality {n+1>n}. The claim is now that if {A} is a finite class, and {A'} is a strictly larger class (containing at least one additional element {a} outside of {A}), and one is given an enumeration oracle {f()} that takes an variable {x} in {A} as input and returns a variable {f(x)} in {A'} as output, then one should be able to find a variable {y} in {A'} or equal to {a} which is not covered by the enumeration oracle.

One way to find such a {y} is by the following algorithm, which is just a painfully disguised version of the proof that {n+1>n} using induction. Let {a} be an element in {A'} that is not in {A}. We first search through all the elements {x} of {A} to see if there is a solution to the equation {f(x)=a}. If no solution exists, then we are done; we can just take {y=a}. So suppose instead that we have some found some {x_0} for which {f(x_0)=a}. Then we can delete {x_0}, and all other solutions to {f(x)=a}, from the domain {A} of the oracle {f}, and also delete {a} from the output {A'} of the oracle, giving rise to a modified oracle {f'} which takes as input a variable {x} with {f(x) \neq a}, and returns the output {f'(x) := f(x)}. Note that if we ever find a variable {y} in the range of {f'} that is not enumerated by {f'}, then {y} will not be enumerated by {f} either. So we can descend from {f} to the oracle {f'}, which has a smaller domain and range. Also note that the range remains strictly larger than the domain, as {x_0} lies in the range but not in the domain. We can thus keep iterating this procedure; since we cannot have an infinite descent, the algorithm must eventually terminate. Unpacking the termination condition, we have indeed produced an element {y} in the range of {f} which was not enumerated by {f}.

We observe that this algorithm only halts due to the principle of infinite descent, which is only valid when the initial class one starts with was finite. For infinite classes, which admit infinite descents, one can of course find surjective enumerations between such classes and strictly larger classes; for instance, the enumeration oracle that sends a natural number {n} to {n-1} is an enumeration of {{\bf N} \cup \{-1\}} by {{\bf N}}. In contrast, the proof of Cantor’s theorem does not rely on facts specific to finite classes, such as the principle of infinite descent, and is thus valid in arbitrary classes.

— 3. Measurability revisited —

In a previous section, we gave an example of a membership oracle {E} in the real line which was non-measurable, in the sense that the set that it was (virtually) describing necessarily failed to be Lebesgue measurable. But the concept of Lebesgue measurability was still defined in the context of set theory, and not in a purely oracle-theoretic language. It is then natural to ask whether one can define Lebesgue measurability purely in the context of computational models, and in particular whether one can discuss this concept in a finitary computational model.

For simplicity let us now work in a bounded subset of {{\bf R}}, such as the unit interval {[0,1]}, so that we can work with a finite measure rather than a {\sigma}-finite measure.

From classical measure theory, we recall the following characterisation of Lebesgue measurability: a subset {E} of the unit interval is Lebesgue measurable if for every {\epsilon > 0}, one can find an elementary subset {E_\epsilon} of the unit interval (i.e. a finite union of open intervals) whose set-theoretic difference {E \Delta E_\epsilon} with {E} has outer measure less than {\epsilon}, or equivalently that it can be covered by a countable collection of intervals {I_{\epsilon,1}, I_{\epsilon,2}, \ldots} whose length adds up to less than {\epsilon}.

Thus, if one wants a membership oracle {E()} to “certify” its measurability, one way to do so is to provide an additional “measurability oracle” {M()}, which takes a positive real number {\epsilon > 0} as input, and returns three outputs:

  • A description of an elementary set {E_\epsilon} (which can be done in a finite amount of time and space, by specifying the number of intervals used to form {E_\epsilon}, together with their endpoints);
  • An interval oracle {I()}, that for each natural number input {n} returns an interval {I_{\epsilon,n}};
  • A covering oracle {N()}, which for each real number {x} in {E \Delta E_\epsilon}, returns a natural number {n} for which {x \in I_{\epsilon,n}}. (Actually, this oracle is redundant, as it can be simulated from the previous two outputs by a finite brute force search.)

With these oracles, one can then verify the measurability of {E} by selecting an {\epsilon > 0}, and verifying firstly that after selecting any natural number {N}, the sum of the lengths of {I_{\epsilon,1},\ldots,1_{\epsilon,N}} remains less than {\epsilon}, and secondly that after selecting any real number {x}, that if {x} lies in {E} but not in {E_\epsilon} or vice versa, that {x} indeed lies in {I_{\epsilon,N(x)}}. In principle, if one performed this verification procedure an uncountable number of times (once for each choice of {\epsilon, N, x}) one would fully demonstrate the measurability of {E}; but if one instead only had access to finite or countable computational resources, then one could only verify measurability on an “as needed” basis.

So, in the oracle model, a measurable set is not simply a membership oracle {E()}; it must also be supplemented with an additional measurability oracle {M()} that “witnesses” the measurability. This is analogous to how sets must be augmented with (say) topological structure if one wants to perform topology on that set, or algebraic structure if one wants to perform algebra on that set, etc.

If one possesses a measurability oracle {M()} for a set {E} (or more precisely, a membership oracle {E()}), then can estimate the Lebesgue measure of {E} to within accuracy {\epsilon} by calling {M(\epsilon)} to obtain an approximant {E_\epsilon}, and then computing the measure {|E_\epsilon|} of {E_\epsilon} (which can be done in a finite amount of time, as {E_\epsilon} is simply a finite union of intervals). A key fact (which, not coincidentally, is crucial in the standard construction of Lebesgue measure) is that these approximations to the Lebesgue measure of {E} are compatible with each other in the following sense: if one calls one measurability oracle {M(\epsilon)} for {E()} at accuracy {\epsilon > 0} to get one estimate {|E_\epsilon|} for the Lebesgue measure of {E()}, and if one then calls another (possibly different) measurability oracle {M'(\epsilon')} for the same set {E()} at another accuracy {\epsilon' > 0} to get another estimate {|E'_{\epsilon'}|} for {E()}, then these two estimates can only differ by at most {\epsilon+\epsilon'}; in particular, sending {\epsilon \rightarrow 0}, one obtains a Cauchy sequence and (after a countable number of operations) one can then compute the Lebesgue measure of {E} to infinite precision.

This key fact boils down (after some standard manipulations) to the fact that an interval such as {[a,b]} has outer measure at least {b-a}; in our oracle based model, this means that if one is given an interval oracle {I()} that generates open intervals {I(n)} for each natural number {n}, in such a way that the total length {\sum_n |I(n)|} of these intervals is less than {b-a}, then one can find a point in {[a,b]} which is not covered by any of the {I(n)}.

This can be done using a countable amount of computational power (basically, the ability to run a single infinite loop; this is roughly equivalent to the theory {RCA_0} in reverse mathematics). The point is that for each finite {N}, the set {S_N := [a,b] \backslash \bigcup_{n=1}^N I(n)} is a computable non-empty finite union of closed intervals in {[a,b]}, which decreases in {N}. The infimum {\inf(S_N)} can be computed infinite time for each {N}, and increases in {N}; the limit {\lim_{N \rightarrow \infty} \inf(S_N)} is then an element in {\bigcap_N S_N} that lies in {[a,b]} but is outside all of the {I(n)}. Thus, given a countable amount of computational power, one can consistently define Lebesgue measure of a measurable set, and verify its basic properties.

It is instructive to apply the above discussion to the non-measurable membership oracle {E()} given in previous sections (trivially modified to lie in {[0,1]} rather than in {{\bf R}}). If one is given a purported measurability oracle {M()} for this oracle {E()}, one can eventually show that this oracle {M()} does not actually certify the measurability of {E()} as claimed, but this requires a countably infinite amount of computation to establish. (Basically, there are two cases, based on whether {M()} asserts that {E()} has positive Lebesgue measure or not (which can be decided after a countable amount of computation). If it has positive measure, then by invoking {M(\epsilon)} with {\epsilon} less than half this measure, one can soon find an interval {I} in which {E()} has density greater than {1/2} (or more precisely, the complement of {E} in {I} has outer measure strictly less than {|I|/2}) and then one can run a variant of the above Bolzano-Weierstrass argument to find two points {x, y} in {E()} and in {I} which differ by a rational, contradicting the construction of {E}. If instead {M()} asserts that {E()} has zero measure, then {M()} can cover {E()} by intervals of arbitrarily small total measure, and then {M()} can do the same for the union of all the rational shifts of {E()}, and one can then find an element {x \in [0,1]} such that no rational shift {x+q} of {x} lies in {E}.)

On the other hand, if one is only allowed to query {E()} and {M()} finitely many times, then one can show that one can adaptively build {M()} and {E()} in response in these queries in such a way that one never obtains a contradiction, while retaining the properties that the shifts of {E()} by the rationals partition {{\bf R}}. So a pathological liar can build a non-measurable set but claim that it is measurable; the deception can sustain any finite number of queries about the set and its measurability, but given a countable amount of queries one can eventually demonstrate that there is an inconsistency (at least for non-measurable sets coming from the coset partition of {{\bf R}} by the rationals).

Remark 5 Interestingly, the type of adaptive oracles one uses to create non-measurable sets are not compatible with random number generation. If one has access to a source of random real numbers (say in {[0,1]}), then in principle one can (almost surely) compute the Lebesgue measure in countable time of any subset {E} of {[0,1]} accessed through a random oracle {E()} by the Monte Carlo method: randomly sampling {N} points in {[0,1]}, counting the proportion that lie in {E}, and then sending {N \rightarrow \infty}. However this only works properly if the oracle {E()} does not adapt to the various random inputs it is given, and is instead independent of these inputs. (For instance, if one were to use Monte Carlo methods to measure the non-measurable set {E} described above, one would almost surely get that {E} has measure {1}, as each random number is almost certain to lie in a different coset of {{\bf Q}}!.)

— 4. The Banach-Tarski paradox —

We now turn to the Banach-Tarski paradox. The usual formulation of this paradox involves a partition of the unit ball into pieces that can be rearranged (after rotations and translation) to form two copies of the ball. To avoid some minor technicalities, we will work instead on the unit sphere {S^2} with an explicit countable set {\Sigma} removed, and establish the following version of the paradox:

Proposition 1 (Banach-Tarski paradox, reduced version) There exists a countable subset {\Sigma} of {S^2} and partition of {S^2 \backslash \Sigma} into four disjoint pieces {E_1 \cup E_2 \cup E_3 \cup E_4}, such that {E_1} and {E_2} can be rotated to cover {S^2 \backslash \Sigma}, and {E_3} and {E_4} can also be rotated to cover {S^2 \backslash \Sigma}.

Of course, from the rotation-invariant nature of Lebesgue measure on the sphere, such a partition can only occur if at least one of {E_1,E_2,E_3,E_4} are not Lebesgue measurable.

We return briefly to set theory and give the standard proof of this proposition. The first step is to locate two rotations {a, b} in the orthogonal group {SO(3)} which generate the free group {\langle a,b\rangle}. This can be done explicitly; for instance one can take

\displaystyle  a := \begin{pmatrix} 3/5 & 4/5 & 0 \\ -4/5 & 3/5 & 0 \\ 0 & 0 & 1 \end{pmatrix}; \quad b := \begin{pmatrix} 1 & 0 & 0 \\ 0 & 3/5 & -4/5 \\ 0 & 4/5 & 3/5 \end{pmatrix}.

(See for instance this post from the Secret Blogging Seminar for more discussion of this example.) Each rotation in {\langle a,b \rangle} has two fixed antipodal points in {S^2}; we let {\Sigma} be the union of all these points. Then the group {\langle a,b \rangle} acts freely on the remainder {S^2 \backslash \Sigma}.

Using the axiom of choice, we can then build a (non-measurable) subset {E} of {S^2 \backslash \Sigma} which consists of a single point from each orbit of {\langle a,b \rangle}. For each {i=1,2,3,4}, we then define {E_i} to be the set of all points of the form {w x}, where {x \in E} and {w \in \langle a,b\rangle} is a word such that

  • {w} is the identity, or begins with {a} (if {i=1});
  • {w} begins with {a^{-1}} (if {i=2});
  • {w} begins with {b} (if {i=3});
  • {w} begins with {b^{-1}} (if {i=4}).

It is then clear that {E_1, E_2, E_3, E_4} partition {S^2 \backslash \Sigma}, while {a^{-1} E_1 \cup a E_2} and {b^{-1} E_3 \cup b E_4} both cover {S^2 \backslash \Sigma}, as claimed.

Now let us interpret this example using oracles. The free group {\langle a, b \rangle} is countably enumerable, and so with a countable amount of time and memory, one can construct and store {\Sigma} without difficulty. A membership oracle {E()} for {E} can then be constructed by an adaptive oracle much as in the previous sections; {E(x)} returns yes if {x} is the first point in its orbit that is queried, and returns no otherwise. The oracles {E_i(x)} can then be defined by querying {E} for all the points in {x}‘s orbit in some arbitrary order until one finds the point {w^{-1} x} which lies in {E}, and then deciding membership in {E_i(x)} depending on the first symbol of {w}. (For instance, if {x}‘s orbit has not been queried before, the first point in the orbit that one queries {E()} for will lie in {E}; thus one sees that the order in which one decides to search through the orbit will in fact influence quite strongly what {E} and the {E_i} will look like.)

It is then not hard to show that {E_1(), E_2(), E_3(), E_4()} do indeed partition {S^2 \backslash \Sigma} in the sense that for each {x \in S^2 \backslash \Sigma}, exactly one of {E_1(x), E_2(x), E_3(x), E_4(x)} will return “yes”. Also, for any {x \in S^2 \backslash \Sigma}, at least one of {E_1(ax), E_2(a^{-1} x)} will return “yes”, and at least one of {E_3(bx), E_4(b^{-1} x)} will return “yes”. Thus, after performing an uncountable number of queries to fully complete all of these sets, we obtain sets obeying the properties in Proposition 1; but the oracles that do this are perfectly deterministic, and run in only a countable amount of time, at least for the first few queries (and one can probably even trim it down to a finite amount of time, if one has an efficient way of deciding whether two points lie in the same orbit of {\langle a,b\rangle}).

Now we discuss how a Banach-Tarski type construction does not work in one or two dimensions. Let us just consider the one-dimensional case:

Proposition 2 There does not exist a decomposition of the unit interval {[0,1)} into finitely many sets {E_1,\ldots,E_k} such that some translates {E_1+x_1,\ldots,E_k+x_k} of these sets cover {[0,2)}.

We briefly review the proof of this proposition. Suppose for contradiction that we have such a decomposition. We can colour the lattice {{\bf Z}^k} in {k} colours {1,\ldots,k}, by giving each {k}-tuple {(n_1,\ldots,n_k)} the colour {i} if {n_1 x_1 + \ldots + n_k x_k \hbox{ mod } 1 \in E_i}. This is clearly a {k}-colouring of {{\bf Z}^k}. Furthermore, for every {k}-tuple {\vec n}, we see that at {\vec n - e_i} has colour {i} for at least two values of {i=1,\ldots,k}, where {e_1,\ldots,e_k} is the standard basis for {{\bf Z}^k}. If one then uses double-counting to get two different estimates for the number of coloured points in a box {\{1,\ldots,N\}^k} for a large enough {N}, one obtains a contradiction.

Note that this proof is quite finitary; given some real numbers {x_1,\ldots,x_k} and some membership oracles {E_1(),\ldots,E_k()}, one could convert the above argument into an algorithm that would be able to demonstrate in finite time that either the oracles {E_1(),\ldots,E_k()} fail to partition {[0,1)}, or that the translates {E_1()+x_1,\ldots,E_k()+x_k} fail to cover {[0,2)}; we leave this as an exercise to the reader.

(The key distinction here between the low and high dimensional cases is that the free group {\langle a,b \rangle} is not amenable, whereas {{\bf Z}^k} is amenable. See these previous blog posts for further discussion.)

— 5. Summary —

The above discussion suggests that it is possible to retain much of the essential mathematical content of set theory without the need for explicitly dealing with large sets (such as uncountable sets), but there is a significant price to pay in doing so, namely that one has to deal with sets on a “virtual” or “incomplete” basis, rather than with the “completed infinities” that one is accustomed to in the standard modern framework of mathematics. Conceptually, this marks quite a different approach to mathematical objects, and assertions about such objects; such assertions are not simply true or false, but instead require a certain computational cost to be paid before their truth can be ascertained. This approach makes the mathematical reasoning process look rather strange compared to how it is usually presented, but I believe it is still a worthwhile exercise to try to translate mathematical arguments into this computational framework, as it illustrates how some parts of mathematics are in some sense “more infinitary” than others, in that they require a more infinite amount of computational power in order to model in this fashion. It also illustrates why we adopt the conveniences of infinite set theory in the first place; while it is technically possible to do mathematics without infinite sets, it can be significantly more tedious and painful to do so.

Update, Mar 21: Several edits and additions, including a summary.