You are currently browsing the category archive for the ‘math.LO’ category.
This fall (starting Monday, September 26), I will be teaching a graduate topics course which I have entitled “Hilbert’s fifth problem and related topics.” The course is going to focus on three related topics:
- Hilbert’s fifth problem on the topological description of Lie groups, as well as the closely related (local) classification of locally compact groups (the Gleason-Yamabe theorem).
- Approximate groups in nonabelian groups, and their classification via the Gleason-Yamabe theorem (this is very recent work of Emmanuel Breuillard, Ben Green, Tom Sanders, and myself, building upon earlier work of Hrushovski);
- Gromov’s theorem on groups of polynomial growth, as proven via the classification of approximate groups (as well as some consequences to fundamental groups of Riemannian manifolds).
I have already blogged about these topics repeatedly in the past (particularly with regard to Hilbert’s fifth problem), and I intend to recycle some of that material in the lecture notes for this course.
The above three families of results exemplify two broad principles (part of what I like to call “the dichotomy between structure and randomness“):
- (Rigidity) If a group-like object exhibits a weak amount of regularity, then it (or a large portion thereof) often automatically exhibits a strong amount of regularity as well;
- (Structure) This strong regularity manifests itself either as Lie type structure (in continuous settings) or nilpotent type structure (in discrete settings). (In some cases, “nilpotent” should be replaced by sister properties such as “abelian“, “solvable“, or “polycyclic“.)
Let me illustrate what I mean by these two principles with two simple examples, one in the continuous setting and one in the discrete setting. We begin with a continuous example. Given an complex matrix , define the matrix exponential of by the formula
which can easily be verified to be an absolutely convergent series.
- (Group-like object) is a homomorphism, thus for all .
- (Weak regularity) The map is continuous.
- (Strong regularity) The map is smooth (i.e. infinitely differentiable). In fact it is even real analytic.
- (Lie-type structure) There exists a (unique) complex matrix such that for all .
Proof: Let be as above. Let be a small number (depending only on ). By the homomorphism property, (where we use here to denote the identity element of ), and so by continuity we may find a small such that for all (we use some arbitrary norm here on the space of matrices, and allow implied constants in the notation to depend on ).
The map is real analytic and (by the inverse function theorem) is a diffeomorphism near . Thus, by the inverse function theorem, we can (if is small enough) find a matrix of size such that . By the homomorphism property and (1), we thus have
On the other hand, by another application of the inverse function theorem we see that the squaring map is a diffeomorphism near in , and thus (if is small enough)
We may iterate this argument (for a fixed, but small, value of ) and conclude that
for all . By the homomorphism property and (1) we thus have
whenever is a dyadic rational, i.e. a rational of the form for some integer and natural number . By continuity we thus have
for all real . Setting we conclude that
for all real , which gives existence of the representation and also real analyticity and smoothness. Finally, uniqueness of the representation follows from the identity
Exercise 2 Generalise Proposition 1 by replacing the hypothesis that is continuous with the hypothesis that is Lebesgue measurable (Hint: use the Steinhaus theorem.). Show that the proposition fails (assuming the axiom of choice) if this hypothesis is omitted entirely.
Note how one needs both the group-like structure and the weak regularity in combination in order to ensure the strong regularity; neither is sufficient on its own. We will see variants of the above basic argument throughout the course. Here, the task of obtaining smooth (or real analytic structure) was relatively easy, because we could borrow the smooth (or real analytic) structure of the domain and range ; but, somewhat remarkably, we shall see that one can still build such smooth or analytic structures even when none of the original objects have any such structure to begin with.
Now we turn to a second illustration of the above principles, namely Jordan’s theorem, which uses a discreteness hypothesis to upgrade Lie type structure to nilpotent (and in this case, abelian) structure. We shall formulate Jordan’s theorem in a slightly stilted fashion in order to emphasise the adherence to the above-mentioned principles.
- (Group-like object) is a group.
- (Discreteness) is finite.
- (Lie-type structure) is contained in (the group of unitary matrices) for some .
Then there is a subgroup of such that
- ( is close to ) The index of in is (i.e. bounded by for some quantity depending only on ).
- (Nilpotent-type structure) is abelian.
A key observation in the proof of Jordan’s theorem is that if two unitary elements are close to the identity, then their commutator is even closer to the identity (in, say, the operator norm ). Indeed, since multiplication on the left or right by unitary elements does not affect the operator norm, we have
Now we can prove Jordan’s theorem.
Proof: We induct on , the case being trivial. Suppose first that contains a central element which is not a multiple of the identity. Then, by definition, is contained in the centraliser of , which by the spectral theorem is isomorphic to a product of smaller unitary groups. Projecting to each of these factor groups and applying the induction hypothesis, we obtain the claim.
Thus we may assume that contains no central elements other than multiples of the identity. Now pick a small (one could take in fact) and consider the subgroup of generated by those elements of that are within of the identity (in the operator norm). By considering a maximal -net of we see that has index at most in . By arguing as before, we may assume that has no central elements other than multiples of the identity.
If consists only of multiples of the identity, then we are done. If not, take an element of that is not a multiple of the identity, and which is as close as possible to the identity (here is where we crucially use that is finite). By (2), we see that if is sufficiently small depending on , and if is one of the generators of , then lies in and is closer to the identity than , and is thus a multiple of the identity. On the other hand, has determinant . Given that it is so close to the identity, it must therefore be the identity (if is small enough). In other words, is central in , and is thus a multiple of the identity. But this contradicts the hypothesis that there are no central elements other than multiples of the identity, and we are done.
Commutator estimates such as (2) will play a fundamental role in many of the arguments we will see in this course; as we saw above, such estimates combine very well with a discreteness hypothesis, but will also be very useful in the continuous setting.
Exercise 3 Generalise Jordan’s theorem to the case when is a finite subgroup of rather than of . (Hint: The elements of are not necessarily unitary, and thus do not necessarily preserve the standard Hilbert inner product of . However, if one averages that inner product by the finite group , one obtains a new inner product on that is preserved by , which allows one to conjugate to a subgroup of . This averaging trick is (a small) part of Weyl’s unitary trick in representation theory.)
Exercise 4 (Inability to discretise nonabelian Lie groups) Show that if , then the orthogonal group cannot contain arbitrarily dense finite subgroups, in the sense that there exists an depending only on such that for every finite subgroup of , there exists a ball of radius in (with, say, the operator norm metric) that is disjoint from . What happens in the case?
Remark 1 More precise classifications of the finite subgroups of are known, particularly in low dimensions. For instance, one can show that the only finite subgroups of (which is a double cover of) are isomorphic to either a cyclic group, a dihedral group, or the symmetry group of one of the Platonic solids.
I have blogged several times in the past about nonstandard analysis, which among other things is useful in allowing one to import tools from infinitary (or qualitative) mathematics in order to establish results in finitary (or quantitative) mathematics. One drawback, though, to using nonstandard analysis methods is that the bounds one obtains by such methods are usually ineffective: in particular, the conclusions of a nonstandard analysis argument may involve an unspecified constant that is known to be finite but for which no explicit bound is obviously available. (In many cases, a bound can eventually be worked out by performing proof mining on the argument, and in particular by carefully unpacking the proofs of all the various results from infinitary mathematics that were used in the argument, as opposed to simply using them as “black boxes”, but this is a time-consuming task and the bounds that one eventually obtains tend to be quite poor (e.g. tower exponential or Ackermann type bounds are not uncommon).)
Because of this fact, it would seem that quantitative bounds, such as polynomial type bounds that show that one quantity is controlled in a polynomial fashion by another quantity , are not easily obtainable through the ineffective methods of nonstandard analysis. Actually, this is not the case; as I will demonstrate by an example below, nonstandard analysis can certainly yield polynomial type bounds. The catch is that the exponent in such bounds will be ineffective; but nevertheless such bounds are still good enough for many applications.
Let us now illustrate this by reproving a lemma from this paper of Mei-Chu Chang (Lemma 2.14, to be precise), which was recently pointed out to me by Van Vu. Chang’s paper is focused primarily on the sum-product problem, but she uses a quantitative lemma from algebraic geometry which is of independent interest. To motivate the lemma, let us first establish a qualitative version:
then there also exists a solution whose coefficients are algebraic numbers (i.e. they lie in the algebraic closure of the rationals).
Proof: Suppose there was no solution to over . Applying Hilbert’s nullstellensatz (which is available as is algebraically closed), we conclude the existence of some polynomials (with coefficients in ) such that
as polynomials. In particular, we have
for all . This shows that there is no solution to over , as required.
The above lemma asserts that if a system of rational equations is solvable at all, then it is solvable with some algebraic solution. But it gives no bound on the complexity of that solution in terms of the complexity of the original equation. Chang’s lemma provides such a bound. If is an integer, let us say that an algebraic number has height at most if its minimal polynomial (after clearing denominators) consists of integers of magnitude at most .
Lemma 2 (Quantitative solvability) Let be a finite number of polynomials of degree at most with rational coefficients, each of height at most . If there is a complex solution to the simultaneous system of equations
then there also exists a solution whose coefficients are algebraic numbers of degree at most and height at most , where depends only on , and .
Chang proves this lemma by essentially establishing a quantitative version of the nullstellensatz, via elementary elimination theory (somewhat similar, actually, to the approach I took to the nullstellensatz in my own blog post). She also notes that one could also establish the result through the machinery of Gröbner bases. In each of these arguments, it was not possible to use Lemma 1 (or the closely related nullstellensatz) as a black box; one actually had to unpack one of the proofs of that lemma or nullstellensatz to get the polynomial bound. However, using nonstandard analysis, it is possible to get such polynomial bounds (albeit with an ineffective value of the constant ) directly from Lemma 1 (or more precisely, the generalisation in Remark 1) without having to inspect the proof, and instead simply using it as a black box, thus providing a “soft” proof of Lemma 2 that is an alternative to the “hard” proofs mentioned above.
Here’s how the proof works. Informally, the idea is that Lemma 2 should follow from Lemma 1 after replacing the field of rationals with “the field of rationals of polynomially bounded height”. Unfortunately, the latter object does not really make sense as a field in standard analysis; nevertheless, it is a perfectly sensible object in nonstandard analysis, and this allows the above informal argument to be made rigorous.
We turn to the details. As is common whenever one uses nonstandard analysis to prove finitary results, we use a “compactness and contradiction” argument (or more precisely, an “ultralimit and contradiction” argument). Suppose for contradiction that Lemma 2 failed. Carefully negating the quantifiers (and using the axiom of choice), we conclude that there exists such that for each natural number , there is a positive integer and a family of polynomials of degree at most and rational coefficients of height at most , such that there exist at least one complex solution to
but such that there does not exist any such solution whose coefficients are algebraic numbers of degree at most and height at most .
but such that there does not exist any such solution whose coefficients are algebraic numbers of degree at most and height at most .
Now we take ultralimits (see e.g. this previous blog post of a quick review of ultralimit analysis, which we will assume knowledge of in the argument that follows). Let be a non-principal ultrafilter. For each , the ultralimit
of the (standard) polynomials is a nonstandard polynomial of degree at most , whose coefficients now lie in the nonstandard rationals . Actually, due to the height restriction, we can say more. Let be the ultralimit of the , this is a nonstandard natural number (which will almost certainly be unbounded, but we will not need to use this). Let us say that a nonstandard integer is of polynomial size if we have for some standard natural number , and say that a nonstandard rational number is of polynomial height if , are of polynomial size. Let be the collection of all nonstandard rationals of polynomial height. (In the language of nonstandard analysis, is an external set rather than an internal one, because it is not itself an ultraproduct of standard sets; but this will not be relevant for the argument that follows.) It is easy to see that is a field, basically because the sum or product of two integers of polynomial size, remains of polynomial size. By construction, it is clear that the coefficients of are nonstandard rationals of polynomial height, and thus are defined over .
Meanwhile, if we let be the ultralimit of the solutions in (1), we have
thus are solvable in . Applying Lemma 1 (or more precisely, the generalisation in Remark 1), we see that are also solvable in . (Note that as is algebraically closed, is also (by Los’s theorem), and so contains .) Thus, there exists with
As lies in , we can write as an ultralimit of standard complex vectors . By construction, the coefficients of each obey a non-trivial polynomial equation of degree at most and whose coefficients are nonstandard integers of magnitude at most , for some standard natural number . Undoing the ultralimit, we conclude that for sufficiently close to , the coefficients of obey a non-trivial polynomial equation of degree at most whose coefficients are standard integers of magnitude at most . In particular, these coefficients have height at most . Also, we have
But for larger than , this contradicts the construction of the , and the claim follows. (Note that as is non-principal, any neighbourhood of in will contain arbitrarily large natural numbers.)
Remark 2 The same argument actually gives a slightly stronger version of Lemma 2, namely that the integer coefficients used to define the algebraic solution can be taken to be polynomials in the coefficients of , with degree and coefficients bounded by .
I recently reposted my favourite logic puzzle, namely the blue-eyed islander puzzle. I am fond of this puzzle because in order to properly understand the correct solution (and to properly understand why the alternative solution is incorrect), one has to think very clearly (but unintuitively) about the nature of knowledge.
There is however an additional subtlety to the puzzle that was pointed out in comments, in that the correct solution to the puzzle has two components, a (necessary) upper bound and a (possible) lower bound (I’ll explain this further below the fold, in order to avoid blatantly spoiling the puzzle here). Only the upper bound is correctly explained in the puzzle (and even then, there are some slight inaccuracies, as will be discussed below). The lower bound, however, is substantially more difficult to establish, in part because the bound is merely possible and not necessary. Ultimately, this is because to demonstrate the upper bound, one merely has to show that a certain statement is logically deducible from an islander’s state of knowledge, which can be done by presenting an appropriate chain of logical deductions. But to demonstrate the lower bound, one needs to show that certain statements are not logically deducible from an islander’s state of knowledge, which is much harder, as one has to rule out all possible chains of deductive reasoning from arriving at this particular conclusion. In fact, to rigorously establish such impossiblity statements, one ends up having to leave the “syntactic” side of logic (deductive reasoning), and move instead to the dual “semantic” side of logic (creation of models). As we shall see, semantics requires substantially more mathematical setup than syntax, and the demonstration of the lower bound will therefore be much lengthier than that of the upper bound.
To complicate things further, the particular logic that is used in the blue-eyed islander puzzle is not the same as the logics that are commonly used in mathematics, namely propositional logic and first-order logic. Because the logical reasoning here depends so crucially on the concept of knowledge, one must work instead with an epistemic logic (or more precisely, an epistemic modal logic) which can properly work with, and model, the knowledge of various agents. To add even more complication, the role of time is also important (an islander may not know a certain fact on one day, but learn it on the next day), so one also needs to incorporate the language of temporal logic in order to fully model the situation. This makes both the syntax and semantics of the logic quite intricate; to see this, one only needs to contemplate the task of programming a computer with enough epistemic and temporal deductive reasoning powers that it would be able to solve the islander puzzle (or even a smaller version thereof, say with just three or four islanders) without being deliberately “fed” the solution. (The fact, therefore, that humans can grasp the correct solution without any formal logical training is therefore quite remarkable.)
As difficult as the syntax of temporal epistemic modal logic is, though, the semantics is more intricate still. For instance, it turns out that in order to completely model the epistemic state of a finite number of agents (such as 1000 islanders), one requires an infinite model, due to the existence of arbitrarily long nested chains of knowledge (e.g. “ knows that knows that knows that has blue eyes”), which cannot be automatically reduced to shorter chains of knowledge. Furthermore, because each agent has only an incomplete knowledge of the world, one must take into account multiple hypothetical worlds, which differ from the real world but which are considered to be possible worlds by one or more agents, thus introducing modality into the logic. More subtly, one must also consider worlds which each agent knows to be impossible, but are not commonly known to be impossible, so that (for instance) one agent is willing to admit the possibility that another agent considers that world to be possible; it is the consideration of such worlds which is crucial to the resolution of the blue-eyed islander puzzle. And this is even before one adds the temporal aspect (e.g. “On Tuesday, knows that on Monday, knew that by Wednesday, will know that has blue eyes”).
Despite all this fearsome complexity, it is still possible to set up both the syntax and semantics of temporal epistemic modal logic in such a way that one can formulate the blue-eyed islander problem rigorously, and in such a way that one has both an upper and a lower bound in the solution. The purpose of this post is to construct such a setup and to explain the lower bound in particular. The same logic is also useful for analysing another well-known paradox, the unexpected hanging paradox, and I will do so at the end of the post. Note though that there is more than one way to set up epistemic logics, and they are not all equivalent to each other.
(On the other hand, for puzzles such as the islander puzzle in which there are only a finite number of atomic propositions and no free variables, one at least can avoid the need to admit predicate logic, in which one has to discuss quantifiers such as and . A fully formed predicate temporal epistemic modal logic would indeed be of terrifying complexity.)
Our approach here will be a little different from the approach commonly found in the epistemic logic literature, in which one jumps straight to “arbitrary-order epistemic logic” in which arbitrarily long nested chains of knowledge (“ knows that knows that knows that \ldots”) are allowed. Instead, we will adopt a hierarchical approach, recursively defining for a “-order epistemic logic” in which knowledge chains of depth up to , but no greater, are permitted. The arbitrarily order epistemic logic is then obtained as a limit (a direct limit on the syntactic side, and an inverse limit on the semantic side, which is dual to the syntactic side) of the finite order epistemic logics.
I should warn that this is going to be a rather formal and mathematical post. Readers who simply want to know the answer to the islander puzzle would probably be better off reading the discussion at the puzzle’s own blog post instead.
One of the key difficulties in performing analysis in infinite-dimensional function spaces, as opposed to finite-dimensional vector spaces, is that the Bolzano-Weierstrass theorem no longer holds: a bounded sequence in an infinite-dimensional function space need not have any convergent subsequences (when viewed using the strong topology). To put it another way, the closed unit ball in an infinite-dimensional function space usually fails to be (sequentially) compact.
As compactness is such a useful property to have in analysis, various tools have been developed over the years to try to salvage some sort of substitute for the compactness property in infinite-dimensional spaces. One of these tools is concentration compactness, which was discussed previously on this blog. This can be viewed as a compromise between weak compactness (which is true in very general circumstances, but is often too weak for applications) and strong compactness (which would be very useful in applications, but is usually false), in which one obtains convergence in an intermediate sense that involves a group of symmetries acting on the function space in question.
Concentration compactness is usually stated and proved in the language of standard analysis: epsilons and deltas, limits and supremas, and so forth. In this post, I wanted to note that one could also state and prove the basic foundations of concentration compactness in the framework of nonstandard analysis, in which one now deals with infinitesimals and ultralimits instead of epsilons and ordinary limits. This is a fairly mild change of viewpoint, but I found it to be informative to view this subject from a slightly different perspective. The nonstandard proofs require a fair amount of general machinery to set up, but conversely, once all the machinery is up and running, the proofs become slightly shorter, and can exploit tools from (standard) infinitary analysis, such as orthogonal projections in Hilbert spaces, or the continuous-pure point decomposition of measures. Because of the substantial amount of setup required, nonstandard proofs tend to have significantly more net complexity than their standard counterparts when it comes to basic results (such as those presented in this post), but the gap between the two narrows when the results become more difficult, and for particularly intricate and deep results it can happen that nonstandard proofs end up being simpler overall than their standard analogues, particularly if the nonstandard proof is able to tap the power of some existing mature body of infinitary mathematics (e.g. ergodic theory, measure theory, Hilbert space theory, or topological group theory) which is difficult to directly access in the standard formulation of the argument.
Many structures in mathematics are incomplete in one or more ways. For instance, the field of rationals or the reals are algebraically incomplete, because there are some non-trivial algebraic equations (such as in the case of the rationals, or in the case of the reals) which could potentially have solutions (because they do not imply a necessarily false statement, such as , just using the laws of algebra), but do not actually have solutions in the specified field.
Similarly, the rationals , when viewed now as a metric space rather than as a field, are also metrically incomplete, beause there exist sequences in the rationals (e.g. the decimal approximations of the irrational number ) which could potentially converge to a limit (because they form a Cauchy sequence), but do not actually converge in the specified metric space.
A third type of incompleteness is that of logical incompleteness, which applies now to formal theories rather than to fields or metric spaces. For instance, Zermelo-Frankel-Choice (ZFC) set theory is logically incomplete, because there exist statements (such as the consistency of ZFC) which could potentially be provable by the theory (because it does not lead to a contradiction, or at least so we believe, just from the axioms and deductive rules of the theory), but is not actually provable in this theory.
A fourth type of incompleteness, which is slightly less well known than the above three, is what I will call elementary incompleteness (and which model theorists call the failure of the countable saturation property). It applies to any structure that is describable by a first-order language, such as a field, a metric space, or a universe of sets. For instance, in the language of ordered real fields, the real line is elementarily incomplete, because there exists a sequence of statements (such as the statements for natural numbers ) in this language which are potentially simultaneously satisfiable (in the sense that any finite number of these statements can be satisfied by some real number ) but are not actually simultaneously satisfiable in this theory.
In each of these cases, though, it is possible to start with an incomplete structure and complete it to a much larger structure to eliminate the incompleteness. For instance, starting with an arbitrary field , one can take its algebraic completion (or algebraic closure) ; for instance, can be viewed as the algebraic completion of . This field is usually significantly larger than the original field , but contains as a subfield, and every element of can be described as the solution to some polynomial equation with coefficients in . Furthermore, is now algebraically complete (or algebraically closed): every polynomial equation in which is potentially satisfiable (in the sense that it does not lead to a contradiction such as from the laws of algebra), is actually satisfiable in .
Similarly, starting with an arbitrary metric space , one can take its metric completion ; for instance, can be viewed as the metric completion of . Again, the completion is usually much larger than the original metric space , but contains as a subspace, and every element of can be described as the limit of some Cauchy sequence in . Furthermore, is now a complete metric space: every sequence in which is potentially convergent (in the sense of being a Cauchy sequence), is now actually convegent in .
In a similar vein, we have the Gödel completeness theorem, which implies (among other things) that for any consistent first-order theory for a first-order language , there exists at least one completion of that theory , which is a consistent theory in which every sentence in which is potentially true in (because it does not lead to a contradiction in ) is actually true in . Indeed, the completeness theorem provides at least one model (or structure) of the consistent theory , and then the completion can be formed by interpreting every sentence in using to determine its truth value. Note, in contrast to the previous two examples, that the completion is usually not unique in any way; a theory can have multiple inequivalent models , giving rise to distinct completions of the same theory.
Finally, if one starts with an arbitrary structure , one can form an elementary completion of it, which is a significantly larger structure which contains as a substructure, and such that every element of is an elementary limit of a sequence of elements in (I will define this term shortly). Furthermore, is elementarily complete; any sequence of statements that are potentially simultaneously satisfiable in (in the sense that any finite number of statements in this collection are simultaneously satisfiable), will actually be simultaneously satisfiable. As we shall see, one can form such an elementary completion by taking an ultrapower of the original structure . If is the standard universe of all the standard objects one considers in mathematics, then its elementary completion is known as the nonstandard universe, and is the setting for nonstandard analysis.
As mentioned earlier, completion tends to make a space much larger and more complicated. If one algebraically completes a finite field, for instance, one necessarily obtains an infinite field as a consequence. If one metrically completes a countable metric space with no isolated points, such as , then one necessarily obtains an uncountable metric space (thanks to the Baire category theorem). If one takes a logical completion of a consistent first-order theory that can model true arithmetic, then this completion is no longer describable by a recursively enumerable schema of axioms, thanks to Gödel’s incompleteness theorem. And if one takes the elementary completion of a countable structure, such as the integers , then the resulting completion will necessarily be uncountable.
However, there are substantial benefits to working in the completed structure which can make it well worth the massive increase in size. For instance, by working in the algebraic completion of a field, one gains access to the full power of algebraic geometry. By working in the metric completion of a metric space, one gains access to powerful tools of real analysis, such as the Baire category theorem, the Heine-Borel theorem, and (in the case of Euclidean completions) the Bolzano-Weierstrass theorem. By working in a logically and elementarily completed theory (aka a saturated model) of a first-order theory, one gains access to the branch of model theory known as definability theory, which allows one to analyse the structure of definable sets in much the same way that algebraic geometry allows one to analyse the structure of algebraic sets. Finally, when working in an elementary completion of a structure, one gains a sequential compactness property, analogous to the Bolzano-Weierstrass theorem, which can be interpreted as the foundation for much of nonstandard analysis, as well as providing a unifying framework to describe various correspondence principles between finitary and infinitary mathematics.
In this post, I wish to expand upon these above points with regard to elementary completion, and to present nonstandard analysis as a completion of standard analysis in much the same way as, say, complex algebra is a completion of real algebra, or real metric geometry is a completion of rational metric geometry.
This is the third in a series of posts on the “no self-defeating object” argument in mathematics – a powerful and useful argument based on formalising the observation that any object or structure that is so powerful that it can “defeat” even itself, cannot actually exist. This argument is used to establish many basic impossibility results in mathematics, such as Gödel’s theorem that it is impossible for any sufficiently sophisticated formal axiom system to prove its own consistency, Turing’s theorem that it is impossible for any sufficiently sophisticated programming language to solve its own halting problem, or Cantor’s theorem that it is impossible for any set to enumerate its own power set (and as a corollary, the natural numbers cannot enumerate the real numbers).
As remarked in the previous posts, many people who encounter these theorems can feel uneasy about their conclusions, and their method of proof; this seems to be particularly the case with regard to Cantor’s result that the reals are uncountable. In the previous post in this series, I focused on one particular aspect of the standard proofs which one might be uncomfortable with, namely their counterfactual nature, and observed that many of these proofs can be largely (though not completely) converted to non-counterfactual form. However, this does not fully dispel the sense that the conclusions of these theorems – that the reals are not countable, that the class of all sets is not itself a set, that truth cannot be captured by a predicate, that consistency is not provable, etc. – are highly unintuitive, and even objectionable to “common sense” in some cases.
How can intuition lead one to doubt the conclusions of these mathematical results? I believe that one reason is because these results are sensitive to the amount of vagueness in one’s mental model of mathematics. In the formal mathematical world, where every statement is either absolutely true or absolutely false with no middle ground, and all concepts require a precise definition (or at least a precise axiomatisation) before they can be used, then one can rigorously state and prove Cantor’s theorem, Gödel’s theorem, and all the other results mentioned in the previous posts without difficulty. However, in the vague and fuzzy world of mathematical intuition, in which one’s impression of the truth or falsity of a statement may be influenced by recent mental reference points, definitions are malleable and blurry with no sharp dividing lines between what is and what is not covered by such definitions, and key mathematical objects may be incompletely specified and thus “moving targets” subject to interpretation, then one can argue with some degree of justification that the conclusions of the above results are incorrect; in the vague world, it seems quite plausible that one can always enumerate all the real numbers “that one needs to”, one can always justify the consistency of one’s reasoning system, one can reason using truth as if it were a predicate, and so forth. The impossibility results only kick in once one tries to clear away the fog of vagueness and nail down all the definitions and mathematical statements precisely. (To put it another way, the no-self-defeating object argument relies very much on the disconnected, definite, and absolute nature of the boolean truth space in the rigorous mathematical world.)
One notable feature of mathematical reasoning is the reliance on counterfactual thinking – taking a hypothesis (or set of hypotheses) which may or may not be true, and following it (or them) to its logical conclusion. For instance, most propositions in mathematics start with a set of hypotheses (e.g. “Let be a natural number such that …”), which may or may not apply to the particular value of one may have in mind. Or, if one ever argues by dividing into separate cases (e.g. “Case 1: is even. … Case 2: is odd. …”), then for any given , at most one of these cases would actually be applicable, with the other cases being counterfactual alternatives. But the purest example of counterfactual thinking in mathematics comes when one employs a proof by contradiction (or reductio ad absurdum) – one introduces a hypothesis that in fact has no chance of being true at all (e.g. “Suppose for sake of contradiction that is equal to the ratio of two natural numbers.”), and proceeds to demonstrate this fact by showing that this hypothesis leads to absurdity.
Experienced mathematicians are so used to this type of counterfactual thinking that it is sometimes difficult for them to realise that it this type of thinking is not automatically intuitive for students or non-mathematicians, who can anchor their thinking on the single, “real” world to the extent that they cannot easily consider hypothetical alternatives. This can lead to confused exchanges such as the following:
Lecturer: “Theorem. Let be a prime number. Then…”
Student: “But how do you know that is a prime number? Couldn’t it be composite?”
Lecturer: “Now we see what the function does when we give it the input of instead. …”
Student: “But didn’t you just say that the input was equal to just a moment ago?”
This is not to say that counterfactual thinking is not encountered at all outside of mathematics. For instance, an obvious source of counterfactual thinking occurs in fictional writing or film, particularly in speculative fiction such as science fiction, fantasy, or alternate history. Here, one can certainly take one or more counterfactual hypotheses (e.g. “what if magic really existed?”) and follow them to see what conclusions would result. The analogy between this and mathematical counterfactual reasoning is not perfect, of course: in fiction, consequences are usually not logically entailed by their premises, but are instead driven by more contingent considerations, such as the need to advance the plot, to entertain or emotionally affect the reader, or to make some moral or ideological point, and these types of narrative elements are almost completely absent in mathematical writing. Nevertheless, the analogy can be somewhat helpful when one is first coming to terms with mathematical reasoning. For instance, the mathematical concept of a proof by contradiction can be viewed as roughly analogous in some ways to such literary concepts as satire, dark humour, or absurdist fiction, in which one takes a premise specifically with the intent to derive absurd consequences from it. And if the proof of (say) a lemma is analogous to a short story, then the statement of that lemma can be viewed as analogous to the moral of that story.
Another source of counterfactual thinking outside of mathematics comes from simulation, when one feeds some initial data or hypotheses (that may or may not correspond to what actually happens in the real world) into a simulated environment (e.g. a piece of computer software, a laboratory experiment, or even just a thought-experiment), and then runs the simulation to see what consequences result from these hypotheses. Here, proof by contradiction is roughly analogous to the “garbage in, garbage out” phenomenon that is familiar to anyone who has worked with computers: if one’s initial inputs to a simulation are not consistent with the hypotheses of that simulation, or with each other, one can obtain bizarrely illogical (and sometimes unintentionally amusing) outputs as a result; and conversely, such outputs can be used to detect and diagnose problems with the data, hypotheses, or implementation of the simulation.
Despite the presence of these non-mathematical analogies, though, proofs by contradiction are still often viewed with suspicion and unease by many students of mathematics. Perhaps the quintessential example of this is the standard proof of Cantor’s theorem that the set of real numbers is uncountable. This is about as short and as elegant a proof by contradiction as one can have without being utterly trivial, and despite this (or perhaps because of this) it seems to offend the reason of many people when they are first exposed to it, to an extent far greater than most other results in mathematics. (The only other two examples I know of that come close to doing this are the fact that the real number is equal to 1, and the solution to the blue-eyed islanders puzzle.)
Some time ago on this blog, I collected a family of well-known results in mathematics that were proven by contradiction, and specifically by a type of argument that I called the “no self-defeating object” argument; that any object that was so ridiculously overpowered that it could be used to “defeat” its own existence, could not actually exist. Many basic results in mathematics can be phrased in this manner: not only Cantor’s theorem, but Euclid’s theorem on the infinitude of primes, Gödel’s incompleteness theorem, or the conclusion (from Russell’s paradox) that the class of all sets cannot itself be a set.
I presented each of these arguments in the usual “proof by contradiction” manner; I made the counterfactual hypothesis that the impossibly overpowered object existed, and then used this to eventually derive a contradiction. Mathematically, there is nothing wrong with this reasoning, but because the argument spends almost its entire duration inside the bizarre counterfactual universe caused by an impossible hypothesis, readers who are not experienced with counterfactual thinking may view these arguments with unease.
It was pointed out to me, though (originally with regards to Euclid’s theorem, but the same point in fact applies to the other results I presented) that one can pull a large fraction of each argument out of this counterfactual world, so that one can see most of the argument directly, without the need for any intrinsically impossible hypotheses. This is done by converting the “no self-defeating object” argument into a logically equivalent “any object can be defeated” argument, with the former then being viewed as an immediate corollary of the latter. This change is almost trivial to enact (it is often little more than just taking the contrapositive of the original statement), but it does offer a slightly different “non-counterfactual” (or more precisely, “not necessarily counterfactual”) perspective on these arguments which may assist in understanding how they work.
For instance, consider the very first no-self-defeating result presented in the previous post:
Proposition 1 (No largest natural number). There does not exist a natural number that is larger than all the other natural numbers.
This is formulated in the “no self-defeating object” formulation. But it has a logically equivalent “any object can be defeated” form:
Proposition 1′. Given any natural number , one can find another natural number which is larger than .
Proof. Take .
While Proposition 1 and Proposition 1′ are logically equivalent to each other, note one key difference: Proposition 1′ can be illustrated with examples (e.g. take , so that the proof gives ), whilst Proposition 1 cannot (since there is, after all, no such thing as a largest natural number). So there is a sense in which Proposition 1′ is more “non-counterfactual” or ”constructive” than the “counterfactual” Proposition 1.
Proposition 3. There are infinitely many primes.
can be recast in “all objects can be defeated” form as
Proposition 3′. Let be a collection of primes. Then there exists a prime which is distinct from any of the primes .
Proof. Take to be any prime factor of (for instance, one could take the smallest prime factor, if one wished to be completely concrete). Since is not divisible by any of the primes , must be distinct from all of these primes.
One could argue that there was a slight use of proof by contradiction in the proof of Proposition 3′ (because one had to briefly entertain and then rule out the counterfactual possibility that was equal to one of the ), but the proposition itself is not inherently counterfactual, as it does not make as patently impossible a hypothesis as a finite enumeration of the primes. Incidentally, it can be argued that the proof of Proposition 3′ is closer in spirit to Euclid’s original proof of his theorem, than the proof of Proposition 3 that is usually given today. Again, Proposition 3′ is “constructive”; one can apply it to any finite list of primes, say , and it will actually exhibit a prime not in that list (in this case, ). The same cannot be said of Proposition 3, despite the logical equivalence of the two statements.
[Note: the article below may make more sense if one first reviews the previous blog post on the "no self-defeating object". For instance, the section and theorem numbering here is deliberately chosen to match that of the preceding post.]
One of the most notorious open problems in functional analysis is the invariant subspace problem for Hilbert spaces, which I will state here as a conjecture:
Conjecture 1 (Invariant Subspace Problem, ISP0) Let be an infinite dimensional complex Hilbert space, and let be a bounded linear operator. Then contains a proper closed invariant subspace (thus ).
As stated this conjecture is quite infinitary in nature. Just for fun, I set myself the task of trying to find an equivalent reformulation of this conjecture that only involved finite-dimensional spaces and operators. This turned out to be somewhat difficult, but not entirely impossible, if one adopts a sufficiently generous version of “finitary” (cf. my discussion of how to finitise the infinitary pigeonhole principle). Unfortunately, the finitary formulation that I arrived at ended up being rather complicated (in particular, involving the concept of a “barrier”), and did not obviously suggest a path to resolving the conjecture; but it did at least provide some simpler finitary consequences of the conjecture which might be worth focusing on as subproblems.
I should point out that the arguments here are quite “soft” in nature and are not really addressing the heart of the invariant subspace problem; but I think it is still of interest to observe that this problem is not purely an infinitary problem, and does have some non-trivial finitary consequences.
I am indebted to Henry Towsner for many discussions on this topic.
In topology, a non-empty set is said to be connected if cannot be decomposed into two nontrivial subsets that are both closed and open relative to , and path connected if any two points in can be connected by a path (i.e. there exists a continuous map with and ).
Path-connected sets are always connected, but the converse is not true, even in the model case of compact subsets of a Euclidean space. The classic counterexample is the set
which is connected but not path-connected (there is no continuous path from to ).
which is connected but not path-connected (there is no continuous path from to ).
Looking at the definitions of the two concepts, one notices a difference: the notion of path-connectedness is somehow a “positive” one, in the sense that a path-connected set can produce the existence of something (a path connecting two points and ) for a given type of input (in this case, a pair of points ). On the other hand, the notion of connectedness is a “negative” one, in that it asserts the non-existence of something (a non-trivial partition into clopen sets). To put it another way, it is relative easy to convince someone that a set is path-connected (by providing a connecting path for every pair of points) or is disconnected (by providing a non-trivial partition into clopen sets) but if a set is not path-connected, or is connected, how can one easily convince someone of this fact? To put it yet another way: is there a reasonable certificate for connectedness (or for path-disconnectedness)?
In the case of connectedness for compact subsets of Euclidean space, there is an answer as follows. If , let us call two points in -connected if one can find a finite sequence of points in , such that for all ; informally, one can jump from to in using jumps of length at most . Let us call an -discrete path.
Proof: Suppose first that is disconnected, then can be partitioned into two non-empty closed subsets . Since is compact, are compact also, and so they are separated by some non-zero distance . But then it is clear that points in cannot be -connected to points in , and the claim follows.
Conversely, suppose that there is a pair of points in and an such that are not -connected. Let be the set of all points in that are -connected to . It is easy to check that is open, closed, and a proper subset of ; thus is disconnected.
We remark that the above proposition in fact works for any compact metric space. It is instructive to see how the points and are -connected in the set (1); the -discrete path follows the graph of backwards until one gets sufficiently close to the -axis, at which point one “jumps” across to the -axis to eventually reach .
It is also interesting to contrast the above proposition with path connectedness. Clearly, if two points are connected by a path, then they are -connected for every (because every continuous map is uniformly continuous); but from the analysis of the example (1) we see that the converse is not true. Roughly speaking, the various -discrete paths from to have to be “compatible” with each other in some sense in order to synthesise a continuous path from to in the limit (we will not make this precise here).
But this leaves two (somewhat imprecise) questions, which I do not know how to satisfactorily answer:
Question 1: Is there a good certificate for path disconnectedness, say for compact subsets of ? One can construct lousy certificates, for instance one could look at all continuous paths in joining two particular points in , and verify that each one of them leaves at some point. But this is an “uncountable” certificate – it requires one to check an uncountable number of paths. In contrast, the certificate in Proposition 1 is basically a countable one (if one describes a compact set by describing a family of -nets for a countable sequence of tending to zero). (Very roughly speaking, I would like a certificate that can somehow be “verified in countable time” in a suitable oracle model, as discussed in my previous post, though I have not tried to make this vague specification more rigorous.)
It is tempting to look at the equivalence classes of given by the relation of being connected by a path, but these classes need not be closed (as one can see with the example (1)) and it is not obvious to me how to certify that two such classes are not path-connected to each other.
Question 2: Is there a good certificate for connectedness for closed but unbounded closed subsets of ? Proposition 1 fails in this case; consider for instance the set
Any pair of points is -connected for every , and yet this set is disconnected.
Any pair of points is -connected for every , and yet this set is disconnected.
The problem here is that as gets smaller, the -discrete paths connecting a pair of points such as and have diameter going to infinity. One natural guess is then to require a uniform bound on the diameter, i.e. that for any pair of points , there exists an such that there is an -discrete path from to of diameter at most for every . This does indeed force connectedness, but unfortunately not all connected sets have this property. Consider for instance the set
in , where
in , where
is a rectangular ellipse centered at the origin with minor diameter endpoints and major diameter endpoints , and
is a circle that connects the endpoint of to the point in . One can check that is a closed connected set, but the -discrete paths connecting with have unbounded diameter as .
Currently, I do not have any real progress on Question 1. For Question 2, I can only obtain the following strange “second-order” criterion for connectedness, that involves an unspecified gauge function :
Proposition 2 (Second-order connectedness certificate) Let be a closed non-empty subset of . Then the following are equivalent:
- is connected.
- For every monotone decreasing, strictly positive function and every , there exists a discrete path in such that .
Proof: This is proven in almost the same way as Proposition 1. If can be disconnected into two non-trivial sets , then one can find a monotone decreasing gauge function such that for each ball , and are separated by at least , and then there is no discrete path from to in obeying the condition .
Conversely, if there exists a gauge function and two points which cannot be connected by a discrete path in that obeys the condition , then if one sets to be all the points that can be reached from in this manner, one easily verifies that and disconnect .
It may be that this is somehow the “best” one can do, but I am not sure how to quantify this formally.
Anyway, I was curious if any of the readers here (particularly those with expertise in point-set topology or descriptive set theory) might be able to shed more light on these questions. (I also considered crossposting this to Math Overflow, but I think the question may be a bit too long (and vague) for that.)
(The original motivation for this question, by the way, stems from an attempt to use methods of topological group theory to attack questions in additive combinatorics, in the spirit of the paper of Hrushovski studied previously on this blog. The connection is rather indirect, though; I may discuss this more in a future post.)
The standard modern foundation of mathematics is constructed using set theory. With these foundations, the mathematical universe of objects one studies contains not only the “primitive” mathematical objects such as numbers and points, but also sets of these objects, sets of sets of objects, and so forth. (In a pure set theory, the primitive objects would themselves be sets as well; this is useful for studying the foundations of mathematics, but for most mathematical purposes it is more convenient, and less conceptually confusing, to refrain from modeling primitive objects as sets.) One has to carefully impose a suitable collection of axioms on these sets, in order to avoid paradoxes such as Russell’s paradox; but with a standard axiom system such as Zermelo-Fraenkel-Choice (ZFC), all actual paradoxes that we know of are eliminated. Still, one might be somewhat unnerved by the presence in set theory of statements which, while not genuinely paradoxical in a strict sense, are still highly unintuitive; Cantor’s theorem on the uncountability of the reals, and the Banach-Tarski paradox, are perhaps the two most familiar examples of this.
One may suspect that the reason for this unintuitive behaviour is the presence of infinite sets in one’s mathematical universe. After all, if one deals solely with finite sets, then there is no need to distinguish between countable and uncountable infinities, and Banach-Tarski type paradoxes cannot occur.
On the other hand, many statements in infinitary mathematics can be reformulated into equivalent statements in finitary mathematics (involving only finitely many points or numbers, etc.); I have explored this theme in a number of previous blog posts. So, one may ask: what is the finitary analogue of statements such as Cantor’s theorem or the Banach-Tarski paradox?
The finitary analogue of Cantor’s theorem is well-known: it is the assertion that for every natural number , or equivalently that the power set of a finite set of elements cannot be enumerated by itself. Though this is not quite the end of the story; after all, one also has for every natural number , or equivalently that the union of a finite set and an additional element cannot be enumerated by itself, but the former statement extends to the infinite case, while the latter one does not. What causes these two outcomes to be distinct?
On the other hand, it is less obvious what the finitary version of the Banach-Tarski paradox is. Note that this paradox is available only in three and higher dimensions, but not in one or two dimensions; so presumably a finitary analogue of this paradox should also make the same distinction between low and high dimensions.
I therefore set myself the exercise of trying to phrase Cantor’s theorem and the Banach-Tarski paradox in a more “finitary” language. It seems that the easiest way to accomplish this is to avoid the use of set theory, and replace sets by some other concept. Taking inspiration from theoretical computer science, I decided to replace concepts such as functions and sets by the concepts of algorithms and oracles instead, with various constructions in set theory being replaced instead by computer language pseudocode. The point of doing this is that one can now add a new parameter to the universe, namely the amount of computational resources one is willing to allow one’s algorithms to use. At one extreme, one can enforce a “strict finitist” viewpoint where the total computational resources available (time and memory) are bounded by some numerical constant, such as ; roughly speaking, this causes any mathematical construction to break down once its complexity exceeds this number. Or one can take the slightly more permissive “finitist” or “constructivist” viewpoint, where any finite amount of computational resource is permitted; or one can then move up to allowing any construction indexed by a countable ordinal, or the storage of any array of countable size. Finally one can allow constructions indexed by arbitrary ordinals (i.e. transfinite induction) and arrays of arbitrary infinite size, at which point the theory becomes more or less indistinguishable from standard set theory.
I describe this viewpoint, and how statements such as Cantor’s theorem and Banach-Tarski are interpreted with this viewpoint, below the fold. I should caution that this is a conceptual exercise rather than a rigorous one; I have not attempted to formalise these notions to the same extent that set theory is formalised. Thus, for instance, I have no explicit system of axioms that algorithms and oracles are supposed to obey. Of course, these formal issues have been explored in great depth by logicians over the past century or so, but I do not wish to focus on these topics in this post.
A second caveat is that the actual semantic content of this post is going to be extremely low. I am not going to provide any genuinely new proof of Cantor’s theorem, or give a new construction of Banach-Tarski type; instead, I will be reformulating the standard proofs and constructions in a different language. Nevertheless I believe this viewpoint is somewhat clarifying as to the nature of these paradoxes, and as to how they are not as fundamentally tied to the nature of sets or the nature of infinity as one might first expect.