You are currently browsing the category archive for the ‘math.LO’ category.

This fall (starting Monday, September 26), I will be teaching a graduate topics course which I have entitled “Hilbert’s fifth problem and related topics.” The course is going to focus on three related topics:

  • Hilbert’s fifth problem on the topological description of Lie groups, as well as the closely related (local) classification of locally compact groups (the Gleason-Yamabe theorem).
  • Approximate groups in nonabelian groups, and their classification via the Gleason-Yamabe theorem (this is very recent work of Emmanuel Breuillard, Ben Green, Tom Sanders, and myself, building upon earlier work of Hrushovski);
  • Gromov’s theorem on groups of polynomial growth, as proven via the classification of approximate groups (as well as some consequences to fundamental groups of Riemannian manifolds).

I have already blogged about these topics repeatedly in the past (particularly with regard to Hilbert’s fifth problem), and I intend to recycle some of that material in the lecture notes for this course.

The above three families of results exemplify two broad principles (part of what I like to call “the dichotomy between structure and randomness“):

  • (Rigidity) If a group-like object exhibits a weak amount of regularity, then it (or a large portion thereof) often automatically exhibits a strong amount of regularity as well;
  • (Structure) This strong regularity manifests itself either as Lie type structure (in continuous settings) or nilpotent type structure (in discrete settings). (In some cases, “nilpotent” should be replaced by sister properties such as “abelian“, “solvable“, or “polycyclic“.)

Let me illustrate what I mean by these two principles with two simple examples, one in the continuous setting and one in the discrete setting. We begin with a continuous example. Given an {n \times n} complex matrix {A \in M_n({\bf C})}, define the matrix exponential {\exp(A)} of {A} by the formula

\displaystyle  \exp(A) := \sum_{k=0}^\infty \frac{A^k}{k!} = 1 + A + \frac{1}{2!} A^2 + \frac{1}{3!} A^3 + \ldots

which can easily be verified to be an absolutely convergent series.

Exercise 1 Show that the map {A \mapsto \exp(A)} is a real analytic (and even complex analytic) map from {M_n({\bf C})} to {M_n({\bf C})}, and obeys the restricted homomorphism property

\displaystyle  \exp(sA) \exp(tA) = \exp((s+t)A) \ \ \ \ \ (1)

for all {A \in M_n({\bf C})} and {s,t \in {\bf C}}.

Proposition 1 (Rigidity and structure of matrix homomorphisms) Let {n} be a natural number. Let {GL_n({\bf C})} be the group of invertible {n \times n} complex matrices. Let {\Phi: {\bf R} \rightarrow GL_n({\bf C})} be a map obeying two properties:

  • (Group-like object) {\Phi} is a homomorphism, thus {\Phi(s) \Phi(t) = \Phi(s+t)} for all {s,t \in {\bf R}}.
  • (Weak regularity) The map {t \mapsto \Phi(t)} is continuous.

Then:

  • (Strong regularity) The map {t \mapsto \Phi(t)} is smooth (i.e. infinitely differentiable). In fact it is even real analytic.
  • (Lie-type structure) There exists a (unique) complex {n \times n} matrix {A} such that {\Phi(t) = \exp(tA)} for all {t \in {\bf R}}.

Proof: Let {\Phi} be as above. Let {\epsilon > 0} be a small number (depending only on {n}). By the homomorphism property, {\Phi(0) = 1} (where we use {1} here to denote the identity element of {GL_n({\bf C})}), and so by continuity we may find a small {t_0>0} such that {\Phi(t) = 1 + O(\epsilon)} for all {t \in [-t_0,t_0]} (we use some arbitrary norm here on the space of {n \times n} matrices, and allow implied constants in the {O()} notation to depend on {n}).

The map {A \mapsto \exp(A)} is real analytic and (by the inverse function theorem) is a diffeomorphism near {0}. Thus, by the inverse function theorem, we can (if {\epsilon} is small enough) find a matrix {B} of size {B = O(\epsilon)} such that {\Phi(t_0) = \exp(B)}. By the homomorphism property and (1), we thus have

\displaystyle  \Phi(t_0/2)^2 = \Phi(t_0) = \exp(B) = \exp(B/2)^2.

On the other hand, by another application of the inverse function theorem we see that the squaring map {A \mapsto A^2} is a diffeomorphism near {1} in {GL_n({\bf C})}, and thus (if {\epsilon} is small enough)

\displaystyle  \Phi(t_0/2) = \exp(B/2).

We may iterate this argument (for a fixed, but small, value of {\epsilon}) and conclude that

\displaystyle  \Phi(t_0/2^k) = \exp(B/2^k)

for all {k = 0,1,2,\ldots}. By the homomorphism property and (1) we thus have

\displaystyle  \Phi(qt_0) = \exp(qB)

whenever {q} is a dyadic rational, i.e. a rational of the form {a/2^k} for some integer {a} and natural number {k}. By continuity we thus have

\displaystyle  \Phi(st_0) = \exp(sB)

for all real {s}. Setting {A := B/t_0} we conclude that

\displaystyle  \Phi(t) = \exp(tA)

for all real {t}, which gives existence of the representation and also real analyticity and smoothness. Finally, uniqueness of the representation {\Phi(t) = \exp(tA)} follows from the identity

\displaystyle  A = \frac{d}{dt} \exp(tA)|_{t=0}.

\Box

Exercise 2 Generalise Proposition 1 by replacing the hypothesis that {\Phi} is continuous with the hypothesis that {\Phi} is Lebesgue measurable (Hint: use the Steinhaus theorem.). Show that the proposition fails (assuming the axiom of choice) if this hypothesis is omitted entirely.

Note how one needs both the group-like structure and the weak regularity in combination in order to ensure the strong regularity; neither is sufficient on its own. We will see variants of the above basic argument throughout the course. Here, the task of obtaining smooth (or real analytic structure) was relatively easy, because we could borrow the smooth (or real analytic) structure of the domain {{\bf R}} and range {M_n({\bf C})}; but, somewhat remarkably, we shall see that one can still build such smooth or analytic structures even when none of the original objects have any such structure to begin with.

Now we turn to a second illustration of the above principles, namely Jordan’s theorem, which uses a discreteness hypothesis to upgrade Lie type structure to nilpotent (and in this case, abelian) structure. We shall formulate Jordan’s theorem in a slightly stilted fashion in order to emphasise the adherence to the above-mentioned principles.

Theorem 2 (Jordan’s theorem) Let {G} be an object with the following properties:

  • (Group-like object) {G} is a group.
  • (Discreteness) {G} is finite.
  • (Lie-type structure) {G} is contained in {U_n({\bf C})} (the group of unitary {n \times n} matrices) for some {n}.

Then there is a subgroup {G'} of {G} such that

  • ({G'} is close to {G}) The index {|G/G'|} of {G'} in {G} is {O_n(1)} (i.e. bounded by {C_n} for some quantity {C_n} depending only on {n}).
  • (Nilpotent-type structure) {G'} is abelian.

A key observation in the proof of Jordan’s theorem is that if two unitary elements {g, h \in U_n({\bf C})} are close to the identity, then their commutator {[g,h] = g^{-1}h^{-1}gh} is even closer to the identity (in, say, the operator norm {\| \|_{op}}). Indeed, since multiplication on the left or right by unitary elements does not affect the operator norm, we have

\displaystyle  \| [g,h] - 1 \|_{op} = \| gh - hg \|_{op}

\displaystyle  = \| (g-1)(h-1) - (h-1)(g-1) \|_{op}

and so by the triangle inequality

\displaystyle  \| [g,h] - 1 \|_{op} \leq 2 \|g-1\|_{op} \|h-1\|_{op}. \ \ \ \ \ (2)

Now we can prove Jordan’s theorem.

Proof: We induct on {n}, the case {n=1} being trivial. Suppose first that {G} contains a central element {g} which is not a multiple of the identity. Then, by definition, {G} is contained in the centraliser {Z(g)} of {g}, which by the spectral theorem is isomorphic to a product {U_{n_1}({\bf C}) \times \ldots \times U_{n_k}({\bf C})} of smaller unitary groups. Projecting {G} to each of these factor groups and applying the induction hypothesis, we obtain the claim.

Thus we may assume that {G} contains no central elements other than multiples of the identity. Now pick a small {\epsilon > 0} (one could take {\epsilon=\frac{1}{10d}} in fact) and consider the subgroup {G'} of {G} generated by those elements of {G} that are within {\epsilon} of the identity (in the operator norm). By considering a maximal {\epsilon}-net of {G} we see that {G'} has index at most {O_{n,\epsilon}(1)} in {G}. By arguing as before, we may assume that {G'} has no central elements other than multiples of the identity.

If {G'} consists only of multiples of the identity, then we are done. If not, take an element {g} of {G'} that is not a multiple of the identity, and which is as close as possible to the identity (here is where we crucially use that {G} is finite). By (2), we see that if {\epsilon} is sufficiently small depending on {n}, and if {h} is one of the generators of {G'}, then {[g,h]} lies in {G'} and is closer to the identity than {g}, and is thus a multiple of the identity. On the other hand, {[g,h]} has determinant {1}. Given that it is so close to the identity, it must therefore be the identity (if {\epsilon} is small enough). In other words, {g} is central in {G'}, and is thus a multiple of the identity. But this contradicts the hypothesis that there are no central elements other than multiples of the identity, and we are done. \Box

Commutator estimates such as (2) will play a fundamental role in many of the arguments we will see in this course; as we saw above, such estimates combine very well with a discreteness hypothesis, but will also be very useful in the continuous setting.

Exercise 3 Generalise Jordan’s theorem to the case when {G} is a finite subgroup of {GL_n({\bf C})} rather than of {U_n({\bf C})}. (Hint: The elements of {G} are not necessarily unitary, and thus do not necessarily preserve the standard Hilbert inner product of {{\bf C}^n}. However, if one averages that inner product by the finite group {G}, one obtains a new inner product on {{\bf C}^n} that is preserved by {G}, which allows one to conjugate {G} to a subgroup of {U_n({\bf C})}. This averaging trick is (a small) part of Weyl’s unitary trick in representation theory.)

Exercise 4 (Inability to discretise nonabelian Lie groups) Show that if {n \geq 3}, then the orthogonal group {O_n({\bf R})} cannot contain arbitrarily dense finite subgroups, in the sense that there exists an {\epsilon = \epsilon_n > 0} depending only on {n} such that for every finite subgroup {G} of {O_n({\bf R})}, there exists a ball of radius {\epsilon} in {O_n({\bf R})} (with, say, the operator norm metric) that is disjoint from {G}. What happens in the {n=2} case?

Remark 1 More precise classifications of the finite subgroups of {U_n({\bf C})} are known, particularly in low dimensions. For instance, one can show that the only finite subgroups of {SO_3({\bf R})} (which {SU_2({\bf C})} is a double cover of) are isomorphic to either a cyclic group, a dihedral group, or the symmetry group of one of the Platonic solids.

Read the rest of this entry »

I have blogged several times in the past about nonstandard analysis, which among other things is useful in allowing one to import tools from infinitary (or qualitative) mathematics in order to establish results in finitary (or quantitative) mathematics. One drawback, though, to using nonstandard analysis methods is that the bounds one obtains by such methods are usually ineffective: in particular, the conclusions of a nonstandard analysis argument may involve an unspecified constant {C} that is known to be finite but for which no explicit bound is obviously available. (In many cases, a bound can eventually be worked out by performing proof mining on the argument, and in particular by carefully unpacking the proofs of all the various results from infinitary mathematics that were used in the argument, as opposed to simply using them as “black boxes”, but this is a time-consuming task and the bounds that one eventually obtains tend to be quite poor (e.g. tower exponential or Ackermann type bounds are not uncommon).)

Because of this fact, it would seem that quantitative bounds, such as polynomial type bounds {X \leq C Y^C} that show that one quantity {X} is controlled in a polynomial fashion by another quantity {Y}, are not easily obtainable through the ineffective methods of nonstandard analysis. Actually, this is not the case; as I will demonstrate by an example below, nonstandard analysis can certainly yield polynomial type bounds. The catch is that the exponent {C} in such bounds will be ineffective; but nevertheless such bounds are still good enough for many applications.

Let us now illustrate this by reproving a lemma from this paper of Mei-Chu Chang (Lemma 2.14, to be precise), which was recently pointed out to me by Van Vu. Chang’s paper is focused primarily on the sum-product problem, but she uses a quantitative lemma from algebraic geometry which is of independent interest. To motivate the lemma, let us first establish a qualitative version:

Lemma 1 (Qualitative solvability) Let {P_1,\ldots,P_r: {\bf C}^d \rightarrow {\bf C}} be a finite number of polynomials in several variables with rational coefficients. If there is a complex solution {z = (z_1,\ldots,z_d) \in {\bf C}^d} to the simultaneous system of equations

\displaystyle  P_1(z) = \ldots = P_r(z) = 0,

then there also exists a solution {z \in \overline{{\bf Q}}^d} whose coefficients are algebraic numbers (i.e. they lie in the algebraic closure {{\bf Q}} of the rationals).

Proof: Suppose there was no solution to {P_1(z)=\ldots=P_r(z)=0} over {\overline{{\bf Q}}}. Applying Hilbert’s nullstellensatz (which is available as {\overline{{\bf Q}}} is algebraically closed), we conclude the existence of some polynomials {Q_1,\ldots,Q_r} (with coefficients in {\overline{{\bf Q}}}) such that

\displaystyle  P_1 Q_1 + \ldots + P_r Q_r = 1

as polynomials. In particular, we have

\displaystyle  P_1(z) Q_1(z) + \ldots + P_r(z) Q_r(z) = 1

for all {z \in {\bf C}^d}. This shows that there is no solution to {P_1(z)=\ldots=P_r(z)=0} over {{\bf C}}, as required. \Box

Remark 1 Observe that in the above argument, one could replace {{\bf Q}} and {{\bf C}} by any other pair of fields, with the latter containing the algebraic closure of the former, and still obtain the same result.

The above lemma asserts that if a system of rational equations is solvable at all, then it is solvable with some algebraic solution. But it gives no bound on the complexity of that solution in terms of the complexity of the original equation. Chang’s lemma provides such a bound. If {H \geq 1} is an integer, let us say that an algebraic number has height at most {H} if its minimal polynomial (after clearing denominators) consists of integers of magnitude at most {H}.

Lemma 2 (Quantitative solvability) Let {P_1,\ldots,P_r: {\bf C}^d \rightarrow {\bf C}} be a finite number of polynomials of degree at most {D} with rational coefficients, each of height at most {H}. If there is a complex solution {z = (z_1,\ldots,z_d) \in {\bf C}^d} to the simultaneous system of equations

\displaystyle  P_1(z) = \ldots = P_r(z) = 0,

then there also exists a solution {z \in \overline{{\bf Q}}^d} whose coefficients are algebraic numbers of degree at most {C} and height at most {CH^C}, where {C = C_{D, d,r}} depends only on {D}, {d} and {r}.

Chang proves this lemma by essentially establishing a quantitative version of the nullstellensatz, via elementary elimination theory (somewhat similar, actually, to the approach I took to the nullstellensatz in my own blog post). She also notes that one could also establish the result through the machinery of Gröbner bases. In each of these arguments, it was not possible to use Lemma 1 (or the closely related nullstellensatz) as a black box; one actually had to unpack one of the proofs of that lemma or nullstellensatz to get the polynomial bound. However, using nonstandard analysis, it is possible to get such polynomial bounds (albeit with an ineffective value of the constant {C}) directly from Lemma 1 (or more precisely, the generalisation in Remark 1) without having to inspect the proof, and instead simply using it as a black box, thus providing a “soft” proof of Lemma 2 that is an alternative to the “hard” proofs mentioned above.

Here’s how the proof works. Informally, the idea is that Lemma 2 should follow from Lemma 1 after replacing the field of rationals {{\bf Q}} with “the field of rationals of polynomially bounded height”. Unfortunately, the latter object does not really make sense as a field in standard analysis; nevertheless, it is a perfectly sensible object in nonstandard analysis, and this allows the above informal argument to be made rigorous.

We turn to the details. As is common whenever one uses nonstandard analysis to prove finitary results, we use a “compactness and contradiction” argument (or more precisely, an “ultralimit and contradiction” argument). Suppose for contradiction that Lemma 2 failed. Carefully negating the quantifiers (and using the axiom of choice), we conclude that there exists {D, d, r} such that for each natural number {n}, there is a positive integer {H^{(n)}} and a family {P_1^{(n)}, \ldots, P_r^{(n)}: {\bf C}^d \rightarrow {\bf C}} of polynomials of degree at most {D} and rational coefficients of height at most {H^{(n)}}, such that there exist at least one complex solution {z^{(n)} \in {\bf C}^d} to

\displaystyle  P_1^{(n)}(z^{(n)}) = \ldots = P_r(z^{(n)}) = 0, \ \ \ \ \ (1)

but such that there does not exist any such solution whose coefficients are algebraic numbers of degree at most {n} and height at most {n (H^{(n)})^n}.

Now we take ultralimits (see e.g. this previous blog post of a quick review of ultralimit analysis, which we will assume knowledge of in the argument that follows). Let {p \in \beta {\bf N} \backslash {\bf N}} be a non-principal ultrafilter. For each {i=1,\ldots,r}, the ultralimit

\displaystyle  P_i := \lim_{n \rightarrow p} P_i^{(n)}

of the (standard) polynomials {P_i^{(n)}} is a nonstandard polynomial {P_i: {}^* {\bf C}^d \rightarrow {}^* {\bf C}} of degree at most {D}, whose coefficients now lie in the nonstandard rationals {{}^* {\bf Q}}. Actually, due to the height restriction, we can say more. Let {H := \lim_{n \rightarrow p} H^{(n)} \in {}^* {\bf N}} be the ultralimit of the {H^{(n)}}, this is a nonstandard natural number (which will almost certainly be unbounded, but we will not need to use this). Let us say that a nonstandard integer {a} is of polynomial size if we have {|a| \leq C H^C} for some standard natural number {C}, and say that a nonstandard rational number {a/b} is of polynomial height if {a}, {b} are of polynomial size. Let {{\bf Q}_{poly(H)}} be the collection of all nonstandard rationals of polynomial height. (In the language of nonstandard analysis, {{\bf Q}_{poly(H)}} is an external set rather than an internal one, because it is not itself an ultraproduct of standard sets; but this will not be relevant for the argument that follows.) It is easy to see that {{\bf Q}_{poly(H)}} is a field, basically because the sum or product of two integers of polynomial size, remains of polynomial size. By construction, it is clear that the coefficients of {P_i} are nonstandard rationals of polynomial height, and thus {P_1,\ldots,P_r} are defined over {{\bf Q}_{poly(H)}}.

Meanwhile, if we let {z := \lim_{n \rightarrow p} z^{(n)} \in {}^* {\bf C}^d} be the ultralimit of the solutions {z^{(n)}} in (1), we have

\displaystyle  P_1(z) = \ldots = P_r(z) = 0,

thus {P_1,\ldots,P_r} are solvable in {{}^* {\bf C}}. Applying Lemma 1 (or more precisely, the generalisation in Remark 1), we see that {P_1,\ldots,P_r} are also solvable in {\overline{{\bf Q}_{poly(H)}}}. (Note that as {{\bf C}} is algebraically closed, {{}^*{\bf C}} is also (by Los’s theorem), and so {{}^* {\bf C}} contains {\overline{{\bf Q}_{poly(H)}}}.) Thus, there exists {w \in \overline{{\bf Q}_{poly(H)}}^d} with

\displaystyle  P_1(w) = \ldots = P_r(w) = 0.

As {\overline{{\bf Q}_{poly(H)}}^d} lies in {{}^* {\bf C}^d}, we can write {w} as an ultralimit {w = \lim_{n \rightarrow p} w^{(n)}} of standard complex vectors {w^{(n)} \in {\bf C}^d}. By construction, the coefficients of {w} each obey a non-trivial polynomial equation of degree at most {C} and whose coefficients are nonstandard integers of magnitude at most {C H^C}, for some standard natural number {C}. Undoing the ultralimit, we conclude that for {n} sufficiently close to {p}, the coefficients of {w^{(n)}} obey a non-trivial polynomial equation of degree at most {C} whose coefficients are standard integers of magnitude at most {C (H^{(n)})^C}. In particular, these coefficients have height at most {C (H^{(n)})^C}. Also, we have

\displaystyle  P_1^{(n)}(w^{(n)}) = \ldots = P_r^{(n)}(w^{(n)}) = 0.

But for {n} larger than {C}, this contradicts the construction of the {P_i^{(n)}}, and the claim follows. (Note that as {p} is non-principal, any neighbourhood of {p} in {{\bf N}} will contain arbitrarily large natural numbers.)

Remark 2 The same argument actually gives a slightly stronger version of Lemma 2, namely that the integer coefficients used to define the algebraic solution {z} can be taken to be polynomials in the coefficients of {P_1,\ldots,P_r}, with degree and coefficients bounded by {C_{D,d,r}}.

I recently reposted my favourite logic puzzle, namely the blue-eyed islander puzzle. I am fond of this puzzle because in order to properly understand the correct solution (and to properly understand why the alternative solution is incorrect), one has to think very clearly (but unintuitively) about the nature of knowledge.

There is however an additional subtlety to the puzzle that was pointed out in comments, in that the correct solution to the puzzle has two components, a (necessary) upper bound and a (possible) lower bound (I’ll explain this further below the fold, in order to avoid blatantly spoiling the puzzle here). Only the upper bound is correctly explained in the puzzle (and even then, there are some slight inaccuracies, as will be discussed below). The lower bound, however, is substantially more difficult to establish, in part because the bound is merely possible and not necessary. Ultimately, this is because to demonstrate the upper bound, one merely has to show that a certain statement is logically deducible from an islander’s state of knowledge, which can be done by presenting an appropriate chain of logical deductions. But to demonstrate the lower bound, one needs to show that certain statements are not logically deducible from an islander’s state of knowledge, which is much harder, as one has to rule out all possible chains of deductive reasoning from arriving at this particular conclusion. In fact, to rigorously establish such impossiblity statements, one ends up having to leave the “syntactic” side of logic (deductive reasoning), and move instead to the dual “semantic” side of logic (creation of models). As we shall see, semantics requires substantially more mathematical setup than syntax, and the demonstration of the lower bound will therefore be much lengthier than that of the upper bound.

To complicate things further, the particular logic that is used in the blue-eyed islander puzzle is not the same as the logics that are commonly used in mathematics, namely propositional logic and first-order logic. Because the logical reasoning here depends so crucially on the concept of knowledge, one must work instead with an epistemic logic (or more precisely, an epistemic modal logic) which can properly work with, and model, the knowledge of various agents. To add even more complication, the role of time is also important (an islander may not know a certain fact on one day, but learn it on the next day), so one also needs to incorporate the language of temporal logic in order to fully model the situation. This makes both the syntax and semantics of the logic quite intricate; to see this, one only needs to contemplate the task of programming a computer with enough epistemic and temporal deductive reasoning powers that it would be able to solve the islander puzzle (or even a smaller version thereof, say with just three or four islanders) without being deliberately “fed” the solution. (The fact, therefore, that humans can grasp the correct solution without any formal logical training is therefore quite remarkable.)

As difficult as the syntax of temporal epistemic modal logic is, though, the semantics is more intricate still. For instance, it turns out that in order to completely model the epistemic state of a finite number of agents (such as 1000 islanders), one requires an infinite model, due to the existence of arbitrarily long nested chains of knowledge (e.g. “{A} knows that {B} knows that {C} knows that {D} has blue eyes”), which cannot be automatically reduced to shorter chains of knowledge. Furthermore, because each agent has only an incomplete knowledge of the world, one must take into account multiple hypothetical worlds, which differ from the real world but which are considered to be possible worlds by one or more agents, thus introducing modality into the logic. More subtly, one must also consider worlds which each agent knows to be impossible, but are not commonly known to be impossible, so that (for instance) one agent is willing to admit the possibility that another agent considers that world to be possible; it is the consideration of such worlds which is crucial to the resolution of the blue-eyed islander puzzle. And this is even before one adds the temporal aspect (e.g. “On Tuesday, {A} knows that on Monday, {B} knew that by Wednesday, {C} will know that {D} has blue eyes”).

Despite all this fearsome complexity, it is still possible to set up both the syntax and semantics of temporal epistemic modal logic in such a way that one can formulate the blue-eyed islander problem rigorously, and in such a way that one has both an upper and a lower bound in the solution. The purpose of this post is to construct such a setup and to explain the lower bound in particular. The same logic is also useful for analysing another well-known paradox, the unexpected hanging paradox, and I will do so at the end of the post. Note though that there is more than one way to set up epistemic logics, and they are not all equivalent to each other.

(On the other hand, for puzzles such as the islander puzzle in which there are only a finite number of atomic propositions and no free variables, one at least can avoid the need to admit predicate logic, in which one has to discuss quantifiers such as {\forall} and {\exists}. A fully formed predicate temporal epistemic modal logic would indeed be of terrifying complexity.)

Our approach here will be a little different from the approach commonly found in the epistemic logic literature, in which one jumps straight to “arbitrary-order epistemic logic” in which arbitrarily long nested chains of knowledge (“{A} knows that {B} knows that {C} knows that \ldots”) are allowed. Instead, we will adopt a hierarchical approach, recursively defining for {k=0,1,2,\ldots} a “{k^{th}}-order epistemic logic” in which knowledge chains of depth up to {k}, but no greater, are permitted. The arbitrarily order epistemic logic is then obtained as a limit (a direct limit on the syntactic side, and an inverse limit on the semantic side, which is dual to the syntactic side) of the finite order epistemic logics.

I should warn that this is going to be a rather formal and mathematical post. Readers who simply want to know the answer to the islander puzzle would probably be better off reading the discussion at the puzzle’s own blog post instead.

Read the rest of this entry »

One of the key difficulties in performing analysis in infinite-dimensional function spaces, as opposed to finite-dimensional vector spaces, is that the Bolzano-Weierstrass theorem no longer holds: a bounded sequence in an infinite-dimensional function space need not have any convergent subsequences (when viewed using the strong topology). To put it another way, the closed unit ball in an infinite-dimensional function space usually fails to be (sequentially) compact.

As compactness is such a useful property to have in analysis, various tools have been developed over the years to try to salvage some sort of substitute for the compactness property in infinite-dimensional spaces. One of these tools is concentration compactness, which was discussed previously on this blog. This can be viewed as a compromise between weak compactness (which is true in very general circumstances, but is often too weak for applications) and strong compactness (which would be very useful in applications, but is usually false), in which one obtains convergence in an intermediate sense that involves a group of symmetries acting on the function space in question.

Concentration compactness is usually stated and proved in the language of standard analysis: epsilons and deltas, limits and supremas, and so forth. In this post, I wanted to note that one could also state and prove the basic foundations of concentration compactness in the framework of nonstandard analysis, in which one now deals with infinitesimals and ultralimits instead of epsilons and ordinary limits. This is a fairly mild change of viewpoint, but I found it to be informative to view this subject from a slightly different perspective. The nonstandard proofs require a fair amount of general machinery to set up, but conversely, once all the machinery is up and running, the proofs become slightly shorter, and can exploit tools from (standard) infinitary analysis, such as orthogonal projections in Hilbert spaces, or the continuous-pure point decomposition of measures. Because of the substantial amount of setup required, nonstandard proofs tend to have significantly more net complexity than their standard counterparts when it comes to basic results (such as those presented in this post), but the gap between the two narrows when the results become more difficult, and for particularly intricate and deep results it can happen that nonstandard proofs end up being simpler overall than their standard analogues, particularly if the nonstandard proof is able to tap the power of some existing mature body of infinitary mathematics (e.g. ergodic theory, measure theory, Hilbert space theory, or topological group theory) which is difficult to directly access in the standard formulation of the argument.

Read the rest of this entry »

Many structures in mathematics are incomplete in one or more ways. For instance, the field of rationals {{\bf Q}} or the reals {{\bf R}} are algebraically incomplete, because there are some non-trivial algebraic equations (such as {x^2=2} in the case of the rationals, or {x^2=-1} in the case of the reals) which could potentially have solutions (because they do not imply a necessarily false statement, such as {1=0}, just using the laws of algebra), but do not actually have solutions in the specified field.

Similarly, the rationals {{\bf Q}}, when viewed now as a metric space rather than as a field, are also metrically incomplete, beause there exist sequences in the rationals (e.g. the decimal approximations {3, 3.1, 3.14, 3.141, \ldots} of the irrational number {\pi}) which could potentially converge to a limit (because they form a Cauchy sequence), but do not actually converge in the specified metric space.

A third type of incompleteness is that of logical incompleteness, which applies now to formal theories rather than to fields or metric spaces. For instance, Zermelo-Frankel-Choice (ZFC) set theory is logically incomplete, because there exist statements (such as the consistency of ZFC) which could potentially be provable by the theory (because it does not lead to a contradiction, or at least so we believe, just from the axioms and deductive rules of the theory), but is not actually provable in this theory.

A fourth type of incompleteness, which is slightly less well known than the above three, is what I will call elementary incompleteness (and which model theorists call the failure of the countable saturation property). It applies to any structure that is describable by a first-order language, such as a field, a metric space, or a universe of sets. For instance, in the language of ordered real fields, the real line {{\bf R}} is elementarily incomplete, because there exists a sequence of statements (such as the statements {0 < x < 1/n} for natural numbers {n=1,2,\ldots}) in this language which are potentially simultaneously satisfiable (in the sense that any finite number of these statements can be satisfied by some real number {x}) but are not actually simultaneously satisfiable in this theory.

In each of these cases, though, it is possible to start with an incomplete structure and complete it to a much larger structure to eliminate the incompleteness. For instance, starting with an arbitrary field {k}, one can take its algebraic completion (or algebraic closure) {\overline{k}}; for instance, {{\bf C} = \overline{{\bf R}}} can be viewed as the algebraic completion of {{\bf R}}. This field is usually significantly larger than the original field {k}, but contains {k} as a subfield, and every element of {\overline{k}} can be described as the solution to some polynomial equation with coefficients in {k}. Furthermore, {\overline{k}} is now algebraically complete (or algebraically closed): every polynomial equation in {\overline{k}} which is potentially satisfiable (in the sense that it does not lead to a contradiction such as {1=0} from the laws of algebra), is actually satisfiable in {\overline{k}}.

Similarly, starting with an arbitrary metric space {X}, one can take its metric completion {\overline{X}}; for instance, {{\bf R} = \overline{{\bf Q}}} can be viewed as the metric completion of {{\bf Q}}. Again, the completion {\overline{X}} is usually much larger than the original metric space {X}, but contains {X} as a subspace, and every element of {\overline{X}} can be described as the limit of some Cauchy sequence in {X}. Furthermore, {\overline{X}} is now a complete metric space: every sequence in {\overline{X}} which is potentially convergent (in the sense of being a Cauchy sequence), is now actually convegent in {\overline{X}}.

In a similar vein, we have the Gödel completeness theorem, which implies (among other things) that for any consistent first-order theory {T} for a first-order language {L}, there exists at least one completion {\overline{T}} of that theory {T}, which is a consistent theory in which every sentence in {L} which is potentially true in {\overline{T}} (because it does not lead to a contradiction in {\overline{T}}) is actually true in {\overline{T}}. Indeed, the completeness theorem provides at least one model (or structure) {{\mathfrak U}} of the consistent theory {T}, and then the completion {\overline{T} = \hbox{Th}({\mathfrak U})} can be formed by interpreting every sentence in {L} using {{\mathfrak U}} to determine its truth value. Note, in contrast to the previous two examples, that the completion is usually not unique in any way; a theory {T} can have multiple inequivalent models {{\mathfrak U}}, giving rise to distinct completions of the same theory.

Finally, if one starts with an arbitrary structure {{\mathfrak U}}, one can form an elementary completion {{}^* {\mathfrak U}} of it, which is a significantly larger structure which contains {{\mathfrak U}} as a substructure, and such that every element of {{}^* {\mathfrak U}} is an elementary limit of a sequence of elements in {{\mathfrak U}} (I will define this term shortly). Furthermore, {{}^* {\mathfrak U}} is elementarily complete; any sequence of statements that are potentially simultaneously satisfiable in {{}^* {\mathfrak U}} (in the sense that any finite number of statements in this collection are simultaneously satisfiable), will actually be simultaneously satisfiable. As we shall see, one can form such an elementary completion by taking an ultrapower of the original structure {{\mathfrak U}}. If {{\mathfrak U}} is the standard universe of all the standard objects one considers in mathematics, then its elementary completion {{}^* {\mathfrak U}} is known as the nonstandard universe, and is the setting for nonstandard analysis.

As mentioned earlier, completion tends to make a space much larger and more complicated. If one algebraically completes a finite field, for instance, one necessarily obtains an infinite field as a consequence. If one metrically completes a countable metric space with no isolated points, such as {{\bf Q}}, then one necessarily obtains an uncountable metric space (thanks to the Baire category theorem). If one takes a logical completion of a consistent first-order theory that can model true arithmetic, then this completion is no longer describable by a recursively enumerable schema of axioms, thanks to Gödel’s incompleteness theorem. And if one takes the elementary completion of a countable structure, such as the integers {{\bf Z}}, then the resulting completion {{}^* {\bf Z}} will necessarily be uncountable.

However, there are substantial benefits to working in the completed structure which can make it well worth the massive increase in size. For instance, by working in the algebraic completion of a field, one gains access to the full power of algebraic geometry. By working in the metric completion of a metric space, one gains access to powerful tools of real analysis, such as the Baire category theorem, the Heine-Borel theorem, and (in the case of Euclidean completions) the Bolzano-Weierstrass theorem. By working in a logically and elementarily completed theory (aka a saturated model) of a first-order theory, one gains access to the branch of model theory known as definability theory, which allows one to analyse the structure of definable sets in much the same way that algebraic geometry allows one to analyse the structure of algebraic sets. Finally, when working in an elementary completion of a structure, one gains a sequential compactness property, analogous to the Bolzano-Weierstrass theorem, which can be interpreted as the foundation for much of nonstandard analysis, as well as providing a unifying framework to describe various correspondence principles between finitary and infinitary mathematics.

In this post, I wish to expand upon these above points with regard to elementary completion, and to present nonstandard analysis as a completion of standard analysis in much the same way as, say, complex algebra is a completion of real algebra, or real metric geometry is a completion of rational metric geometry.

Read the rest of this entry »

This is the third in a series of posts on the “no self-defeating object” argument in mathematics – a powerful and useful argument based on formalising the observation that any object or structure that is so powerful that it can “defeat” even itself, cannot actually exist.   This argument is used to establish many basic impossibility results in mathematics, such as Gödel’s theorem that it is impossible for any sufficiently sophisticated formal axiom system to prove its own consistency, Turing’s theorem that it is impossible for any sufficiently sophisticated programming language to solve its own halting problem, or Cantor’s theorem that it is impossible for any set to enumerate its own power set (and as a corollary, the natural numbers cannot enumerate the real numbers).

As remarked in the previous posts, many people who encounter these theorems can feel uneasy about their conclusions, and their method of proof; this seems to be particularly the case with regard to Cantor’s result that the reals are uncountable.   In the previous post in this series, I focused on one particular aspect of the standard proofs which one might be uncomfortable with, namely their counterfactual nature, and observed that many of these proofs can be largely (though not completely) converted to non-counterfactual form.  However, this does not fully dispel the sense that the conclusions of these theorems – that the reals are not countable, that the class of all sets is not itself a set, that truth cannot be captured by a predicate, that consistency is not provable, etc. – are highly unintuitive, and even objectionable to “common sense” in some cases.

How can intuition lead one to doubt the conclusions of these mathematical results?  I believe that one reason is because these results are sensitive to the amount of vagueness in one’s mental model of mathematics.  In the formal mathematical world, where every statement is either absolutely true or absolutely false with no middle ground, and all concepts require a precise definition (or at least a precise axiomatisation) before they can be used, then one can rigorously state and prove Cantor’s theorem, Gödel’s theorem, and all the other results mentioned in the previous posts without difficulty.  However, in the vague and fuzzy world of mathematical intuition, in which one’s impression of the truth or falsity of a statement may be influenced by recent mental reference points, definitions are malleable and blurry with no sharp dividing lines between what is and what is not covered by such definitions, and key mathematical objects may be incompletely specified and thus “moving targets” subject to interpretation, then one can argue with some degree of justification that the conclusions of the above results are incorrect; in the vague world, it seems quite plausible that one can always enumerate all the real numbers “that one needs to”, one can always justify the consistency of one’s reasoning system, one can reason using truth as if it were a predicate, and so forth.    The impossibility results only kick in once one tries to clear away the fog of vagueness and nail down all the definitions and mathematical statements precisely.  (To put it another way, the no-self-defeating object argument relies very much on the disconnected, definite, and absolute nature of the boolean truth space \{\hbox{true},\hbox{ false}\} in the rigorous mathematical world.)

Read the rest of this entry »

One notable feature of mathematical reasoning is the reliance on counterfactual thinking – taking a hypothesis (or set of hypotheses) which may or may not be true, and following it (or them) to its logical conclusion.  For instance, most propositions in mathematics start with a set of hypotheses (e.g. “Let n be a natural number such that …”), which may or may not apply to the particular value of n one may have in mind.  Or, if one ever argues by dividing into separate cases (e.g. “Case 1: n is even. … Case 2: n is odd.  …”), then for any given n, at most one of these cases would actually be applicable, with the other cases being counterfactual alternatives.     But the purest example of counterfactual thinking in mathematics comes when one employs a proof by contradiction (or reductio ad absurdum) – one introduces a hypothesis that in fact has no chance of being true at all (e.g. “Suppose for sake of contradiction that \sqrt{2} is equal to the ratio p/q of two natural numbers.”), and proceeds to demonstrate this fact by showing that this hypothesis leads to absurdity.

Experienced mathematicians are so used to this type of counterfactual thinking that it is sometimes difficult for them to realise that it this type of thinking is not automatically intuitive for students or non-mathematicians, who can anchor their thinking on the single, “real” world to the extent that they cannot easily consider hypothetical alternatives.  This can lead to confused exchanges such as the following:

Lecturer: “Theorem.  Let p be a prime number.  Then…”

Student: “But how do you know that p is a prime number?  Couldn’t it be composite?”

or

Lecturer: “Now we see what the function f does when we give it the input of x+dx instead.  …”

Student: “But didn’t you just say that the input was equal to x just a moment ago?”

This is not to say that counterfactual thinking is not encountered at all outside of mathematics.  For instance, an obvious source of counterfactual thinking occurs in fictional writing or film, particularly in speculative fiction such as science fiction, fantasy, or alternate history.  Here, one can certainly take one or more counterfactual hypotheses (e.g. “what if magic really existed?”) and follow them to see what conclusions would result.  The analogy between this and mathematical counterfactual reasoning is not perfect, of course: in fiction, consequences are usually not logically entailed by their premises, but are instead driven by more contingent considerations, such as the need to advance the plot, to entertain or emotionally affect the reader, or to make some moral or ideological point, and these types of narrative elements are almost completely absent in mathematical writing.  Nevertheless, the analogy can be somewhat helpful when one is first coming to terms with mathematical reasoning.  For instance, the mathematical concept of a proof by contradiction can be viewed as roughly analogous in some ways to such literary concepts as satire, dark humour, or absurdist fiction, in which one takes a premise specifically with the intent to derive absurd consequences from it.  And if the proof of (say) a lemma is analogous to a short story, then the statement of that lemma can be viewed as analogous to the moral of that story.

Another source of counterfactual thinking outside of mathematics comes from simulation, when one feeds some initial data or hypotheses (that may or may not correspond to what actually happens in the real world) into a simulated environment (e.g. a piece of computer software, a laboratory experiment, or even just a thought-experiment), and then runs the simulation to see what consequences result from these hypotheses.   Here, proof by contradiction is roughly analogous to the “garbage in, garbage out” phenomenon that is familiar to anyone who has worked with computers: if one’s initial inputs to a simulation are not consistent with the hypotheses of that simulation, or with each other, one can obtain bizarrely illogical (and sometimes unintentionally amusing) outputs as a result; and conversely, such outputs can be used to detect and diagnose problems with the data, hypotheses, or implementation of the simulation.

Despite the presence of these non-mathematical analogies, though, proofs by contradiction are still often viewed with suspicion and unease by many students of mathematics.  Perhaps the quintessential example of this is the standard proof of Cantor’s theorem that the set {\bf R} of real numbers is uncountable.  This is about as short and as elegant a proof by contradiction as one can have without being utterly trivial, and despite this (or perhaps because of this) it seems to offend the reason of many people when they are first exposed to it, to an extent far greater than most other results in mathematics.  (The only other two examples I know of that come close to doing this are the fact that the real number 0.999\ldots is equal to 1, and the solution to the blue-eyed islanders puzzle.)

Some time ago on this blog, I collected a family of well-known results in mathematics that were proven by contradiction, and specifically by a type of argument that I called the “no self-defeating object” argument; that any object that was so ridiculously overpowered that it could be used to “defeat” its own existence, could not actually exist.  Many basic results in mathematics can be phrased in this manner: not only Cantor’s theorem, but Euclid’s theorem on the infinitude of primes, Gödel’s incompleteness theorem, or the conclusion (from Russell’s paradox) that the class of all sets cannot itself be a set.

I presented each of these arguments in the usual “proof by contradiction” manner; I made the counterfactual hypothesis that the impossibly overpowered object existed, and then used this to eventually derive a contradiction.  Mathematically, there is nothing wrong with this reasoning, but because the argument spends almost its entire duration inside the bizarre counterfactual universe caused by an impossible hypothesis, readers who are not experienced with counterfactual thinking may view these arguments with unease.

It was pointed out to me, though (originally with regards to Euclid’s theorem, but the same point in fact applies to the other results I presented) that one can pull a large fraction of each argument out of this counterfactual world, so that one can see most of the argument directly, without the need for any intrinsically impossible hypotheses.  This is done by converting the “no self-defeating object” argument into a logically equivalent “any object can be defeated” argument, with the former then being viewed as an immediate corollary of the latter.  This change is almost trivial to enact (it is often little more than just taking the contrapositive of the original statement), but it does offer a slightly different “non-counterfactual” (or more precisely, “not necessarily counterfactual”) perspective on these arguments which may assist in understanding how they work.

For instance, consider the very first no-self-defeating result presented in the previous post:

Proposition 1 (No largest natural number). There does not exist a natural number N that is larger than all the other natural numbers.

This is formulated in the “no self-defeating object” formulation.  But it has a logically equivalent “any object can be defeated” form:

Proposition 1′. Given any natural number N, one can find another natural number N' which is larger than N.

Proof. Take N' := N+1. \Box

While Proposition 1 and Proposition 1′ are logically equivalent to each other, note one key difference: Proposition 1′ can be illustrated with examples (e.g. take N = 100, so that the proof gives N'=101 ), whilst Proposition 1 cannot (since there is, after all, no such thing as a largest natural number).  So there is a sense in which Proposition 1′ is more “non-counterfactual” or  “constructive” than the “counterfactual” Proposition 1.

In a similar spirit, Euclid’s theorem (which we give using the numbering from the previous post),

Proposition 3. There are infinitely many primes.

can be recast in “all objects can be defeated” form as

Proposition 3′.  Let p_1,\ldots,p_n be a collection of primes.   Then there exists a prime q which is distinct from any of the primes p_1,\ldots,p_n.

Proof. Take q to be any prime factor of p_1 \ldots p_n + 1 (for instance, one could take the smallest prime factor, if one wished to be completely concrete).   Since p_1 \ldots p_n + 1 is not divisible by any of the primes p_1,\ldots,p_n, q must be distinct from all of these primes.  \Box

One could argue that  there was a slight use of proof by contradiction in the proof of Proposition 3′ (because one had to briefly entertain and then rule out the counterfactual possibility that q was equal to one of the p_1,\ldots,p_n), but the proposition itself is not inherently counterfactual, as  it does not make as patently impossible a hypothesis as a finite enumeration of the primes.  Incidentally, it can be argued that the proof of Proposition 3′ is closer in spirit to Euclid’s original proof of his theorem, than the proof of Proposition 3 that is usually given today.  Again, Proposition 3′ is “constructive”; one can apply it to any finite list of primes, say 2, 3, 5, and it will actually exhibit a prime not in that list (in this case, 31).  The same cannot be said of Proposition 3, despite the logical equivalence of the two statements.

[Note: the article below may make more sense if one first reviews the previous blog post on the "no self-defeating object".  For instance, the section and theorem numbering here is deliberately chosen to match that of the preceding post.]

Read the rest of this entry »

One of the most notorious open problems in functional analysis is the invariant subspace problem for Hilbert spaces, which I will state here as a conjecture:

Conjecture 1 (Invariant Subspace Problem, ISP0) Let {H} be an infinite dimensional complex Hilbert space, and let {T: H \rightarrow H} be a bounded linear operator. Then {H} contains a proper closed invariant subspace {V} (thus {TV \subset V}).

As stated this conjecture is quite infinitary in nature. Just for fun, I set myself the task of trying to find an equivalent reformulation of this conjecture that only involved finite-dimensional spaces and operators. This turned out to be somewhat difficult, but not entirely impossible, if one adopts a sufficiently generous version of “finitary” (cf. my discussion of how to finitise the infinitary pigeonhole principle). Unfortunately, the finitary formulation that I arrived at ended up being rather complicated (in particular, involving the concept of a “barrier”), and did not obviously suggest a path to resolving the conjecture; but it did at least provide some simpler finitary consequences of the conjecture which might be worth focusing on as subproblems.

I should point out that the arguments here are quite “soft” in nature and are not really addressing the heart of the invariant subspace problem; but I think it is still of interest to observe that this problem is not purely an infinitary problem, and does have some non-trivial finitary consequences.

I am indebted to Henry Towsner for many discussions on this topic.

Read the rest of this entry »

In topology, a non-empty set {E} is said to be connected if cannot be decomposed into two nontrivial subsets that are both closed and open relative to {E}, and path connected if any two points {x,y} in {E} can be connected by a path (i.e. there exists a continuous map {\gamma: [0,1] \rightarrow E} with {\gamma(0)=x} and {\gamma(1)=y}).

Path-connected sets are always connected, but the converse is not true, even in the model case of compact subsets of a Euclidean space. The classic counterexample is the set

\displaystyle  E := \{ (0,y): -1 \leq y \leq 1 \} \cup \{ (x, \sin(1/x)): 0 < x \leq 1 \}, \ \ \ \ \ (1)

which is connected but not path-connected (there is no continuous path from {(0,1)} to {(1,\sin(1))}).

Looking at the definitions of the two concepts, one notices a difference: the notion of path-connectedness is somehow a “positive” one, in the sense that a path-connected set can produce the existence of something (a path connecting two points {x} and {y}) for a given type of input (in this case, a pair of points {x, y}). On the other hand, the notion of connectedness is a “negative” one, in that it asserts the non-existence of something (a non-trivial partition into clopen sets). To put it another way, it is relative easy to convince someone that a set is path-connected (by providing a connecting path for every pair of points) or is disconnected (by providing a non-trivial partition into clopen sets) but if a set is not path-connected, or is connected, how can one easily convince someone of this fact? To put it yet another way: is there a reasonable certificate for connectedness (or for path-disconnectedness)?

In the case of connectedness for compact subsets {E} of Euclidean space, there is an answer as follows. If {\epsilon > 0}, let us call two points {x, y} in {E} {\epsilon}-connected if one can find a finite sequence {x_0 = x, x_1, \ldots, x_N = y} of points in {E}, such that {|x_{i+1}-x_i| < \epsilon} for all {0 \leq i < N}; informally, one can jump from {x} to {y} in {E} using jumps of length at most {\epsilon}. Let us call {x_0,\ldots,x_N} an {\epsilon}-discrete path.

Proposition 1 (Connectedness certificate for compact subsets of Euclidean space) Let {E \subset {\bf R}^d} be compact and non-empty. Then {E} is connected if and only if every pair of points in {E} is {\epsilon}-connected for every {\epsilon > 0}.

Proof: Suppose first that {E} is disconnected, then {E} can be partitioned into two non-empty closed subsets {F, G}. Since {E} is compact, {F, G} are compact also, and so they are separated by some non-zero distance {\epsilon > 0}. But then it is clear that points in {F} cannot be {\epsilon}-connected to points in {G}, and the claim follows.

Conversely, suppose that there is a pair of points {x,y} in {E} and an {\epsilon > 0} such that {x,y} are not {\epsilon}-connected. Let {F} be the set of all points in {E} that are {\epsilon}-connected to {x}. It is easy to check that {F} is open, closed, and a proper subset of {E}; thus {E} is disconnected. \Box

We remark that the above proposition in fact works for any compact metric space. It is instructive to see how the points {(1,\sin(1))} and {(0,1)} are {\epsilon}-connected in the set (1); the {\epsilon}-discrete path follows the graph of {\sin(1/x)} backwards until one gets sufficiently close to the {y}-axis, at which point one “jumps” across to the {y}-axis to eventually reach {(0,1)}.

It is also interesting to contrast the above proposition with path connectedness. Clearly, if two points {x, y} are connected by a path, then they are {\epsilon}-connected for every {\epsilon > 0} (because every continuous map {\gamma: [0,1] \rightarrow E} is uniformly continuous); but from the analysis of the example (1) we see that the converse is not true. Roughly speaking, the various {\epsilon}-discrete paths from {x} to {y} have to be “compatible” with each other in some sense in order to synthesise a continuous path from {x} to {y} in the limit (we will not make this precise here).

But this leaves two (somewhat imprecise) questions, which I do not know how to satisfactorily answer:

Question 1: Is there a good certificate for path disconnectedness, say for compact subsets {E} of {{\bf R}^d}? One can construct lousy certificates, for instance one could look at all continuous paths in {{\bf R}^d} joining two particular points {x, y} in {E}, and verify that each one of them leaves {E} at some point. But this is an “uncountable” certificate – it requires one to check an uncountable number of paths. In contrast, the certificate in Proposition 1 is basically a countable one (if one describes a compact set {E} by describing a family of {\epsilon}-nets for a countable sequence of {\epsilon} tending to zero). (Very roughly speaking, I would like a certificate that can somehow be “verified in countable time” in a suitable oracle model, as discussed in my previous post, though I have not tried to make this vague specification more rigorous.)

It is tempting to look at the equivalence classes of {E} given by the relation of being connected by a path, but these classes need not be closed (as one can see with the example (1)) and it is not obvious to me how to certify that two such classes are not path-connected to each other.

Question 2: Is there a good certificate for connectedness for closed but unbounded closed subsets of {{\bf R}^d}? Proposition 1 fails in this case; consider for instance the set

\displaystyle  E := \{ (x,0): x \in {\bf R} \} \cup \{ (x,\frac{1}{x}): x > 0 \}. \ \ \ \ \ (2)

Any pair of points {x,y \in E} is {\epsilon}-connected for every {\epsilon > 0}, and yet this set is disconnected.

The problem here is that as {\epsilon} gets smaller, the {\epsilon}-discrete paths connecting a pair of points such as {(1,0)} and {(1,1)} have diameter going to infinity. One natural guess is then to require a uniform bound on the diameter, i.e. that for any pair of points {x, y}, there exists an {R>0} such that there is an {\epsilon}-discrete path from {x} to {y} of diameter at most {R} for every {\epsilon > 0}. This does indeed force connectedness, but unfortunately not all connected sets have this property. Consider for instance the set

\displaystyle  E := \{ (x,y,0): x \in {\bf R}, y \in \pm 1 \} \cup \bigcup_{n=1}^\infty (E_n \cup F_n) \ \ \ \ \ (3)

in {{\bf R}^3}, where

\displaystyle E_n := \{ (x,y,0): \frac{x^2}{n^2} + \frac{y^2}{(1-1/n)^2} = 1\}

is a rectangular ellipse centered at the origin with minor diameter endpoints {(0,1/n-1), (0,1-1/n)} and major diameter endpoints {(-n,0), (n,0)}, and

\displaystyle F_n := \{ (n,y,z): (y-1/2)^2+z^2=1/4 \}

is a circle that connects the {(n,0)} endpoint of {E_n} to the point {(n,1)} in {E}. One can check that {E} is a closed connected set, but the {\epsilon}-discrete paths connecting {(0,-1)} with {(0,+1)} have unbounded diameter as {\epsilon \rightarrow 0}.

Currently, I do not have any real progress on Question 1. For Question 2, I can only obtain the following strange “second-order” criterion for connectedness, that involves an unspecified gauge function {\delta}:

Proposition 2 (Second-order connectedness certificate) Let {E} be a closed non-empty subset of {{\bf R}^d}. Then the following are equivalent:

  • {E} is connected.
  • For every monotone decreasing, strictly positive function {\delta: {\bf R}^+ \rightarrow {\bf R}^+} and every {x,y \in E}, there exists a discrete path {x_0=x,x_1,\ldots,x_N=y} in {E} such that {|x_{i+1}-x_i| < \delta(|x_i|)}.

Proof: This is proven in almost the same way as Proposition 1. If {E} can be disconnected into two non-trivial sets {F, G}, then one can find a monotone decreasing gauge function {\delta: {\bf R}^+ \rightarrow {\bf R}^+} such that for each ball {B_R := \{ x \in {\bf R}^d: |x| \leq R \}}, {F \cap B_R} and {G} are separated by at least {\delta(R)}, and then there is no discrete path from {F} to {G} in {E} obeying the condition {|x_{i+1}-x_i| < \delta(|x_i|)}.

Conversely, if there exists a gauge function {\delta} and two points {x,y \in E} which cannot be connected by a discrete path in {E} that obeys the condition {|x_{i+1}-x_i| < \delta(|x_i|)}, then if one sets {F} to be all the points that can be reached from {x} in this manner, one easily verifies that {F} and {E \backslash F} disconnect {E}. \Box

It may be that this is somehow the “best” one can do, but I am not sure how to quantify this formally.

Anyway, I was curious if any of the readers here (particularly those with expertise in point-set topology or descriptive set theory) might be able to shed more light on these questions. (I also considered crossposting this to Math Overflow, but I think the question may be a bit too long (and vague) for that.)

(The original motivation for this question, by the way, stems from an attempt to use methods of topological group theory to attack questions in additive combinatorics, in the spirit of the paper of Hrushovski studied previously on this blog. The connection is rather indirect, though; I may discuss this more in a future post.)

The standard modern foundation of mathematics is constructed using set theory. With these foundations, the mathematical universe of objects one studies contains not only the “primitive” mathematical objects such as numbers and points, but also sets of these objects, sets of sets of objects, and so forth. (In a pure set theory, the primitive objects would themselves be sets as well; this is useful for studying the foundations of mathematics, but for most mathematical purposes it is more convenient, and less conceptually confusing, to refrain from modeling primitive objects as sets.) One has to carefully impose a suitable collection of axioms on these sets, in order to avoid paradoxes such as Russell’s paradox; but with a standard axiom system such as Zermelo-Fraenkel-Choice (ZFC), all actual paradoxes that we know of are eliminated. Still, one might be somewhat unnerved by the presence in set theory of statements which, while not genuinely paradoxical in a strict sense, are still highly unintuitive; Cantor’s theorem on the uncountability of the reals, and the Banach-Tarski paradox, are perhaps the two most familiar examples of this.

One may suspect that the reason for this unintuitive behaviour is the presence of infinite sets in one’s mathematical universe. After all, if one deals solely with finite sets, then there is no need to distinguish between countable and uncountable infinities, and Banach-Tarski type paradoxes cannot occur.

On the other hand, many statements in infinitary mathematics can be reformulated into equivalent statements in finitary mathematics (involving only finitely many points or numbers, etc.); I have explored this theme in a number of previous blog posts. So, one may ask: what is the finitary analogue of statements such as Cantor’s theorem or the Banach-Tarski paradox?

The finitary analogue of Cantor’s theorem is well-known: it is the assertion that {2^n > n} for every natural number {n}, or equivalently that the power set of a finite set {A} of {n} elements cannot be enumerated by {A} itself. Though this is not quite the end of the story; after all, one also has {n+1 > n} for every natural number {n}, or equivalently that the union {A \cup \{a\}} of a finite set {A} and an additional element {a} cannot be enumerated by {A} itself, but the former statement extends to the infinite case, while the latter one does not. What causes these two outcomes to be distinct?

On the other hand, it is less obvious what the finitary version of the Banach-Tarski paradox is. Note that this paradox is available only in three and higher dimensions, but not in one or two dimensions; so presumably a finitary analogue of this paradox should also make the same distinction between low and high dimensions.

I therefore set myself the exercise of trying to phrase Cantor’s theorem and the Banach-Tarski paradox in a more “finitary” language. It seems that the easiest way to accomplish this is to avoid the use of set theory, and replace sets by some other concept. Taking inspiration from theoretical computer science, I decided to replace concepts such as functions and sets by the concepts of algorithms and oracles instead, with various constructions in set theory being replaced instead by computer language pseudocode. The point of doing this is that one can now add a new parameter to the universe, namely the amount of computational resources one is willing to allow one’s algorithms to use. At one extreme, one can enforce a “strict finitist” viewpoint where the total computational resources available (time and memory) are bounded by some numerical constant, such as {10^{100}}; roughly speaking, this causes any mathematical construction to break down once its complexity exceeds this number. Or one can take the slightly more permissive “finitist” or “constructivist” viewpoint, where any finite amount of computational resource is permitted; or one can then move up to allowing any construction indexed by a countable ordinal, or the storage of any array of countable size. Finally one can allow constructions indexed by arbitrary ordinals (i.e. transfinite induction) and arrays of arbitrary infinite size, at which point the theory becomes more or less indistinguishable from standard set theory.

I describe this viewpoint, and how statements such as Cantor’s theorem and Banach-Tarski are interpreted with this viewpoint, below the fold. I should caution that this is a conceptual exercise rather than a rigorous one; I have not attempted to formalise these notions to the same extent that set theory is formalised. Thus, for instance, I have no explicit system of axioms that algorithms and oracles are supposed to obey. Of course, these formal issues have been explored in great depth by logicians over the past century or so, but I do not wish to focus on these topics in this post.

A second caveat is that the actual semantic content of this post is going to be extremely low. I am not going to provide any genuinely new proof of Cantor’s theorem, or give a new construction of Banach-Tarski type; instead, I will be reformulating the standard proofs and constructions in a different language. Nevertheless I believe this viewpoint is somewhat clarifying as to the nature of these paradoxes, and as to how they are not as fundamentally tied to the nature of sets or the nature of infinity as one might first expect.

Read the rest of this entry »

Archives

RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.
Follow

Get every new post delivered to your Inbox.

Join 3,330 other followers