One notable feature of mathematical reasoning is the reliance on counterfactual thinking – taking a hypothesis (or set of hypotheses) which may or may not be true, and following it (or them) to its logical conclusion.  For instance, most propositions in mathematics start with a set of hypotheses (e.g. “Let n be a natural number such that …”), which may or may not apply to the particular value of n one may have in mind.  Or, if one ever argues by dividing into separate cases (e.g. “Case 1: n is even. … Case 2: n is odd.  …”), then for any given n, at most one of these cases would actually be applicable, with the other cases being counterfactual alternatives.     But the purest example of counterfactual thinking in mathematics comes when one employs a proof by contradiction (or reductio ad absurdum) – one introduces a hypothesis that in fact has no chance of being true at all (e.g. “Suppose for sake of contradiction that \sqrt{2} is equal to the ratio p/q of two natural numbers.”), and proceeds to demonstrate this fact by showing that this hypothesis leads to absurdity.

Experienced mathematicians are so used to this type of counterfactual thinking that it is sometimes difficult for them to realise that it this type of thinking is not automatically intuitive for students or non-mathematicians, who can anchor their thinking on the single, “real” world to the extent that they cannot easily consider hypothetical alternatives.  This can lead to confused exchanges such as the following:

Lecturer: “Theorem.  Let p be a prime number.  Then…”

Student: “But how do you know that p is a prime number?  Couldn’t it be composite?”


Lecturer: “Now we see what the function f does when we give it the input of x+dx instead.  …”

Student: “But didn’t you just say that the input was equal to x just a moment ago?”

This is not to say that counterfactual thinking is not encountered at all outside of mathematics.  For instance, an obvious source of counterfactual thinking occurs in fictional writing or film, particularly in speculative fiction such as science fiction, fantasy, or alternate history.  Here, one can certainly take one or more counterfactual hypotheses (e.g. “what if magic really existed?”) and follow them to see what conclusions would result.  The analogy between this and mathematical counterfactual reasoning is not perfect, of course: in fiction, consequences are usually not logically entailed by their premises, but are instead driven by more contingent considerations, such as the need to advance the plot, to entertain or emotionally affect the reader, or to make some moral or ideological point, and these types of narrative elements are almost completely absent in mathematical writing.  Nevertheless, the analogy can be somewhat helpful when one is first coming to terms with mathematical reasoning.  For instance, the mathematical concept of a proof by contradiction can be viewed as roughly analogous in some ways to such literary concepts as satire, dark humour, or absurdist fiction, in which one takes a premise specifically with the intent to derive absurd consequences from it.  And if the proof of (say) a lemma is analogous to a short story, then the statement of that lemma can be viewed as analogous to the moral of that story.

Another source of counterfactual thinking outside of mathematics comes from simulation, when one feeds some initial data or hypotheses (that may or may not correspond to what actually happens in the real world) into a simulated environment (e.g. a piece of computer software, a laboratory experiment, or even just a thought-experiment), and then runs the simulation to see what consequences result from these hypotheses.   Here, proof by contradiction is roughly analogous to the “garbage in, garbage out” phenomenon that is familiar to anyone who has worked with computers: if one’s initial inputs to a simulation are not consistent with the hypotheses of that simulation, or with each other, one can obtain bizarrely illogical (and sometimes unintentionally amusing) outputs as a result; and conversely, such outputs can be used to detect and diagnose problems with the data, hypotheses, or implementation of the simulation.

Despite the presence of these non-mathematical analogies, though, proofs by contradiction are still often viewed with suspicion and unease by many students of mathematics.  Perhaps the quintessential example of this is the standard proof of Cantor’s theorem that the set {\bf R} of real numbers is uncountable.  This is about as short and as elegant a proof by contradiction as one can have without being utterly trivial, and despite this (or perhaps because of this) it seems to offend the reason of many people when they are first exposed to it, to an extent far greater than most other results in mathematics.  (The only other two examples I know of that come close to doing this are the fact that the real number 0.999\ldots is equal to 1, and the solution to the blue-eyed islanders puzzle.)

Some time ago on this blog, I collected a family of well-known results in mathematics that were proven by contradiction, and specifically by a type of argument that I called the “no self-defeating object” argument; that any object that was so ridiculously overpowered that it could be used to “defeat” its own existence, could not actually exist.  Many basic results in mathematics can be phrased in this manner: not only Cantor’s theorem, but Euclid’s theorem on the infinitude of primes, Gödel’s incompleteness theorem, or the conclusion (from Russell’s paradox) that the class of all sets cannot itself be a set.

I presented each of these arguments in the usual “proof by contradiction” manner; I made the counterfactual hypothesis that the impossibly overpowered object existed, and then used this to eventually derive a contradiction.  Mathematically, there is nothing wrong with this reasoning, but because the argument spends almost its entire duration inside the bizarre counterfactual universe caused by an impossible hypothesis, readers who are not experienced with counterfactual thinking may view these arguments with unease.

It was pointed out to me, though (originally with regards to Euclid’s theorem, but the same point in fact applies to the other results I presented) that one can pull a large fraction of each argument out of this counterfactual world, so that one can see most of the argument directly, without the need for any intrinsically impossible hypotheses.  This is done by converting the “no self-defeating object” argument into a logically equivalent “any object can be defeated” argument, with the former then being viewed as an immediate corollary of the latter.  This change is almost trivial to enact (it is often little more than just taking the contrapositive of the original statement), but it does offer a slightly different “non-counterfactual” (or more precisely, “not necessarily counterfactual”) perspective on these arguments which may assist in understanding how they work.

For instance, consider the very first no-self-defeating result presented in the previous post:

Proposition 1 (No largest natural number). There does not exist a natural number N that is larger than all the other natural numbers.

This is formulated in the “no self-defeating object” formulation.  But it has a logically equivalent “any object can be defeated” form:

Proposition 1′. Given any natural number N, one can find another natural number N' which is larger than N.

Proof. Take N' := N+1. \Box

While Proposition 1 and Proposition 1′ are logically equivalent to each other, note one key difference: Proposition 1′ can be illustrated with examples (e.g. take N = 100, so that the proof gives N'=101 ), whilst Proposition 1 cannot (since there is, after all, no such thing as a largest natural number).  So there is a sense in which Proposition 1′ is more “non-counterfactual” or  “constructive” than the “counterfactual” Proposition 1.

In a similar spirit, Euclid’s theorem (which we give using the numbering from the previous post),

Proposition 3. There are infinitely many primes.

can be recast in “all objects can be defeated” form as

Proposition 3′.  Let p_1,\ldots,p_n be a collection of primes.   Then there exists a prime q which is distinct from any of the primes p_1,\ldots,p_n.

Proof. Take q to be any prime factor of p_1 \ldots p_n + 1 (for instance, one could take the smallest prime factor, if one wished to be completely concrete).   Since p_1 \ldots p_n + 1 is not divisible by any of the primes p_1,\ldots,p_n, q must be distinct from all of these primes.  \Box

One could argue that  there was a slight use of proof by contradiction in the proof of Proposition 3′ (because one had to briefly entertain and then rule out the counterfactual possibility that q was equal to one of the p_1,\ldots,p_n), but the proposition itself is not inherently counterfactual, as  it does not make as patently impossible a hypothesis as a finite enumeration of the primes.  Incidentally, it can be argued that the proof of Proposition 3′ is closer in spirit to Euclid’s original proof of his theorem, than the proof of Proposition 3 that is usually given today.  Again, Proposition 3′ is “constructive”; one can apply it to any finite list of primes, say 2, 3, 5, and it will actually exhibit a prime not in that list (in this case, 31).  The same cannot be said of Proposition 3, despite the logical equivalence of the two statements.

[Note: the article below may make more sense if one first reviews the previous blog post on the “no self-defeating object”.  For instance, the section and theorem numbering here is deliberately chosen to match that of the preceding post.]

— 1. Set theory —

Now we revisit examples of the no-self-defeating object in set  theory.  Take, for instance, Cantor’s theorem:

Proposition 6. The reals are uncountable.

One can easily recast this in a “non-counterfactual” or “all objects can be defeated” form:

Proposition 6′. Let x_1, x_2, x_3, \ldots be a countable sequence of real numbers (possibly with repetition).  Then there exists a real number y that is not equal to any of the x_1, x_2, x_3, \ldots.

Proof. Set y equal to y = 0.a_1a_2a_3\ldots, where

  1. a_1 is the smallest digit in \{0,\ldots,9\} that is not equal to the first digit past the decimal point of any decimal representation of x_1;
  2. a_2 is the smallest digit in \{0,\ldots,9\} that is not equal to the second digit past the decimal point of any decimal representation of x_2;
  3. etc.

(Here we write “any” decimal representation rather than “the” decimal representation to deal with the annoying 0.999\ldots = 1.000\ldots issue mentioned earlier.  As with the proof of Euclid’s theorem, there is nothing special about taking the smallest digit here; this is just for sake of concreteness.)  Note that any real number has at most two decimal representations, and there are ten digits available, so one can always find a_1, a_2, ... with the desired properties.  Then, by construction, the real number y cannot equal x_1 (because it differs in the first digit from any of the decimal representations of x_1), it cannot equal x_2 (because it differs in the second digit), and so forth. \Box

Again, Proposition 6′ is trivially equivalent to Proposition 6, and still briefly uses contradiction in its proof, but in this non-counterfactual form one can actually illustrate it with examples.  For instance, one could start with a list of terminating decimals, starting with the single digit terminating decimals, the two-digit terminating decimals, and so forth:

x_1 := 0.1, \quad x_2 := 0.2, \quad x_3 := 0.3, \quad \ldots \quad x_9 := 0.9,

x_{10} := 0.01, \quad x_{11} := 0.02, \quad \ldots \quad x_{108} := 0.99,

x_{109} := 0.001, \quad x_{110} := 0.002, \quad \ldots \quad x_{1007} := 0.999,


and one then sees that the construction will, in this case, give the number y = 0.21111\ldots, which indeed does not occur on the above list.

It is instructive to try to “outrun” Proposition 6′ by modifying the list to accommodate 0.2111\ldots to the list.  One cannot simply tack on this number “at the end” of this list, as the list is infinite and does not actually have an end.  One can insert it at, say, the beginning of the list, and then move all the other numbers down one, but then Proposition 6′ gives a new number not on the list (in this case, 0.0111\ldots).  One can add that number to the list also, bumping everyone else down one, but then Proposition 6′ gives yet another number not on the list (in this case, 0.10111\ldots).    After doing this a few times, one can begin to appreciate how Proposition 6′ always defeats any attempt to outrun it, much as one cannot obtain a largest natural number by continually adding +1 to one’s previous proposed candidate.

It is also remarkable how inoffensive Proposition 6′ and its proof is, when compared against the reaction one sometimes encounters to Proposition 6, which is logically equivalent.  A single contraposition can dramatically change one’s impression of a result.

In a similar spirit, the result

Proposition 4 (No universal set) There does not exist a set that contains all sets (including itself).

(which, of course, assumes one is working in something like the Zermelo-Frankel axioms of set theory) becomes

Proposition 4′. Given any set A, there exists a set B which is not an element of A.

Proof. Consider the set

B := \{ C \in A: C \not \in C \};

the existence of this set is guaranteed by the axiom schema of specification.  If B was an element of itself, then by construction we would have B \not \in B, a contradiction.  Thus we must have B \not \in B.  From construction, this forces B \not \in A.  \Box

In the usual axiomatic formulation of set theory, the axiom of foundation implies, among other things, that no set is an element of itself.  With that axiom, the set B given by Proposition 4′ is nothing other than A itself, which by the axiom of foundation is not an element of A.  But since the axiom of foundation was not used in the proof of Proposition 4′, one can also explore (counterfactually!) what happens in set theories in which one does not assume the axiom of foundation.  Suppose, for instance, that one managed somehow to produce a set A that contained itself as its only element: A = \{A\}.  (Informally, one could think of A as an infinite nested chain, A : =\{ \{ \{\ldots \} \} \}.)  Then the only element that A has, namely A, is an element of itself, so the set B produced by Proposition 4′ is the empty set B := \emptyset, which is indeed not in A.

One can try outrunning Proposition 4′ again to see what happens.  For instance, let’s add the empty set to the set A produced earlier, to give the new set A' := \{ A, \emptyset\}.   The Proposition 4′ construction then gives the set B = \{\emptyset\}, which is indeed not in A'.  If we then try to add that set in to get a new set A'' := \{ A, \emptyset, \{ \emptyset\} \}, then one gets the set B = \{\emptyset, \{\emptyset\}\}, which is again not in A''.  Iterating this, one in fact begins constructing the von Neumann ordinals!  (Actually, the original set A plays essentially no role in this construction; one could have started with the empty set and it would have generated the same sequence of ordinals.)

— 2. Logic —

One can also convert the no-self-defeating arguments given in the logic section of the previous post into “every object can be defeated” forms, but it took me a while to find them.  (One may wish to pause to try this before moving on.)

We first turn to the result (essentially coming from the liar paradox) that the notion of truth cannot be captured by a predicate.  This was discussed in the previous blog post, but not formalised as a theorem.  I will do so now, starting with the easier “self-referential case”:

Theorem 8.5 (Impredicativity of truth, self-referential case) (Informal statement) Let L be a formal language that contains the concepts of predicates and allows self-reference, and let M be an interpretation of that language (i.e. a way of consistently assigning values to every constant, ranges to every variable, and truth values to every sentence in that language, obeying all the axioms of that language).  Then there does not exist a “truth predicate” T(x) in L that takes a sentence x as input, with the property that for every sentence x in L, that T(x) is true (in M) if and only if x is true (in M).

Here is the non-counterfactual version:

Theorem 8.5′ (Informal statement) Let L be a formal language that contains the concepts of predicates and strings and allows self-reference, and let M be an interpretation of that language.  Let T() be a predicate in L that takes sentences as input.  Then there exists a sentence G such that the truth value of T(G) (in M) is different from the truth value of G (in M).

Proof. We define G be the self-referential “liar sentence”

G := “T(G) is false”.

Then, clearly, G is true if and only if T(G) is false, and the claim follows.  \Box

Using the Quining trick to achieve indirect self-reference, one can remove the need for direct self-reference in the above argument:

Theorem 8.6 (Impredicativity of truth) (Informal statement) Let L be a formal language that contains the concepts of predicates and strings, and let M be an interpretation of that language (i.e. a way of consistently assigning values to every constant, ranges to every variable, and truth values to every sentence in that language) that interprets strings in the standard manner (so in particular, every sentence or predicate  inL can also be viewed as a string constant in M).  Then there does not exist a “truth predicate” T(x) in L that takes a string x as input, with the property that for every sentence x in L, that T(x) is true (in M) if and only if x is true (in M).

A more formal version of the above theorem is Tarski’s undefinability theorem.

Here is the non-counterfactual version:

Theorem 8.6′.  (Informal statement) Let L be a formal language containing the concepts of predicates and strings, and let T(x) be a predicate on strings.  Then there exists a sentence G in L with the property that, for any interpretation M of L that interprets strings in the standard manner,that the truth value of T(G) in M is different from the truth value of G in M.

Proof. (Sketch) We use the “quining” trick.  Let Q(x) be the predicate on strings defined by

Q(x) = “x is a predicate on strings, and T(x(x)) is false”

and let G be the “Gödel sentence” G := Q(Q).  Then, by construction, G is true in M if and only if T(G) is false in M, and the claim follows. \Box

Actually Theorem 8.6′ is marginally stronger than Theorem 8.6 because it makes the sentence G independent of the interpretation M, whereas Theorem 8.6 (when viewed in the contrapositive) allows G to depend on M.  This slight strengthening will be useful shortly.

An important special case of Theorem 8.6′ is the first incompleteness theorem:

Corollary 8.7 (Gödel’s first incompleteness theorem) (Informal statement) Let L be a formal language containing the concepts of predicates and strings that has at least one interpretation M that gives the standard interpretation of strings (in particular, L must be consistent).   Then there exists a sentence G in L that is undecidable in L (or more precisely, in a formal recursive proof system for L).

Proof. (Sketch) A language L that is powerful enough to contain predicates and strings will also be able to contain a provability predicate P(), so that a sentence x in L is provable in L‘s proof system if and only if P(x) is true in the standard interpretation M.  Applying Theorem 8.6′ to this predicate, we obtain a Gödel sentence G such that the truth value of G in M differs from the truth value of P(G) in M.  If P(G) is true in M, then G must be true in M also since L is consistent, so the only remaining option is that P(G) is false in M and G is true in M.  Thus neither G nor its negation can be provable, and hence G is undecidable. \Box

Now we turn to the second incompleteness theorem:

Theorem 9 (Second Gödel incompleteness theorem). (Informal statement) No consistent logical system which has the notion of a predicate and a string, can provide a proof of its own logical consistency.

Here is the non-counterfactual version:

Theorem 9′. (Informal statement) Let L, L' be consistent logical systems that have  the notion of a predicate and a string, such that every sentence in L' is also a sentence in L, and such that the consistency of L' can be proven in L.  Then there exists a sentence G that lies in both L and L' that is provable in L but is not provable in L'.

Proof. (Sketch) In the common language of L and L', let T() be the predicate

T(x) := ” x is provable in L’ “.

Applying Theorem 8.6′, we can find a sentence G (common to both L and L') with the property that in any interpretation M of either L or L’, the truth value of G and the truth value of T(G) differ.

By Corollary 8.7′ (or more precisely, the proof of that corollary), G is not provable in L'.  Now we show that G is provable in L.   Because L can prove the consistency of L’, one can embed the proof Corollary 8.7′ inside the language L, and deduce that the sentence “G is not provable in L'” is also provable in L.    In other words, L can prove that T(G) is false.  On the other hand, embedding the proof of Theorem 8.6′ inside L, L can also prove that the truth value of G and T(G) differ.  Thus L can prove that G is true.  \Box

The advantage of this formulation of the second incompleteness theorem, as opposed to the usual counterfactual one, is that one can actually trace through the argument with a concrete example.  For instance, Zermelo-Frankel-Choice (ZFC)  set theory can prove the consistency of Peano arithmetic (a result of Gentzen), and so one can follow the above argument to show that the Gödel sentence of Peano arithmetic is provably true in ZFC, but not provable in Peano arithmetic.

— 3.  Computability —

By now, it should not be surprising that the no-self-defeating arguments in computability also have a non-counterfactual form, given how close they are to the analogous arguments in set theory and logic.  For sake of completeness, we record this for Turing’s theorem:

Theorem 10 (Turing halting theorem) (Informal statement) There does not exist a program P which takes a string S as input, and determines in finite time whether S  is a program (with no input) that halts in finite time.

Here is the non-counterfactual version:

Theorem 10′. (Informal statement) Let P be a program that takes a string S as input, returns a yes-no answer P(S) as output, and which always halts in finite time.  Then there exists a string G that is a program with no input, such that if P is given G as input, then P does not determine correctly whether G halts in finite time.

Proof. Define Q to be the program taking a string R as input which does the following:

  • If R is not a program that takes a string as input, it halts.
  • Otherwise, it  runs P with input R(R) (which is a program with no input).
  • If P(R(R)) returns “no”, it halts, while if P(R(R)) returns “yes”, it runs forever.

Now, let G be the program Q(Q).  By construction, G halts if and only if P(G) returns “no”, and the claim follows. \Box

One can apply Theorem 10′ to various naive halting algorithms.  For instance, let P(S) be the program that simulates S for (say) 1000 CPU cycles, and then returns “yes” if S halted by that time, or “no” otherwise.  Then the program G generated by the above proof will take more than 1000 CPU cycles to execute, and so P will determine incorrectly whether G halted or not.  (Notice the similarity here with Proposition 1′.)

The same argument also gives a non-counterfactual version of the non-computability of the Busy Beaver function:

Proposition 11′.  Let f: {\bf N} \to {\bf N} be a computable function.  Then there exists a natural number n and a program G of length n (and taking no input) that halts in finite time, but requires more than f(n) CPU cycles before it halts.

Proof. Let P(S) be the program that simulates S for f(n) CPU cycles, where n is the length of S, and returns “yes” if S halted by that time, or “no” otherwise.  Then the program G generated by Theorem 10′ is such that P does not correctly determine if G halts.  Since P is always correct when it returns “yes”, this means that G does halt, but that P(G) returned “no”, which implies that G takes more than f(n) cycles to execute. \Box

Of course, once one has a program of length n that runs for more than f(n) CPU cycles, it is not hard to make a program of length a little bit larger than n that outputs a number greater than f(n), so that one can conclude as a corollary that the Busy Beaver function outgrows any computable function.

— 4. Miscellaneous —

The strategy stealing argument in game theory is already more or less set up in non-counterfactual form: in any game that admits “harmless moves” (such as noughts and crosses), any strategy of the second player can be stolen to be defeated (or at least held to a draw) by the first player.  Similarly for arbitrage strategies in finance (unless there are loopholes due to imperfect information or friction costs).

It is a bit more difficult to recast the no-self-defeating objects in physics in a non-counterfactual form, due to the large number of implicit physical assumptions in these arguments.  I will present just one simple example of this, which is the grandfather paradox that asserts that controlled time travel is impossible because you could use such travel to go back in time to kill your grandfather before you were born.  One can convert this to a slightly less counterfactual format:

“Theorem”.  (Very imprecisely stated!) Suppose that one has a mechanism in universe U to travel back in time and arrive at universe U’.  Then there can exist events in U that occurred differently in universe U’.

The “proof” is, of course, the same: starting from U, go back in time and kill your grandfather in universe U’.  This version of the “theorem” (though not the precise “proof” given here) is of course invoked often in many science fiction stories involving time travel.

It seems possible to also cast the no-immovable-objects and no-controlled-and-detectable-tachyon-particles arguments from the previous blog post in this form, but one would have to consider multiple universes to do this properly, and I will not attempt to do so here, as it appears to be rather complicated.

The “omnipotence paradox” in philosophy (can an omnipotent being create a stone so heavy that He cannot lift it?) can also be rephrased in a non-counterfactual form that does not require consideration of any omnipotent beings:

“Theorem”: If G is a being, then G will be unable to do at least one of the following two tasks:

  1. Create a stone so heavy that G cannot lift it.
  2. Be able to lift any possible stone.

(Of course, most beings will fail at both Task 1 and Task 2.)